Articles de revues sur le sujet « Computer vision, object detection, action recognition »

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Computer vision, object detection, action recognition.

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleurs articles de revues pour votre recherche sur le sujet « Computer vision, object detection, action recognition ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les articles de revues sur diverses disciplines et organisez correctement votre bibliographie.

1

Zhang, Hong-Bo, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du et Duan-Sheng Chen. « A Comprehensive Survey of Vision-Based Human Action Recognition Methods ». Sensors 19, no 5 (27 février 2019) : 1005. http://dx.doi.org/10.3390/s19051005.

Texte intégral
Résumé :
Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action recognition methods using depth data, 3D-skeleton data, still image data, spatiotemporal interest point-based methods, and human walking motion recognition. However, there has been no systematic survey of human action recognition. To this end, we present a thorough review of human action recognition methods and provide a comprehensive overview of recent approaches in human action recognition research, including progress in hand-designed action features in RGB and depth data, current deep learning-based action feature representation methods, advances in human–object interaction recognition methods, and the current prominent research topic of action detection methods. Finally, we present several analysis recommendations for researchers. This survey paper provides an essential reference for those interested in further research on human action recognition.
Styles APA, Harvard, Vancouver, ISO, etc.
2

Gundu, Sireesha, et Hussain Syed. « Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques ». Sensors 23, no 5 (25 février 2023) : 2569. http://dx.doi.org/10.3390/s23052569.

Texte intégral
Résumé :
Activity recognition in unmanned aerial vehicle (UAV) surveillance is addressed in various computer vision applications such as image retrieval, pose estimation, object detection, object detection in videos, object detection in still images, object detection in video frames, face recognition, and video action recognition. In the UAV-based surveillance technology, video segments captured from aerial vehicles make it challenging to recognize and distinguish human behavior. In this research, to recognize a single and multi-human activity using aerial data, a hybrid model of histogram of oriented gradient (HOG), mask-regional convolutional neural network (Mask-RCNN), and bidirectional long short-term memory (Bi-LSTM) is employed. The HOG algorithm extracts patterns, Mask-RCNN extracts feature maps from the raw aerial image data, and the Bi-LSTM network exploits the temporal relationship between the frames for the underlying action in the scene. This Bi-LSTM network reduces the error rate to the greatest extent due to its bidirectional process. This novel architecture generates enhanced segmentation by utilizing the histogram gradient-based instance segmentation and improves the accuracy of classifying human activities using the Bi-LSTM approach. Experimental outcomes demonstrate that the proposed model outperforms the other state-of-the-art models and has achieved 99.25% accuracy on the YouTube-Aerial dataset.
Styles APA, Harvard, Vancouver, ISO, etc.
3

Mikhalev, Oleg, et Alexander Yanyushkin. « Machine vision and object recognition using neural networks ». Robotics and Technical Cybernetics 10, no 2 (juin 2022) : 113–20. http://dx.doi.org/10.31776/rtcj.10204.

Texte intégral
Résumé :
Computer vision is becoming one of the important areas of automation of various human activities. Technical systems today are endowed with the ability to see, and along with the use of neural networks, they are also endowed with the ability to act intelligently. Thus, they are able to see and make the right decisions and actions faster and more accurately than a person. The article discusses the possibility of using machine vision and object recognition technology for industrial automation, describes a convolutional neural network and an object detection algorithm.
Styles APA, Harvard, Vancouver, ISO, etc.
4

Voulodimos, Athanasios, Nikolaos Doulamis, Anastasios Doulamis et Eftychios Protopapadakis. « Deep Learning for Computer Vision : A Brief Review ». Computational Intelligence and Neuroscience 2018 (2018) : 1–13. http://dx.doi.org/10.1155/2018/7068349.

Texte intégral
Résumé :
Over the last years deep learning methods have been shown to outperform previous state-of-the-art machine learning techniques in several fields, with computer vision being one of the most prominent cases. This review paper provides a brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders. A brief account of their history, structure, advantages, and limitations is given, followed by a description of their applications in various computer vision tasks, such as object detection, face recognition, action and activity recognition, and human pose estimation. Finally, a brief overview is given of future directions in designing deep learning schemes for computer vision problems and the challenges involved therein.
Styles APA, Harvard, Vancouver, ISO, etc.
5

Wang, Chang, Jinyu Sun, Shiwei Ma, Yuqiu Lu et Wang Liu. « Multi-stream Network for Human-object Interaction Detection ». International Journal of Pattern Recognition and Artificial Intelligence 35, no 08 (12 mars 2021) : 2150025. http://dx.doi.org/10.1142/s0218001421500257.

Texte intégral
Résumé :
Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.
Styles APA, Harvard, Vancouver, ISO, etc.
6

Gall, J., A. Yao, N. Razavi, L. Van Gool et V. Lempitsky. « Hough Forests for Object Detection, Tracking, and Action Recognition ». IEEE Transactions on Pattern Analysis and Machine Intelligence 33, no 11 (novembre 2011) : 2188–202. http://dx.doi.org/10.1109/tpami.2011.70.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
7

Hoshino, Satoshi, et Kyohei Niimura. « Optical Flow for Real-Time Human Detection and Action Recognition Based on CNN Classifiers ». Journal of Advanced Computational Intelligence and Intelligent Informatics 23, no 4 (20 juillet 2019) : 735–42. http://dx.doi.org/10.20965/jaciii.2019.p0735.

Texte intégral
Résumé :
Mobile robots equipped with camera sensors are required to perceive surrounding humans and their actions for safe and autonomous navigation. In this work, moving humans are the target objects. For robot vision, real-time performance is an important requirement. Therefore, we propose a robot vision system in which the original images captured by a camera sensor are described by optical flow. These images are then used as inputs to a classifier. For classifying images into human and not-human classifications, and the actions, we use a convolutional neural network (CNN), rather than coding invariant features. Moreover, we present a local search window as a novel detector for clipping partial images around target objects in an original image. Through the experiments, we ultimately show that the robot vision system is able to detect moving humans and recognize action in real time.
Styles APA, Harvard, Vancouver, ISO, etc.
8

Sumathi, J. k. « Dynamic Image Forensics and Forgery Analytics using Open Computer Vision Framework ». Wasit Journal of Computer and Mathematics Science 1, no 1 (17 mars 2021) : 1–8. http://dx.doi.org/10.31185/wjcm.vol1.iss1.3.

Texte intégral
Résumé :
The key advances in Computer Vision and Optical Image Processing are the emerging technologies nowadays in diverse fields including Facial Recognition, Biometric Verifications, Internet of Things (IoT), Criminal Investigation, Signature Identification in banking and several others. Thus, these applications use image and live video processing for facilitating different applications for analyzing and forecasting." Computer vision is used in tons of activities such as monitoring, face recognition, motion recognition, object detection, among many others. The development of social networking platforms such as Facebook and Instagram led to an increase in the volume of image data that was being generated. Use of image and video processing software is a major concern for Facebook because the photos and videos that people post to the social network are doctored images. These kind of images are frequently cited as fake and used in malevolent ways such as motivating violence and death. You need to authenticate the questionable images before take action. It is very hard to ensure photo authenticity due to the power of photo manipulations. Image formation can be determined by image forensic techniques. The technique of image duplication is used to conceal missing areas.
Styles APA, Harvard, Vancouver, ISO, etc.
9

Zeng, Wei, Junjian Huang, Wei Zhang, Hai Nan et Zhenjiang Fu. « SlowFast Action Recognition Algorithm Based on Faster and More Accurate Detectors ». Electronics 11, no 22 (16 novembre 2022) : 3770. http://dx.doi.org/10.3390/electronics11223770.

Texte intégral
Résumé :
Object detection algorithms play a crucial role in other vision tasks. This paper finds that the action recognition algorithm SlowFast’s detection algorithm FasterRCNN (Region Convolutional Neural Network) has disadvantages in terms of both detection accuracy and speed and the traditional IOU (Intersection over Union) localization loss is difficult to make the detection model converge to the minimum stability point. To solve the above problems, the article uses YOLOv3 (You Only Look Once), YOLOX, and CascadeRCNN to improve the detection accuracy and speed of the SlowFast. This paper proposes a new localization loss function that adopts the Lance and Williams distance as a new penalty term. The new loss function is more sensitive when the distance difference is smaller, and this property is very suitable for the late convergence of the detection model. The experiments were conducted on the VOC (Visual Object Classes) dataset and the COCO dataset. In the final videos test, YOLOv3 improved the detection speed by 10.5 s. CascadeRCNN improved by 3.1%AP compared to FasterRCNN in the COCO dataset. YOLOX’s performance on the COCO dataset is also mostly better than that of FasterRCNN. The new LIOU (Lance and Williams Distance Intersection over Union) localization loss function performs better than other loss functions in the VOC dataset. It can be seen that improving the detection algorithm of the SlowFast seems to be crucial and the proposed loss function is indeed effective.
Styles APA, Harvard, Vancouver, ISO, etc.
10

Prahara, Adhi, Murinto Murinto et Dewi Pramudi Ismi. « Bottom-up visual attention model for still image : a preliminary study ». International Journal of Advances in Intelligent Informatics 6, no 1 (31 mars 2020) : 82. http://dx.doi.org/10.26555/ijain.v6i1.469.

Texte intégral
Résumé :
The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.
Styles APA, Harvard, Vancouver, ISO, etc.
11

Abduljabbar Ali, Mohammed, Abir Jaafar Hussain et Ahmed T. Sadiq. « Deep Learning Algorithms for Human Fighting Action Recognition ». International Journal of Online and Biomedical Engineering (iJOE) 18, no 02 (16 février 2022) : 71–87. http://dx.doi.org/10.3991/ijoe.v18i02.28019.

Texte intégral
Résumé :
— Human action recognition using skeletons has been employed in various applications, including healthcare robots, human-computer interaction, and surveillance systems. Recently, deep learning systems have been used in various applications, such as object classification. In contrast to conventional techniques, one of the most prominent convolutional neural network deep learning algorithms extracts image features from its operations. Machine learning in computer vision applications faces many challenges, including human action recognition in real time. Despite significant improvements, videos are typically shot with at least 24 frames per second, meaning that the fastest classification technologies take time. Object detection algorithms must correctly identify and locate essential items, but they must also be speedy at prediction time to meet the real-time requirements of video processing. The fundamental goal of this research paper is to recognize the real-time state of human fighting to provide security in organizations by discovering and identifying problems through video surveillance. First, the images in the videos are investigated to locate human fight scenes using the YOLOv3 algorithm, which has been updated in this work. Our improvements to the YOLOv3 algorithm allowed us to accelerate the exploration of a group of humans in the images. The center locator feature in this algorithm was adopted as an essential indicator for measuring the safety distance between two persons. If it is less than a specific value specified in the code, they are tracked. Then, a deep sorting algorithm is used to track people. This framework is filtered to process and classify whether these two people continue to exceed the programmatically defined minimum safety distance. Finally, the content of the filter frame is categorized as combat scenes using the OpenPose technology and a trained VGG-16 algorithm, which classifies the situation as walking, hugging, or fighting. A dataset was created to train these algorithms in the three categories of walking, hugging, and fighting. The proposed methodology proved successful, exhibiting a classification accuracy for walking, hugging, and fighting of 95.0%, 87.4%, and 90.1%, respectively.
Styles APA, Harvard, Vancouver, ISO, etc.
12

Wu, Youfu, Jun Shen et Mo Dai. « Traffic object detections and its action analysis ». Pattern Recognition Letters 26, no 13 (octobre 2005) : 1963–84. http://dx.doi.org/10.1016/j.patrec.2005.02.009.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
13

HADI , NAMIR MOHAMED. « IDENTIFICATION ALGORITHM FACES AND CRIMINAL ACTIONS ». Computational Nanotechnology 9, no 3 (28 septembre 2022) : 19–31. http://dx.doi.org/10.33693/2313-223x-2022-9-3-19-31.

Texte intégral
Résumé :
Currently, there are a number of unresolved problems in the identification of images. If a person is wearing something on their face, such as a mask or glasses, or at some point part of the face is covered by clothing, hair or an object, then the video surveillance system may lose sight of the person. Identification deteriorates significantly, and recognition of a person occurs only after some time. The purpose of this work is to improve the existing methods of recognition. The paper proposes an algorithm based on the multi-cascade method and the object detection method. This algorithm is able to identify a person by the actions of a criminal nature and by the face by highlighting some parts of the face in the form of squares and rectangles using the computer vision library. As a result of testing, the algorithm showed high detection accuracy using a GPU with 16 GB of video memory.
Styles APA, Harvard, Vancouver, ISO, etc.
14

Ergun, Hilal, Yusuf Caglar Akyuz, Mustafa Sert et Jianquan Liu. « Early and Late Level Fusion of Deep Convolutional Neural Networks for Visual Concept Recognition ». International Journal of Semantic Computing 10, no 03 (septembre 2016) : 379–97. http://dx.doi.org/10.1142/s1793351x16400158.

Texte intégral
Résumé :
Visual concept recognition is an active research field in the last decade. Related to this attention, deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition in videos. In this study, we investigate various aspects of convolutional neural networks for visual concept recognition. We analyze recent studies and different network architectures both in terms of running time and accuracy. In our proposed visual concept recognition system, we first discuss various important properties of popular convolutional network architecture under consideration. Then we describe our method for feature extraction at different levels of abstraction. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the-art results on benchmark datasets while keeping computational costs at low level. Our results show that these state-of-the-art results can be reached without using extensive data augmentation techniques.
Styles APA, Harvard, Vancouver, ISO, etc.
15

Patil, Ninad, et Vanita Agarwal. « Performance Simulation of a Traffic Sign Recognition based Neural Network on Cadence’s Tensilica Vision P6 DSP using Xtensa Xplorer IDE ». WSEAS TRANSACTIONS ON COMPUTER RESEARCH 10 (24 mars 2022) : 35–42. http://dx.doi.org/10.37394/232018.2022.10.5.

Texte intégral
Résumé :
Advanced Driver Assistance System (ADAS) technology is currently in an embryonic stage. Many multinational tech companies and startups are developing a truly autonomous vehicle that will guarantee the safety and security of the passengers and other vehicles, pedestrians on roads, and roadside structures such as traffic signal poles, traffic signposts, and other structures. However, these autonomous vehicles have not been implemented on a large scale for regular use on roads currently. These autonomous vehicles perform many different object detection/recognition tasks. Examples include traffic sign recognition, lane detection, pedestrian detection. Usually, the person driving the vehicle performs these detection/recognition tasks. The main goal of such autonomous systems should be to perform these tasks in real-time. Deep learning performs these object recognition tasks with very high accuracy. The neural network is implemented on the hardware device, which does all the computation work. Different vendors have many different hardware choices that suit the client's needs. Usually, these neural networks are implemented on a CPU, DSP, GPU, FPGA, and other custom-made AI-specific hardware. The underlying processor forms a vital part of an ADAS. The CNN needs to process the incoming frames from a camera for real-time object detection/recognition tasks. Real-time processing is necessary to take appropriate actions/decisions depending on the logic embedded. Hence knowing the performance of the neural network (in terms of frames processed per second) on the underlying hardware is a significant factor in deciding the various hardware options available from different vendors, which CNN model to implement, whether the CNN model is suitable to implement on the underlying hardware depending upon the system specifications and requirement. In this paper, we trained a CNN using the transfer learning approach to recognize german traffic signs using Nvidia DIGITS web-based software and analyzed the performance of this trained CNN (in terms of frames per second) by simulating the trained CNN on Cadence's Xtensa Xplorer software by selecting Cadence's Tensilica Vision P6 DSP as an underlying processor for inference.
Styles APA, Harvard, Vancouver, ISO, etc.
16

Kambala, Vijaya Kumar, et Harikiran Jonnadula. « A multi-task learning based hybrid prediction algorithm for privacy preserving human activity recognition framework ». Bulletin of Electrical Engineering and Informatics 10, no 6 (1 décembre 2021) : 3191–201. http://dx.doi.org/10.11591/eei.v10i6.3204.

Texte intégral
Résumé :
There is ever increasing need to use computer vision devices to capture videos as part of many real-world applications. However, invading privacy of people is the cause of concern. There is need for protecting privacy of people while videos are used purposefully based on objective functions. One such use case is human activity recognition without disclosing human identity. In this paper, we proposed a multi-task learning based hybrid prediction algorithm (MTL-HPA) towards realising privacy preserving human activity recognition framework (PPHARF). It serves the purpose by recognizing human activities from videos while preserving identity of humans present in the multimedia object. Face of any person in the video is anonymized to preserve privacy while the actions of the person are exposed to get them extracted. Without losing utility of human activity recognition, anonymization is achieved. Humans and face detection methods file to reveal identity of the persons in video. We experimentally confirm with joint-annotated human motion data base (JHMDB) and daily action localization in YouTube (DALY) datasets that the framework recognises human activities and ensures non-disclosure of privacy information. Our approach is better than many traditional anonymization techniques such as noise adding, blurring, and masking.
Styles APA, Harvard, Vancouver, ISO, etc.
17

Quinn, Evan, et Niall Corcoran. « Automation of Computer Vision Applications for Real-time Combat Sports Video Analysis ». European Conference on the Impact of Artificial Intelligence and Robotics 4, no 1 (17 novembre 2022) : 162–71. http://dx.doi.org/10.34190/eciair.4.1.930.

Texte intégral
Résumé :
This study examines the potential applications of Human Action Recognition (HAR) in combat sports and aims to develop a prototype automation client that examines a video of a combat sports competition or training session and accurately classifies human movements. Computer Vision (CV) architectures that examine real-time video data streams are being investigated by integrating Deep Learning architectures into client-server systems for data storage and analysis using customised algorithms. The development of the automation client for training and deploying CV robots to watch and track specific chains of human actions is a central component of the project. Categorising specific chains of human actions allows for the comparison of multiple athletes' techniques as well as the identification of potential areas for improvement based on posture, accuracy, and other technical details, which can be used as an aid to improve athlete efficiency. The automation client will also be developed for the purpose of scoring, with a focus on the automation of the CV model to analyse and score a competition using a specific ruleset. The model will be validated by comparing performance and accuracy to that of combat sports experts. The primary research domains are CV, automation, robotics, combat sports, and decision science. Decision science is a set of quantitative techniques used to assist people to make decisions. The creation of a new automation client may contribute to the development of more efficient machine learning and CV applications in areas such as process efficiency, which improves user experience, workload management to reduce wait times, and run-time optimisation. This study found that real-time object detection and tracking can be combined with real-time pose estimation to generate performance statistics from a combat sports athlete's movements in a video.
Styles APA, Harvard, Vancouver, ISO, etc.
18

Fiedler, Marc-André, Philipp Werner, Aly Khalifa et Ayoub Al-Hamadi. « SFPD : Simultaneous Face and Person Detection in Real-Time for Human–Robot Interaction ». Sensors 21, no 17 (2 septembre 2021) : 5918. http://dx.doi.org/10.3390/s21175918.

Texte intégral
Résumé :
Face and person detection are important tasks in computer vision, as they represent the first component in many recognition systems, such as face recognition, facial expression analysis, body pose estimation, face attribute detection, or human action recognition. Thereby, their detection rate and runtime are crucial for the performance of the overall system. In this paper, we combine both face and person detection in one framework with the goal of reaching a detection performance that is competitive to the state of the art of lightweight object-specific networks while maintaining real-time processing speed for both detection tasks together. In order to combine face and person detection in one network, we applied multi-task learning. The difficulty lies in the fact that no datasets are available that contain both face as well as person annotations. Since we did not have the resources to manually annotate the datasets, as it is very time-consuming and automatic generation of ground truths results in annotations of poor quality, we solve this issue algorithmically by applying a special training procedure and network architecture without the need of creating new labels. Our newly developed method called Simultaneous Face and Person Detection (SFPD) is able to detect persons and faces with 40 frames per second. Because of this good trade-off between detection performance and inference time, SFPD represents a useful and valuable real-time framework especially for a multitude of real-world applications such as, e.g., human–robot interaction.
Styles APA, Harvard, Vancouver, ISO, etc.
19

Maraghi, Vali Ollah, et Karim Faez. « Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning ». Computational Intelligence and Neuroscience 2021 (9 juin 2021) : 1–15. http://dx.doi.org/10.1155/2021/9922697.

Texte intégral
Résumé :
Recognition of human activities is an essential field in computer vision. The most human activity consists of the interaction between humans and objects. Many successful works have been done on human-object interaction (HOI) recognition and achieved acceptable results in recent years. Still, they are fully supervised and need to train labeled data for all HOIs. Due to the enormous space of human-object interactions, listing and providing the training data for all possible categories is costly and impractical. We propose an approach for scaling human-object interaction recognition in video data through the zero-shot learning technique to solve this problem. Our method recognizes a verb and an object from the video and makes an HOI class. Recognition of the verbs and objects instead of HOIs allows identifying a new combination of verbs and objects. So, a new HOI class can be identified, which is not seen by the recognizer system. We introduce a neural network architecture that can understand and represent the video data. The proposed system learns verbs and objects from available training data at the training phase and can identify the verb-object pairs in a video at test time. So, the system can identify the HOI class with different combinations of objects and verbs. Also, we propose to use lateral information for combining the verbs and the objects to make valid verb-object pairs. It helps to prevent the detection of rare and probably wrong HOIs. The lateral information comes from word embedding techniques. Furthermore, we propose a new feature aggregation method for aggregating extracted high-level features from video frames before feeding them to the classifier. We illustrate that this feature aggregation method is more effective for actions that include multiple subactions. We evaluated our system by recently introduced Charades challengeable dataset, which has lots of HOI categories in videos. We show that our proposed system can detect unseen HOI classes in addition to the acceptable recognition of seen types. Therefore, the number of classes identifiable by the system is greater than the number of classes used for training.
Styles APA, Harvard, Vancouver, ISO, etc.
20

Zhao, XianPin. « Research on Athlete Behavior Recognition Technology in Sports Teaching Video Based on Deep Neural Network ». Computational Intelligence and Neuroscience 2022 (5 janvier 2022) : 1–13. http://dx.doi.org/10.1155/2022/7260894.

Texte intégral
Résumé :
In recent years, due to the simple design idea and good recognition effect, deep learning method has attracted more and more researchers’ attention in computer vision tasks. Aiming at the problem of athlete behavior recognition in mass sports teaching video, this paper takes depth video as the research object and cuts the frame sequence as the input of depth neural network model, inspired by the successful application of depth neural network based on two-dimensional convolution in image detection and recognition. A depth neural network based on three-dimensional convolution is constructed to automatically learn the temporal and spatial characteristics of athletes’ behavior. The training results on UTKinect-Action3D and MSR-Action3D public datasets show that the algorithm can correctly detect athletes’ behaviors and actions and show stronger recognition ability to the algorithm compared with the images without clipping frames, which effectively improves the recognition effect of physical education teaching videos.
Styles APA, Harvard, Vancouver, ISO, etc.
21

VELOSO, MANUELA, NICHOLAS ARMSTRONG-CREWS, SONIA CHERNOVA, ELISABETH CRAWFORD, COLIN MCMILLEN, MAAYAN ROTH, DOUGLAS VAIL et STEFAN ZICKLER. « A TEAM OF HUMANOID GAME COMMENTATORS ». International Journal of Humanoid Robotics 05, no 03 (septembre 2008) : 457–80. http://dx.doi.org/10.1142/s0219843608001479.

Texte intégral
Résumé :
We present a team of two humanoid robot commentators for AIBO robot soccer games. The two humanoids stand by the side lines of the playing field, autonomously observe the game, wirelessly listen to a "game controller" computer, recognize events, and select announcing actions that may require coordination with each other. Given the large degree of uncertainty and dynamics of the robot soccer games, we further introduce a "Puppet Master" control that allows humans to intervene, prompting the robots to commentate an event if previously undefined or undetected. The robots recognize events based on input from these three sources, namely own and shared vision, game controller, and occasional Puppet Master. We present the two-humanoid behavioral architecture and the vision-based event recognition, including a SIFT-based vision processing algorithm that allows for the detection of multiple similar objects, such as the identical shaped robot players. We introduce the commentating algorithm that probabilistically selects a commentating action from a set of weighted actions corresponding to a detected event. The probabilistic selection uses the game history and updates the action weights to effectively avoid repetition of comments to enable entertainment. Our work, corresponding to a fully implemented system, CMCast, with two QRIO robots, contributes a team of two humanoids fully executing a challenging observation, modeling, coordination, and reporting task.
Styles APA, Harvard, Vancouver, ISO, etc.
22

Jaiswal, Ashish, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee et Fillia Makedon. « A Survey on Contrastive Self-Supervised Learning ». Technologies 9, no 1 (28 décembre 2020) : 2. http://dx.doi.org/10.3390/technologies9010002.

Texte intégral
Résumé :
Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.
Styles APA, Harvard, Vancouver, ISO, etc.
23

Zhao, Qi, Boxue Zhang, Shuchang Lyu, Hong Zhang, Daniel Sun, Guoqiang Li et Wenquan Feng. « A CNN-SIFT Hybrid Pedestrian Navigation Method Based on First-Person Vision ». Remote Sensing 10, no 8 (5 août 2018) : 1229. http://dx.doi.org/10.3390/rs10081229.

Texte intégral
Résumé :
The emergence of new wearable technologies, such as action cameras and smart glasses, has driven the use of the first-person perspective in computer applications. This field is now attracting the attention and investment of researchers aiming to develop methods to process first-person vision (FPV) video. The current approaches present particular combinations of different image features and quantitative methods to accomplish specific objectives, such as object detection, activity recognition, user–machine interaction, etc. FPV-based navigation is necessary in some special areas, where Global Position System (GPS) or other radio-wave strength methods are blocked, and is especially helpful for visually impaired people. In this paper, we propose a hybrid structure with a convolutional neural network (CNN) and local image features to achieve FPV pedestrian navigation. A novel end-to-end trainable global pooling operator, called AlphaMEX, has been designed to improve the scene classification accuracy of CNNs. A scale-invariant feature transform (SIFT)-based tracking algorithm is employed for movement estimation and trajectory tracking of the person through each frame of FPV images. Experimental results demonstrate the effectiveness of the proposed method. The top-1 error rate of the proposed AlphaMEX-ResNet outperforms the original ResNet (k = 12) by 1.7% on the ImageNet dataset. The CNN-SIFT hybrid pedestrian navigation system reaches 0.57 m average absolute error, which is an adequate accuracy for pedestrian navigation. Both positions and movements can be well estimated by the proposed pedestrian navigation algorithm with a single wearable camera.
Styles APA, Harvard, Vancouver, ISO, etc.
24

Mauri, Antoine, Redouane Khemmar, Benoit Decoux, Madjid Haddad et Rémi Boutteau. « Real-Time 3D Multi-Object Detection and Localization Based on Deep Learning for Road and Railway Smart Mobility ». Journal of Imaging 7, no 8 (12 août 2021) : 145. http://dx.doi.org/10.3390/jimaging7080145.

Texte intégral
Résumé :
For smart mobility, autonomous vehicles, and advanced driver-assistance systems (ADASs), perception of the environment is an important task in scene analysis and understanding. Better perception of the environment allows for enhanced decision making, which, in turn, enables very high-precision actions. To this end, we introduce in this work a new real-time deep learning approach for 3D multi-object detection for smart mobility not only on roads, but also on railways. To obtain the 3D bounding boxes of the objects, we modified a proven real-time 2D detector, YOLOv3, to predict 3D object localization, object dimensions, and object orientation. Our method has been evaluated on KITTI’s road dataset as well as on our own hybrid virtual road/rail dataset acquired from the video game Grand Theft Auto (GTA) V. The evaluation of our method on these two datasets shows good accuracy, but more importantly that it can be used in real-time conditions, in road and rail traffic environments. Through our experimental results, we also show the importance of the accuracy of prediction of the regions of interest (RoIs) used in the estimation of 3D bounding box parameters.
Styles APA, Harvard, Vancouver, ISO, etc.
25

Guo, Yongping, Ying Chen, Jianzhi Deng, Shuiwang Li et Hui Zhou. « Identity-Preserved Human Posture Detection in Infrared Thermal Images : A Benchmark ». Sensors 23, no 1 (22 décembre 2022) : 92. http://dx.doi.org/10.3390/s23010092.

Texte intégral
Résumé :
Human pose estimation has a variety of real-life applications, including human action recognition, AI-powered personal trainers, robotics, motion capture and augmented reality, gaming, and video surveillance. However, most current human pose estimation systems are based on RGB images, which do not seriously take into account personal privacy. Although identity-preserved algorithms are very desirable when human pose estimation is applied to scenarios where personal privacy does matter, developing human pose estimation algorithms based on identity-preserved modalities, such as thermal images concerned here, is very challenging due to the limited amount of training data currently available and the fact that infrared thermal images, unlike RGB images, lack rich texture cues which makes annotating training data itself impractical. In this paper, we formulate a new task with privacy protection that lies between human detection and human pose estimation by introducing a benchmark for IPHPDT (i.e., Identity-Preserved Human Posture Detection in Thermal images). This task has a threefold novel purpose: the first is to establish an identity-preserved task with thermal images; the second is to achieve more information other than the location of persons as provided by human detection for more advanced computer vision applications; the third is to avoid difficulties in collecting well-annotated data for human pose estimation in thermal images. The presented IPHPDT dataset contains four types of human postures, consisting of 75,000 images well-annotated with axis-aligned bounding boxes and postures of the persons. Based on this well-annotated IPHPDT dataset and three state-of-the-art algorithms, i.e., YOLOF (short for You Only Look One-level Feature), YOLOX (short for Exceeding YOLO Series in 2021) and TOOD (short for Task-aligned One-stage Object Detection), we establish three baseline detectors, called IPH-YOLOF, IPH-YOLOX, and IPH-TOOD. In the experiments, three baseline detectors are used to recognize four infrared human postures, and the mean average precision can reach 70.4%. The results show that the three baseline detectors can effectively perform accurate posture detection on the IPHPDT dataset. By releasing IPHPDT, we expect to encourage more future studies into human posture detection in infrared thermal images and draw more attention to this challenging task.
Styles APA, Harvard, Vancouver, ISO, etc.
26

Rezaei, Mahdi, et Mohsen Azarmi. « DeepSOCIAL : Social Distancing Monitoring and Infection Risk Assessment in COVID-19 Pandemic ». Applied Sciences 10, no 21 (26 octobre 2020) : 7514. http://dx.doi.org/10.3390/app10217514.

Texte intégral
Résumé :
Social distancing is a recommended solution by the World Health Organisation (WHO) to minimise the spread of COVID-19 in public places. The majority of governments and national health authorities have set the 2-m physical distancing as a mandatory safety measure in shopping centres, schools and other covered areas. In this research, we develop a hybrid Computer Vision and YOLOv4-based Deep Neural Network (DNN) model for automated people detection in the crowd in indoor and outdoor environments using common CCTV security cameras. The proposed DNN model in combination with an adapted inverse perspective mapping (IPM) technique and SORT tracking algorithm leads to a robust people detection and social distancing monitoring. The model has been trained against two most comprehensive datasets by the time of the research—the Microsoft Common Objects in Context (MS COCO) and Google Open Image datasets. The system has been evaluated against the Oxford Town Centre dataset (including 150,000 instances of people detection) with superior performance compared to three state-of-the-art methods. The evaluation has been conducted in challenging conditions, including occlusion, partial visibility, and under lighting variations with the mean average precision of 99.8% and the real-time speed of 24.1 fps. We also provide an online infection risk assessment scheme by statistical analysis of the spatio-temporal data from people’s moving trajectories and the rate of social distancing violations. We identify high-risk zones with the highest possibility of virus spread and infection. This may help authorities to redesign the layout of a public place or to take precaution actions to mitigate high-risk zones. The developed model is a generic and accurate people detection and tracking solution that can be applied in many other fields such as autonomous vehicles, human action recognition, anomaly detection, sports, crowd analysis, or any other research areas where the human detection is in the centre of attention.
Styles APA, Harvard, Vancouver, ISO, etc.
27

Zheng, Zepei. « Human Gesture Recognition in Computer Vision Research ». SHS Web of Conferences 144 (2022) : 03011. http://dx.doi.org/10.1051/shsconf/202214403011.

Texte intégral
Résumé :
Human gesture recognition is a popular issue in the studies of computer vision, since it provides technological expertise required to advance the interaction between people and computers, virtual environments, smart surveillance, motion tracking, as well as other domains. Extraction of the human skeleton is a rather typical gesture recognition approach using existing technologies based on two-dimensional human gesture detection. Likewise, I t cannot be overlooked that objects in the surrounding environment give some information about human gestures. To semantically recognize the posture of the human body, the logic system presented in this research integrates the components recognized in the visual environment alongside the human skeletal position. In principle, it can improve the precision of recognizing postures and semantically represent peoples’ actions. As such, the paper suggests a potential and notion for recognizing human gestures, as well as increasing the quantity of information offered through analysis of images to enhance interaction between humans and computers.
Styles APA, Harvard, Vancouver, ISO, etc.
28

Bello, R. W., A. S. A. Mohamed, A. Z. Talib, D. A. Olubummo et O. C. Enuma. « Computer vision-based techniques for cow object recognition ». IOP Conference Series : Earth and Environmental Science 858, no 1 (1 septembre 2021) : 012008. http://dx.doi.org/10.1088/1755-1315/858/1/012008.

Texte intégral
Résumé :
Abstract The productivity of livestock farming depends on the welfare of the livestock. This can be achieved by physically and constantly monitoring their behaviors and activities by human experts. However, the degree of having high accuracy and consistency with manual monitoring in a commercial farm is herculean, and in most cases impractical. Hence, there is a need for a method that can overcome the challenges. Proposed in this paper, therefore, is the cow detection and monitoring method using computer vision techniques. The proposed method is capable of tracking and identifying cow objects in video experiments, thereby actualizing precision livestock farming. The method generates reasonable results when compared to other methods.
Styles APA, Harvard, Vancouver, ISO, etc.
29

Liu, Haitao, Yuge Li et Dongchang Liu. « Object detection and recognition system based on computer vision analysis ». Journal of Physics : Conference Series 1976, no 1 (1 juillet 2021) : 012024. http://dx.doi.org/10.1088/1742-6596/1976/1/012024.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
30

Hoshino, Satoshi, et Kyohei Niimura. « Robot Vision System for Human Detection and Action Recognition ». Journal of Advanced Computational Intelligence and Intelligent Informatics 24, no 3 (20 mai 2020) : 346–56. http://dx.doi.org/10.20965/jaciii.2020.p0346.

Texte intégral
Résumé :
Mobile robots equipped with camera sensors are required to perceive humans and their actions for safe autonomous navigation. For simultaneous human detection and action recognition, the real-time performance of the robot vision is an important issue. In this paper, we propose a robot vision system in which original images captured by a camera sensor are described by the optical flow. These images are then used as inputs for the human and action classifications. For the image inputs, two classifiers based on convolutional neural networks are developed. Moreover, we describe a novel detector (a local search window) for clipping partial images around the target human from the original image. Since the camera sensor moves together with the robot, the camera movement has an influence on the calculation of optical flow in the image, which we address by further modifying the optical flow for changes caused by the camera movement. Through the experiments, we show that the robot vision system can detect humans and recognize the action in real time. Furthermore, we show that a moving robot can achieve human detection and action recognition by modifying the optical flow.
Styles APA, Harvard, Vancouver, ISO, etc.
31

Suyadnya, I. Made Arsa, et Duman Care Khrisne. « Residual Neural Network Model for Detecting Waste Disposing Action in Images ». Journal of Electrical, Electronics and Informatics 5, no 2 (27 septembre 2021) : 52. http://dx.doi.org/10.24843/jeei.2021.v05.i02.p03.

Texte intégral
Résumé :
Waste in general has become a major problem for people around the world. Evidence internationally shows that everyone, or nearly everyone, admits to polluting at some point, with the majority of people littering at least occasionally. This research wants to overcome these problems, by utilizing computer vision and deep learning approaches. This research was conducted to detect the actions carried out by humans in the activities/actions of disposing of waste in an image. This is useful to provide better information for research on better waste disposal behavior than before. We use a Convolutional Neural Network model with a Residual Neural Network architecture to detect the types of activities that objects perform in an image. The result is an artificial neural network model that can label the activities that occur in the input image (scene recognition). This model has been able to carry out the recognition process with an accuracy of 88% with an F1-Score of 0.87.
Styles APA, Harvard, Vancouver, ISO, etc.
32

Jiang, Hairong, Juan P. Wachs et Bradley S. Duerstock. « Integrated vision-based system for efficient, semi-automated control of a robotic manipulator ». International Journal of Intelligent Computing and Cybernetics 7, no 3 (5 août 2014) : 253–66. http://dx.doi.org/10.1108/ijicc-09-2013-0042.

Texte intégral
Résumé :
Purpose – The purpose of this paper is to develop an integrated, computer vision-based system to operate a commercial wheelchair-mounted robotic manipulator (WMRM). In addition, a gesture recognition interface system was developed specially for individuals with upper-level spinal cord injuries including object tracking and face recognition to function as an efficient, hands-free WMRM controller. Design/methodology/approach – Two Kinect® cameras were used synergistically to perform a variety of simple object retrieval tasks. One camera was used to interpret the hand gestures and locate the operator's face for object positioning, and then send those as commands to control the WMRM. The other sensor was used to automatically recognize different daily living objects selected by the subjects. An object recognition module employing the Speeded Up Robust Features algorithm was implemented and recognition results were sent as a commands for “coarse positioning” of the robotic arm near the selected object. Automatic face detection was provided as a shortcut enabling the positing of the objects close by the subject's face. Findings – The gesture recognition interface incorporated hand detection, tracking and recognition algorithms, and yielded a recognition accuracy of 97.5 percent for an eight-gesture lexicon. Tasks’ completion time were conducted to compare manual (gestures only) and semi-manual (gestures, automatic face detection, and object recognition) WMRM control modes. The use of automatic face and object detection significantly reduced the completion times for retrieving a variety of daily living objects. Originality/value – Integration of three computer vision modules were used to construct an effective and hand-free interface for individuals with upper-limb mobility impairments to control a WMRM.
Styles APA, Harvard, Vancouver, ISO, etc.
33

Singh, Baljeet, Nitin Kumar, Irshad Ahmed et Karun Yadav. « Real-Time Object Detection Using Deep Learning ». International Journal for Research in Applied Science and Engineering Technology 10, no 5 (31 mai 2022) : 3159–60. http://dx.doi.org/10.22214/ijraset.2022.42820.

Texte intégral
Résumé :
Abstract: The computer vision field known as real-time acquisition is large, dynamic, and complex. Local image process refers to the acquisition of one object in an image, while Objects refers to the acquisition of multiple objects in an image. In digital photos and videos, this sees semantic class objects. Tracking features, video surveilance, pedestrian detection, census, self-driving cars, face recognition, sports tracking, and many other applications used to find real-time object. Convolution Neural Networks is an in-depth study tool for OpenCV (Opensource Computer Vision), a set of basic computer-assisted programming tasks. Computer visualization, in-depth study, and convolutional neural networks are some of the words used in this paper..
Styles APA, Harvard, Vancouver, ISO, etc.
34

West, Geoff A. W. « Assessing Feature Importance in the Context of Object Recognition ». International Journal of Pattern Recognition and Artificial Intelligence 11, no 01 (février 1997) : 49–77. http://dx.doi.org/10.1142/s0218001497000044.

Texte intégral
Résumé :
A popular paradigm in computer vision is based on dividing the vision problem into three stages namely segmentation, feature extraction and recognition. For example edge detection followed by line detection followed by planar object recognition. It can be argued that each of these stages needs to be thoroughly described to enable vision systems to be configured with predictable performance. However an alternative view is that the performance of each stage is not in itself important as long as the overall performance is acceptable. This paper discusses feature performance concentrating on the assessmentof edge-based feature detection and object recognition. Evaluation techniques are discussed for assessing arc and line detection algorithmsand for features in the context of verification and pose refinement strategies. These techniques can then be used for the design and integration of indexing and verification stages of object recognition. A theme of the paper is the need to assess feature extraction in the context of the chosen task.
Styles APA, Harvard, Vancouver, ISO, etc.
35

Wang, Jinding, Haifeng Hu et Xinlong Lu. « ADN for object detection ». IET Computer Vision 14, no 2 (23 janvier 2020) : 65–72. http://dx.doi.org/10.1049/iet-cvi.2018.5651.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
36

Rehman, Amjad, Tanzila Saba, Muhammad Zeeshan Khan, Robertas Damaševičius et Saeed Ali Bahaj. « Internet-of-Things-Based Suspicious Activity Recognition Using Multimodalities of Computer Vision for Smart City Security ». Security and Communication Networks 2022 (5 octobre 2022) : 1–12. http://dx.doi.org/10.1155/2022/8383461.

Texte intégral
Résumé :
Automatic human activity recognition is one of the milestones of smart city surveillance projects. Human activity detection and recognition aim to identify the activities based on the observations that are being performed by the subject. Hence, vision-based human activity recognition systems have a wide scope in video surveillance, health care systems, and human-computer interaction. Currently, the world is moving towards a smart and safe city concept. Automatic human activity recognition is the major challenge of smart city surveillance. The proposed research work employed fine-tuned YOLO-v4 for activity detection, whereas for classification purposes, 3D-CNN has been implemented. Besides the classification, the presented research model also leverages human-object interaction with the help of intersection over union (IOU). An Internet of Things (IoT) based architecture is implemented to take efficient and real-time decisions. The dataset of exploit classes has been taken from the UCF-Crime dataset for activity recognition. At the same time, the dataset extracted from MS-COCO for suspicious object detection is involved in human-object interaction. This research is also applied to human activity detection and recognition in the university premises for real-time suspicious activity detection and automatic alerts. The experiments have exhibited that the proposed multimodal approach achieves remarkable activity detection and recognition accuracy.
Styles APA, Harvard, Vancouver, ISO, etc.
37

Modwel, Garv, Anu Mehra, Nitin Rakesh et K. K. Mishra. « Advanced Object Detection in Bio-Medical X-Ray Images for Anomaly Detection and Recognition ». International Journal of E-Health and Medical Communications 12, no 2 (juillet 2021) : 93–110. http://dx.doi.org/10.4018/ijehmc.2021030106.

Texte intégral
Résumé :
The human vision system is mimicked in the format of videos and images in the area of computer vision. As humans can process their memories, likewise video and images can be processed and perceptive with the help of computer vision technology. There is a broad range of fields that have great speculation and concepts building in the area of application of computer vision, which includes automobile, biomedical, space research, etc. The case study in this manuscript enlightens one about the innovation and future scope possibilities that can start a new era in the biomedical image-processing sector. A pre-surgical investigation can be perused with the help of the proposed technology that will enable the doctors to analyses the situations with deeper insight. There are different types of biomedical imaging such as magnetic resonance imaging (MRI), computerized tomographic (CT) scan, x-ray imaging. The focused arena of the proposed research is x-ray imaging in this subset. As it is always error-prone to do an eyeball check for a human when it comes to the detailing. The same applied to doctors. Subsequently, they need different equipment for related technologies. The methodology proposed in this manuscript analyses the details that may be missed by an expert doctor. The input to the algorithm is the image in the format of x-ray imaging; eventually, the output of the process is a label on the corresponding objects in the test image. The tool used in the process also mimics the human brain neuron system. The proposed method uses a convolutional neural network to decide on the labels on the objects for which it interprets the image. After some pre-processing the x-ray images, the neural network receives the input to achieve an efficient performance. The result analysis is done that gives a considerable performance in terms of confusion factor that is represented in terms of percentage. At the end of the narration of the manuscript, future possibilities are being traces out to the limelight to conduct further research.
Styles APA, Harvard, Vancouver, ISO, etc.
38

Dhaigude, Santosh. « Computer Vision Based Virtual Sketch Using Detection ». International Journal for Research in Applied Science and Engineering Technology 10, no 1 (31 janvier 2022) : 264–68. http://dx.doi.org/10.22214/ijraset.2022.39814.

Texte intégral
Résumé :
Abstract: In todays world during this pandemic situation Online Learning is the only source where one could learn. Online learning makes students more curious about the knowledge and so they decide their learning path . But considering the academics as they have to pass the course or exam given, they need to take time to study, and have to be disciplined about their dedication. And there are many barriers for Online learning as well. Students are lowering their grasping power the reason for this is that each and every student was used to rely on their teacher and offline classes. Virtual writing and controlling system is challenging research areas in field of image processing and pattern recognition in the recent years. It contributes extremely to the advancement of an automation process and can improve the interface between man and machine in numerous applications. Several research works have been focusing on new techniques and methods that would reduce the processing time while providing higher recognition accuracy. Given the real time webcam data, this jambord like python application uses OpenCV library to track an object-of-interest (a human palm/finger in this case) and allows the user to draw bymoving the finger, which makes it both awesome and interesting to draw simple thing. Keyword: Detection, Handlandmark , Keypoints, Computer vision, OpenCV
Styles APA, Harvard, Vancouver, ISO, etc.
39

Cahyadi, Septian, Febri Damatraseta et Lodryck Lodefikus S. « Comparative Analysis Of Efficient Image Segmentation Technique For Text Recognition And Human Skin Recognition ». Jurnal Informatika Kesatuan 1, no 1 (13 juillet 2021) : 81–90. http://dx.doi.org/10.37641/jikes.v1i1.775.

Texte intégral
Résumé :
Computer Vision and Pattern Recognition is one of the most interesting research subject on computer science, especially in case of reading or recognition of objects in realtime from the camera device. Object detection has wide range of segments, in this study we will try to find where the better methodologies for detecting a text and human skin. This study aims to develop a computer vision technology that will be used to help people with disabilities, especially illiterate (tuna aksara) and deaf (penyandang tuli) to recognize and learn the letters of the alphabet (A-Z). Based on our research, it is found that the best method and technique used for text recognition is Convolutional Neural Network with achievement accuracy reaches 93%, the next best achievement obtained OCR method, which reached 98% on the reading plate number. And also OCR method are 88% with stable image reading and good lighting conditions as well as the standard font type of a book. Meanwhile, best method and technique to detect human skin is by using Skin Color Segmentation: CIELab color space with accuracy of 96.87%. While the algorithm for classification using Convolutional Neural Network (CNN), the accuracy rate of 98% Key word: Computer Vision, Segmentation, Object Recognition, Text Recognition, Skin Color Detection, Motion Detection, Disability Application
Styles APA, Harvard, Vancouver, ISO, etc.
40

Terada, Kazunori, Takayuki Nakamura, Hideaki Takeda et Tsukasa Ogasawara. « Embodiment-Based Object Recognition for Vision-Based Mobile Agents ». Journal of Robotics and Mechatronics 13, no 1 (20 février 2001) : 88–95. http://dx.doi.org/10.20965/jrm.2001.p0088.

Texte intégral
Résumé :
In this paper, we propose a new architecture for object recognition based on the concept of ""embodiment"" as a primitive function for a cognitive robot. We define the term ""embodiment"" as the extent of the agent itself, locomotive ability, and its sensor. Based on this concept, an object is represented by reaching action paths, which correspond to a set of sequences of movement by the agent for reaching the object. Such behavior is acquired by trial-and-error based on visual and tactile information. Visual information is used to obtain sensorimotor mapping, which represents the relationship between the change of an object's appearance and the movement of the agent. Tactile information is used to evaluate the change of physical condition of the object caused by such movement. By such means, the agent can recognize an object regardless of its position and orientation in the environment. To demonstrate the feasibility of our method, we detail experimental results of computer simulation.
Styles APA, Harvard, Vancouver, ISO, etc.
41

Heikel, Edvard, et Leonardo Espinosa-Leal. « Indoor Scene Recognition via Object Detection and TF-IDF ». Journal of Imaging 8, no 8 (26 juillet 2022) : 209. http://dx.doi.org/10.3390/jimaging8080209.

Texte intégral
Résumé :
Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision and natural language processing (YOLO and TF-IDF, respectively). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on.
Styles APA, Harvard, Vancouver, ISO, etc.
42

Meisels, Amnon, et Ronen Versano. « Token-textured object detection by pyramids ». Image and Vision Computing 10, no 1 (janvier 1992) : 55–62. http://dx.doi.org/10.1016/0262-8856(92)90084-g.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
43

Zhang, Hongming, Wen Gao, Xilin Chen et Debin Zhao. « Object detection using spatial histogram features ». Image and Vision Computing 24, no 4 (avril 2006) : 327–41. http://dx.doi.org/10.1016/j.imavis.2005.11.010.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
44

Laptev, Ivan. « Improving object detection with boosted histograms ». Image and Vision Computing 27, no 5 (avril 2009) : 535–44. http://dx.doi.org/10.1016/j.imavis.2008.08.010.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
45

Meghana, K. S. « Face Sketch Recognition Using Computer Vision ». International Journal for Research in Applied Science and Engineering Technology 9, no VII (25 juillet 2021) : 2005–9. http://dx.doi.org/10.22214/ijraset.2021.36806.

Texte intégral
Résumé :
Now-a-days need for technologies for identification, detection and recognition of suspects has increased. One of the most common biometric techniques is face recognition, since face is the convenient way used by the people to identify each-other. Understanding how humans recognize face sketches drawn by artists is of significant value to both criminal investigators and forensic researchers in Computer Vision. However, studies say that hand-drawn face sketches are still very limited in terms of artists and number of sketches because after any incident a forensic artist prepares a victim’s sketches on behalf of the description provided by an eyewitness. Sometimes suspect uses special mask to hide some common features of faces like nose, eyes, lips, face-color etc. but the outliner features of face biometrics one could never hide. Here we concentrate on some specific facial geometric feature which could be used to calculate some ratio of similarities from the template photograph database against the forensic sketches. The project describes the design of a system for face sketch recognition by a computer vision approach like Discrete Cosine Transform (DCT), Local Binary Pattern Histogram (LBPH) algorithm and a supervised machine learning model called Support Vector Machine (SVM) for face recognition. Tkinter is the standard GUI library for Python. Python when combined with Tkinter provides a fast and easy way to create GUI applications. Tkinter provides a powerful object-oriented interface to the Tk GUI toolkit.
Styles APA, Harvard, Vancouver, ISO, etc.
46

Zhang, Zheng, Cong Huang, Fei Zhong, Bote Qi et Binghong Gao. « Posture Recognition and Behavior Tracking in Swimming Motion Images under Computer Machine Vision ». Complexity 2021 (20 mai 2021) : 1–9. http://dx.doi.org/10.1155/2021/5526831.

Texte intégral
Résumé :
This study is to explore the gesture recognition and behavior tracking in swimming motion images under computer machine vision and to expand the application of moving target detection and tracking algorithms based on computer machine vision in this field. The objectives are realized by moving target detection and tracking, Gaussian mixture model, optimized correlation filtering algorithm, and Camshift tracking algorithm. Firstly, the Gaussian algorithm is introduced into target tracking and detection to reduce the filtering loss and make the acquired motion posture more accurate. Secondly, an improved kernel-related filter tracking algorithm is proposed by training multiple filters, which can clearly and accurately obtain the motion trajectory of the monitored target object. Finally, it is proposed to combine the Kalman algorithm with the Camshift algorithm for optimization, which can complete the tracking and recognition of moving targets. The experimental results show that the target tracking and detection method can obtain the movement form of the template object relatively completely, and the kernel-related filter tracking algorithm can also obtain the movement speed of the target object finely. In addition, the accuracy of Camshift tracking algorithm can reach 86.02%. Results of this study can provide reliable data support and reference for expanding the application of moving target detection and tracking methods.
Styles APA, Harvard, Vancouver, ISO, etc.
47

Jung, Minji, Heekyung Yang et Kyungha Min. « Improving Deep Object Detection Algorithms for Game Scenes ». Electronics 10, no 20 (17 octobre 2021) : 2527. http://dx.doi.org/10.3390/electronics10202527.

Texte intégral
Résumé :
The advancement and popularity of computer games make game scene analysis one of the most interesting research topics in the computer vision society. Among the various computer vision techniques, we employ object detection algorithms for the analysis, since they can both recognize and localize objects in a scene. However, applying the existing object detection algorithms for analyzing game scenes does not guarantee a desired performance, since the algorithms are trained using datasets collected from the real world. In order to achieve a desired performance for analyzing game scenes, we built a dataset by collecting game scenes and retrained the object detection algorithms pre-trained with the datasets from the real world. We selected five object detection algorithms, namely YOLOv3, Faster R-CNN, SSD, FPN and EfficientDet, and eight games from various game genres including first-person shooting, role-playing, sports, and driving. PascalVOC and MS COCO were employed for the pre-training of the object detection algorithms. We proved the improvement in the performance that comes from our strategy in two aspects: recognition and localization. The improvement in recognition performance was measured using mean average precision (mAP) and the improvement in localization using intersection over union (IoU).
Styles APA, Harvard, Vancouver, ISO, etc.
48

Pulla Rao, Chennamsetty, A. Guruva Reddy et C. B. Rama Rao. « Camouflaged object detection for machine vision applications ». International Journal of Speech Technology 23, no 2 (16 mars 2020) : 327–35. http://dx.doi.org/10.1007/s10772-020-09699-7.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
49

Jot Singh, Kiran, Divneet Singh Kapoor, Khushal Thakur, Anshul Sharma et Xiao-Zhi Gao. « Computer-Vision Based Object Detection and Recognition for Service Robot in Indoor Environment ». Computers, Materials & ; Continua 72, no 1 (2022) : 197–213. http://dx.doi.org/10.32604/cmc.2022.022989.

Texte intégral
Styles APA, Harvard, Vancouver, ISO, etc.
50

Hidayat, Rahmat, Hendrick, Riandini, Zhi-Hao Wang et Horng Gwo-Jiun. « Mask RCNN Methods for Eyes Modelling ». International Journal of Data Science 2, no 2 (31 décembre 2021) : 63–68. http://dx.doi.org/10.18517/ijods.2.2.63-68.2021.

Texte intégral
Résumé :
Object detection is one of Deep Learning section in Computer Vision. The application of computer vision is divided into image classification and object detection. Object detection have target to find specific object from an image. The application of object detection for security are face recognition, and face detection. Face detection have been developed for medical application to identify emotion from faces. In this research, we proposed an eye modelling by using Mask RCNN. The eye model was applied in real time detection combined with OpenCV. The dataset was created from online dataset and image from webcam. The model was trained with 4 epochs and 131 iterations. The final model was successfully detected eye from real-time application.
Styles APA, Harvard, Vancouver, ISO, etc.
Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!

Vers la bibliographie