Journal articles on the topic 'Computer vision, object detection, action recognition'

To see the other types of publications on this topic, follow the link: Computer vision, object detection, action recognition.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Computer vision, object detection, action recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, Hong-Bo, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. "A Comprehensive Survey of Vision-Based Human Action Recognition Methods." Sensors 19, no. 5 (February 27, 2019): 1005. http://dx.doi.org/10.3390/s19051005.

Full text
Abstract:
Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action recognition methods using depth data, 3D-skeleton data, still image data, spatiotemporal interest point-based methods, and human walking motion recognition. However, there has been no systematic survey of human action recognition. To this end, we present a thorough review of human action recognition methods and provide a comprehensive overview of recent approaches in human action recognition research, including progress in hand-designed action features in RGB and depth data, current deep learning-based action feature representation methods, advances in human–object interaction recognition methods, and the current prominent research topic of action detection methods. Finally, we present several analysis recommendations for researchers. This survey paper provides an essential reference for those interested in further research on human action recognition.
APA, Harvard, Vancouver, ISO, and other styles
2

Gundu, Sireesha, and Hussain Syed. "Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques." Sensors 23, no. 5 (February 25, 2023): 2569. http://dx.doi.org/10.3390/s23052569.

Full text
Abstract:
Activity recognition in unmanned aerial vehicle (UAV) surveillance is addressed in various computer vision applications such as image retrieval, pose estimation, object detection, object detection in videos, object detection in still images, object detection in video frames, face recognition, and video action recognition. In the UAV-based surveillance technology, video segments captured from aerial vehicles make it challenging to recognize and distinguish human behavior. In this research, to recognize a single and multi-human activity using aerial data, a hybrid model of histogram of oriented gradient (HOG), mask-regional convolutional neural network (Mask-RCNN), and bidirectional long short-term memory (Bi-LSTM) is employed. The HOG algorithm extracts patterns, Mask-RCNN extracts feature maps from the raw aerial image data, and the Bi-LSTM network exploits the temporal relationship between the frames for the underlying action in the scene. This Bi-LSTM network reduces the error rate to the greatest extent due to its bidirectional process. This novel architecture generates enhanced segmentation by utilizing the histogram gradient-based instance segmentation and improves the accuracy of classifying human activities using the Bi-LSTM approach. Experimental outcomes demonstrate that the proposed model outperforms the other state-of-the-art models and has achieved 99.25% accuracy on the YouTube-Aerial dataset.
APA, Harvard, Vancouver, ISO, and other styles
3

Mikhalev, Oleg, and Alexander Yanyushkin. "Machine vision and object recognition using neural networks." Robotics and Technical Cybernetics 10, no. 2 (June 2022): 113–20. http://dx.doi.org/10.31776/rtcj.10204.

Full text
Abstract:
Computer vision is becoming one of the important areas of automation of various human activities. Technical systems today are endowed with the ability to see, and along with the use of neural networks, they are also endowed with the ability to act intelligently. Thus, they are able to see and make the right decisions and actions faster and more accurately than a person. The article discusses the possibility of using machine vision and object recognition technology for industrial automation, describes a convolutional neural network and an object detection algorithm.
APA, Harvard, Vancouver, ISO, and other styles
4

Voulodimos, Athanasios, Nikolaos Doulamis, Anastasios Doulamis, and Eftychios Protopapadakis. "Deep Learning for Computer Vision: A Brief Review." Computational Intelligence and Neuroscience 2018 (2018): 1–13. http://dx.doi.org/10.1155/2018/7068349.

Full text
Abstract:
Over the last years deep learning methods have been shown to outperform previous state-of-the-art machine learning techniques in several fields, with computer vision being one of the most prominent cases. This review paper provides a brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders. A brief account of their history, structure, advantages, and limitations is given, followed by a description of their applications in various computer vision tasks, such as object detection, face recognition, action and activity recognition, and human pose estimation. Finally, a brief overview is given of future directions in designing deep learning schemes for computer vision problems and the challenges involved therein.
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Chang, Jinyu Sun, Shiwei Ma, Yuqiu Lu, and Wang Liu. "Multi-stream Network for Human-object Interaction Detection." International Journal of Pattern Recognition and Artificial Intelligence 35, no. 08 (March 12, 2021): 2150025. http://dx.doi.org/10.1142/s0218001421500257.

Full text
Abstract:
Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
6

Gall, J., A. Yao, N. Razavi, L. Van Gool, and V. Lempitsky. "Hough Forests for Object Detection, Tracking, and Action Recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence 33, no. 11 (November 2011): 2188–202. http://dx.doi.org/10.1109/tpami.2011.70.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hoshino, Satoshi, and Kyohei Niimura. "Optical Flow for Real-Time Human Detection and Action Recognition Based on CNN Classifiers." Journal of Advanced Computational Intelligence and Intelligent Informatics 23, no. 4 (July 20, 2019): 735–42. http://dx.doi.org/10.20965/jaciii.2019.p0735.

Full text
Abstract:
Mobile robots equipped with camera sensors are required to perceive surrounding humans and their actions for safe and autonomous navigation. In this work, moving humans are the target objects. For robot vision, real-time performance is an important requirement. Therefore, we propose a robot vision system in which the original images captured by a camera sensor are described by optical flow. These images are then used as inputs to a classifier. For classifying images into human and not-human classifications, and the actions, we use a convolutional neural network (CNN), rather than coding invariant features. Moreover, we present a local search window as a novel detector for clipping partial images around target objects in an original image. Through the experiments, we ultimately show that the robot vision system is able to detect moving humans and recognize action in real time.
APA, Harvard, Vancouver, ISO, and other styles
8

Sumathi, J. k. "Dynamic Image Forensics and Forgery Analytics using Open Computer Vision Framework." Wasit Journal of Computer and Mathematics Science 1, no. 1 (March 17, 2021): 1–8. http://dx.doi.org/10.31185/wjcm.vol1.iss1.3.

Full text
Abstract:
The key advances in Computer Vision and Optical Image Processing are the emerging technologies nowadays in diverse fields including Facial Recognition, Biometric Verifications, Internet of Things (IoT), Criminal Investigation, Signature Identification in banking and several others. Thus, these applications use image and live video processing for facilitating different applications for analyzing and forecasting." Computer vision is used in tons of activities such as monitoring, face recognition, motion recognition, object detection, among many others. The development of social networking platforms such as Facebook and Instagram led to an increase in the volume of image data that was being generated. Use of image and video processing software is a major concern for Facebook because the photos and videos that people post to the social network are doctored images. These kind of images are frequently cited as fake and used in malevolent ways such as motivating violence and death. You need to authenticate the questionable images before take action. It is very hard to ensure photo authenticity due to the power of photo manipulations. Image formation can be determined by image forensic techniques. The technique of image duplication is used to conceal missing areas.
APA, Harvard, Vancouver, ISO, and other styles
9

Zeng, Wei, Junjian Huang, Wei Zhang, Hai Nan, and Zhenjiang Fu. "SlowFast Action Recognition Algorithm Based on Faster and More Accurate Detectors." Electronics 11, no. 22 (November 16, 2022): 3770. http://dx.doi.org/10.3390/electronics11223770.

Full text
Abstract:
Object detection algorithms play a crucial role in other vision tasks. This paper finds that the action recognition algorithm SlowFast’s detection algorithm FasterRCNN (Region Convolutional Neural Network) has disadvantages in terms of both detection accuracy and speed and the traditional IOU (Intersection over Union) localization loss is difficult to make the detection model converge to the minimum stability point. To solve the above problems, the article uses YOLOv3 (You Only Look Once), YOLOX, and CascadeRCNN to improve the detection accuracy and speed of the SlowFast. This paper proposes a new localization loss function that adopts the Lance and Williams distance as a new penalty term. The new loss function is more sensitive when the distance difference is smaller, and this property is very suitable for the late convergence of the detection model. The experiments were conducted on the VOC (Visual Object Classes) dataset and the COCO dataset. In the final videos test, YOLOv3 improved the detection speed by 10.5 s. CascadeRCNN improved by 3.1%AP compared to FasterRCNN in the COCO dataset. YOLOX’s performance on the COCO dataset is also mostly better than that of FasterRCNN. The new LIOU (Lance and Williams Distance Intersection over Union) localization loss function performs better than other loss functions in the VOC dataset. It can be seen that improving the detection algorithm of the SlowFast seems to be crucial and the proposed loss function is indeed effective.
APA, Harvard, Vancouver, ISO, and other styles
10

Prahara, Adhi, Murinto Murinto, and Dewi Pramudi Ismi. "Bottom-up visual attention model for still image: a preliminary study." International Journal of Advances in Intelligent Informatics 6, no. 1 (March 31, 2020): 82. http://dx.doi.org/10.26555/ijain.v6i1.469.

Full text
Abstract:
The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.
APA, Harvard, Vancouver, ISO, and other styles
11

Abduljabbar Ali, Mohammed, Abir Jaafar Hussain, and Ahmed T. Sadiq. "Deep Learning Algorithms for Human Fighting Action Recognition." International Journal of Online and Biomedical Engineering (iJOE) 18, no. 02 (February 16, 2022): 71–87. http://dx.doi.org/10.3991/ijoe.v18i02.28019.

Full text
Abstract:
— Human action recognition using skeletons has been employed in various applications, including healthcare robots, human-computer interaction, and surveillance systems. Recently, deep learning systems have been used in various applications, such as object classification. In contrast to conventional techniques, one of the most prominent convolutional neural network deep learning algorithms extracts image features from its operations. Machine learning in computer vision applications faces many challenges, including human action recognition in real time. Despite significant improvements, videos are typically shot with at least 24 frames per second, meaning that the fastest classification technologies take time. Object detection algorithms must correctly identify and locate essential items, but they must also be speedy at prediction time to meet the real-time requirements of video processing. The fundamental goal of this research paper is to recognize the real-time state of human fighting to provide security in organizations by discovering and identifying problems through video surveillance. First, the images in the videos are investigated to locate human fight scenes using the YOLOv3 algorithm, which has been updated in this work. Our improvements to the YOLOv3 algorithm allowed us to accelerate the exploration of a group of humans in the images. The center locator feature in this algorithm was adopted as an essential indicator for measuring the safety distance between two persons. If it is less than a specific value specified in the code, they are tracked. Then, a deep sorting algorithm is used to track people. This framework is filtered to process and classify whether these two people continue to exceed the programmatically defined minimum safety distance. Finally, the content of the filter frame is categorized as combat scenes using the OpenPose technology and a trained VGG-16 algorithm, which classifies the situation as walking, hugging, or fighting. A dataset was created to train these algorithms in the three categories of walking, hugging, and fighting. The proposed methodology proved successful, exhibiting a classification accuracy for walking, hugging, and fighting of 95.0%, 87.4%, and 90.1%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
12

Wu, Youfu, Jun Shen, and Mo Dai. "Traffic object detections and its action analysis." Pattern Recognition Letters 26, no. 13 (October 2005): 1963–84. http://dx.doi.org/10.1016/j.patrec.2005.02.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

HADI , NAMIR MOHAMED. "IDENTIFICATION ALGORITHM FACES AND CRIMINAL ACTIONS." Computational Nanotechnology 9, no. 3 (September 28, 2022): 19–31. http://dx.doi.org/10.33693/2313-223x-2022-9-3-19-31.

Full text
Abstract:
Currently, there are a number of unresolved problems in the identification of images. If a person is wearing something on their face, such as a mask or glasses, or at some point part of the face is covered by clothing, hair or an object, then the video surveillance system may lose sight of the person. Identification deteriorates significantly, and recognition of a person occurs only after some time. The purpose of this work is to improve the existing methods of recognition. The paper proposes an algorithm based on the multi-cascade method and the object detection method. This algorithm is able to identify a person by the actions of a criminal nature and by the face by highlighting some parts of the face in the form of squares and rectangles using the computer vision library. As a result of testing, the algorithm showed high detection accuracy using a GPU with 16 GB of video memory.
APA, Harvard, Vancouver, ISO, and other styles
14

Ergun, Hilal, Yusuf Caglar Akyuz, Mustafa Sert, and Jianquan Liu. "Early and Late Level Fusion of Deep Convolutional Neural Networks for Visual Concept Recognition." International Journal of Semantic Computing 10, no. 03 (September 2016): 379–97. http://dx.doi.org/10.1142/s1793351x16400158.

Full text
Abstract:
Visual concept recognition is an active research field in the last decade. Related to this attention, deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition in videos. In this study, we investigate various aspects of convolutional neural networks for visual concept recognition. We analyze recent studies and different network architectures both in terms of running time and accuracy. In our proposed visual concept recognition system, we first discuss various important properties of popular convolutional network architecture under consideration. Then we describe our method for feature extraction at different levels of abstraction. We present extensive empirical information along with best practices for big data practitioners. Using these best practices we propose efficient fusion mechanisms both for single and multiple network models. We present state-of-the-art results on benchmark datasets while keeping computational costs at low level. Our results show that these state-of-the-art results can be reached without using extensive data augmentation techniques.
APA, Harvard, Vancouver, ISO, and other styles
15

Patil, Ninad, and Vanita Agarwal. "Performance Simulation of a Traffic Sign Recognition based Neural Network on Cadence’s Tensilica Vision P6 DSP using Xtensa Xplorer IDE." WSEAS TRANSACTIONS ON COMPUTER RESEARCH 10 (March 24, 2022): 35–42. http://dx.doi.org/10.37394/232018.2022.10.5.

Full text
Abstract:
Advanced Driver Assistance System (ADAS) technology is currently in an embryonic stage. Many multinational tech companies and startups are developing a truly autonomous vehicle that will guarantee the safety and security of the passengers and other vehicles, pedestrians on roads, and roadside structures such as traffic signal poles, traffic signposts, and other structures. However, these autonomous vehicles have not been implemented on a large scale for regular use on roads currently. These autonomous vehicles perform many different object detection/recognition tasks. Examples include traffic sign recognition, lane detection, pedestrian detection. Usually, the person driving the vehicle performs these detection/recognition tasks. The main goal of such autonomous systems should be to perform these tasks in real-time. Deep learning performs these object recognition tasks with very high accuracy. The neural network is implemented on the hardware device, which does all the computation work. Different vendors have many different hardware choices that suit the client's needs. Usually, these neural networks are implemented on a CPU, DSP, GPU, FPGA, and other custom-made AI-specific hardware. The underlying processor forms a vital part of an ADAS. The CNN needs to process the incoming frames from a camera for real-time object detection/recognition tasks. Real-time processing is necessary to take appropriate actions/decisions depending on the logic embedded. Hence knowing the performance of the neural network (in terms of frames processed per second) on the underlying hardware is a significant factor in deciding the various hardware options available from different vendors, which CNN model to implement, whether the CNN model is suitable to implement on the underlying hardware depending upon the system specifications and requirement. In this paper, we trained a CNN using the transfer learning approach to recognize german traffic signs using Nvidia DIGITS web-based software and analyzed the performance of this trained CNN (in terms of frames per second) by simulating the trained CNN on Cadence's Xtensa Xplorer software by selecting Cadence's Tensilica Vision P6 DSP as an underlying processor for inference.
APA, Harvard, Vancouver, ISO, and other styles
16

Kambala, Vijaya Kumar, and Harikiran Jonnadula. "A multi-task learning based hybrid prediction algorithm for privacy preserving human activity recognition framework." Bulletin of Electrical Engineering and Informatics 10, no. 6 (December 1, 2021): 3191–201. http://dx.doi.org/10.11591/eei.v10i6.3204.

Full text
Abstract:
There is ever increasing need to use computer vision devices to capture videos as part of many real-world applications. However, invading privacy of people is the cause of concern. There is need for protecting privacy of people while videos are used purposefully based on objective functions. One such use case is human activity recognition without disclosing human identity. In this paper, we proposed a multi-task learning based hybrid prediction algorithm (MTL-HPA) towards realising privacy preserving human activity recognition framework (PPHARF). It serves the purpose by recognizing human activities from videos while preserving identity of humans present in the multimedia object. Face of any person in the video is anonymized to preserve privacy while the actions of the person are exposed to get them extracted. Without losing utility of human activity recognition, anonymization is achieved. Humans and face detection methods file to reveal identity of the persons in video. We experimentally confirm with joint-annotated human motion data base (JHMDB) and daily action localization in YouTube (DALY) datasets that the framework recognises human activities and ensures non-disclosure of privacy information. Our approach is better than many traditional anonymization techniques such as noise adding, blurring, and masking.
APA, Harvard, Vancouver, ISO, and other styles
17

Quinn, Evan, and Niall Corcoran. "Automation of Computer Vision Applications for Real-time Combat Sports Video Analysis." European Conference on the Impact of Artificial Intelligence and Robotics 4, no. 1 (November 17, 2022): 162–71. http://dx.doi.org/10.34190/eciair.4.1.930.

Full text
Abstract:
This study examines the potential applications of Human Action Recognition (HAR) in combat sports and aims to develop a prototype automation client that examines a video of a combat sports competition or training session and accurately classifies human movements. Computer Vision (CV) architectures that examine real-time video data streams are being investigated by integrating Deep Learning architectures into client-server systems for data storage and analysis using customised algorithms. The development of the automation client for training and deploying CV robots to watch and track specific chains of human actions is a central component of the project. Categorising specific chains of human actions allows for the comparison of multiple athletes' techniques as well as the identification of potential areas for improvement based on posture, accuracy, and other technical details, which can be used as an aid to improve athlete efficiency. The automation client will also be developed for the purpose of scoring, with a focus on the automation of the CV model to analyse and score a competition using a specific ruleset. The model will be validated by comparing performance and accuracy to that of combat sports experts. The primary research domains are CV, automation, robotics, combat sports, and decision science. Decision science is a set of quantitative techniques used to assist people to make decisions. The creation of a new automation client may contribute to the development of more efficient machine learning and CV applications in areas such as process efficiency, which improves user experience, workload management to reduce wait times, and run-time optimisation. This study found that real-time object detection and tracking can be combined with real-time pose estimation to generate performance statistics from a combat sports athlete's movements in a video.
APA, Harvard, Vancouver, ISO, and other styles
18

Fiedler, Marc-André, Philipp Werner, Aly Khalifa, and Ayoub Al-Hamadi. "SFPD: Simultaneous Face and Person Detection in Real-Time for Human–Robot Interaction." Sensors 21, no. 17 (September 2, 2021): 5918. http://dx.doi.org/10.3390/s21175918.

Full text
Abstract:
Face and person detection are important tasks in computer vision, as they represent the first component in many recognition systems, such as face recognition, facial expression analysis, body pose estimation, face attribute detection, or human action recognition. Thereby, their detection rate and runtime are crucial for the performance of the overall system. In this paper, we combine both face and person detection in one framework with the goal of reaching a detection performance that is competitive to the state of the art of lightweight object-specific networks while maintaining real-time processing speed for both detection tasks together. In order to combine face and person detection in one network, we applied multi-task learning. The difficulty lies in the fact that no datasets are available that contain both face as well as person annotations. Since we did not have the resources to manually annotate the datasets, as it is very time-consuming and automatic generation of ground truths results in annotations of poor quality, we solve this issue algorithmically by applying a special training procedure and network architecture without the need of creating new labels. Our newly developed method called Simultaneous Face and Person Detection (SFPD) is able to detect persons and faces with 40 frames per second. Because of this good trade-off between detection performance and inference time, SFPD represents a useful and valuable real-time framework especially for a multitude of real-world applications such as, e.g., human–robot interaction.
APA, Harvard, Vancouver, ISO, and other styles
19

Maraghi, Vali Ollah, and Karim Faez. "Scaling Human-Object Interaction Recognition in the Video through Zero-Shot Learning." Computational Intelligence and Neuroscience 2021 (June 9, 2021): 1–15. http://dx.doi.org/10.1155/2021/9922697.

Full text
Abstract:
Recognition of human activities is an essential field in computer vision. The most human activity consists of the interaction between humans and objects. Many successful works have been done on human-object interaction (HOI) recognition and achieved acceptable results in recent years. Still, they are fully supervised and need to train labeled data for all HOIs. Due to the enormous space of human-object interactions, listing and providing the training data for all possible categories is costly and impractical. We propose an approach for scaling human-object interaction recognition in video data through the zero-shot learning technique to solve this problem. Our method recognizes a verb and an object from the video and makes an HOI class. Recognition of the verbs and objects instead of HOIs allows identifying a new combination of verbs and objects. So, a new HOI class can be identified, which is not seen by the recognizer system. We introduce a neural network architecture that can understand and represent the video data. The proposed system learns verbs and objects from available training data at the training phase and can identify the verb-object pairs in a video at test time. So, the system can identify the HOI class with different combinations of objects and verbs. Also, we propose to use lateral information for combining the verbs and the objects to make valid verb-object pairs. It helps to prevent the detection of rare and probably wrong HOIs. The lateral information comes from word embedding techniques. Furthermore, we propose a new feature aggregation method for aggregating extracted high-level features from video frames before feeding them to the classifier. We illustrate that this feature aggregation method is more effective for actions that include multiple subactions. We evaluated our system by recently introduced Charades challengeable dataset, which has lots of HOI categories in videos. We show that our proposed system can detect unseen HOI classes in addition to the acceptable recognition of seen types. Therefore, the number of classes identifiable by the system is greater than the number of classes used for training.
APA, Harvard, Vancouver, ISO, and other styles
20

Zhao, XianPin. "Research on Athlete Behavior Recognition Technology in Sports Teaching Video Based on Deep Neural Network." Computational Intelligence and Neuroscience 2022 (January 5, 2022): 1–13. http://dx.doi.org/10.1155/2022/7260894.

Full text
Abstract:
In recent years, due to the simple design idea and good recognition effect, deep learning method has attracted more and more researchers’ attention in computer vision tasks. Aiming at the problem of athlete behavior recognition in mass sports teaching video, this paper takes depth video as the research object and cuts the frame sequence as the input of depth neural network model, inspired by the successful application of depth neural network based on two-dimensional convolution in image detection and recognition. A depth neural network based on three-dimensional convolution is constructed to automatically learn the temporal and spatial characteristics of athletes’ behavior. The training results on UTKinect-Action3D and MSR-Action3D public datasets show that the algorithm can correctly detect athletes’ behaviors and actions and show stronger recognition ability to the algorithm compared with the images without clipping frames, which effectively improves the recognition effect of physical education teaching videos.
APA, Harvard, Vancouver, ISO, and other styles
21

VELOSO, MANUELA, NICHOLAS ARMSTRONG-CREWS, SONIA CHERNOVA, ELISABETH CRAWFORD, COLIN MCMILLEN, MAAYAN ROTH, DOUGLAS VAIL, and STEFAN ZICKLER. "A TEAM OF HUMANOID GAME COMMENTATORS." International Journal of Humanoid Robotics 05, no. 03 (September 2008): 457–80. http://dx.doi.org/10.1142/s0219843608001479.

Full text
Abstract:
We present a team of two humanoid robot commentators for AIBO robot soccer games. The two humanoids stand by the side lines of the playing field, autonomously observe the game, wirelessly listen to a "game controller" computer, recognize events, and select announcing actions that may require coordination with each other. Given the large degree of uncertainty and dynamics of the robot soccer games, we further introduce a "Puppet Master" control that allows humans to intervene, prompting the robots to commentate an event if previously undefined or undetected. The robots recognize events based on input from these three sources, namely own and shared vision, game controller, and occasional Puppet Master. We present the two-humanoid behavioral architecture and the vision-based event recognition, including a SIFT-based vision processing algorithm that allows for the detection of multiple similar objects, such as the identical shaped robot players. We introduce the commentating algorithm that probabilistically selects a commentating action from a set of weighted actions corresponding to a detected event. The probabilistic selection uses the game history and updates the action weights to effectively avoid repetition of comments to enable entertainment. Our work, corresponding to a fully implemented system, CMCast, with two QRIO robots, contributes a team of two humanoids fully executing a challenging observation, modeling, coordination, and reporting task.
APA, Harvard, Vancouver, ISO, and other styles
22

Jaiswal, Ashish, Ashwin Ramesh Babu, Mohammad Zaki Zadeh, Debapriya Banerjee, and Fillia Makedon. "A Survey on Contrastive Self-Supervised Learning." Technologies 9, no. 1 (December 28, 2020): 2. http://dx.doi.org/10.3390/technologies9010002.

Full text
Abstract:
Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.
APA, Harvard, Vancouver, ISO, and other styles
23

Zhao, Qi, Boxue Zhang, Shuchang Lyu, Hong Zhang, Daniel Sun, Guoqiang Li, and Wenquan Feng. "A CNN-SIFT Hybrid Pedestrian Navigation Method Based on First-Person Vision." Remote Sensing 10, no. 8 (August 5, 2018): 1229. http://dx.doi.org/10.3390/rs10081229.

Full text
Abstract:
The emergence of new wearable technologies, such as action cameras and smart glasses, has driven the use of the first-person perspective in computer applications. This field is now attracting the attention and investment of researchers aiming to develop methods to process first-person vision (FPV) video. The current approaches present particular combinations of different image features and quantitative methods to accomplish specific objectives, such as object detection, activity recognition, user–machine interaction, etc. FPV-based navigation is necessary in some special areas, where Global Position System (GPS) or other radio-wave strength methods are blocked, and is especially helpful for visually impaired people. In this paper, we propose a hybrid structure with a convolutional neural network (CNN) and local image features to achieve FPV pedestrian navigation. A novel end-to-end trainable global pooling operator, called AlphaMEX, has been designed to improve the scene classification accuracy of CNNs. A scale-invariant feature transform (SIFT)-based tracking algorithm is employed for movement estimation and trajectory tracking of the person through each frame of FPV images. Experimental results demonstrate the effectiveness of the proposed method. The top-1 error rate of the proposed AlphaMEX-ResNet outperforms the original ResNet (k = 12) by 1.7% on the ImageNet dataset. The CNN-SIFT hybrid pedestrian navigation system reaches 0.57 m average absolute error, which is an adequate accuracy for pedestrian navigation. Both positions and movements can be well estimated by the proposed pedestrian navigation algorithm with a single wearable camera.
APA, Harvard, Vancouver, ISO, and other styles
24

Mauri, Antoine, Redouane Khemmar, Benoit Decoux, Madjid Haddad, and Rémi Boutteau. "Real-Time 3D Multi-Object Detection and Localization Based on Deep Learning for Road and Railway Smart Mobility." Journal of Imaging 7, no. 8 (August 12, 2021): 145. http://dx.doi.org/10.3390/jimaging7080145.

Full text
Abstract:
For smart mobility, autonomous vehicles, and advanced driver-assistance systems (ADASs), perception of the environment is an important task in scene analysis and understanding. Better perception of the environment allows for enhanced decision making, which, in turn, enables very high-precision actions. To this end, we introduce in this work a new real-time deep learning approach for 3D multi-object detection for smart mobility not only on roads, but also on railways. To obtain the 3D bounding boxes of the objects, we modified a proven real-time 2D detector, YOLOv3, to predict 3D object localization, object dimensions, and object orientation. Our method has been evaluated on KITTI’s road dataset as well as on our own hybrid virtual road/rail dataset acquired from the video game Grand Theft Auto (GTA) V. The evaluation of our method on these two datasets shows good accuracy, but more importantly that it can be used in real-time conditions, in road and rail traffic environments. Through our experimental results, we also show the importance of the accuracy of prediction of the regions of interest (RoIs) used in the estimation of 3D bounding box parameters.
APA, Harvard, Vancouver, ISO, and other styles
25

Guo, Yongping, Ying Chen, Jianzhi Deng, Shuiwang Li, and Hui Zhou. "Identity-Preserved Human Posture Detection in Infrared Thermal Images: A Benchmark." Sensors 23, no. 1 (December 22, 2022): 92. http://dx.doi.org/10.3390/s23010092.

Full text
Abstract:
Human pose estimation has a variety of real-life applications, including human action recognition, AI-powered personal trainers, robotics, motion capture and augmented reality, gaming, and video surveillance. However, most current human pose estimation systems are based on RGB images, which do not seriously take into account personal privacy. Although identity-preserved algorithms are very desirable when human pose estimation is applied to scenarios where personal privacy does matter, developing human pose estimation algorithms based on identity-preserved modalities, such as thermal images concerned here, is very challenging due to the limited amount of training data currently available and the fact that infrared thermal images, unlike RGB images, lack rich texture cues which makes annotating training data itself impractical. In this paper, we formulate a new task with privacy protection that lies between human detection and human pose estimation by introducing a benchmark for IPHPDT (i.e., Identity-Preserved Human Posture Detection in Thermal images). This task has a threefold novel purpose: the first is to establish an identity-preserved task with thermal images; the second is to achieve more information other than the location of persons as provided by human detection for more advanced computer vision applications; the third is to avoid difficulties in collecting well-annotated data for human pose estimation in thermal images. The presented IPHPDT dataset contains four types of human postures, consisting of 75,000 images well-annotated with axis-aligned bounding boxes and postures of the persons. Based on this well-annotated IPHPDT dataset and three state-of-the-art algorithms, i.e., YOLOF (short for You Only Look One-level Feature), YOLOX (short for Exceeding YOLO Series in 2021) and TOOD (short for Task-aligned One-stage Object Detection), we establish three baseline detectors, called IPH-YOLOF, IPH-YOLOX, and IPH-TOOD. In the experiments, three baseline detectors are used to recognize four infrared human postures, and the mean average precision can reach 70.4%. The results show that the three baseline detectors can effectively perform accurate posture detection on the IPHPDT dataset. By releasing IPHPDT, we expect to encourage more future studies into human posture detection in infrared thermal images and draw more attention to this challenging task.
APA, Harvard, Vancouver, ISO, and other styles
26

Rezaei, Mahdi, and Mohsen Azarmi. "DeepSOCIAL: Social Distancing Monitoring and Infection Risk Assessment in COVID-19 Pandemic." Applied Sciences 10, no. 21 (October 26, 2020): 7514. http://dx.doi.org/10.3390/app10217514.

Full text
Abstract:
Social distancing is a recommended solution by the World Health Organisation (WHO) to minimise the spread of COVID-19 in public places. The majority of governments and national health authorities have set the 2-m physical distancing as a mandatory safety measure in shopping centres, schools and other covered areas. In this research, we develop a hybrid Computer Vision and YOLOv4-based Deep Neural Network (DNN) model for automated people detection in the crowd in indoor and outdoor environments using common CCTV security cameras. The proposed DNN model in combination with an adapted inverse perspective mapping (IPM) technique and SORT tracking algorithm leads to a robust people detection and social distancing monitoring. The model has been trained against two most comprehensive datasets by the time of the research—the Microsoft Common Objects in Context (MS COCO) and Google Open Image datasets. The system has been evaluated against the Oxford Town Centre dataset (including 150,000 instances of people detection) with superior performance compared to three state-of-the-art methods. The evaluation has been conducted in challenging conditions, including occlusion, partial visibility, and under lighting variations with the mean average precision of 99.8% and the real-time speed of 24.1 fps. We also provide an online infection risk assessment scheme by statistical analysis of the spatio-temporal data from people’s moving trajectories and the rate of social distancing violations. We identify high-risk zones with the highest possibility of virus spread and infection. This may help authorities to redesign the layout of a public place or to take precaution actions to mitigate high-risk zones. The developed model is a generic and accurate people detection and tracking solution that can be applied in many other fields such as autonomous vehicles, human action recognition, anomaly detection, sports, crowd analysis, or any other research areas where the human detection is in the centre of attention.
APA, Harvard, Vancouver, ISO, and other styles
27

Zheng, Zepei. "Human Gesture Recognition in Computer Vision Research." SHS Web of Conferences 144 (2022): 03011. http://dx.doi.org/10.1051/shsconf/202214403011.

Full text
Abstract:
Human gesture recognition is a popular issue in the studies of computer vision, since it provides technological expertise required to advance the interaction between people and computers, virtual environments, smart surveillance, motion tracking, as well as other domains. Extraction of the human skeleton is a rather typical gesture recognition approach using existing technologies based on two-dimensional human gesture detection. Likewise, I t cannot be overlooked that objects in the surrounding environment give some information about human gestures. To semantically recognize the posture of the human body, the logic system presented in this research integrates the components recognized in the visual environment alongside the human skeletal position. In principle, it can improve the precision of recognizing postures and semantically represent peoples’ actions. As such, the paper suggests a potential and notion for recognizing human gestures, as well as increasing the quantity of information offered through analysis of images to enhance interaction between humans and computers.
APA, Harvard, Vancouver, ISO, and other styles
28

Bello, R. W., A. S. A. Mohamed, A. Z. Talib, D. A. Olubummo, and O. C. Enuma. "Computer vision-based techniques for cow object recognition." IOP Conference Series: Earth and Environmental Science 858, no. 1 (September 1, 2021): 012008. http://dx.doi.org/10.1088/1755-1315/858/1/012008.

Full text
Abstract:
Abstract The productivity of livestock farming depends on the welfare of the livestock. This can be achieved by physically and constantly monitoring their behaviors and activities by human experts. However, the degree of having high accuracy and consistency with manual monitoring in a commercial farm is herculean, and in most cases impractical. Hence, there is a need for a method that can overcome the challenges. Proposed in this paper, therefore, is the cow detection and monitoring method using computer vision techniques. The proposed method is capable of tracking and identifying cow objects in video experiments, thereby actualizing precision livestock farming. The method generates reasonable results when compared to other methods.
APA, Harvard, Vancouver, ISO, and other styles
29

Liu, Haitao, Yuge Li, and Dongchang Liu. "Object detection and recognition system based on computer vision analysis." Journal of Physics: Conference Series 1976, no. 1 (July 1, 2021): 012024. http://dx.doi.org/10.1088/1742-6596/1976/1/012024.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Hoshino, Satoshi, and Kyohei Niimura. "Robot Vision System for Human Detection and Action Recognition." Journal of Advanced Computational Intelligence and Intelligent Informatics 24, no. 3 (May 20, 2020): 346–56. http://dx.doi.org/10.20965/jaciii.2020.p0346.

Full text
Abstract:
Mobile robots equipped with camera sensors are required to perceive humans and their actions for safe autonomous navigation. For simultaneous human detection and action recognition, the real-time performance of the robot vision is an important issue. In this paper, we propose a robot vision system in which original images captured by a camera sensor are described by the optical flow. These images are then used as inputs for the human and action classifications. For the image inputs, two classifiers based on convolutional neural networks are developed. Moreover, we describe a novel detector (a local search window) for clipping partial images around the target human from the original image. Since the camera sensor moves together with the robot, the camera movement has an influence on the calculation of optical flow in the image, which we address by further modifying the optical flow for changes caused by the camera movement. Through the experiments, we show that the robot vision system can detect humans and recognize the action in real time. Furthermore, we show that a moving robot can achieve human detection and action recognition by modifying the optical flow.
APA, Harvard, Vancouver, ISO, and other styles
31

Suyadnya, I. Made Arsa, and Duman Care Khrisne. "Residual Neural Network Model for Detecting Waste Disposing Action in Images." Journal of Electrical, Electronics and Informatics 5, no. 2 (September 27, 2021): 52. http://dx.doi.org/10.24843/jeei.2021.v05.i02.p03.

Full text
Abstract:
Waste in general has become a major problem for people around the world. Evidence internationally shows that everyone, or nearly everyone, admits to polluting at some point, with the majority of people littering at least occasionally. This research wants to overcome these problems, by utilizing computer vision and deep learning approaches. This research was conducted to detect the actions carried out by humans in the activities/actions of disposing of waste in an image. This is useful to provide better information for research on better waste disposal behavior than before. We use a Convolutional Neural Network model with a Residual Neural Network architecture to detect the types of activities that objects perform in an image. The result is an artificial neural network model that can label the activities that occur in the input image (scene recognition). This model has been able to carry out the recognition process with an accuracy of 88% with an F1-Score of 0.87.
APA, Harvard, Vancouver, ISO, and other styles
32

Jiang, Hairong, Juan P. Wachs, and Bradley S. Duerstock. "Integrated vision-based system for efficient, semi-automated control of a robotic manipulator." International Journal of Intelligent Computing and Cybernetics 7, no. 3 (August 5, 2014): 253–66. http://dx.doi.org/10.1108/ijicc-09-2013-0042.

Full text
Abstract:
Purpose – The purpose of this paper is to develop an integrated, computer vision-based system to operate a commercial wheelchair-mounted robotic manipulator (WMRM). In addition, a gesture recognition interface system was developed specially for individuals with upper-level spinal cord injuries including object tracking and face recognition to function as an efficient, hands-free WMRM controller. Design/methodology/approach – Two Kinect® cameras were used synergistically to perform a variety of simple object retrieval tasks. One camera was used to interpret the hand gestures and locate the operator's face for object positioning, and then send those as commands to control the WMRM. The other sensor was used to automatically recognize different daily living objects selected by the subjects. An object recognition module employing the Speeded Up Robust Features algorithm was implemented and recognition results were sent as a commands for “coarse positioning” of the robotic arm near the selected object. Automatic face detection was provided as a shortcut enabling the positing of the objects close by the subject's face. Findings – The gesture recognition interface incorporated hand detection, tracking and recognition algorithms, and yielded a recognition accuracy of 97.5 percent for an eight-gesture lexicon. Tasks’ completion time were conducted to compare manual (gestures only) and semi-manual (gestures, automatic face detection, and object recognition) WMRM control modes. The use of automatic face and object detection significantly reduced the completion times for retrieving a variety of daily living objects. Originality/value – Integration of three computer vision modules were used to construct an effective and hand-free interface for individuals with upper-limb mobility impairments to control a WMRM.
APA, Harvard, Vancouver, ISO, and other styles
33

Singh, Baljeet, Nitin Kumar, Irshad Ahmed, and Karun Yadav. "Real-Time Object Detection Using Deep Learning." International Journal for Research in Applied Science and Engineering Technology 10, no. 5 (May 31, 2022): 3159–60. http://dx.doi.org/10.22214/ijraset.2022.42820.

Full text
Abstract:
Abstract: The computer vision field known as real-time acquisition is large, dynamic, and complex. Local image process refers to the acquisition of one object in an image, while Objects refers to the acquisition of multiple objects in an image. In digital photos and videos, this sees semantic class objects. Tracking features, video surveilance, pedestrian detection, census, self-driving cars, face recognition, sports tracking, and many other applications used to find real-time object. Convolution Neural Networks is an in-depth study tool for OpenCV (Opensource Computer Vision), a set of basic computer-assisted programming tasks. Computer visualization, in-depth study, and convolutional neural networks are some of the words used in this paper..
APA, Harvard, Vancouver, ISO, and other styles
34

West, Geoff A. W. "Assessing Feature Importance in the Context of Object Recognition." International Journal of Pattern Recognition and Artificial Intelligence 11, no. 01 (February 1997): 49–77. http://dx.doi.org/10.1142/s0218001497000044.

Full text
Abstract:
A popular paradigm in computer vision is based on dividing the vision problem into three stages namely segmentation, feature extraction and recognition. For example edge detection followed by line detection followed by planar object recognition. It can be argued that each of these stages needs to be thoroughly described to enable vision systems to be configured with predictable performance. However an alternative view is that the performance of each stage is not in itself important as long as the overall performance is acceptable. This paper discusses feature performance concentrating on the assessmentof edge-based feature detection and object recognition. Evaluation techniques are discussed for assessing arc and line detection algorithmsand for features in the context of verification and pose refinement strategies. These techniques can then be used for the design and integration of indexing and verification stages of object recognition. A theme of the paper is the need to assess feature extraction in the context of the chosen task.
APA, Harvard, Vancouver, ISO, and other styles
35

Wang, Jinding, Haifeng Hu, and Xinlong Lu. "ADN for object detection." IET Computer Vision 14, no. 2 (January 23, 2020): 65–72. http://dx.doi.org/10.1049/iet-cvi.2018.5651.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Rehman, Amjad, Tanzila Saba, Muhammad Zeeshan Khan, Robertas Damaševičius, and Saeed Ali Bahaj. "Internet-of-Things-Based Suspicious Activity Recognition Using Multimodalities of Computer Vision for Smart City Security." Security and Communication Networks 2022 (October 5, 2022): 1–12. http://dx.doi.org/10.1155/2022/8383461.

Full text
Abstract:
Automatic human activity recognition is one of the milestones of smart city surveillance projects. Human activity detection and recognition aim to identify the activities based on the observations that are being performed by the subject. Hence, vision-based human activity recognition systems have a wide scope in video surveillance, health care systems, and human-computer interaction. Currently, the world is moving towards a smart and safe city concept. Automatic human activity recognition is the major challenge of smart city surveillance. The proposed research work employed fine-tuned YOLO-v4 for activity detection, whereas for classification purposes, 3D-CNN has been implemented. Besides the classification, the presented research model also leverages human-object interaction with the help of intersection over union (IOU). An Internet of Things (IoT) based architecture is implemented to take efficient and real-time decisions. The dataset of exploit classes has been taken from the UCF-Crime dataset for activity recognition. At the same time, the dataset extracted from MS-COCO for suspicious object detection is involved in human-object interaction. This research is also applied to human activity detection and recognition in the university premises for real-time suspicious activity detection and automatic alerts. The experiments have exhibited that the proposed multimodal approach achieves remarkable activity detection and recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
37

Modwel, Garv, Anu Mehra, Nitin Rakesh, and K. K. Mishra. "Advanced Object Detection in Bio-Medical X-Ray Images for Anomaly Detection and Recognition." International Journal of E-Health and Medical Communications 12, no. 2 (July 2021): 93–110. http://dx.doi.org/10.4018/ijehmc.2021030106.

Full text
Abstract:
The human vision system is mimicked in the format of videos and images in the area of computer vision. As humans can process their memories, likewise video and images can be processed and perceptive with the help of computer vision technology. There is a broad range of fields that have great speculation and concepts building in the area of application of computer vision, which includes automobile, biomedical, space research, etc. The case study in this manuscript enlightens one about the innovation and future scope possibilities that can start a new era in the biomedical image-processing sector. A pre-surgical investigation can be perused with the help of the proposed technology that will enable the doctors to analyses the situations with deeper insight. There are different types of biomedical imaging such as magnetic resonance imaging (MRI), computerized tomographic (CT) scan, x-ray imaging. The focused arena of the proposed research is x-ray imaging in this subset. As it is always error-prone to do an eyeball check for a human when it comes to the detailing. The same applied to doctors. Subsequently, they need different equipment for related technologies. The methodology proposed in this manuscript analyses the details that may be missed by an expert doctor. The input to the algorithm is the image in the format of x-ray imaging; eventually, the output of the process is a label on the corresponding objects in the test image. The tool used in the process also mimics the human brain neuron system. The proposed method uses a convolutional neural network to decide on the labels on the objects for which it interprets the image. After some pre-processing the x-ray images, the neural network receives the input to achieve an efficient performance. The result analysis is done that gives a considerable performance in terms of confusion factor that is represented in terms of percentage. At the end of the narration of the manuscript, future possibilities are being traces out to the limelight to conduct further research.
APA, Harvard, Vancouver, ISO, and other styles
38

Dhaigude, Santosh. "Computer Vision Based Virtual Sketch Using Detection." International Journal for Research in Applied Science and Engineering Technology 10, no. 1 (January 31, 2022): 264–68. http://dx.doi.org/10.22214/ijraset.2022.39814.

Full text
Abstract:
Abstract: In todays world during this pandemic situation Online Learning is the only source where one could learn. Online learning makes students more curious about the knowledge and so they decide their learning path . But considering the academics as they have to pass the course or exam given, they need to take time to study, and have to be disciplined about their dedication. And there are many barriers for Online learning as well. Students are lowering their grasping power the reason for this is that each and every student was used to rely on their teacher and offline classes. Virtual writing and controlling system is challenging research areas in field of image processing and pattern recognition in the recent years. It contributes extremely to the advancement of an automation process and can improve the interface between man and machine in numerous applications. Several research works have been focusing on new techniques and methods that would reduce the processing time while providing higher recognition accuracy. Given the real time webcam data, this jambord like python application uses OpenCV library to track an object-of-interest (a human palm/finger in this case) and allows the user to draw bymoving the finger, which makes it both awesome and interesting to draw simple thing. Keyword: Detection, Handlandmark , Keypoints, Computer vision, OpenCV
APA, Harvard, Vancouver, ISO, and other styles
39

Cahyadi, Septian, Febri Damatraseta, and Lodryck Lodefikus S. "Comparative Analysis Of Efficient Image Segmentation Technique For Text Recognition And Human Skin Recognition." Jurnal Informatika Kesatuan 1, no. 1 (July 13, 2021): 81–90. http://dx.doi.org/10.37641/jikes.v1i1.775.

Full text
Abstract:
Computer Vision and Pattern Recognition is one of the most interesting research subject on computer science, especially in case of reading or recognition of objects in realtime from the camera device. Object detection has wide range of segments, in this study we will try to find where the better methodologies for detecting a text and human skin. This study aims to develop a computer vision technology that will be used to help people with disabilities, especially illiterate (tuna aksara) and deaf (penyandang tuli) to recognize and learn the letters of the alphabet (A-Z). Based on our research, it is found that the best method and technique used for text recognition is Convolutional Neural Network with achievement accuracy reaches 93%, the next best achievement obtained OCR method, which reached 98% on the reading plate number. And also OCR method are 88% with stable image reading and good lighting conditions as well as the standard font type of a book. Meanwhile, best method and technique to detect human skin is by using Skin Color Segmentation: CIELab color space with accuracy of 96.87%. While the algorithm for classification using Convolutional Neural Network (CNN), the accuracy rate of 98% Key word: Computer Vision, Segmentation, Object Recognition, Text Recognition, Skin Color Detection, Motion Detection, Disability Application
APA, Harvard, Vancouver, ISO, and other styles
40

Terada, Kazunori, Takayuki Nakamura, Hideaki Takeda, and Tsukasa Ogasawara. "Embodiment-Based Object Recognition for Vision-Based Mobile Agents." Journal of Robotics and Mechatronics 13, no. 1 (February 20, 2001): 88–95. http://dx.doi.org/10.20965/jrm.2001.p0088.

Full text
Abstract:
In this paper, we propose a new architecture for object recognition based on the concept of ""embodiment"" as a primitive function for a cognitive robot. We define the term ""embodiment"" as the extent of the agent itself, locomotive ability, and its sensor. Based on this concept, an object is represented by reaching action paths, which correspond to a set of sequences of movement by the agent for reaching the object. Such behavior is acquired by trial-and-error based on visual and tactile information. Visual information is used to obtain sensorimotor mapping, which represents the relationship between the change of an object's appearance and the movement of the agent. Tactile information is used to evaluate the change of physical condition of the object caused by such movement. By such means, the agent can recognize an object regardless of its position and orientation in the environment. To demonstrate the feasibility of our method, we detail experimental results of computer simulation.
APA, Harvard, Vancouver, ISO, and other styles
41

Heikel, Edvard, and Leonardo Espinosa-Leal. "Indoor Scene Recognition via Object Detection and TF-IDF." Journal of Imaging 8, no. 8 (July 26, 2022): 209. http://dx.doi.org/10.3390/jimaging8080209.

Full text
Abstract:
Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision and natural language processing (YOLO and TF-IDF, respectively). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on.
APA, Harvard, Vancouver, ISO, and other styles
42

Meisels, Amnon, and Ronen Versano. "Token-textured object detection by pyramids." Image and Vision Computing 10, no. 1 (January 1992): 55–62. http://dx.doi.org/10.1016/0262-8856(92)90084-g.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Zhang, Hongming, Wen Gao, Xilin Chen, and Debin Zhao. "Object detection using spatial histogram features." Image and Vision Computing 24, no. 4 (April 2006): 327–41. http://dx.doi.org/10.1016/j.imavis.2005.11.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Laptev, Ivan. "Improving object detection with boosted histograms." Image and Vision Computing 27, no. 5 (April 2009): 535–44. http://dx.doi.org/10.1016/j.imavis.2008.08.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Meghana, K. S. "Face Sketch Recognition Using Computer Vision." International Journal for Research in Applied Science and Engineering Technology 9, no. VII (July 25, 2021): 2005–9. http://dx.doi.org/10.22214/ijraset.2021.36806.

Full text
Abstract:
Now-a-days need for technologies for identification, detection and recognition of suspects has increased. One of the most common biometric techniques is face recognition, since face is the convenient way used by the people to identify each-other. Understanding how humans recognize face sketches drawn by artists is of significant value to both criminal investigators and forensic researchers in Computer Vision. However, studies say that hand-drawn face sketches are still very limited in terms of artists and number of sketches because after any incident a forensic artist prepares a victim’s sketches on behalf of the description provided by an eyewitness. Sometimes suspect uses special mask to hide some common features of faces like nose, eyes, lips, face-color etc. but the outliner features of face biometrics one could never hide. Here we concentrate on some specific facial geometric feature which could be used to calculate some ratio of similarities from the template photograph database against the forensic sketches. The project describes the design of a system for face sketch recognition by a computer vision approach like Discrete Cosine Transform (DCT), Local Binary Pattern Histogram (LBPH) algorithm and a supervised machine learning model called Support Vector Machine (SVM) for face recognition. Tkinter is the standard GUI library for Python. Python when combined with Tkinter provides a fast and easy way to create GUI applications. Tkinter provides a powerful object-oriented interface to the Tk GUI toolkit.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhang, Zheng, Cong Huang, Fei Zhong, Bote Qi, and Binghong Gao. "Posture Recognition and Behavior Tracking in Swimming Motion Images under Computer Machine Vision." Complexity 2021 (May 20, 2021): 1–9. http://dx.doi.org/10.1155/2021/5526831.

Full text
Abstract:
This study is to explore the gesture recognition and behavior tracking in swimming motion images under computer machine vision and to expand the application of moving target detection and tracking algorithms based on computer machine vision in this field. The objectives are realized by moving target detection and tracking, Gaussian mixture model, optimized correlation filtering algorithm, and Camshift tracking algorithm. Firstly, the Gaussian algorithm is introduced into target tracking and detection to reduce the filtering loss and make the acquired motion posture more accurate. Secondly, an improved kernel-related filter tracking algorithm is proposed by training multiple filters, which can clearly and accurately obtain the motion trajectory of the monitored target object. Finally, it is proposed to combine the Kalman algorithm with the Camshift algorithm for optimization, which can complete the tracking and recognition of moving targets. The experimental results show that the target tracking and detection method can obtain the movement form of the template object relatively completely, and the kernel-related filter tracking algorithm can also obtain the movement speed of the target object finely. In addition, the accuracy of Camshift tracking algorithm can reach 86.02%. Results of this study can provide reliable data support and reference for expanding the application of moving target detection and tracking methods.
APA, Harvard, Vancouver, ISO, and other styles
47

Jung, Minji, Heekyung Yang, and Kyungha Min. "Improving Deep Object Detection Algorithms for Game Scenes." Electronics 10, no. 20 (October 17, 2021): 2527. http://dx.doi.org/10.3390/electronics10202527.

Full text
Abstract:
The advancement and popularity of computer games make game scene analysis one of the most interesting research topics in the computer vision society. Among the various computer vision techniques, we employ object detection algorithms for the analysis, since they can both recognize and localize objects in a scene. However, applying the existing object detection algorithms for analyzing game scenes does not guarantee a desired performance, since the algorithms are trained using datasets collected from the real world. In order to achieve a desired performance for analyzing game scenes, we built a dataset by collecting game scenes and retrained the object detection algorithms pre-trained with the datasets from the real world. We selected five object detection algorithms, namely YOLOv3, Faster R-CNN, SSD, FPN and EfficientDet, and eight games from various game genres including first-person shooting, role-playing, sports, and driving. PascalVOC and MS COCO were employed for the pre-training of the object detection algorithms. We proved the improvement in the performance that comes from our strategy in two aspects: recognition and localization. The improvement in recognition performance was measured using mean average precision (mAP) and the improvement in localization using intersection over union (IoU).
APA, Harvard, Vancouver, ISO, and other styles
48

Pulla Rao, Chennamsetty, A. Guruva Reddy, and C. B. Rama Rao. "Camouflaged object detection for machine vision applications." International Journal of Speech Technology 23, no. 2 (March 16, 2020): 327–35. http://dx.doi.org/10.1007/s10772-020-09699-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Jot Singh, Kiran, Divneet Singh Kapoor, Khushal Thakur, Anshul Sharma, and Xiao-Zhi Gao. "Computer-Vision Based Object Detection and Recognition for Service Robot in Indoor Environment." Computers, Materials & Continua 72, no. 1 (2022): 197–213. http://dx.doi.org/10.32604/cmc.2022.022989.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Hidayat, Rahmat, Hendrick, Riandini, Zhi-Hao Wang, and Horng Gwo-Jiun. "Mask RCNN Methods for Eyes Modelling." International Journal of Data Science 2, no. 2 (December 31, 2021): 63–68. http://dx.doi.org/10.18517/ijods.2.2.63-68.2021.

Full text
Abstract:
Object detection is one of Deep Learning section in Computer Vision. The application of computer vision is divided into image classification and object detection. Object detection have target to find specific object from an image. The application of object detection for security are face recognition, and face detection. Face detection have been developed for medical application to identify emotion from faces. In this research, we proposed an eye modelling by using Mask RCNN. The eye model was applied in real time detection combined with OpenCV. The dataset was created from online dataset and image from webcam. The model was trained with 4 epochs and 131 iterations. The final model was successfully detected eye from real-time application.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography