Academic literature on the topic 'Computer vision, object detection, action recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Computer vision, object detection, action recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Computer vision, object detection, action recognition"

1

Zhang, Hong-Bo, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. "A Comprehensive Survey of Vision-Based Human Action Recognition Methods." Sensors 19, no. 5 (February 27, 2019): 1005. http://dx.doi.org/10.3390/s19051005.

Full text
Abstract:
Although widely used in many applications, accurate and efficient human action recognition remains a challenging area of research in the field of computer vision. Most recent surveys have focused on narrow problems such as human action recognition methods using depth data, 3D-skeleton data, still image data, spatiotemporal interest point-based methods, and human walking motion recognition. However, there has been no systematic survey of human action recognition. To this end, we present a thorough review of human action recognition methods and provide a comprehensive overview of recent approaches in human action recognition research, including progress in hand-designed action features in RGB and depth data, current deep learning-based action feature representation methods, advances in human–object interaction recognition methods, and the current prominent research topic of action detection methods. Finally, we present several analysis recommendations for researchers. This survey paper provides an essential reference for those interested in further research on human action recognition.
APA, Harvard, Vancouver, ISO, and other styles
2

Gundu, Sireesha, and Hussain Syed. "Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques." Sensors 23, no. 5 (February 25, 2023): 2569. http://dx.doi.org/10.3390/s23052569.

Full text
Abstract:
Activity recognition in unmanned aerial vehicle (UAV) surveillance is addressed in various computer vision applications such as image retrieval, pose estimation, object detection, object detection in videos, object detection in still images, object detection in video frames, face recognition, and video action recognition. In the UAV-based surveillance technology, video segments captured from aerial vehicles make it challenging to recognize and distinguish human behavior. In this research, to recognize a single and multi-human activity using aerial data, a hybrid model of histogram of oriented gradient (HOG), mask-regional convolutional neural network (Mask-RCNN), and bidirectional long short-term memory (Bi-LSTM) is employed. The HOG algorithm extracts patterns, Mask-RCNN extracts feature maps from the raw aerial image data, and the Bi-LSTM network exploits the temporal relationship between the frames for the underlying action in the scene. This Bi-LSTM network reduces the error rate to the greatest extent due to its bidirectional process. This novel architecture generates enhanced segmentation by utilizing the histogram gradient-based instance segmentation and improves the accuracy of classifying human activities using the Bi-LSTM approach. Experimental outcomes demonstrate that the proposed model outperforms the other state-of-the-art models and has achieved 99.25% accuracy on the YouTube-Aerial dataset.
APA, Harvard, Vancouver, ISO, and other styles
3

Mikhalev, Oleg, and Alexander Yanyushkin. "Machine vision and object recognition using neural networks." Robotics and Technical Cybernetics 10, no. 2 (June 2022): 113–20. http://dx.doi.org/10.31776/rtcj.10204.

Full text
Abstract:
Computer vision is becoming one of the important areas of automation of various human activities. Technical systems today are endowed with the ability to see, and along with the use of neural networks, they are also endowed with the ability to act intelligently. Thus, they are able to see and make the right decisions and actions faster and more accurately than a person. The article discusses the possibility of using machine vision and object recognition technology for industrial automation, describes a convolutional neural network and an object detection algorithm.
APA, Harvard, Vancouver, ISO, and other styles
4

Voulodimos, Athanasios, Nikolaos Doulamis, Anastasios Doulamis, and Eftychios Protopapadakis. "Deep Learning for Computer Vision: A Brief Review." Computational Intelligence and Neuroscience 2018 (2018): 1–13. http://dx.doi.org/10.1155/2018/7068349.

Full text
Abstract:
Over the last years deep learning methods have been shown to outperform previous state-of-the-art machine learning techniques in several fields, with computer vision being one of the most prominent cases. This review paper provides a brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders. A brief account of their history, structure, advantages, and limitations is given, followed by a description of their applications in various computer vision tasks, such as object detection, face recognition, action and activity recognition, and human pose estimation. Finally, a brief overview is given of future directions in designing deep learning schemes for computer vision problems and the challenges involved therein.
APA, Harvard, Vancouver, ISO, and other styles
5

Wang, Chang, Jinyu Sun, Shiwei Ma, Yuqiu Lu, and Wang Liu. "Multi-stream Network for Human-object Interaction Detection." International Journal of Pattern Recognition and Artificial Intelligence 35, no. 08 (March 12, 2021): 2150025. http://dx.doi.org/10.1142/s0218001421500257.

Full text
Abstract:
Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
6

Gall, J., A. Yao, N. Razavi, L. Van Gool, and V. Lempitsky. "Hough Forests for Object Detection, Tracking, and Action Recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence 33, no. 11 (November 2011): 2188–202. http://dx.doi.org/10.1109/tpami.2011.70.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hoshino, Satoshi, and Kyohei Niimura. "Optical Flow for Real-Time Human Detection and Action Recognition Based on CNN Classifiers." Journal of Advanced Computational Intelligence and Intelligent Informatics 23, no. 4 (July 20, 2019): 735–42. http://dx.doi.org/10.20965/jaciii.2019.p0735.

Full text
Abstract:
Mobile robots equipped with camera sensors are required to perceive surrounding humans and their actions for safe and autonomous navigation. In this work, moving humans are the target objects. For robot vision, real-time performance is an important requirement. Therefore, we propose a robot vision system in which the original images captured by a camera sensor are described by optical flow. These images are then used as inputs to a classifier. For classifying images into human and not-human classifications, and the actions, we use a convolutional neural network (CNN), rather than coding invariant features. Moreover, we present a local search window as a novel detector for clipping partial images around target objects in an original image. Through the experiments, we ultimately show that the robot vision system is able to detect moving humans and recognize action in real time.
APA, Harvard, Vancouver, ISO, and other styles
8

Sumathi, J. k. "Dynamic Image Forensics and Forgery Analytics using Open Computer Vision Framework." Wasit Journal of Computer and Mathematics Science 1, no. 1 (March 17, 2021): 1–8. http://dx.doi.org/10.31185/wjcm.vol1.iss1.3.

Full text
Abstract:
The key advances in Computer Vision and Optical Image Processing are the emerging technologies nowadays in diverse fields including Facial Recognition, Biometric Verifications, Internet of Things (IoT), Criminal Investigation, Signature Identification in banking and several others. Thus, these applications use image and live video processing for facilitating different applications for analyzing and forecasting." Computer vision is used in tons of activities such as monitoring, face recognition, motion recognition, object detection, among many others. The development of social networking platforms such as Facebook and Instagram led to an increase in the volume of image data that was being generated. Use of image and video processing software is a major concern for Facebook because the photos and videos that people post to the social network are doctored images. These kind of images are frequently cited as fake and used in malevolent ways such as motivating violence and death. You need to authenticate the questionable images before take action. It is very hard to ensure photo authenticity due to the power of photo manipulations. Image formation can be determined by image forensic techniques. The technique of image duplication is used to conceal missing areas.
APA, Harvard, Vancouver, ISO, and other styles
9

Zeng, Wei, Junjian Huang, Wei Zhang, Hai Nan, and Zhenjiang Fu. "SlowFast Action Recognition Algorithm Based on Faster and More Accurate Detectors." Electronics 11, no. 22 (November 16, 2022): 3770. http://dx.doi.org/10.3390/electronics11223770.

Full text
Abstract:
Object detection algorithms play a crucial role in other vision tasks. This paper finds that the action recognition algorithm SlowFast’s detection algorithm FasterRCNN (Region Convolutional Neural Network) has disadvantages in terms of both detection accuracy and speed and the traditional IOU (Intersection over Union) localization loss is difficult to make the detection model converge to the minimum stability point. To solve the above problems, the article uses YOLOv3 (You Only Look Once), YOLOX, and CascadeRCNN to improve the detection accuracy and speed of the SlowFast. This paper proposes a new localization loss function that adopts the Lance and Williams distance as a new penalty term. The new loss function is more sensitive when the distance difference is smaller, and this property is very suitable for the late convergence of the detection model. The experiments were conducted on the VOC (Visual Object Classes) dataset and the COCO dataset. In the final videos test, YOLOv3 improved the detection speed by 10.5 s. CascadeRCNN improved by 3.1%AP compared to FasterRCNN in the COCO dataset. YOLOX’s performance on the COCO dataset is also mostly better than that of FasterRCNN. The new LIOU (Lance and Williams Distance Intersection over Union) localization loss function performs better than other loss functions in the VOC dataset. It can be seen that improving the detection algorithm of the SlowFast seems to be crucial and the proposed loss function is indeed effective.
APA, Harvard, Vancouver, ISO, and other styles
10

Prahara, Adhi, Murinto Murinto, and Dewi Pramudi Ismi. "Bottom-up visual attention model for still image: a preliminary study." International Journal of Advances in Intelligent Informatics 6, no. 1 (March 31, 2020): 82. http://dx.doi.org/10.26555/ijain.v6i1.469.

Full text
Abstract:
The philosophy of human visual attention is scientifically explained in the field of cognitive psychology and neuroscience then computationally modeled in the field of computer science and engineering. Visual attention models have been applied in computer vision systems such as object detection, object recognition, image segmentation, image and video compression, action recognition, visual tracking, and so on. This work studies bottom-up visual attention, namely human fixation prediction and salient object detection models. The preliminary study briefly covers from the biological perspective of visual attention, including visual pathway, the theory of visual attention, to the computational model of bottom-up visual attention that generates saliency map. The study compares some models at each stage and observes whether the stage is inspired by biological architecture, concept, or behavior of human visual attention. From the study, the use of low-level features, center-surround mechanism, sparse representation, and higher-level guidance with intrinsic cues dominate the bottom-up visual attention approaches. The study also highlights the correlation between bottom-up visual attention and curiosity.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Computer vision, object detection, action recognition"

1

Anwer, Rao Muhammad. "Color for Object Detection and Action Recognition." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/120224.

Full text
Abstract:
Detectar objetos en imágenes es un problema central en el campo de la visión por computador. El marco de detección basado en modelos de partes deformable es actualmente el más eficaz. Generalmente, HOG es el descriptor de imágenes a partir del cual se construyen esos modelos. El reconocimiento de acciones humanas es otro de los tópicos de más interés actualmente en el campo de la visión por computador. En este caso, los modelos usados siguen la idea de conjuntos de palabras (visuales), en inglés bag-of-words, en este caso siendo SIFT uno de los descriptor de imágenes más usados para dar soporte a la formación de esos modelos. En este contexto hay una información muy relevante para el sistema visual humano que normalmente está infrautilizada tanto en la detección de objetos como en el reconocimiento de acciones, hablamos del color. Es decir, tanto HOG como SIFT suelen ser aplicados al canal de luminancia o algún tipo de proyección de los canales de color que también lo desechan. Globalmente esta tesis se centra en incorporar color como fuente de información adicional para mejorar tanto la detección objetos como el reconocimiento de acciones. En primer lugar la tesis analiza el problema de la detección de personas en fotografías. En particular nos centramos en analizar la aportación del color a los métodos del estado del arte. A continuación damos el salto al problema de la detección de objetos en general, no solo personas. Además, en lugar de introducir el color en el nivel más bajo de la representación de la imagen, lo cual incrementa la dimensión de la representación provocando un mayor coste computacional y la necesidad de más ejemplos de aprendizaje, en esta tesis nos centramos en introducir el color en un nivel más alto de la representación. Esto no es trivial ya que el sistema en desarrollo tiene que aprender una serie de atributos de color que sean lo suficientemente discriminativos para cada tarea. En particular, en esta tesis combinamos esos atributos de color con los tradicionales atributos de forma y lo aplicamos de forma que mejoramos el estado del arte de la detección de objetos. Finalmente, nos centramos en llevar las ideas incorporadas para la tarea de detección a la tarea de reconocimiento de acciones. En este caso también demostramos cómo la incorporación del color, tal y como proponemos en esta tesis, permite mejorar el estado del arte.
Recognizing object categories in real world images is a challenging problem in computer vision. The deformable part based framework is currently the most successful approach for object detection. Generally, HOG are used for image representation within the part-based framework. For action recognition, the bag-of-word framework has shown to provide promising results. Within the bag-of-words framework, local image patches are described by SIFT descriptor. Contrary to object detection and action recognition, combining color and shape has shown to provide the best performance for object and scene recognition. In the first part of this thesis, we analyze the problem of person detection in still images. Standard person detection approaches rely on intensity based features for image representation while ignoring the color. Channel based descriptors is one of the most commonly used approaches in object recognition. This inspires us to evaluate incorporating color information using the channel based fusion approach for the task of person detection. In the second part of the thesis, we investigate the problem of object detection in still images. Due to high dimensionality, channel based fusion increases the computational cost. Moreover, channel based fusion has been found to obtain inferior results for object category where one of the visual varies significantly. On the other hand, late fusion is known to provide improved results for a wide range of object categories. A consequence of late fusion strategy is the need of a pure color descriptor. Therefore, we propose to use Color attributes as an explicit color representation for object detection. Color attributes are compact and computationally efficient. Consequently color attributes are combined with traditional shape features providing excellent results for object detection task. Finally, we focus on the problem of action detection and classification in still images. We investigate the potential of color for action classification and detection in still images. We also evaluate different fusion approaches for combining color and shape information for action recognition. Additionally, an analysis is performed to validate the contribution of color for action recognition. Our results clearly demonstrate that combining color and shape information significantly improve the performance of both action classification and detection in still images.
APA, Harvard, Vancouver, ISO, and other styles
2

Friberg, Oscar. "Recognizing Semantics in Human Actions with Object Detection." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-212579.

Full text
Abstract:
Two-stream convolutional neural networks are currently one of the most successful approaches for human action recognition. The two-stream convolutional networks separates spatial and temporal information into a spatial stream and a temporal stream. The spatial stream accepts a single RGB frame, while the temporal stream accepts a sequence of optical flow. There have been attempts to further extend the work of the two-stream convolutional network framework. For instance there have been attempts to extend with a third network for auxiliary information, which this thesis mainly focuses on. We seek to extend the two-stream convolutional neural network by introducing a semantic stream by using object detection systems. Two contributions are made in thesis: First we show that this semantic stream can provide slight improvements over two-stream convolutional neural networks for human action recognition on standard benchmarks. Secondly, we attempt to seek divergence enhancements techniques to force our new semantic stream to complement the spatial and the temporal streams by modifying the loss function during training. Slight gains are seen using these divergence enhancement techniques.
Faltningsnätverk i två strömmar är just nu den mest lyckade tillvägagångsmetoden för mänsklig aktivitetsigenkänning, vilket delar upp rumslig och timlig information i en rumslig ström och en timlig ström. Den rumsliga strömmen tar emot individella RGB bildrutor för igenkänning, medan den timliga strömmen tar emot en sekvens av optisk flöde. Försök i att utöka ramverket för faltningsnätverk i två strömmar har gjorts i tidigare arbete. Till exempel har försök gjorts i att komplementera dessa två nätverk med ett tredje nätverk som tar emot extra information. I detta examensarbete söker vi metoder för att utöka faltningsnätverk i två strömmar genom att introducera en semantisk ström med objektdetektion. Vi gör i huvudsak två bidrag i detta examensarbete: Först visar vi att den semantiska strömmen tillsammans med den rumsliga strömmen och den timliga strömmen kan bidra till små förbättringar för mänsklig aktivitetsigenkänning i video på riktmärkesstandarder. För det andra söker vi efter divergensutökningstekniker som tvingar den semantiska strömme att komplementera de andra två strömmarna genom att modifiera förlustfunktionen under träning. Vi ser små förbättringar med att använda dessa tekniker för att öka divergens.
APA, Harvard, Vancouver, ISO, and other styles
3

Kalogeiton, Vasiliki. "Localizing spatially and temporally objects and actions in videos." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/28984.

Full text
Abstract:
The rise of deep learning has facilitated remarkable progress in video understanding. This thesis addresses three important tasks of video understanding: video object detection, joint object and action detection, and spatio-temporal action localization. Object class detection is one of the most important challenges in computer vision. Object detectors are usually trained on bounding-boxes from still images. Recently, video has been used as an alternative source of data. Yet, training an object detector on one domain (either still images or videos) and testing on the other one results in a significant performance gap compared to training and testing on the same domain. In the first part of this thesis, we examine the reasons behind this performance gap. We define and evaluate several domain shift factors: spatial location accuracy, appearance diversity, image quality, aspect distribution, and object size and camera framing. We examine the impact of these factors by comparing the detection performance before and after cancelling them out. The results show that all five factors affect the performance of the detectors and their combined effect explains the performance gap. While most existing approaches for detection in videos focus on objects or human actions separately, in the second part of this thesis we aim at detecting non-human centric actions, i.e., objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multitask objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting object-action pairs in videos, and show that both tasks of object and action detection benefit from this joint learning. In experiments on the A2D dataset [Xu et al., 2015], we obtain state-of-the-art results on segmentation of object-action pairs. In the third part, we are the first to propose an action tubelet detector that leverages the temporal continuity of videos instead of operating at the frame level, as state-of-the-art approaches do. The same way modern detectors rely on anchor boxes, our tubelet detector is based on anchor cuboids by taking as input a sequence of frames and outputing tubelets, i.e., sequences of bounding boxes with associated scores. Our tubelet detector outperforms all state of the art on the UCF-Sports [Rodriguez et al., 2008], J-HMDB [Jhuang et al., 2013a], and UCF-101 [Soomro et al., 2012] action localization datasets especially at high overlap thresholds. The improvement in detection performance is explained by both more accurate scores and more precise localization.
APA, Harvard, Vancouver, ISO, and other styles
4

Ranalli, Lorenzo. "Studio ed implementazione di un modello di Action Recognition. Classificazione delle azioni di gioco e della tipologia di colpi durante un match di Tennis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
Il Machine Learning e lo sport stanno sempre di più consolidando il proprio matrimonio. Che siano sport individuali, sport di squadra, sport più o meno professionistici, è sempre più presente una componente smart che emerge sia in fase di arbitraggio che in fase di coaching virtuale. È proprio in ambito Virtual Coaching che si colloca l’idea di IConsulting che, con mAIcoach, cerca di ridefinire le regole degli allenamenti di tennis, assistere l’atleta e guidarlo nell’esecuzione corretta dei movimenti. Più nello specifico l’idea è quella di trasmettere un metodo matematico attraverso un sistema smart di valutazioni del tennista. L’utente potrà effettuare submit dei video del proprio allenamento e ricevere consigli e critiche costruttive al fine di migliorare le proprie posture ed i propri colpi.
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Chang. "Human motion detection and action recognition." HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ta, Anh Phuong. "Inexact graph matching techniques : application to object detection and human action recognition." Lyon, INSA, 2010. http://theses.insa-lyon.fr/publication/2010ISAL0099/these.pdf.

Full text
Abstract:
Object detection and human action recognition are two active fields of research in computer vision, which have applications ranging from robotics and video surveillance, medical image analysis, human-computer interactions to content-based video annotation and retrieval. At this time, building such robust recognition systems still remain very challenging tasks, because of the variations in action/object classes, different possible viewpoints, as well as illumination changes, moving cameras, complex dynamic backgrounds and occlusions. In this thesis, we deal with object and activity recognition problems. Despite differences in the applications’ goals, the associated fundamental problems share numerous properties, for instance the necessity of handling non-rigid transformations. Describing a model object or a video by a set of local features, we formulate the recognition problem as a graph matching problem, where nodes represent local features, and edges represent spatial and/or spatio-temporal relationships between them. Inexact matching of valued graphs is a well known NP-hard problem, therefore we concentrated on finding approximate solutions. To this end, the graph matching problem is formulated as an energy minimization problem. Based on this energy function, we propose two different solutions for the two applications: object detection in images and activity recognition in video sequences. We also propose new features to improve the conventional Bag of words model, which is widely used in computer vision. Experiments on both standard datasets and our own datasets, demonstrate that our methods provide good results regarding the recent state-of-the-art in both domains
La détection d’objets et la reconnaissance des activités humaines sont les deux domaines actifs dans la vision par ordinateur, qui trouve des applications en robotique, vidéo surveillance, analyse des images médicales, interaction homme-machine, annotation et recherche de la vidéo par le contenue. Actuellement, il reste encore très difficile de construire de tels systèmes, en raison des variations des classes d’objets et d’actions, les différents points de vue, ainsi que des changements d’illumination, des mouvements de caméra, des fonds dynamiques et des occlusions. Dans cette thèse, nous traitons le problème de la détection d’objet et d’activités dans la vidéo. Malgré ses différences de buts, les problèmes fondamentaux associés partagent de nombreuses propriétés, par exemple la nécessité de manipuler des transformations non-ridiges. En décrivant un modèle d’objet ou une vidéo par un ensemble des caractéristiques locales, nous formulons le problème de reconnaissance comme celui d’une mise en correspondance de graphes, dont les nœuds représentent les caractéristiques locales, et les arrêtes représentent les relations que l’on veut vérifier entre ces caractéristiques. Le problème de mise en correspondance inexacte de graphes est connu comme NP-difficile, nous avons donc porté notre effort sur des solutions approchées. Pour cela, le problème est transformé en problème d’optimisation d’une fonction d’énergie, qui contient un terme en rapport avec la distance entre les descripteurs locaux et d’autres termes en rapport avec les relations spatiales (ou/et temporelles) entre eux. Basé sur cette énergie, deux différentes solutions ont été proposées et validées pour les deux applications ciblées: la reconnaissance d’objets à partir d’images et la reconnaissance des activités dans la vidéo. En plus, nous avons également proposé un nouveaux descripteur pour améliorer les modèles de Sac-de-mots, qui sont largement utilisé dans la vision par ordinateur. Nos expérimentations sur deux bases standards, ainsi que sur nos bases démontrent que les méthodes proposées donnent de bons résultats en comparant avec l’état de l’art dans ces deux domaines
APA, Harvard, Vancouver, ISO, and other styles
7

Dittmar, George William. "Object Detection and Recognition in Natural Settings." PDXScholar, 2013. https://pdxscholar.library.pdx.edu/open_access_etds/926.

Full text
Abstract:
Much research as of late has focused on biologically inspired vision models that are based on our understanding of how the visual cortex processes information. One prominent example of such a system is HMAX [17]. HMAX attempts to simulate the biological process for object recognition in cortex based on the model proposed by Hubel & Wiesel [10]. This thesis investigates the ability of an HMAX-like system (GLIMPSE [20]) to perform object-detection in cluttered natural scenes. I evaluate these results using the StreetScenes database from MIT [1, 8]. This thesis addresses three questions: (1) Can the GLIMPSE-based object detection system replicate the results on object-detection reported by Bileschi using HMAX? (2) Which features computed by GLIMPSE lead to the best object-detection performance? (3) What effect does elimination of clutter in the training sets have on the performance of our system? As part of this thesis, I built an object detection and recognition system using GLIMPSE [20] and demonstrate that it approximately replicates the results reported in Bileschi's thesis. In addition, I found that extracting and combining features from GLIMPSE using different layers of the HMAX model gives the best overall invariance to position, scale and translation for recognition tasks, but comes with a much higher computational overhead. Further contributions include the creation of modified training and test sets based on the StreetScenes database, with removed clutter in the training data and extending the annotations for the detection task to cover more objects of interest that were not in the original annotations of the database.
APA, Harvard, Vancouver, ISO, and other styles
8

Higgs, David Robert. "Parts-based object detection using multiple views /." Link to online version, 2005. https://ritdml.rit.edu/dspace/handle/1850/1000.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Pan, Xiang. "Approaches for edge detection, pose determination and object representation in computer vision." Thesis, Heriot-Watt University, 1994. http://hdl.handle.net/10399/1378.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Tonge, Ashwini Kishor. "Object Recognition Using Scale-Invariant Chordiogram." Thesis, University of North Texas, 2017. https://digital.library.unt.edu/ark:/67531/metadc984116/.

Full text
Abstract:
This thesis describes an approach for object recognition using the chordiogram shape-based descriptor. Global shape representations are highly susceptible to clutter generated due to the background or other irrelevant objects in real-world images. To overcome the problem, we aim to extract precise object shape using superpixel segmentation, perceptual grouping, and connected components. The employed shape descriptor chordiogram is based on geometric relationships of chords generated from the pairs of boundary points of an object. The chordiogram descriptor applies holistic properties of the shape and also proven suitable for object detection and digit recognition mechanisms. Additionally, it is translation invariant and robust to shape deformations. In spite of such excellent properties, chordiogram is not scale-invariant. To this end, we propose scale invariant chordiogram descriptors and intend to achieve a similar performance before and after applying scale invariance. Our experiments show that we achieve similar performance with and without scale invariance for silhouettes and real world object images. We also show experiments at different scales to confirm that we obtain scale invariance for chordiogram.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Computer vision, object detection, action recognition"

1

Amit, Yali. 2D object detection and recognition: Models, algorithms, and networks. Cambridge, Mass: MIT Press, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

2D Object Detection and Recognition: Models, Algorithms, and Networks. The MIT Press, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Moving Object Detection Using Background Subtraction. Springer International Publishing AG, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Cyganek, Boguslaw. Object Detection and Recognition in Digital Images: Theory and Practice. Wiley & Sons, Incorporated, John, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Cyganek, Boguslaw. Object Detection and Recognition in Digital Images: Theory and Practice. Wiley & Sons, Limited, John, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cyganek, Boguslaw. Object Detection and Recognition in Digital Images: Theory and Practice. Wiley & Sons, Limited, John, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cyganek, Boguslaw. Object Detection and Recognition in Digital Images: Theory and Practice. Wiley & Sons, Incorporated, John, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cyganek, Boguslaw. Object Detection and Recognition in Digital Images: Theory and Practice. Wiley & Sons, Incorporated, John, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Chalupa, Leo M., and John S. Werner, eds. The Visual Neurosciences, 2-vol. set. The MIT Press, 2003. http://dx.doi.org/10.7551/mitpress/7131.001.0001.

Full text
Abstract:
An essential reference book for visual science. Visual science is the model system for neuroscience, its findings relevant to all other areas. This massive collection of papers by leading researchers in the field will become an essential reference for researchers and students in visual neuroscience, and will be of importance to researchers and professionals in other disciplines, including molecular and cellular biology, cognitive science, ophthalmology, psychology, computer science, optometry, and education. Over 100 chapters cover the entire field of visual neuroscience, from its historical foundations to the latest research and findings in molecular mechanisms and network modeling. The book is organized by topic—different sections cover such subjects as the history of vision science; developmental processes; retinal mechanisms and processes; organization of visual pathways; subcortical processing; processing in the primary visual cortex; detection and sampling; brightness and color; form, shape, and object recognition; motion, depth, and spatial relationships; eye movements; attention and cognition; and theoretical and computational perspectives. The list of contributors includes leading international researchers in visual science. Bradford Books imprint
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Computer vision, object detection, action recognition"

1

Gollapudi, Sunila. "Object Detection and Recognition." In Learn Computer Vision Using OpenCV, 97–117. Berkeley, CA: Apress, 2019. http://dx.doi.org/10.1007/978-1-4842-4261-2_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Xia, Jingran, Guowen Kuang, Xu Wang, Zhibin Chen, and Jinfeng Yang. "ORION: Orientation-Sensitive Object Detection." In Pattern Recognition and Computer Vision, 593–607. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-18916-6_47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Cheng, Qi, Yingjie Wu, Fei Chen, and Yilong Guo. "Balanced Loss for Accurate Object Detection." In Pattern Recognition and Computer Vision, 342–54. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60636-7_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Xiao, Xiaohua Xie, and Jianhuang Lai. "Convolutional LSTM Based Video Object Detection." In Pattern Recognition and Computer Vision, 99–109. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03335-4_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Liu, Zhuo, Xuemei Xie, and Xuyang Li. "Scene Semantic Guidance for Object Detection." In Pattern Recognition and Computer Vision, 355–65. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-88004-0_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hu, Zibo, Kun Gao, Xiaodian Zhang, and Zeyang Dou. "Noise Resistant Focal Loss for Object Detection." In Pattern Recognition and Computer Vision, 114–25. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60639-8_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Xie, Xuemei, Quan Liao, Lihua Ma, and Xing Jin. "Gated Feature Pyramid Network for Object Detection." In Pattern Recognition and Computer Vision, 199–208. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03341-5_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Zhao, Wenqing, and Hai Yan. "Penalty Non-maximum Suppression in Object Detection." In Pattern Recognition and Computer Vision, 90–102. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03341-5_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Junjie, and Weiyu Yu. "Multi-view LiDAR Guided Monocular 3D Object Detection." In Pattern Recognition and Computer Vision, 520–32. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-18916-6_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yang, Xinbo, Chenglong Li, Rui Ruan, Lei Liu, Wang Chao, and Bin Luo. "EllipseIoU: A General Metric for Aerial Object Detection." In Pattern Recognition and Computer Vision, 537–50. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-18913-5_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Computer vision, object detection, action recognition"

1

Pardo, Alejandro, Mengmeng Xu, Ali Thabet, Pablo Arbelaez, and Bernard Ghanem. "BAOD: Budget-Aware Object Detection." In LatinX in AI at Computer Vision and Pattern Recognition Conference 2021. Journal of LatinX in AI Research, 2021. http://dx.doi.org/10.52591/lxai202106254.

Full text
Abstract:
We study the problem of object detection from a novel perspective in which annotation budget constraints are taken into consideration, appropriately coined Budget Aware Object Detection (BAOD). When provided with a fixed budget, we propose a strategy for building a diverse and informative dataset that can be used to optimally train a robust detector. We investigate both optimization and learning-based methods to sample which images to annotate and what type of annotation (strongly or weakly supervised) to annotate them with. We adopt a hybrid supervised learning framework to train the object detector from both these types of annotation. We conduct a comprehensive empirical study showing that a handcrafted optimization method outperforms other selection techniques including random sampling, uncertainty sampling and active learning. By combining an optimal image/annotation selection scheme with hybrid supervised learning to solve the BAOD problem, we show that one can achieve the performance of a strongly supervised detector on PASCAL-VOC 2007 while saving 12.8% of its original annotation budget. Furthermore, when 100% of the budget is used, it surpasses this performance by 2.0 mAP percentage points.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhong, Xubin, Xian Qu, Changxing Ding, and Dacheng Tao. "Glance and Gaze: Inferring Action-aware Points for One-Stage Human-Object Interaction Detection." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. http://dx.doi.org/10.1109/cvpr46437.2021.01303.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ni, Bingbing, Xiaokang Yang, and Shenghua Gao. "Progressively Parsing Interactional Objects for Fine Grained Action Detection." In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016. http://dx.doi.org/10.1109/cvpr.2016.116.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mi, Peng, Jianghang Lin, Yiyi Zhou, Yunhang Shen, Gen Luo, Xiaoshuai Sun, Liujuan Cao, Rongrong Fu, Qiang Xu, and Rongrong Ji. "Active Teacher for Semi-Supervised Object Detection." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.01408.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yuan, Tianning, Fang Wan, Mengying Fu, Jianzhuang Liu, Songcen Xu, Xiangyang Ji, and Qixiang Ye. "Multiple Instance Active Learning for Object Detection." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. http://dx.doi.org/10.1109/cvpr46437.2021.00529.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Yu, Weiping, Sijie Zhu, Taojiannan Yang, and Chen Chen. "Consistency-based Active Learning for Object Detection." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2022. http://dx.doi.org/10.1109/cvprw56347.2022.00440.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gonzalez-Garcia, Abel, Alexander Vezhnevets, and Vittorio Ferrari. "An active search strategy for efficient object class detection." In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015. http://dx.doi.org/10.1109/cvpr.2015.7298921.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Fu, Qichen, Xingyu Liu, and Kris M. Kitani. "Sequential Voting with Relational Box Fields for Active Object Detection." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00241.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wu, Jiaxi, Jiaxin Chen, and Di Huang. "Entropy-based Active Learning for Object Detection with Progressive Diversity Constraint." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00918.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Vikas Desai, Sai, and Vineeth N. Balasubramanian. "Towards Fine-grained Sampling for Active Learning in Object Detection." In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020. http://dx.doi.org/10.1109/cvprw50498.2020.00470.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Computer vision, object detection, action recognition"

1

Bragdon, Sophia, Vuong Truong, and Jay Clausen. Environmentally informed buried object recognition. Engineer Research and Development Center (U.S.), November 2022. http://dx.doi.org/10.21079/11681/45902.

Full text
Abstract:
The ability to detect and classify buried objects using thermal infrared imaging is affected by the environmental conditions at the time of imaging, which leads to an inconsistent probability of detection. For example, periods of dense overcast or recent precipitation events result in the suppression of the soil temperature difference between the buried object and soil, thus preventing detection. This work introduces an environmentally informed framework to reduce the false alarm rate in the classification of regions of interest (ROIs) in thermal IR images containing buried objects. Using a dataset that consists of thermal images containing buried objects paired with the corresponding environmental and meteorological conditions, we employ a machine learning approach to determine which environmental conditions are the most impactful on the visibility of the buried objects. We find the key environmental conditions include incoming shortwave solar radiation, soil volumetric water content, and average air temperature. For each image, ROIs are computed using a computer vision approach and these ROIs are coupled with the most important environmental conditions to form the input for the classification algorithm. The environmentally informed classification algorithm produces a decision on whether the ROI contains a buried object by simultaneously learning on the ROIs with a classification neural network and on the environmental data using a tabular neural network. On a given set of ROIs, we have shown that the environmentally informed classification approach improves the detection of buried objects within the ROIs.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography