Log in

Relevant bibliographies by topics / Computer Vision, Object Recognition, Vision and Scene Understanding, Object Detection / Journal articles

Journal articles on the topic 'Computer Vision, Object Recognition, Vision and Scene Understanding, Object Detection'

To see the other types of publications on this topic, follow the link: Computer Vision, Object Recognition, Vision and Scene Understanding, Object Detection.

Author: Grafiati

Published: 11 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Computer Vision, Object Recognition, Vision and Scene Understanding, Object Detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Achirei, Ștefan-Daniel. "Short Literature Review for Visual Scene Understanding." Bulletin of the Polytechnic Institute of Iași. Electrical Engineering, Power Engineering, Electronics Section 67, no. 3 (September 1, 2021): 57–72. http://dx.doi.org/10.2478/bipie-2021-0017.

Full text

Abstract:

Abstract Individuals are highly accurate for visually understanding natural scenes. By extracting and extrapolating data we reach the highest stage of scene understanding. In the past few years it proved to be an essential part in computer vision applications. It goes further than object detection by bringing machine perceiving closer to the human one: integrates meaningful information and extracts semantic relationships and patterns. Researchers in computer vision focused on scene understanding algorithms, the aim being to obtain semantic knowledge from the environment and determine the properties of objects and the relations between them. For applications in robotics, gaming, assisted living, augmented reality, etc a fundamental task is to be aware of spatial position and capture depth information. First part of this paper focuses on deep learning solutions for scene recognition following the main leads: low-level features and object detection. In the second part we present extensively the most relevant datasets for visual scene understanding. We take into consideration both directions having in mind future applications.

APA, Harvard, Vancouver, ISO, and other styles

2

Singh, Ankita. "Face Mask Detection using Deep Learning to Manage Pandemic Guidelines." Journal of Management and Service Science (JMSS) 1, no. 2 (2021): 1–21. http://dx.doi.org/10.54060/jmss/001.02.003.

Full text

Abstract:

The field of Computer Vision is a branch of science of the computers and systems of software in which one can visualize and as well as comprehend the images and scenes given in the input. This field is consisting of numerous aspects for example image recognition, the detection of objects, generation of images, image super resolution and more others. Object detection is broadly utilized for the detection of faces, the detection of vehicles, counting of pedestrians on a certain street, images displayed on the web, security systems and cars with the feature of self-driving. This process also encompasses the precision of every technique for recognizing the objects. The detection of objects is a crucial task; however, it is also a very challenging vision task. It is an analytical subdivide of various applications such as searching of images, image auto-annotation or scene understanding and tracking of various objects. The tracking of objects in motion of video image sequence was one of the most important subjects in computer vision.

APA, Harvard, Vancouver, ISO, and other styles

3

Bao, Sid Yingze, Min Sun, and Silvio Savarese. "Toward coherent object detection and scene layout understanding." Image and Vision Computing 29, no. 9 (August 2011): 569–79. http://dx.doi.org/10.1016/j.imavis.2011.08.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Zolghadr, Esfandiar, and Borko Furht. "Context-Based Scene Understanding." International Journal of Multimedia Data Engineering and Management 7, no. 1 (January 2016): 22–40. http://dx.doi.org/10.4018/ijmdem.2016010102.

Full text

Abstract:

Context plays an important role in performance of object detection. There are two popular considerations in building context models for computer vision applications; type of context (semantic, spatial, scale) and scope of the relations (pairwise, high-order). In this paper, a new unified framework is presented that combines multiple sources of context in high-order relations to encode semantical coherence and consistency of the scenes. This framework introduces a new descriptor called context relevance score to model context-based distribution of the response variables and apply it to two distributions. First model incorporates context descriptor along with annotation response into a supervised Latent Dirichlet Allocation (LDA) built on multi-variate Bernoulli distribution called Context-Based LDA (CBLDA). The second model is based on multi-variate Wallenius' non-central Hyper-geometric distribution and is called Wallenius LDA (WLDA). WLDA incorporates context knowledge as bias parameter. Scene context is modeled as a graph and effectively used in object detection framework to maximize semantical consistency of the scene. The graph can also be used in recognition of out-of-context objects. Annotation metadata of Sun397 dataset is used to construct the context model. Performance of the proposed approaches was evaluated on ImageNet dataset. Comparison between proposed approaches and state-of-art multi-class object annotation algorithm shows superiority of presented approach in labeling of scene content.

APA, Harvard, Vancouver, ISO, and other styles

5

Sriram, K. V., and R. H. Havaldar. "Analytical review and study on object detection techniques in the image." International Journal of Modeling, Simulation, and Scientific Computing 12, no. 05 (May 21, 2021): 2150031. http://dx.doi.org/10.1142/s1793962321500318.

Full text

Abstract:

Object detection is the most fundamental but challenging issues in the field of computer vision. Object detection identifies the presence of various individual objects in an image. Great success is attained for object detection/recognition problems in the controlled environment, but still, the problem remains unsolved in the uncontrolled places, particularly, when the objects are placed in arbitrary poses in an occluded and cluttered environment. In the last few years, a lots of efforts are made by researchers to resolve this issue, because of its wide range of applications in computer vision tasks, like content-enabled image retrieval, event or activity recognition, scene understanding, and so on. This review provides a detailed survey of 50 research papers presenting the object detection techniques, like machine learning-based techniques, gradient-based techniques, Fast Region-based Convolutional Neural Network (Fast R-CNN) detector, and the foreground-based techniques. Here, the machine learning-based approaches are classified into deep learning-based approaches, random forest, Support Vector Machine (SVM), and so on. Moreover, the challenges faced by the existing techniques are explained in the gaps and issues section. The analysis based on the classification, toolset, datasets utilized, published year, and the performance metrics are discussed. The future dimension of the research is based on the gaps and issues identified from the existing research works.

APA, Harvard, Vancouver, ISO, and other styles

6

Wang, Chang, Jinyu Sun, Shiwei Ma, Yuqiu Lu, and Wang Liu. "Multi-stream Network for Human-object Interaction Detection." International Journal of Pattern Recognition and Artificial Intelligence 35, no. 08 (March 12, 2021): 2150025. http://dx.doi.org/10.1142/s0218001421500257.

Full text

Abstract:

Detecting the interaction between humans and objects in images is a critical problem for obtaining a deeper understanding of the visual relationship in a scene and also a critical technology in many practical applications, such as augmented reality, video surveillance and information retrieval. Be that as it may, due to the fine-grained actions and objects in the real scene and the coexistence of multiple interactions in one scene, the problem is far from being solved. This paper differs from prior approaches, which focused only on the features of instances, by proposing a method that utilizes a four-stream CNNs network for human-object interaction (HOI) detection. More detailed visual features, spatial features and pose features from human-object pairs are extracted to solve the challenging task of detection in images. Specially, the core idea is that the region where people interact with objects contains important identifying cues for specific action classes, and the detailed cues can be fused to facilitate HOI recognition. Experiments on two large-scale HOI public benchmarks, V-COCO and HICO-DET, are carried out and the results show the effectiveness of the proposed method.

APA, Harvard, Vancouver, ISO, and other styles

7

Achirei, Stefan-Daniel, Mihail-Cristian Heghea, Robert-Gabriel Lupu, and Vasile-Ion Manta. "Human Activity Recognition for Assisted Living Based on Scene Understanding." Applied Sciences 12, no. 21 (October 24, 2022): 10743. http://dx.doi.org/10.3390/app122110743.

Full text

Abstract:

The growing share of the population over the age of 65 is putting pressure on the social health insurance system, especially on institutions that provide long-term care services for the elderly or to people who suffer from chronic diseases or mental disabilities. This pressure can be reduced through the assisted living of the patients, based on an intelligent system for monitoring vital signs and home automation. In this regard, since 2008, the European Commission has financed the development of medical products and services through the ambient assisted living (AAL) program—Ageing Well in the Digital World. The SmartCare Project, which integrates the proposed Computer Vision solution, follows the European strategy on AAL. This paper presents an indoor human activity recognition (HAR) system based on scene understanding. The system consists of a ZED 2 stereo camera and a NVIDIA Jetson AGX processing unit. The recognition of human activity is carried out in two stages: all humans and objects in the frame are detected using a neural network, then the results are fed to a second network for the detection of interactions between humans and objects. The activity score is determined based on the human–object interaction (HOI) detections.

APA, Harvard, Vancouver, ISO, and other styles

8

Joshi, Rakesh Chandra, Saumya Yadav, Malay Kishore Dutta, and Carlos M. Travieso-Gonzalez. "Efficient Multi-Object Detection and Smart Navigation Using Artificial Intelligence for Visually Impaired People." Entropy 22, no. 9 (August 27, 2020): 941. http://dx.doi.org/10.3390/e22090941.

Full text

Abstract:

Visually impaired people face numerous difficulties in their daily life, and technological interventions may assist them to meet these challenges. This paper proposes an artificial intelligence-based fully automatic assistive technology to recognize different objects, and auditory inputs are provided to the user in real time, which gives better understanding to the visually impaired person about their surroundings. A deep-learning model is trained with multiple images of objects that are highly relevant to the visually impaired person. Training images are augmented and manually annotated to bring more robustness to the trained model. In addition to computer vision-based techniques for object recognition, a distance-measuring sensor is integrated to make the device more comprehensive by recognizing obstacles while navigating from one place to another. The auditory information that is conveyed to the user after scene segmentation and obstacle identification is optimized to obtain more information in less time for faster processing of video frames. The average accuracy of this proposed method is 95.19% and 99.69% for object detection and recognition, respectively. The time complexity is low, allowing a user to perceive the surrounding scene in real time.

APA, Harvard, Vancouver, ISO, and other styles

9

TIAN, MINGHUI, SHOUHONG WAN, and LIHUA YUE. "A VISUAL ATTENTION MODEL FOR NATURAL SCENES BASED ON DYNAMIC FEATURE COMBINATION." International Journal of Software Engineering and Knowledge Engineering 20, no. 08 (December 2010): 1077–95. http://dx.doi.org/10.1142/s0218194010005043.

Full text

Abstract:

In recent years, many research works indicate that human's visual attention is very helpful in some research areas that are related to computer vision, such as object recognition, scene understanding and object-based image/video retrieval or annotation. This paper presents a visual attention model for natural scenes based on a dynamic feature combination strategy. The model can be divided into three parts, which are feature extraction, dynamic feature combination and salient objects detection. First, the saliency features of color, information entropy and salient boundary are extracted from an original colored image. After that, two different evaluation measurements are proposed for two different categories of feature maps defined in this dynamic combination strategy, which measures the contribution of each feature map to saliency and carries out a dynamic weighting of individual feature maps. Finally, salient objects are located from an integrated saliency map and a computational method is given to simulate the location shift of the real human visual attention. Experimental results show that this model is effective and robust for saliency detection in natural scenes, also similar to the real human visual attention mechanism.

APA, Harvard, Vancouver, ISO, and other styles

10

XIAO, JIANGJIAN, HUI CHENG, FENG HAN, and HARPREET SAWHNEY. "GEO-BASED AERIAL SURVEILLANCE VIDEO PROCESSING FOR SCENE UNDERSTANDING AND OBJECT TRACKING." International Journal of Pattern Recognition and Artificial Intelligence 23, no. 07 (November 2009): 1285–307. http://dx.doi.org/10.1142/s0218001409007582.

Full text

Abstract:

This paper presents an approach to extract semantic layers from aerial surveillance videos for scene understanding and object tracking. The input videos are captured by low flying aerial platforms and typically consist of strong parallax from non-ground-plane structures as well as moving objects. Our approach leverages the geo-registration between video frames and reference images (such as those available from Terraserver and Google satellite imagery) to establish a unique geo-spatial coordinate system for pixels in the video. The geo-registration process enables Euclidean 3D reconstruction with absolute scale unlike traditional monocular structure from motion where continuous scale estimation over long periods of time is an issue. Geo-registration also enables correlation of video data to other stored information sources such as GIS (Geo-spatial Information System) databases. In addition to the geo-registration and 3D reconstruction aspects, the other key contributions of this paper also include: (1) providing a reliable geo-based solution to estimate camera pose for 3D reconstruction, (2) exploiting appearance and 3D shape constraints derived from geo-registered videos for labeling of structures such as buildings, foliage, and roads for scene understanding, and (3) elimination of moving object detection and tracking errors using 3D parallax constraints and semantic labels derived from geo-registered videos. Experimental results on extended time aerial video data demonstrates the qualitative and quantitative aspects of our work.

APA, Harvard, Vancouver, ISO, and other styles

11

He, Boyong, Xianjiang Li, Bo Huang, Enhui Gu, Weijie Guo, and Liaoni Wu. "UnityShip: A Large-Scale Synthetic Dataset for Ship Recognition in Aerial Images." Remote Sensing 13, no. 24 (December 9, 2021): 4999. http://dx.doi.org/10.3390/rs13244999.

Full text

Abstract:

As a data-driven approach, deep learning requires a large amount of annotated data for training to obtain a sufficiently accurate and generalized model, especially in the field of computer vision. However, when compared with generic object recognition datasets, aerial image datasets are more challenging to acquire and more expensive to label. Obtaining a large amount of high-quality aerial image data for object recognition and image understanding is an urgent problem. Existing studies show that synthetic data can effectively reduce the amount of training data required. Therefore, in this paper, we propose the first synthetic aerial image dataset for ship recognition, called UnityShip. This dataset contains over 100,000 synthetic images and 194,054 ship instances, including 79 different ship models in ten categories and six different large virtual scenes with different time periods, weather environments, and altitudes. The annotations include environmental information, instance-level horizontal bounding boxes, oriented bounding boxes, and the type and ID of each ship. This provides the basis for object detection, oriented object detection, fine-grained recognition, and scene recognition. To investigate the applications of UnityShip, the synthetic data were validated for model pre-training and data augmentation using three different object detection algorithms and six existing real-world ship detection datasets. Our experimental results show that for small-sized and medium-sized real-world datasets, the synthetic data achieve an improvement in model pre-training and data augmentation, showing the value and potential of synthetic data in aerial image recognition and understanding tasks.

APA, Harvard, Vancouver, ISO, and other styles

12

Lee, Alvin Wai Chung, Suet-Peng Yong, and Junzo Watada. "Global Thresholding for Scene Understanding Towards Autonomous Drone Navigation." Journal of Advanced Computational Intelligence and Intelligent Informatics 23, no. 5 (September 20, 2019): 909–19. http://dx.doi.org/10.20965/jaciii.2019.p0909.

Full text

Abstract:

Unmanned aerial vehicles, more typically known as drones are flying aircrafts that do not have a pilot onboard. For drones to fly through an area without GPS signals, developing scene understanding algorithms to assist in autonomous navigation will be useful. In this paper, various thresholding algorithms are evaluated to enhance scene understanding in addition to object detection. Based on the results obtained, Gaussian filter global thresholding can segment regions of interest in the scene effectively and provide the least cost of processing time.

APA, Harvard, Vancouver, ISO, and other styles

13

Soudy, Mohamed, Yasmine Afify, and Nagwa Badr. "Insights into few shot learning approaches for image scene classification." PeerJ Computer Science 7 (September 20, 2021): e666. http://dx.doi.org/10.7717/peerj-cs.666.

Full text

Abstract:

Image understanding and scene classification are keystone tasks in computer vision. The development of technologies and profusion of existing datasets open a wide room for improvement in the image classification and recognition research area. Notwithstanding the optimal performance of exiting machine learning models in image understanding and scene classification, there are still obstacles to overcome. All models are data-dependent that can only classify samples close to the training set. Moreover, these models require large data for training and learning. The first problem is solved by few-shot learning, which achieves optimal performance in object detection and classification but with a lack of eligible attention in the scene classification task. Motivated by these findings, in this paper, we introduce two models for few-shot learning in scene classification. In order to trace the behavior of those models, we also introduce two datasets (MiniSun; MiniPlaces) for image scene classification. Experimental results show that the proposed models outperform the benchmark approaches in respect of classification accuracy.

APA, Harvard, Vancouver, ISO, and other styles

14

Pan, Qingchao, and Haohua Zhang. "Key Algorithms of Video Target Detection and Recognition in Intelligent Transportation Systems." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 09 (December 16, 2019): 2055016. http://dx.doi.org/10.1142/s0218001420550162.

Full text

Abstract:

With the popularization of video detection and recognition systems and the advancement of video image processing technology, the application research of intelligent transportation systems based on computer vision technology has received more and more attention. It comprehensively utilizes image processing, pattern recognition, artificial intelligence and other technologies. It also involves processing and analyzing the video image sequence collected by the detection system, intelligently understanding the video content and making processing, and dealing with various problems such as accident information judgment, pedestrian and vehicle classification, traffic flow parameter detection, and moving target tracking. It promotes intelligent transportation systems to be more intelligent and practical, and provides comprehensive, real-time traffic status information for traffic management and control. Therefore, the research on the method of traffic information detection based on computer vision has important theoretical and practical significance. The detection and recognition of video targets is an important research direction in the field of intelligent transportation and computer vision. However, due to the background complexity, illumination changes, target occlusion and other factors in the detection and recognition environment, the application still faces many difficulties, and the robustness and accuracy of detection and recognition need to be further improved. In this paper, several key problems in video object detection and recognition are studied, including accurate segmentation of target and background, shadow in complex scenes; accurate classification of extracted foreground targets; and target recognition in complex background. In response to these problems, this paper proposes a corresponding solution.

APA, Harvard, Vancouver, ISO, and other styles

15

Nurwakit, Ahmat. "Literatur Riview Transliterasi Hurub Arab ke Latin." SAINTEKBU 12, no. 2 (August 26, 2020): 58–67. http://dx.doi.org/10.32764/saintekbu.v12i2.816.

Full text

Abstract:

Computer vision adalah bidang interdisiplin yang mempelajari tentang bagaimana komputer dapat melakukan pemahaman terhadap citra digital dan video. Dalam persepektif engineering computer vision ditujukan untuk melakukan automasi terhadap sistem visual manusia. Tahap dalam computer vision meliputi pengambilan (acquiring), pemrosesan (processing), analisis (analyzing) dan pemahaman (understanding) terhadap citra digital. Computer vision berfokus pada sistem cerdas yang dapat melakukan ekstraksi data dari citra digital ke dalam bentuk numerik Sub domain dari computer vision meliputi scene reconstruction, event detection, video tracking, object recognition, 3D pose estimation, learning, indexing, motion estimation, dan image restoration. Handwriting character recognition (pengenalan tulisan tangan) adalah salah satu cabang dari object recognition, yaitu kemampuan komputer untuk menerima dan menafsirkan input tulisan tangan yang dapat dimengerti dari sumber seperti dokumen kertas, foto, layar sentuh dan perangkat lainnya. Gambar dari teks tertulis dapat digunakan secara luring dari selembar kertas oleh pemindai optik (rekognisi karakter optik). Selain itu, gerakan ujung pena dapat dimengerti secara daring, misalnya dengan menggunakan permukaan layar komputer berbasis pena. Salah satu aksara yang dijadikan objek dalam pengenalan tulisan tangan adalah huruf huruf arab pegon (yang selanjutnya disebut pegon). Huruf pegon biasa digunakan dalam terjemah kitab kuning dalam bahasa daerah (umunya) Jawa, Sunda, atau Melayu. Kenyataannya terjemah kitab-kitab kuning klasik di indonesian lebih banyak menggunakan 3 bahasa tersebut, sehingga orang-orang yang tidak menguasai salah satu dari bahasa tersebut akan kesuliatan untuk mendapatkan terjemah.Berbeda dengan huruf arab baku, huruf pegon memiliki beberapa karakter yang merupakan rekayasa agar dapat dibaca menyesuaikan lidah bahasa daerah bersangkutan. Masalah yang muncul adalah huruf pegon tidak dapat dibaca oleh seseorang yang tidak memiliki kosa kata dalam bahasa bersangkutan sehingga membutuhkan proses penerjemahan. Proses penerjemahan sendiri hanya mungkin dilakukan jika kalimat tertulis dalam huruf latin. Pengenalan tulisan tangan saat ini sudah banyak dilakukan menggunakan banyak metode, terutama yang paling banyak dari varian Jaringan Syaraf Tiruan (neural network). Walaupun tingkat akurasi dari neural network tinggi, tetapi computatiion cost metode-metode ini sangat besar. Eigenspace adalah subspace dari aljabar linier yang terdiri dari sekumpulan eigen vector. Tiap eigen vector terbentuk dari banyak eigen value. Salah satu pemanfaatannya adalah eigenface yang digunakan dalam pengenalan wajah. Caranya adalah dengan mengubah citra digital ke dalam eigen value yang kemudian disusun menjadi eigen vector. Data uji kemudian akan dihitung jaraknya terhadap semua vector yang ada kemudian diambil yang nilainya paling dekat. Metode ini cukup low cost. Keyword: transliterasi, eigen, huruf arab, huruf latin

APA, Harvard, Vancouver, ISO, and other styles

16

Mauri, Antoine, Redouane Khemmar, Benoit Decoux, Madjid Haddad, and Rémi Boutteau. "Real-Time 3D Multi-Object Detection and Localization Based on Deep Learning for Road and Railway Smart Mobility." Journal of Imaging 7, no. 8 (August 12, 2021): 145. http://dx.doi.org/10.3390/jimaging7080145.

Full text

Abstract:

For smart mobility, autonomous vehicles, and advanced driver-assistance systems (ADASs), perception of the environment is an important task in scene analysis and understanding. Better perception of the environment allows for enhanced decision making, which, in turn, enables very high-precision actions. To this end, we introduce in this work a new real-time deep learning approach for 3D multi-object detection for smart mobility not only on roads, but also on railways. To obtain the 3D bounding boxes of the objects, we modified a proven real-time 2D detector, YOLOv3, to predict 3D object localization, object dimensions, and object orientation. Our method has been evaluated on KITTI’s road dataset as well as on our own hybrid virtual road/rail dataset acquired from the video game Grand Theft Auto (GTA) V. The evaluation of our method on these two datasets shows good accuracy, but more importantly that it can be used in real-time conditions, in road and rail traffic environments. Through our experimental results, we also show the importance of the accuracy of prediction of the regions of interest (RoIs) used in the estimation of 3D bounding box parameters.

APA, Harvard, Vancouver, ISO, and other styles

17

KODRATOFF, Y., and S. MOSCATELLI. "MACHINE LEARNING FOR OBJECT RECOGNITION AND SCENE ANALYSIS." International Journal of Pattern Recognition and Artificial Intelligence 08, no. 01 (February 1994): 259–304. http://dx.doi.org/10.1142/s0218001494000139.

Full text

Abstract:

Learning is a critical research field for autonomous computer vision systems. It can bring solutions to the knowledge acquisition bottleneck of image understanding systems. Recent developments of machine learning for computer vision are reported in this paper. We describe several different approaches for learning at different levels of the image understanding process, including learning 2-D shape models, learning strategic knowledge for optimizing model matching, learning for adaptive target recognition systems, knowledge acquisition of constraint rules for labelling and automatic parameter optimization for vision systems. Each approach will be commented on and its strong and weak points will be underlined. In conclusion we will suggest what could be the “ideal” learning system for vision.

APA, Harvard, Vancouver, ISO, and other styles

18

Oe, Shunichiro. "Special Issue on Vision." Journal of Robotics and Mechatronics 11, no. 2 (April 20, 1999): 87. http://dx.doi.org/10.20965/jrm.1999.p0087.

Full text

Abstract:

The widely used term <B>Computer Vision</B> applies to when computers are substituted for human visual information processing. As Real-world objects, except for characters, symbols, figures and photographs created by people, are 3-dimensional (3-D), their two-dimensional (2-D) images obtained by camera are produced by compressing 3-D information to 2-D. Many methods of 2-D image processing and pattern recognition have been developed and widely applied to industrial and medical processing, etc. Research work enabling computers to recognize 3-D objects by 3-D information extracted from 2-D images has been carried out in artificial intelligent robotics. Many techniques have been developed and some applied practically in scene analysis or 3-D measurement. These practical applications are based on image sensing, image processing, pattern recognition, image measurement, extraction of 3-D information, and image understanding. New techniques are constantly appearing. The title of this special issue is <B>Vision</B>, and it features 8 papers from basic computer vision theory to industrial applications. These papers include the following: Kohji Kamejima proposes a method to detect self-similarity in random image fields - the basis of human visual processing. Akio Nagasaka et al. developed a way to identify a real scene in real time using run-length encoding of video feature sequences. This technique will become a basis for active video recording and new robotic machine vision. Toshifumi Honda presents a method for visual inspection of solder joint by 3-D image analysis - a very important issue in the inspection of printed circuit boards. Saburo Okada et al. contribute a new technique on simultaneous measurement of shape and normal vector for specular objects. These methods are all useful for obtaining 3-D information. Masato Nakajima presents a human face identification method for security monitoring using 3-D gray-level information. Kenji Terada et al. propose a method of automatic counting passing people using image sensing. These two technologies are very useful in access control. Yoji. Ogawa presents a new image processing method for automatic welding in turbid water under a non-preparatory environment. Liu Wei et al. develop a method for detection and management of cutting-tool wear using visual sensors. We are certain that all of these papers will contribute greatly to the development of vision systems in robotics and mechatronics.

APA, Harvard, Vancouver, ISO, and other styles

19

Heikel, Edvard, and Leonardo Espinosa-Leal. "Indoor Scene Recognition via Object Detection and TF-IDF." Journal of Imaging 8, no. 8 (July 26, 2022): 209. http://dx.doi.org/10.3390/jimaging8080209.

Full text

Abstract:

Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision and natural language processing (YOLO and TF-IDF, respectively). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on.

APA, Harvard, Vancouver, ISO, and other styles

20

Bielecki, Andrzej, and Piotr Śmigielski. "Graph representation for two-dimensional scene understanding by the cognitive vision module." International Journal of Advanced Robotic Systems 14, no. 1 (December 23, 2016): 172988141668269. http://dx.doi.org/10.1177/1729881416682694.

Full text

Abstract:

In this article, the cognitive vision module of an autonomous flying robot is studied. The problem of the scene understanding by the robot, which flies on the high altitude, is analyzed. In such conditions, the examined scene can be regarded as two-dimensional. It is assumed that the robot operates in the urban-type environment. The scene representation is stored in the neighborhood graph that collects data about the objects locations, shapes, and their spatial relations. The fragments of the scene are understood by the robot in the context of neighborhoods of the objects. It is shown that such information can be effectively used for recognition of the object, while many objects of similar shape exist in the scene. In the proposed recognition process, not only the information about the shape of the object is utilized but also the spatial relations with other objects in its close neighborhood are examined.

APA, Harvard, Vancouver, ISO, and other styles

21

Jung, Minji, Heekyung Yang, and Kyungha Min. "Improving Deep Object Detection Algorithms for Game Scenes." Electronics 10, no. 20 (October 17, 2021): 2527. http://dx.doi.org/10.3390/electronics10202527.

Full text

Abstract:

The advancement and popularity of computer games make game scene analysis one of the most interesting research topics in the computer vision society. Among the various computer vision techniques, we employ object detection algorithms for the analysis, since they can both recognize and localize objects in a scene. However, applying the existing object detection algorithms for analyzing game scenes does not guarantee a desired performance, since the algorithms are trained using datasets collected from the real world. In order to achieve a desired performance for analyzing game scenes, we built a dataset by collecting game scenes and retrained the object detection algorithms pre-trained with the datasets from the real world. We selected five object detection algorithms, namely YOLOv3, Faster R-CNN, SSD, FPN and EfficientDet, and eight games from various game genres including first-person shooting, role-playing, sports, and driving. PascalVOC and MS COCO were employed for the pre-training of the object detection algorithms. We proved the improvement in the performance that comes from our strategy in two aspects: recognition and localization. The improvement in recognition performance was measured using mean average precision (mAP) and the improvement in localization using intersection over union (IoU).

APA, Harvard, Vancouver, ISO, and other styles

22

Drummond, Tom, and Terry Caelli. "Learning Task-Specific Object Recognition and Scene Understanding." Computer Vision and Image Understanding 80, no. 3 (December 2000): 315–48. http://dx.doi.org/10.1006/cviu.2000.0882.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Shen, Zong-Ying, Shiang-Yu Han, Li-Chen Fu, Pei-Yung Hsiao, Yo-Chung Lau, and Sheng-Jen Chang. "Deep convolution neural network with scene-centric and object-centric information for object detection." Image and Vision Computing 85 (May 2019): 14–25. http://dx.doi.org/10.1016/j.imavis.2019.03.004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Ben-Yosef, Guy, and Shimon Ullman. "Image interpretation above and below the object level." Interface Focus 8, no. 4 (June 15, 2018): 20180020. http://dx.doi.org/10.1098/rsfs.2018.0020.

Full text

Abstract:

Computational models of vision have advanced in recent years at a rapid rate, rivalling in some areas human-level performance. Much of the progress to date has focused on analysing the visual scene at the object level—the recognition and localization of objects in the scene. Human understanding of images reaches a richer and deeper image understanding both ‘below' the object level, such as identifying and localizing object parts and sub-parts, as well as ‘above’ the object level, such as identifying object relations, and agents with their actions and interactions. In both cases, understanding depends on recovering meaningful structures in the image, and their components, properties and inter-relations, a process referred here as ‘image interpretation'. In this paper, we describe recent directions, based on human and computer vision studies, towards human-like image interpretation, beyond the reach of current schemes, both below the object level, as well as some aspects of image interpretation at the level of meaningful configurations beyond the recognition of individual objects, and in particular, interactions between two people in close contact. In both cases the recognition process depends on the detailed interpretation of so-called ‘minimal images’, and at both levels recognition depends on combining ‘bottom-up' processing, proceeding from low to higher levels of a processing hierarchy, together with ‘top-down' processing, proceeding from high to lower levels stages of visual analysis.

APA, Harvard, Vancouver, ISO, and other styles

25

Szemenyei, Márton, and Ferenc Vajda. "3D Object Detection and Scene Optimization for Tangible Augmented Reality." Periodica Polytechnica Electrical Engineering and Computer Science 62, no. 2 (May 23, 2018): 25–37. http://dx.doi.org/10.3311/ppee.10482.

Full text

Abstract:

Object recognition in 3D scenes is one of the fundamental tasks in computer vision. It is used frequently in robotics or augmented reality applications [1]. In our work we intend to apply 3D shape recognition to create a Tangible Augmented Reality system that is able to pair virtual and real objects in natural indoors scenes. In this paper we present a method for arranging virtual objects in a real-world scene based on primitive shape graphs. For our scheme, we propose a graph node embedding algorithm for graphs with vectorial nodes and edges, and genetic operators designed to improve the quality of the global setup of virtual objects. We show that our methods improve the quality of the arrangement significantly.

APA, Harvard, Vancouver, ISO, and other styles

26

Hosseinyalamdary, S., and A. Yilmaz. "3D SUPER-RESOLUTION APPROACH FOR SPARSE LASER SCANNER DATA." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences II-3/W5 (August 19, 2015): 151–57. http://dx.doi.org/10.5194/isprsannals-ii-3-w5-151-2015.

Full text

Abstract:

Laser scanner point cloud has been emerging in Photogrammetry and computer vision to achieve high level tasks such as object tracking, object recognition and scene understanding. However, low cost laser scanners are noisy, sparse and prone to systematic errors. This paper proposes a novel 3D super resolution approach to reconstruct surface of the objects in the scene. This method works on sparse, unorganized point clouds and has superior performance over other surface recovery approaches. Since the proposed approach uses anisotropic diffusion equation, it does not deteriorate the object boundaries and it preserves topology of the object.

APA, Harvard, Vancouver, ISO, and other styles

27

Poojitha, L. "Anomalous Object Detection with Deep Learning." International Journal for Research in Applied Science and Engineering Technology 10, no. 6 (June 30, 2022): 3227–32. http://dx.doi.org/10.22214/ijraset.2022.44581.

Full text

Abstract:

Abstract: In many computers vision systems, object identification and monitoring are crucial characteristics. Object identification and monitoring is a difficult job in the fields of computer vision that attempts to identify, recognize and track things across a video series of pictures. It aids in the understanding and description of object behaviour rather than relying on human operators to monitor the computers. Its goal is to find moving things in a video clip or a security camera. On the other hand, rely heavily on computers, on the other hand many respect attributes approaches. The system collects a snapshot from the camera, process it as per the models requirements and the outputs the data to the tensorflow framework which provides a list of the frames discoveries and the objects dependability points and connections to their binding containers. Before exposing the object to the user, the software takes those connections and creates a rectangle adjacent to it. This research detects the presence of anomalous items in camera-captured sequences, with anomalies being things that correspond to categories that should not be present in a given scene.

APA, Harvard, Vancouver, ISO, and other styles

28

Juhás, Martin, Bohuslava Juhásová, Pavol Reménység, and Roman Danel. "Offline Machine Vision in the Production Cell Control." Research Papers Faculty of Materials Science and Technology Slovak University of Technology 27, no. 45 (September 1, 2019): 7–18. http://dx.doi.org/10.2478/rput-2019-0020.

Full text

Abstract:

Abstract The paper presents the possibility of using machine vision in the industrial area. The case study is oriented to indirect image processing in a robotic cell using a Matlab tool. Theoretical part of the contribution is devoted to the comparative analysis of various methods of the object detection and recognition. Analysis of the functionality, speed, performance and reliability of selected methods in the object detection and recognition area is processed. In the practical part, a method of implementing an indirect machine vision is designed to control the handling of objects detected and recognized on the basis of an operator requirement. Based on the analysis of the sample robotic workplace and the identified limitations, possibility of using the indirect computer vision is suggested. In such a case, the image of the workspace scene is saved on the storage and then processed by an external element. The processing result is further distributed in a defined form by a selected channel to the control component of the production cell.

APA, Harvard, Vancouver, ISO, and other styles

29

Golovin, Oleksandr. "Recognition of Geometric Figures and Determination of Their Characteristics by Means of Computer Vision." Cybernetics and Computer Technologies, no. 1 (June 30, 2022): 49–63. http://dx.doi.org/10.34229/2707-451x.22.1.6.

Full text

Abstract:

Introduction. Many computer vision applications often use procedures for recognizing various shapes and estimating their dimensional characteristics. The entire pipeline of such processing consists of several stages, each of which has no clearly defined boundaries. However, it can be divided into low, medium, and high-level processes. Low-level processes only deal with primitive operations such as preprocessing to reduce noise, enhance contrast, or sharpen images. The processes of this level are characterized by the fact that there are images at the input and output. Image processing at the middle level covers tasks such as segmentation, description of objects, and their compression into a form convenient for computer processing. Middle-level processes are characterized by the presence of images only at the input, and only signs and attributes extracted from images are received at the output. High-level processing involves “understanding” a set of recognized objects and recognizing their interactions. Using the example of the developed software models for recognizing figures and estimating their characteristics, it is shown that the image processing process is reduced to transforming spatial image data into metadata, compressing the amount of information, which leads to a significant increase in the importance of data. This indicates that at the input of the middle level, the image should be as informative as possible (with high contrast, no noise, artifacts, etc.) because after the transformation of the spatial image data into metadata, no further the procedures are not able to correct the data obtained by the video sensors in the direction of improving or increasing the information content. Recognition of figures in an image can be realized quite efficiently through the use of the procedure for determining the contours of figures. To do this, you need to determine the boundaries of objects and localize them in the image, often the first step for procedures such as separating objects from the background, image segmentation, detection and recognition of various objects, etc. The purpose of the article is to study the image processing pipeline from the moment of image fixation to the recognition of a certain set of figures (for example, geometric shapes, such as a triangle, quadrilateral, etc.) in an image, the development of software models for recognizing figures in an image, determining the center of mass figures by means of computer vision. Results. We proposed and tested some variants of nonlinear estimating problem. The properties of such problems depend on value of regulating parameter. The dependence of estimation on value of parameter was studied. It was defined a range for parameter's value for which estimating problem gives adequate result for initial task. Numerical examples show how much volume of calculations reduces when using a dynamic branching tree. Conclusions. The results obtained can be used in many applications of computer vision, for example, counting objects in a scene, estimating their parameters, estimating the distance between objects in a scene, etc. Keywords: contour, segmentation, image binarization, computer vision, histogram.

APA, Harvard, Vancouver, ISO, and other styles

30

Gundu, Sireesha, and Hussain Syed. "Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques." Sensors 23, no. 5 (February 25, 2023): 2569. http://dx.doi.org/10.3390/s23052569.

Full text

Abstract:

Activity recognition in unmanned aerial vehicle (UAV) surveillance is addressed in various computer vision applications such as image retrieval, pose estimation, object detection, object detection in videos, object detection in still images, object detection in video frames, face recognition, and video action recognition. In the UAV-based surveillance technology, video segments captured from aerial vehicles make it challenging to recognize and distinguish human behavior. In this research, to recognize a single and multi-human activity using aerial data, a hybrid model of histogram of oriented gradient (HOG), mask-regional convolutional neural network (Mask-RCNN), and bidirectional long short-term memory (Bi-LSTM) is employed. The HOG algorithm extracts patterns, Mask-RCNN extracts feature maps from the raw aerial image data, and the Bi-LSTM network exploits the temporal relationship between the frames for the underlying action in the scene. This Bi-LSTM network reduces the error rate to the greatest extent due to its bidirectional process. This novel architecture generates enhanced segmentation by utilizing the histogram gradient-based instance segmentation and improves the accuracy of classifying human activities using the Bi-LSTM approach. Experimental outcomes demonstrate that the proposed model outperforms the other state-of-the-art models and has achieved 99.25% accuracy on the YouTube-Aerial dataset.

APA, Harvard, Vancouver, ISO, and other styles

31

Kumar, Aayush, Amit Kumar, Avanish Chandra, and Indira Adak. "Custom Object Detection and Analysis in Real Time: YOLOv4." International Journal for Research in Applied Science and Engineering Technology 10, no. 5 (May 31, 2022): 3982–90. http://dx.doi.org/10.22214/ijraset.2022.43303.

Full text

Abstract:

Abstract: Object recognition is one of the most basic and complex problems in computer vision, which seeks to locate object instances from the enormous categories of already defined and readily available natural images. The object detection method aims to recognize all the objects or entities in the given picture and determine the categories and position information to achieve machine vision understanding. Several tactics have been put forward to solve this problem, which is more or less inspired by the principles based on Open Source Computer Vision Library (OpenCV) and Deep Learning. Some are relatively good, while others fail to detect objects with random geometric transformations. This paper proposes demonstrating the " HAWKEYE " application, a small initiative to build an application working on the principle of EEE i.e. (Explore→Experience→Evolve). Keywords: Convolution Neural Network, Object detection, Image classification, Deep learning, Open CV, Yolov4.

APA, Harvard, Vancouver, ISO, and other styles

32

Shashank and Indu Sreedevi. "Spatiotemporal Activity Mapping for Enhanced Multi-Object Detection with Reduced Resource Utilization." Electronics 12, no. 1 (December 22, 2022): 37. http://dx.doi.org/10.3390/electronics12010037.

Full text

Abstract:

The accuracy of data captured by sensors highly impacts the performance of a computer vision system. To derive highly accurate data, the computer vision system must be capable of identifying critical objects and activities in the field of sensors and reconfiguring the configuration space of the sensors in real time. The majority of modern reconfiguration systems rely on complex computations and thus consume lots of resources. This may not be a problem for systems with a continuous power supply, but it can be a major set-back for computer vision systems employing sensors with limited resources. Further, to develop an appropriate understanding of the scene, the computer vision system must correlate past and present events of the scene captured in the sensor’s field of view (FOV). To address the abovementioned problems, this article provides a simple yet efficient framework for a sensor’s reconfiguration. The framework performs a spatiotemporal evaluation of the scene to generate adaptive activity maps, based on which the sensors are reconfigured. The activity maps contain normalized values assigned to each pixel in the sensor’s FOV, called normalized pixel sensitivity, which represents the impact of activities or events on each pixel in the sensor’s FOV. The temporal relationship between the past and present events is developed by utilizing standard half-width Gaussian distribution. The framework further proposes a federated optical-flow-based filter to determine critical activities in the FOV. Based on the activity maps, the sensors are re-configured to align the center of the sensors to the most sensitive area (i.e., region of importance) of the field. The proposed framework is tested on multiple surveillance and sports datasets and outperforms the contemporary reconfiguration systems in terms of multi-object tracking accuracy (MOTA).

APA, Harvard, Vancouver, ISO, and other styles

33

Mishra, Ranjan Kumar, G. Y. Sandesh Reddy, and Himanshu Pathak. "The Understanding of Deep Learning: A Comprehensive Review." Mathematical Problems in Engineering 2021 (April 5, 2021): 1–15. http://dx.doi.org/10.1155/2021/5548884.

Full text

Abstract:

Deep learning is a computer-based modeling approach, which is made up of many processing layers that are used to understand the representation of data with several levels of abstraction. This review paper presents the state of the art in deep learning to highlight the major challenges and contributions in computer vision. This work mainly gives an overview of the current understanding of deep learning and their approaches in solving traditional artificial intelligence problems. These computational models enhanced its application in object detection, visual object recognition, speech recognition, face recognition, vision for driverless cars, virtual assistants, and many other fields such as genomics and drug discovery. Finally, this paper also showcases the current developments and challenges in training deep neural network.

APA, Harvard, Vancouver, ISO, and other styles

34

Vaca-Castano, Gonzalo, Niels DaVitoria Lobo, and Mubarak Shah. "Holistic object detection and image understanding." Computer Vision and Image Understanding 181 (April 2019): 1–13. http://dx.doi.org/10.1016/j.cviu.2019.02.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Yehia, Amany, and Shereen A. Taie. "RGB-D and corrupted images in assistive blind systems in smart cities." Bulletin of Electrical Engineering and Informatics 11, no. 4 (August 1, 2022): 1970–82. http://dx.doi.org/10.11591/eei.v11i4.3770.

Full text

Abstract:

Assistive blind systems or assistive systems for visually impaired in smart cities help visually impaired to perform their daily tasks faced two problems when using you only look once version 3 (YOLOv3) object detection. Object recognition is a significant technique used to recognize objects with different technologies, algorithms, and structures. Object detection is a computer vision technique that identifies and locates instances of objects in images or videos. YOLOv3 is the most recent object detection technique that introduces promising results. YOLOv3 object detection task is to determine all objects, their location, and their type of objects in the scene at once so it is faster than another object detection technique. This paper solved these two problems red green blue depth (RGB-D) and corrupted images. This paper introduces two novel ways in object detection that improves YOLOv3 technique to deal with corrupted images and RGB-D images. The first phase introduces a new prepossessing model for automatically handling RGB-D on YOLOv3 with an accuracy of 61.50% in detection and 57.02% in recognition. The second phase presents a preprocessing phase to handle corrupted images to use YOLOv3 architecture with high accuracy 77.39% in detection and 71.96% in recognition.

APA, Harvard, Vancouver, ISO, and other styles

36

Salunkhe, Akilesh, Manthan Raut, Shayantan Santra, and Sumedha Bhagwat. "Android-based object recognition application for visually impaired." ITM Web of Conferences 40 (2021): 03001. http://dx.doi.org/10.1051/itmconf/20214003001.

Full text

Abstract:

Detecting objects in real-time and converting them into an audio output was a challenging task. Recent advancement in computer vision has allowed the development of various real-time object detection applications. This paper describes a simple android app that would help the visually impaired people in understanding their surroundings. The information about the surrounding environment was captured through a phone’s camera where real-time object recognition through tensorflow’s object detection API was done. The detected objects were then converted into an audio output by using android’s text-to-speech library. Tensorflow lite made the offline processing of complex algorithms simple. The overall accuracy of the proposed system was found to be approximately 90%.

APA, Harvard, Vancouver, ISO, and other styles

37

Meghana, K. S. "Face Sketch Recognition Using Computer Vision." International Journal for Research in Applied Science and Engineering Technology 9, no. VII (July 25, 2021): 2005–9. http://dx.doi.org/10.22214/ijraset.2021.36806.

Full text

Abstract:

Now-a-days need for technologies for identification, detection and recognition of suspects has increased. One of the most common biometric techniques is face recognition, since face is the convenient way used by the people to identify each-other. Understanding how humans recognize face sketches drawn by artists is of significant value to both criminal investigators and forensic researchers in Computer Vision. However, studies say that hand-drawn face sketches are still very limited in terms of artists and number of sketches because after any incident a forensic artist prepares a victim’s sketches on behalf of the description provided by an eyewitness. Sometimes suspect uses special mask to hide some common features of faces like nose, eyes, lips, face-color etc. but the outliner features of face biometrics one could never hide. Here we concentrate on some specific facial geometric feature which could be used to calculate some ratio of similarities from the template photograph database against the forensic sketches. The project describes the design of a system for face sketch recognition by a computer vision approach like Discrete Cosine Transform (DCT), Local Binary Pattern Histogram (LBPH) algorithm and a supervised machine learning model called Support Vector Machine (SVM) for face recognition. Tkinter is the standard GUI library for Python. Python when combined with Tkinter provides a fast and easy way to create GUI applications. Tkinter provides a powerful object-oriented interface to the Tk GUI toolkit.

APA, Harvard, Vancouver, ISO, and other styles

38

Yang, Fan, and Yutai Rao. "Vision-Based Intelligent Vehicle Road Recognition and Obstacle Detection Method." International Journal of Pattern Recognition and Artificial Intelligence 34, no. 07 (October 18, 2019): 2050020. http://dx.doi.org/10.1142/s0218001420500202.

Full text

Abstract:

With the development of the world economy and the accelerating process of urbanization, cars have brought great convenience to people’s lives and activities, and have become an indispensable means of transportation. Intelligent vehicles have the important significance of reducing traffic accidents, improving transportation capacity and broad market prospects, and can lead the future development of the automotive industry, so they have received extensive attention. In the existing intelligent vehicle system, the laser radar is a well-deserved protagonist because of its excellent speed and precision. It is an indispensable part of achieving high-precision positioning, but to some extent, the price hindering its marketization is a major factor. Compared with lidar sensors, vision sensors have the advantages of fast sampling rate, light weight, low energy consumption and low price. Therefore, many domestic and foreign research institutions have listed them as the focus of research. However, the current vision-based intelligent vehicle environment sensing technology is also susceptible to factors such as illumination, climate and road type, resulting in insufficient accuracy and real-time performance of the algorithm. This paper takes the environment perception of intelligent vehicles as the research object, and conducts in-depth research on the existing problems in road recognition and obstacle detection algorithms, including road image vanishing point detection, road image segmentation problem, road scene based on binocular vision. Three-dimensional reconstruction and obstacle detection technology.

APA, Harvard, Vancouver, ISO, and other styles

39

Garg, Dr Kamaldeep. "Understanding the Purpose of Object Detection, Models to Detect Objects, Application Use and Benefits." International Journal on Future Revolution in Computer Science & Communication Engineering 8, no. 2 (June 30, 2022): 01–04. http://dx.doi.org/10.17762/ijfrcsce.v8i2.2066.

Full text

Abstract:

Object detection is the latest technology which is used to identify objects and instances of objects in ana image or a video. It is the computer vision technique which is useful in determining the instances of objects in the image and also to identify their locations. The advantage of this technique is that it gives accurate location of the objects and also helps to label them in the image. Object detection contributes a lot in major fields like crowd detection at a particular place, self-driving cars, helps to identify theft by video surveillance etc. Object detection is also used to identify the face recognition, pedestrian recognition which ensures safety on road, also used in image retrieval etc. There are many models which are used in object detection process to simplify the process and gives accurate and efficient results. This paper will discuss the various approaches which are used along with object detection process to enhance the quality of results it will provide.

APA, Harvard, Vancouver, ISO, and other styles

40

Ullah, Habib, Mohib Ullah, Sultan Daud Khan, and Faouzi Alaya Cheikh. "EVALUATING DEEP SEMI-SUPERVISED LEARNING METHODS FOR COMPUTER VISION APPLICATIONS." Electronic Imaging 2021, no. 6 (January 18, 2021): 313–1. http://dx.doi.org/10.2352/issn.2470-1173.2021.6.iriacv-313.

Full text

Abstract:

Deep semi-supervised learning (SSL) have been significantly investigated in the past few years due to its broad spectrum of theory, algorithms, and applications. The extensive use of the SSL methods is dominant in the field of computer vision, for example, image classification, human activity recognition, object detection, scene segmentation, and image generation. In spite of the significant success achieved in these domains, critically analyzing SSL methods on benchmark datasets still presents important challenges. In the literature, very limited reviews and surveys are available. In this paper, we present short but focused review about the most significant SSL methods. We analyze the basic theory of SSL and the differences among various SSL methods. Then, we present experimental analysis to compare these SSL methods using standard datasets. We also provide an insight into the challenges of the SSL methods.

APA, Harvard, Vancouver, ISO, and other styles

41

RAJARAM, DHIVYA, and KOGILAVANI SIVAKUMAR. "MOVING OBJECTS DETECTION, CLASSIFICATION AND TRACKING OF VIDEO STREAMING BY IMPROVED FEATURE EXTRACTION APPROACH USING K-SVM." DYNA 97, no. 3 (May 1, 2022): 274–80. http://dx.doi.org/10.6036/10304.

Full text

Abstract:

The computer vision plays a vital role in variety of applications such as traffic surveillance, robotics, human interaction devices, etc. The video surveillance system has designed to detect, track and classify the moving objects. The moving object detection, classification and tracking of video streaming has various challenges, which utilizes various novel approaches. The existing work uses spatiotemporal feature analysis using sample consistency algorithm for moving object detection and classification. It is not performed well with the complex scene on the video. The binary masking representation of moving object is the challenging task for the researchers. These video streams are partitioned based on the frames, shots, and scenes; here the proposed research work utilizes kernel-Support Vector Machine learning technique for moving object detection and tracking. In this approach, the MIO-TCD DATASET is used for moving object detection. Here the feature extraction is the major part of foreground and background analysis in the video streaming, which utilizes the vehicle features based video data. The SURF (Speeded-Up Robust Feature) feature is used to recognize/register the object and it also used for classification of moving objects. Here the optical flow method is to quantify the relative motion of object in the video streams. Based on the differences on the partitioned frames, the optical flow features hold the object for measuring the pixel of the moving objects. The feature extraction process is improved by combining feature class with intensity level of optical flow result, which makes the gradient analysis of first order derivative function. The proposed method achieves the result of recall, precision, and f-measures than the existing work. The proposed method is done with the help of MATLAB 2018a. Keywords: Computer Vision and Pattern Recognition; Kernel-SVM; SURF features; Optical Flow; Texture feature; Moving object detection, tracking and classification;

APA, Harvard, Vancouver, ISO, and other styles

42

Chou, Chien-Hsing, Yu-Sheng Su, Che-Ju Hsu, Kong-Chang Lee, and Ping-Hsuan Han. "Design of Desktop Audiovisual Entertainment System with Deep Learning and Haptic Sensations." Symmetry 12, no. 10 (October 19, 2020): 1718. http://dx.doi.org/10.3390/sym12101718.

Full text

Abstract:

In this study, we designed a four-dimensional (4D) audiovisual entertainment system called Sense. This system comprises a scene recognition system and hardware modules that provide haptic sensations for users when they watch movies and animations at home. In the scene recognition system, we used Google Cloud Vision to detect common scene elements in a video, such as fire, explosions, wind, and rain, and further determine whether the scene depicts hot weather, rain, or snow. Additionally, for animated videos, we applied deep learning with a single shot multibox detector to detect whether the animated video contained scenes of fire-related objects. The hardware module was designed to provide six types of haptic sensations set as line-symmetry to provide a better user experience. After the system considers the results of object detection via the scene recognition system, the system generates corresponding haptic sensations. The system integrates deep learning, auditory signals, and haptic sensations to provide an enhanced viewing experience.

APA, Harvard, Vancouver, ISO, and other styles

43

Savchenko, A. V., K. V. Demochkin, and I. S. Grechikhin. "Preference prediction based on a photo gallery analysis with scene recognition and object detection." Pattern Recognition 121 (January 2022): 108248. http://dx.doi.org/10.1016/j.patcog.2021.108248.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Mapurisa, W., and G. Sithole. "IMPROVED EDGE DETECTION FOR SATELLITE IMAGES." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences V-2-2022 (May 17, 2022): 185–92. http://dx.doi.org/10.5194/isprs-annals-v-2-2022-185-2022.

Full text

Abstract:

Abstract. Edges are a key feature employed in various computer vision applications namely segmentation, object recognition, feature tracking and 3D reconstruction. Edges provide key information with regards to object presence, shape, form and detail which aid in many computer vision tasks. While there are various edge detection techniques in literature, challenges in edge detection remain. Varying image contrast due to non uniform scene illumination and imaging resolution affects the edge information obtained from any given image. The edge detection results are characterised by missing edges, edge fragmentation and some false positive edges. Gradient based edge detectors are the most commonly used detectors. These detectors all suffer from aforementioned challenges. In this, paper we present an edge detection framework that aims to recover long unfragmented edges from satellite images. This is achieved by using an edge accumulator that operates on the entire edge detection parameter space. Gradient based edge detectors rely on thresholding to retrieve salient edges. This usually results in missed or noisy edges. To counter this, the accumulator is run over a wide parameter space, growing edges at each accumulator level while maintaining edge position using a localisation filter. The results are longer unbroken edges that are detected for most objects, even in shadowy regions and low contrast areas. The results show improved edge detection that preserves the form and detail of objects when compared to current gradient based detectors.

APA, Harvard, Vancouver, ISO, and other styles

45

Hassanin, Mohammed, Salman Khan, and Murat Tahtali. "Visual Affordance and Function Understanding." ACM Computing Surveys 54, no. 3 (June 2021): 1–35. http://dx.doi.org/10.1145/3446370.

Full text

Abstract:

Nowadays, robots are dominating the manufacturing, entertainment, and healthcare industries. Robot vision aims to equip robots with the capabilities to discover information, understand it, and interact with the environment, which require an agent to effectively understand object affordances and functions in complex visual domains. In this literature survey, first, “visual affordances” are focused on and current state-of-the-art approaches for solving relevant problems as well as open problems and research gaps are summarized. Then, sub-problems, such as affordance detection, categorization, segmentation, and high-level affordance reasoning, are specifically discussed. Furthermore, functional scene understanding and its prevalent descriptors used in the literature are covered. This survey also provides the necessary background to the problem, sheds light on its significance, and highlights the existing challenges for affordance and functionality learning.

APA, Harvard, Vancouver, ISO, and other styles

46

Ma, Xin, Yuzhao Zhang, Weiwei Zhang, Hongbo Zhou, and Haoran Yu. "SDWBF Algorithm: A Novel Pedestrian Detection Algorithm in the Aerial Scene." Drones 6, no. 3 (March 14, 2022): 76. http://dx.doi.org/10.3390/drones6030076.

Full text

Abstract:

Due to the large amount of video data from UAV aerial photography and the small target size from the aerial perspective, pedestrian detection in drone videos remains a challenge. To detect objects in UAV images quickly and accurately, a small-sized pedestrian detection algorithm based on the weighted fusion of static and dynamic bounding boxes is proposed. First, a weighted filtration algorithm for redundant frames was applied using the inter-frame pixel difference algorithm cascading vision and structural similarity, which solved the redundancy of the UAV video data, thereby reducing the delay. Second, the pre-training and detector learning datasets were scale matched to address the feature representation loss caused by the scale mismatch between datasets. Finally, the static bounding extracted by YOLOv4 and the motion bounding boxes extracted by LiteFlowNet were subject to the weighted fusion algorithm to enhance the semantic information and solve the problem of missing and multiple detections in UAV object detection. The experimental results showed that the small object recognition method proposed in this paper enabled reaching an mAP of 70.91% and an IoU of 57.53%, which were 3.51% and 2.05% higher than the mainstream target detection algorithm.

APA, Harvard, Vancouver, ISO, and other styles

47

Khan, Altaf, Alexander Chefranov, and Hasan Demirel. "Image-Level Structure Recognition Using Image Features, Templates, and Ensemble of Classifiers." Symmetry 12, no. 7 (June 30, 2020): 1072. http://dx.doi.org/10.3390/sym12071072.

Full text

Abstract:

Image-level structural recognition is an important problem for many applications of computer vision such as autonomous vehicle control, scene understanding, and 3D TV. A novel method, using image features extracted by exploiting predefined templates, each associated with individual classifier, is proposed. The template that reflects the symmetric structure consisting of a number of components represents a stage—a rough structure of an image geometry. The following image features are used: a histogram of oriented gradient (HOG) features showing the overall object shape, colors representing scene information, the parameters of the Weibull distribution features, reflecting relations between image statistics and scene structure, and local binary pattern (LBP) and entropy (E) values representing texture and scene depth information. Each of the individual classifiers learns a discriminative model and their outcomes are fused together using sum rule for recognizing the global structure of an image. The proposed method achieves an 86.25% recognition accuracy on the stage dataset and a 92.58% recognition rate on the 15-scene dataset, both of which are significantly higher than the other state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Barz, Michael, and Daniel Sonntag. "Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze." Sensors 21, no. 12 (June 16, 2021): 4143. http://dx.doi.org/10.3390/s21124143.

Full text

Abstract:

Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods.

APA, Harvard, Vancouver, ISO, and other styles

49

Srividhya, S. R., C. Kavitha, Wen-Cheng Lai, Vinodhini Mani, and Osamah Ibrahim Khalaf. "A Machine Learning Algorithm to Automate Vehicle Classification and License Plate Detection." Wireless Communications and Mobile Computing 2022 (June 6, 2022): 1–12. http://dx.doi.org/10.1155/2022/9273233.

Full text

Abstract:

In the field of intelligent transportation systems (ITS), video surveillance is a hot research topic; this surveillance is used in a variety of applications, such as detecting the cause of an accident, tracking down a specific vehicle, and discovering routes between major locations. Object detection and shadow elimination are the main tasks in this area. Object detection in computer vision is a critical and vital part of object and scene recognition, and its applications are vast in the fields of surveillance and artificial intelligence. Additionally, other challenges arise in regard to video surveillance, including the recognition of text. Based on shadow elevation, we present in this work an inner-outer outline profile (IOOPL) algorithm for detecting the three levels of object boundaries. A system of video surveillance monitoring of traffic can be incorporated into this method. It is essential to identify the type of detected objects in intelligent transportation systems (ITS) to track safely and estimate traffic parameters correctly. This work addresses the problem of not recognizing object shadows as part of the object itself in-vehicle image segmentation. This paper proposes an approach for detecting and segmenting vehicles by eliminating their shadow counterparts using the delta learning algorithm (Widrow-Hoff learning rule), where the system is trained with various types of vehicles according to their appearance, colors, and build types. An essential aspect of the intelligent transportation system is recognizing the type of the detected object so that it can be tracked reliably and the traffic parameters can be estimated correctly. Furthermore, we propose to classify vehicles using a machine learning algorithm consisting of artificial neural networks trained using the delta learning algorithm, a high-performance machine learning algorithm, to obtain information regarding their travels. The paper also presents a method for recognizing the number plate using text correlation and edge dilation techniques. In regard to video text recognition, number plate recognition is a challenging task.

APA, Harvard, Vancouver, ISO, and other styles

50

Et. al., Mohan kumar Shilpa ,. "An Effective Framework Using Region Merging and Learning Machine for Shadow Detection and Removal." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 12, no. 2 (April 10, 2021): 2506–14. http://dx.doi.org/10.17762/turcomat.v12i2.2098.

Full text

Abstract:

Moving cast shadows of moving objects significantly degrade the performance of many high-level computer vision applications such as object tracking, object classification, behavior recognition and scene interpretation. Because they possess similar motion characteristics with their objects, moving cast shadow detection is still challenging. In this paper, the foreground is detected by background subtraction and the shadow is detected by combination of Mean-Shift and Region Merging Segmentation. Using Gabor method, we obtain the moving targets with texture features. According to the characteristics of shadow in HSV space and texture feature, the shadow is detected and removed to eliminate the shadow interference for the subsequent processing of moving targets. Finally, to guarantee the integrity of shadows and objects for further image processing, a simple post-processing procedure is designed to refine the results, which also drastically improves the accuracy of moving shadow detection. Extensive experiments on publicly common datasets that the performance of the proposed framework is superior to representative state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!