Статті в журналах з теми "Multimodal object detection"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Multimodal object detection.

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Multimodal object detection".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Yang, Dongfang, Xing Liu, Hao He, and Yongfei Li. "Air-to-ground multimodal object detection algorithm based on feature association learning." International Journal of Advanced Robotic Systems 16, no. 3 (May 1, 2019): 172988141984299. http://dx.doi.org/10.1177/1729881419842995.

Повний текст джерела
Анотація:
Detecting objects on unmanned aerial vehicles is a hard task, due to the long visual distance and the subsequent small size and lack of view. Besides, the traditional ground observation manners based on visible light camera are sensitive to brightness. This article aims to improve the target detection accuracy in various weather conditions, by using both visible light camera and infrared camera simultaneously. In this article, an association network of multimodal feature maps on the same scene is used to design an object detection algorithm, which is the so-called feature association learning method. In addition, this article collects a new cross-modal detection data set and proposes a cross-modal object detection algorithm based on visible light and infrared observations. The experimental results show that the algorithm improves the detection accuracy of small objects in the air-to-ground view. The multimodal joint detection network can overcome the influence of illumination in different weather conditions, which provides a new detection means and ideas for the space-based unmanned platform to the small object detection task.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Kim, Jinsoo, and Jeongho Cho. "Exploring a Multimodal Mixture-Of-YOLOs Framework for Advanced Real-Time Object Detection." Applied Sciences 10, no. 2 (January 15, 2020): 612. http://dx.doi.org/10.3390/app10020612.

Повний текст джерела
Анотація:
To construct a safe and sound autonomous driving system, object detection is essential, and research on fusion of sensors is being actively conducted to increase the detection rate of objects in a dynamic environment in which safety must be secured. Recently, considerable performance improvements in object detection have been achieved with the advent of the convolutional neural network (CNN) structure. In particular, the YOLO (You Only Look Once) architecture, which is suitable for real-time object detection by simultaneously predicting and classifying bounding boxes of objects, is receiving great attention. However, securing the robustness of object detection systems in various environments still remains a challenge. In this paper, we propose a weighted mean-based adaptive object detection strategy that enhances detection performance through convergence of individual object detection results based on an RGB camera and a LiDAR (Light Detection and Ranging) for autonomous driving. The proposed system utilizes the YOLO framework to perform object detection independently based on image data and point cloud data (PCD). Each detection result is united to reduce the number of objects not detected at the decision level by the weighted mean scheme. To evaluate the performance of the proposed object detection system, tests on vehicles and pedestrians were carried out using the KITTI Benchmark Suite. Test results demonstrated that the proposed strategy can achieve detection performance with a higher mean average precision (mAP) for targeted objects than an RGB camera and is also robust against external environmental changes.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Xiao, Shouguan, and Weiping Fu. "Visual Relationship Detection with Multimodal Fusion and Reasoning." Sensors 22, no. 20 (October 18, 2022): 7918. http://dx.doi.org/10.3390/s22207918.

Повний текст джерела
Анотація:
Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Hong, Bowei, Yuandong Zhou, Huacheng Qin, Zhiqiang Wei, Hao Liu, and Yongquan Yang. "Few-Shot Object Detection Using Multimodal Sensor Systems of Unmanned Surface Vehicles." Sensors 22, no. 4 (February 15, 2022): 1511. http://dx.doi.org/10.3390/s22041511.

Повний текст джерела
Анотація:
The object detection algorithm is a key component for the autonomous operation of unmanned surface vehicles (USVs). However, owing to complex marine conditions, it is difficult to obtain large-scale, fully labeled surface object datasets. Shipborne sensors are often susceptible to external interference and have unsatisfying performance, compromising the results of traditional object detection tasks. In this paper, a few-shot surface object detection method is proposed based on multimodal sensor systems for USVs. The multi-modal sensors were used for three-dimensional object detection, and the ability of USVs to detect moving objects was enhanced, realizing metric learning-based few-shot object detection for USVs. Compared with conventional methods, the proposed method enhanced the classification results of few-shot tasks. The proposed approach achieves relatively better performance in three sampled sets of well-known datasets, i.e., 2%, 10%, 5% on average precision (AP) and 28%, 24%, 24% on average orientation similarity (AOS). Therefore, this study can be potentially used for various applications where the number of labeled data is not enough to acquire a compromising result.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Lin, Che-Tsung, Yen-Yi Wu, Po-Hao Hsu, and Shang-Hong Lai. "Multimodal Structure-Consistent Image-to-Image Translation." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11490–98. http://dx.doi.org/10.1609/aaai.v34i07.6814.

Повний текст джерела
Анотація:
Unpaired image-to-image translation is proven quite effective in boosting a CNN-based object detector for a different domain by means of data augmentation that can well preserve the image-objects in the translated images. Recently, multimodal GAN (Generative Adversarial Network) models have been proposed and were expected to further boost the detector accuracy by generating a diverse collection of images in the target domain, given only a single/labelled image in the source domain. However, images generated by multimodal GANs would achieve even worse detection accuracy than the ones by a unimodal GAN with better object preservation. In this work, we introduce cycle-structure consistency for generating diverse and structure-preserved translated images across complex domains, such as between day and night, for object detector training. Qualitative results show that our model, Multimodal AugGAN, can generate diverse and realistic images for the target domain. For quantitative comparisons, we evaluate other competing methods and ours by using the generated images to train YOLO, Faster R-CNN and FCN models and prove that our model achieves significant improvement and outperforms other methods on the detection accuracies and the FCN scores. Also, we demonstrate that our model could provide more diverse object appearances in the target domain through comparison on the perceptual distance metric.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Zhang, Liwei, Jiahong Lai, Zenghui Zhang, Zhen Deng, Bingwei He, and Yucheng He. "Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information." Complexity 2020 (September 25, 2020): 1–10. http://dx.doi.org/10.1155/2020/8810340.

Повний текст джерела
Анотація:
Multiobject Tracking (MOT) is one of the most important abilities of autonomous driving systems. However, most of the existing MOT methods only use a single sensor, such as a camera, which has the problem of insufficient reliability. In this paper, we propose a novel Multiobject Tracking method by fusing deep appearance features and motion information of objects. In this method, the locations of objects are first determined based on a 2D object detector and a 3D object detector. We use the Nonmaximum Suppression (NMS) algorithm to combine the detection results of the two detectors to ensure the detection accuracy in complex scenes. After that, we use Convolutional Neural Network (CNN) to learn the deep appearance features of objects and employ Kalman Filter to obtain the motion information of objects. Finally, the MOT task is achieved by associating the motion information and deep appearance features. A successful match indicates that the object was tracked successfully. A set of experiments on the KITTI Tracking Benchmark shows that the proposed MOT method can effectively perform the MOT task. The Multiobject Tracking Accuracy (MOTA) is up to 76.40% and the Multiobject Tracking Precision (MOTP) is up to 83.50%.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Gao, Yueqing, Huachun Zhou, Lulu Chen, Yuting Shen, Ce Guo, and Xinyu Zhang. "Cross-Modal Object Detection Based on a Knowledge Update." Sensors 22, no. 4 (February 10, 2022): 1338. http://dx.doi.org/10.3390/s22041338.

Повний текст джерела
Анотація:
As an important field of computer vision, object detection has been studied extensively in recent years. However, existing object detection methods merely utilize the visual information of the image and fail to mine the high-level semantic information of the object, which leads to great limitations. To take full advantage of multi-source information, a knowledge update-based multimodal object recognition model is proposed in this paper. Specifically, our method initially uses Faster R-CNN to regionalize the image, then applies a transformer-based multimodal encoder to encode visual region features (region-based image features) and textual features (semantic relationships between words) corresponding to pictures. After that, a graph convolutional network (GCN) inference module is introduced to establish a relational network in which the points denote visual and textual region features, and the edges represent their relationships. In addition, based on an external knowledge base, our method further enhances the region-based relationship expression capability through a knowledge update module. In summary, the proposed algorithm not only learns the accurate relationship between objects in different regions of the image, but also benefits from the knowledge update through an external relational database. Experimental results verify the effectiveness of the proposed knowledge update module and the independent reasoning ability of our model.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Kniaz, V. V., and P. Moshkantseva. "OBJECT RE-IDENTIFICATION USING MULTIMODAL AERIAL IMAGERY AND CONDITIONAL ADVERSARIAL NETWORKS." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIV-2/W1-2021 (April 15, 2021): 131–36. http://dx.doi.org/10.5194/isprs-archives-xliv-2-w1-2021-131-2021.

Повний текст джерела
Анотація:
Abstract. Object Re-Identification (ReID) is the task of matching a given object in the new environment with its image captured in a different environment. The input for a ReID method includes two sets of images. The probe set includes one or more images of the object that must be identified in the new environment. The gallery set includes images that may contain the object from the probe image. The ReID task’s complexity arises from the differences in the object appearance in the probe and gallery sets. Such difference may originate from changes in illumination or viewpoint locations for multiple cameras that capture images in the probe and gallery sets. This paper focuses on developing a deep learning ThermalReID framework for cross-modality object ReID in thermal images. Our framework aims to provide continuous object detection and re-identification while monitoring a region from a UAV. Given an input probe image captured in the visible range, our ThermalReID framework detects objects in a thermal image and performs the ReID. We evaluate our ThermalReID framework and modern baselines using various metrics. We use the IoU and mAP metrics for the object detection task. We use the cumulative matching characteristic (CMC) curves and normalized area-under-curve (nAUC) for the ReID task. The evaluation demonstrated encouraging results and proved that our ThermalReID framework outperforms existing baselines in the ReID accuracy. Furthermore, we demonstrated that the fusion of the semantic data with the input thermal gallery image increases the object detection and localization scores. We developed the ThermalReID framework for cross-modality object re-identification. We evaluated our framework and two modern baselines on the task of object ReID for four object classes. Our framework successfully performs object ReID in the thermal gallery image from the color probe image. The evaluation using real and synthetic data demonstrated that our ThermalReID framework increases the ReID accuracy compared to modern ReID baselines.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Muresan, Mircea Paul, Ion Giosan, and Sergiu Nedevschi. "Stabilization and Validation of 3D Object Position Using Multimodal Sensor Fusion and Semantic Segmentation." Sensors 20, no. 4 (February 18, 2020): 1110. http://dx.doi.org/10.3390/s20041110.

Повний текст джерела
Анотація:
The stabilization and validation process of the measured position of objects is an important step for high-level perception functions and for the correct processing of sensory data. The goal of this process is to detect and handle inconsistencies between different sensor measurements, which result from the perception system. The aggregation of the detections from different sensors consists in the combination of the sensorial data in one common reference frame for each identified object, leading to the creation of a super-sensor. The result of the data aggregation may end up with errors such as false detections, misplaced object cuboids or an incorrect number of objects in the scene. The stabilization and validation process is focused on mitigating these problems. The current paper proposes four contributions for solving the stabilization and validation task, for autonomous vehicles, using the following sensors: trifocal camera, fisheye camera, long-range RADAR (Radio detection and ranging), and 4-layer and 16-layer LIDARs (Light Detection and Ranging). We propose two original data association methods used in the sensor fusion and tracking processes. The first data association algorithm is created for tracking LIDAR objects and combines multiple appearance and motion features in order to exploit the available information for road objects. The second novel data association algorithm is designed for trifocal camera objects and has the objective of finding measurement correspondences to sensor fused objects such that the super-sensor data are enriched by adding the semantic class information. The implemented trifocal object association solution uses a novel polar association scheme combined with a decision tree to find the best hypothesis–measurement correlations. Another contribution we propose for stabilizing object position and unpredictable behavior of road objects, provided by multiple types of complementary sensors, is the use of a fusion approach based on the Unscented Kalman Filter and a single-layer perceptron. The last novel contribution is related to the validation of the 3D object position, which is solved using a fuzzy logic technique combined with a semantic segmentation image. The proposed algorithms have a real-time performance, achieving a cumulative running time of 90 ms, and have been evaluated using ground truth data extracted from a high-precision GPS (global positioning system) with 2 cm accuracy, obtaining an average error of 0.8 m.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Essen, Helmut, Wolfgang Koch, Sebastian Hantscher, Rüdiger Zimmermann, Paul Warok, Martin Schröder, Marek Schikora, and Goert Luedtke. "A multimodal sensor system for runway debris detection." International Journal of Microwave and Wireless Technologies 4, no. 2 (April 2012): 155–62. http://dx.doi.org/10.1017/s1759078712000116.

Повний текст джерела
Анотація:
For foreign object detection on runways, highly sensitive radar sensors give the opportunity to detect even very small objects, metallic and non-metallic, also under adverse weather conditions. As it is desirable for airport applications to install only small but robust installations along the traffic areas, millimeter-wave radars offer the advantage of small antenna apertures and miniaturized system hardware. A 220-GHz radar was developed, which is capable to serve this application, if several of these are netted to cover the whole traffic area. Although under fortunate conditions the radar allows a classification or even an identification of the debris, the complete system design incorporates 3-D time-of-flight cameras for assistance in the identification process, which are also distributed along the traffic areas. The system approach further relies upon a change detection algorithm on the netted information to discriminate non-stationary alarms and reduce the false alarm ratio.
Стилі APA, Harvard, Vancouver, ISO та ін.
11

Ezzy, Haitham, Motti Charter, Antonello Bonfante, and Anna Brook. "How the Small Object Detection via Machine Learning and UAS-Based Remote-Sensing Imagery Can Support the Achievement of SDG2: A Case Study of Vole Burrows." Remote Sensing 13, no. 16 (August 12, 2021): 3191. http://dx.doi.org/10.3390/rs13163191.

Повний текст джерела
Анотація:
Small mammals, and particularly rodents, are common inhabitants of farmlands, where they play key roles in the ecosystem, but when overabundant, they can be major pests, able to reduce crop production and farmers’ incomes, with tangible effects on the achievement of Sustainable Development Goals no 2 (SDG2, Zero Hunger) of the United Nations. Farmers do not currently have a standardized, accurate method of detecting the presence, abundance, and locations of rodents in their fields, and hence do not have environmentally efficient methods of rodent control able to promote sustainable agriculture oriented to reduce the environmental impacts of cultivation. New developments in unmanned aerial system (UAS) platforms and sensor technology facilitate cost-effective data collection through simultaneous multimodal data collection approaches at very high spatial resolutions in environmental and agricultural contexts. Object detection from remote-sensing images has been an active research topic over the last decade. With recent increases in computational resources and data availability, deep learning-based object detection methods are beginning to play an important role in advancing remote-sensing commercial and scientific applications. However, the performance of current detectors on various UAS-based datasets, including multimodal spatial and physical datasets, remains limited in terms of small object detection. In particular, the ability to quickly detect small objects from a large observed scene (at field scale) is still an open question. In this paper, we compare the efficiencies of applying one- and two-stage detector models to a single UAS-based image and a processed (via Pix4D mapper photogrammetric program) UAS-based orthophoto product to detect rodent burrows, for agriculture/environmental applications as to support farmer activities in the achievements of SDG2. Our results indicate that the use of multimodal data from low-cost UASs within a self-training YOLOv3 model can provide relatively accurate and robust detection for small objects (mAP of 0.86 and an F1-score of 93.39%), and can deliver valuable insights for field management with high spatial precision able to reduce the environmental costs of crop production in the direction of precision agriculture management.
Стилі APA, Harvard, Vancouver, ISO та ін.
12

Yoon, Sungan, and Jeongho Cho. "Deep Multimodal Detection in Reduced Visibility Using Thermal Depth Estimation for Autonomous Driving." Sensors 22, no. 14 (July 6, 2022): 5084. http://dx.doi.org/10.3390/s22145084.

Повний текст джерела
Анотація:
Recently, the rapid development of convolutional neural networks (CNN) has consistently improved object detection performance using CNN and has naturally been implemented in autonomous driving due to its operational potential in real-time. Detecting moving targets to realize autonomous driving is an essential task for the safety of drivers and pedestrians, and CNN-based moving target detectors have shown stable performance in fair weather. However, there is a considerable drop in detection performance during poor weather conditions like hazy or foggy situations due to particles in the atmosphere. To ensure stable moving object detection, an image restoration process with haze removal must be accompanied. Therefore, this paper proposes an image dehazing network that estimates the current weather conditions and removes haze using the haze level to improve the detection performance under poor weather conditions due to haze and low visibility. Combined with the thermal image, the restored image is assigned to the two You Only Look Once (YOLO) object detectors, respectively, which detect moving targets independently and improve object detection performance using late fusion. The proposed model showed improved dehazing performance compared with the existing image dehazing models and has proved that images taken under foggy conditions, the poorest weather for autonomous driving, can be restored to normal images. Through the fusion of the RGB image restored by the proposed image dehazing network with thermal images, the proposed model improved the detection accuracy by up to 22% or above in a dense haze environment like fog compared with models using existing image dehazing techniques.
Стилі APA, Harvard, Vancouver, ISO та ін.
13

Yu, Yan, Siyu Zou, and Kejie Yin. "A novel detection fusion network for solid waste sorting." International Journal of Advanced Robotic Systems 17, no. 5 (September 1, 2020): 172988142094177. http://dx.doi.org/10.1177/1729881420941779.

Повний текст джерела
Анотація:
Vision-based object detection technology plays a very important role in the field of computer vision. It is widely used in many machine vision applications. However, in the specific application scenarios, like a solid waste sorting system, it is very difficult to obtain good accuracy due to the color information of objects that is badly damaged. In this work, we propose a novel multimodal convolutional neural network method for RGB-D solid waste object detection. The depth information is introduced as the new modal to improve the object detection performance. Our method fuses two individual features in multiple scales, which forms an end-to-end network. We evaluate our method on the self-constructed solid waste data set. In comparison with single modal detection and other popular cross modal fusion neural networks, our method achieves remarkable results with high validity, reliability, and real-time detection speed.
Стилі APA, Harvard, Vancouver, ISO та ін.
14

Dupty, Mohammed Haroon, Zhen Zhang, and Wee Sun Lee. "Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 10737–44. http://dx.doi.org/10.1609/aaai.v34i07.6702.

Повний текст джерела
Анотація:
We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object). We observe that given a pair of bounding box proposals, objects often participate in multiple relations implying the distribution of triplets is multimodal. We leverage the strong correlations within triplets to learn the joint distribution of triplet variables conditioned on the image and the bounding box proposals, doing away with the hitherto used independent distribution of triplets. To make learning the triplet joint distribution feasible, we introduce a novel technique of learning conditional triplet distributions in the form of their normalized low rank non-negative tensor decompositions. Normalized tensor decompositions take form of mixture distributions of discrete variables and thus are able to capture multimodality. This allows us to efficiently learn higher order discrete multimodal distributions and at the same time keep the parameter size manageable. We further model the probability of selecting an object proposal pair and include a relation triplet prior in our model. We show that each part of the model improves performance and the combination outperforms state-of-the-art score on the Visual Genome (VG) and Visual Relationship Detection (VRD) datasets.
Стилі APA, Harvard, Vancouver, ISO та ін.
15

Dou, Shaosong, Zhiquan Feng, Jinglan Tian, Xue Fan, Ya Hou, and Xin Zhang. "An Intention Understanding Algorithm Based on Multimodal Information Fusion." Scientific Programming 2021 (November 18, 2021): 1–11. http://dx.doi.org/10.1155/2021/8354015.

Повний текст джерела
Анотація:
This paper proposes an intention understanding algorithm (KDI) based on an elderly service robot, which combines Neural Network with a seminaive Bayesian classifier to infer user’s intention. KDI algorithm uses CNN to analyze gesture and action information, and YOLOV3 is used for object detection to provide scene information. Then, we enter them into a seminaive Bayesian classifier and set key properties as super parent to enhance its contribution to an intent, realizing intention understanding based on prior knowledge. In addition, we introduce the actual distance between the users and objects and give each object a different purpose to implement intent understanding based on object-user distance. The two methods are combined to enhance the intention understanding. The main contributions of this paper are as follows: (1) an intention reasoning model (KDI) is proposed based on prior knowledge and distance, which combines Neural Network with seminaive Bayesian classifier. (2) A set of robot accompanying systems based on the robot is formed, which is applied in the elderly service scene.
Стилі APA, Harvard, Vancouver, ISO та ін.
16

Fu, Qiang. "Application and Analysis of RGB-D Salient Object Detection in Photographic Camera Vision Processing." Journal of Sensors 2022 (September 24, 2022): 1–10. http://dx.doi.org/10.1155/2022/5125346.

Повний текст джерела
Анотація:
To identify the most visually salient regions in a set of paired RGB and depth maps, in this paper, we propose a multimodal feature fusion supervised RGB-D image saliency detection network, which learns RGB and depth data by two independent streams separately, uses a dual-stream side-supervision module to obtain saliency maps based on RGB and depth features for each layer of the network separately, and then uses a multimodal feature fusion module to fuse the latter 3 layers of RGB and depth high-dimensional information to generate high-level significant prediction results. Experiments on three publicly available datasets show that the proposed network outperforms the current mainstream RGB-D saliency detection models with strong robustness due to the use of a dual-stream side-surveillance module and a multimodal feature fusion module. We use the proposed RGB-D SOD model for background defocusing in realistic scenes and achieve excellent visual results.
Стилі APA, Harvard, Vancouver, ISO та ін.
17

Diaz, Carlos, and Shahram Payandeh. "Multimodal Sensing Interface for Haptic Interaction." Journal of Sensors 2017 (2017): 1–24. http://dx.doi.org/10.1155/2017/2072951.

Повний текст джерела
Анотація:
This paper investigates the integration of a multimodal sensing system for exploring limits of vibrato tactile haptic feedback when interacting with 3D representation of real objects. In this study, the spatial locations of the objects are mapped to the work volume of the user using a Kinect sensor. The position of the user’s hand is obtained using the marker-based visual processing. The depth information is used to build a vibrotactile map on a haptic glove enhanced with vibration motors. The users can perceive the location and dimension of remote objects by moving their hand inside a scanning region. A marker detection camera provides the location and orientation of the user’s hand (glove) to map the corresponding tactile message. A preliminary study was conducted to explore how different users can perceive such haptic experiences. Factors such as total number of objects detected, object separation resolution, and dimension-based and shape-based discrimination were evaluated. The preliminary results showed that the localization and counting of objects can be attained with a high degree of success. The users were able to classify groups of objects of different dimensions based on the perceived haptic feedback.
Стилі APA, Harvard, Vancouver, ISO та ін.
18

Ivorra, Eugenio, Mario Ortega, José Catalán, Santiago Ezquerro, Luis Lledó, Nicolás Garcia-Aracil, and Mariano Alcañiz. "Intelligent Multimodal Framework for Human Assistive Robotics Based on Computer Vision Algorithms." Sensors 18, no. 8 (July 24, 2018): 2408. http://dx.doi.org/10.3390/s18082408.

Повний текст джерела
Анотація:
Assistive technologies help all persons with disabilities to improve their accessibility in all aspects of their life. The AIDE European project contributes to the improvement of current assistive technologies by developing and testing a modular and adaptive multimodal interface customizable to the individual needs of people with disabilities. This paper describes the computer vision algorithms part of the multimodal interface developed inside the AIDE European project. The main contribution of this computer vision part is the integration with the robotic system and with the other sensory systems (electrooculography (EOG) and electroencephalography (EEG)). The technical achievements solved herein are the algorithm for the selection of objects using the gaze, and especially the state-of-the-art algorithm for the efficient detection and pose estimation of textureless objects. These algorithms were tested in real conditions, and were thoroughly evaluated both qualitatively and quantitatively. The experimental results of the object selection algorithm were excellent (object selection over 90%) in less than 12 s. The detection and pose estimation algorithms evaluated using the LINEMOD database were similar to the state-of-the-art method, and were the most computationally efficient.
Стилі APA, Harvard, Vancouver, ISO та ін.
19

Ren, Guoyin, Xiaoqi Lu, and Yuhao Li. "Research on 24-Hour Dense Crowd Counting and Object Detection System Based on Multimodal Image Optimization Feature Fusion." Scientific Programming 2022 (September 16, 2022): 1–21. http://dx.doi.org/10.1155/2022/9863066.

Повний текст джерела
Анотація:
Motivation. In the environment of day and night video surveillance, in order to improve the accuracy of machine vision dense crowd counting and target detection, this paper designs a day and night dual-purpose crowd counting and crowd detection network based on multimode image fusion. Methods. Two sub-models, RGBD-Net and RGBT-Net, are designed in this paper. The depth image features and thermal imaging features are effectively fused with the features of visible light images, so that the model has stronger anti-interference characteristics and robustness to the light noise interference caused by the sudden fall of light at night. The above models use density map regression-guided detection method to complete population counting and detection. Results. The model completed daytime training and testing on MICC dataset. Through verification, the average absolute error of the model was 1.025, the mean square error was 1.521, and the recall rate of target detection was 97.11%. Night vision training and testing were completed on the RGBT-CC dataset. After verification, the average absolute error of the network was 18.16, the mean square error was 32.14, and the recall rate of target detection was 97.65%. By verifying the effectiveness of the multimode medium-term fusion network, it is found to exceed the current most advanced bimodal fusion method. Conclusion. The experimental results show that the proposed multimodal fusion network can solve the counting and detection problem in the video surveillance environment during day and night. The ablation experiment further proves the effectiveness of the parameters of the two models.
Стилі APA, Harvard, Vancouver, ISO та ін.
20

Li, Jianing, Xiao Wang, Lin Zhu, Jia Li, Tiejun Huang, and Yonghong Tian. "Retinomorphic Object Detection in Asynchronous Visual Streams." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 1332–40. http://dx.doi.org/10.1609/aaai.v36i2.20021.

Повний текст джерела
Анотація:
Due to high-speed motion blur and challenging illumination, conventional frame-based cameras have encountered an important challenge in object detection tasks. Neuromorphic cameras that output asynchronous visual streams instead of intensity frames, by taking the advantage of high temporal resolution and high dynamic range, have brought a new perspective to address the challenge. In this paper, we propose a novel problem setting, retinomorphic object detection, which is the first trial that integrates foveal-like and peripheral-like visual streams. Technically, we first build a large-scale multimodal neuromorphic object detection dataset (i.e., PKU-Vidar-DVS) over 215.5k spatio-temporal synchronized labels. Then, we design temporal aggregation representations to preserve the spatio-temporal information from asynchronous visual streams. Finally, we present a novel bio-inspired unifying framework to fuse two sensing modalities via a dynamic interaction mechanism. Our experimental evaluation shows that our approach has significant improvements over the state-of-the-art methods with the single-modality, especially in high-speed motion and low-light scenarios. We hope that our work will attract further research into this newly identified, yet crucial research direction. Our dataset can be available at https://www.pkuml.org/resources/pku-vidar-dvs.html.
Стилі APA, Harvard, Vancouver, ISO та ін.
21

Motlicek, Petr, Stefan Duffner, Danil Korchagin, Hervé Bourlard, Carl Scheffler, Jean-Marc Odobez, Giovanni Del Galdo, Markus Kallinger, and Oliver Thiergart. "Real-Time Audio-Visual Analysis for Multiperson Videoconferencing." Advances in Multimedia 2013 (2013): 1–21. http://dx.doi.org/10.1155/2013/175745.

Повний текст джерела
Анотація:
We describe the design of a system consisting of several state-of-the-art real-time audio and video processing components enabling multimodal stream manipulation (e.g., automatic online editing for multiparty videoconferencing applications) in open, unconstrained environments. The underlying algorithms are designed to allow multiple people to enter, interact, and leave the observable scene with no constraints. They comprise continuous localisation of audio objects and its application for spatial audio object coding, detection, and tracking of faces, estimation of head poses and visual focus of attention, detection and localisation of verbal and paralinguistic events, and the association and fusion of these different events. Combined all together, they represent multimodal streams with audio objects and semantic video objects and provide semantic information for stream manipulation systems (like a virtual director). Various experiments have been performed to evaluate the performance of the system. The obtained results demonstrate the effectiveness of the proposed design, the various algorithms, and the benefit of fusing different modalities in this scenario.
Стилі APA, Harvard, Vancouver, ISO та ін.
22

Liang, Fangfang, Lijuan Duan, Wei Ma, Yuanhua Qiao, and Jun Miao. "A deep multimodal feature learning network for RGB-D salient object detection." Computers & Electrical Engineering 92 (June 2021): 107006. http://dx.doi.org/10.1016/j.compeleceng.2021.107006.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
23

Wu, Junwei, Wujie Zhou, Ting Luo, Lu Yu, and Jingsheng Lei. "Multiscale multilevel context and multimodal fusion for RGB-D salient object detection." Signal Processing 178 (January 2021): 107766. http://dx.doi.org/10.1016/j.sigpro.2020.107766.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
24

Kim, Jongwon, and Jeongho Cho. "RGDiNet: Efficient Onboard Object Detection with Faster R-CNN for Air-to-Ground Surveillance." Sensors 21, no. 5 (March 1, 2021): 1677. http://dx.doi.org/10.3390/s21051677.

Повний текст джерела
Анотація:
An essential component for the autonomous flight or air-to-ground surveillance of a UAV is an object detection device. It must possess a high detection accuracy and requires real-time data processing to be employed for various tasks such as search and rescue, object tracking and disaster analysis. With the recent advancements in multimodal data-based object detection architectures, autonomous driving technology has significantly improved, and the latest algorithm has achieved an average precision of up to 96%. However, these remarkable advances may be unsuitable for the image processing of UAV aerial data directly onboard for object detection because of the following major problems: (1) Objects in aerial views generally have a smaller size than in an image and they are uneven and sparsely distributed throughout an image; (2) Objects are exposed to various environmental changes, such as occlusion and background interference; and (3) The payload weight of a UAV is limited. Thus, we propose employing a new real-time onboard object detection architecture, an RGB aerial image and a point cloud data (PCD) depth map image network (RGDiNet). A faster region-based convolutional neural network was used as the baseline detection network and an RGD, an integration of the RGB aerial image and the depth map reconstructed by the light detection and ranging PCD, was utilized as an input for computational efficiency. Performance tests and evaluation of the proposed RGDiNet were conducted under various operating conditions using hand-labeled aerial datasets. Consequently, it was shown that the proposed method has a superior performance for the detection of vehicles and pedestrians than conventional vision-based methods.
Стилі APA, Harvard, Vancouver, ISO та ін.
25

Dai, Chenguang, Zhenchao Zhang, and Dong Lin. "An Object-Based Bidirectional Method for Integrated Building Extraction and Change Detection between Multimodal Point Clouds." Remote Sensing 12, no. 10 (May 24, 2020): 1680. http://dx.doi.org/10.3390/rs12101680.

Повний текст джерела
Анотація:
Building extraction and change detection are two important tasks in the remote sensing domain. Change detection between airborne laser scanning data and photogrammetric data is vulnerable to dense matching errors, mis-alignment errors and data gaps. This paper proposes an unsupervised object-based method for integrated building extraction and change detection. Firstly, terrain, roofs and vegetation are extracted from the precise laser point cloud, based on “bottom-up” segmentation and clustering. Secondly, change detection is performed in an object-based bidirectional manner: Heightened buildings and demolished buildings are detected by taking the laser scanning data as reference, while newly-built buildings are detected by taking the dense matching data as reference. Experiments on two urban data sets demonstrate its effectiveness and robustness. The object-based change detection achieves a recall rate of 92.31% and a precision rate of 88.89% for the Rotterdam dataset; it achieves a recall rate of 85.71% and a precision rate of 100% for the Enschede dataset. It can not only extract unchanged building footprints, but also assign heightened or demolished labels to the changed buildings.
Стилі APA, Harvard, Vancouver, ISO та ін.
26

Gonzalez, Alejandro, David Vazquez, Antonio M. Lopez, and Jaume Amores. "On-Board Object Detection: Multicue, Multimodal, and Multiview Random Forest of Local Experts." IEEE Transactions on Cybernetics 47, no. 11 (November 2017): 3980–90. http://dx.doi.org/10.1109/tcyb.2016.2593940.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
27

Wei, Zhiqing, Fengkai Zhang, Shuo Chang, Yangyang Liu, Huici Wu, and Zhiyong Feng. "MmWave Radar and Vision Fusion for Object Detection in Autonomous Driving: A Review." Sensors 22, no. 7 (March 25, 2022): 2542. http://dx.doi.org/10.3390/s22072542.

Повний текст джерела
Анотація:
With autonomous driving developing in a booming stage, accurate object detection in complex scenarios attract wide attention to ensure the safety of autonomous driving. Millimeter wave (mmWave) radar and vision fusion is a mainstream solution for accurate obstacle detection. This article presents a detailed survey on mmWave radar and vision fusion based obstacle detection methods. First, we introduce the tasks, evaluation criteria, and datasets of object detection for autonomous driving. The process of mmWave radar and vision fusion is then divided into three parts: sensor deployment, sensor calibration, and sensor fusion, which are reviewed comprehensively. Specifically, we classify the fusion methods into data level, decision level, and feature level fusion methods. In addition, we introduce three-dimensional(3D) object detection, the fusion of lidar and vision in autonomous driving and multimodal information fusion, which are promising for the future. Finally, we summarize this article.
Стилі APA, Harvard, Vancouver, ISO та ін.
28

Monir, Islam A., Mohamed W. Fakhr, and Nashwa El-Bendary. "Multimodal deep learning model for human handover classification." Bulletin of Electrical Engineering and Informatics 11, no. 2 (April 1, 2022): 974–85. http://dx.doi.org/10.11591/eei.v11i2.3690.

Повний текст джерела
Анотація:
Giving and receiving objects between humans and robots is a critical task which collaborative robots must be able to do. In order for robots to achieve that, they must be able to classify different types of human handover motions. Previous works did not mainly focus on classifying the motion type from both giver and receiver perspectives. However, they solely focused on object grasping, handover detection, and handover classification from one side only (giver/receiver). This paper discusses the design and implementation of different deep learning architectures with long short term memory (LSTM) network; and different feature selection techniques for human handover classification from both giver and receiver perspectives. Classification performance while using unimodal and multimodal deep learning models is investigated. The data used for evaluation is a publicly available dataset with four different modalities: motion tracking sensors readings, Kinect readings for 15 joints positions, 6-axis inertial sensor readings, and video recordings. The multimodality added a huge boost in the classification performance; achieving 96% accuracy with the feature selection based deep learning architecture.
Стилі APA, Harvard, Vancouver, ISO та ін.
29

Tong, Xunwei, Ruifeng Li, Lijun Zhao, Lianzheng Ge, and Ke Wang. "Manipulated-Object Detection and Pose Estimation Based on Multimodal Feature Points and Neighboring Patches." Sensors and Materials 32, no. 4 (March 30, 2020): 0. http://dx.doi.org/10.18494/sam..2539.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
30

Tong, Xunwei, Ruifeng Li, Lijun Zhao, Lianzheng Ge, and Ke Wang. "Manipulated-object Detection and Pose Estimation Based on Multimodal Feature Points and Neighboring Patches." Sensors and Materials 32, no. 4 (April 10, 2020): 1171. http://dx.doi.org/10.18494/sam.2020.2539.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
31

Tian, Yonglin, Kunfeng Wang, Yuang Wang, Yulin Tian, Zilei Wang, and Fei-Yue Wang. "Adaptive and azimuth-aware fusion network of multimodal local features for 3D object detection." Neurocomputing 411 (October 2020): 32–44. http://dx.doi.org/10.1016/j.neucom.2020.05.086.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
32

Yamaguchi, Akihiko, and Christopher G. Atkeson. "Tactile Behaviors with the Vision-Based Tactile Sensor FingerVision." International Journal of Humanoid Robotics 16, no. 03 (June 2019): 1940002. http://dx.doi.org/10.1142/s0219843619400024.

Повний текст джерела
Анотація:
This paper introduces a vision-based tactile sensor FingerVision, and explores its usefulness in tactile behaviors. FingerVision consists of a transparent elastic skin marked with dots, and a camera that is easy to fabricate, low cost, and physically robust. Unlike other vision-based tactile sensors, the complete transparency of the FingerVision skin provides multimodal sensation. The modalities sensed by FingerVision include distributions of force and slip, and object information such as distance, location, pose, size, shape, and texture. The slip detection is very sensitive since it is obtained by computer vision directly applied to the output from the FingerVision camera. It provides high-resolution slip detection, which does not depend on the contact force, i.e., it can sense slip of a lightweight object that generates negligible contact force. The tactile behaviors explored in this paper include manipulations that utilize this feature. For example, we demonstrate that grasp adaptation with FingerVision can grasp origami, and other deformable and fragile objects such as vegetables, fruits, and raw eggs.
Стилі APA, Harvard, Vancouver, ISO та ін.
33

Kandylakis, Zacharias, Konstantinos Vasili, and Konstantinos Karantzalos. "Fusing Multimodal Video Data for Detecting Moving Objects/Targets in Challenging Indoor and Outdoor Scenes." Remote Sensing 11, no. 4 (February 21, 2019): 446. http://dx.doi.org/10.3390/rs11040446.

Повний текст джерела
Анотація:
Single sensor systems and standard optical—usually RGB CCTV video cameras—fail to provide adequate observations, or the amount of spectral information required to build rich, expressive, discriminative features for object detection and tracking tasks in challenging outdoor and indoor scenes under various environmental/illumination conditions. Towards this direction, we have designed a multisensor system based on thermal, shortwave infrared, and hyperspectral video sensors and propose a processing pipeline able to perform in real-time object detection tasks despite the huge amount of the concurrently acquired video streams. In particular, in order to avoid the computationally intensive coregistration of the hyperspectral data with other imaging modalities, the initially detected targets are projected through a local coordinate system on the hypercube image plane. Regarding the object detection, a detector-agnostic procedure has been developed, integrating both unsupervised (background subtraction) and supervised (deep learning convolutional neural networks) techniques for validation purposes. The detected and verified targets are extracted through the fusion and data association steps based on temporal spectral signatures of both target and background. The quite promising experimental results in challenging indoor and outdoor scenes indicated the robust and efficient performance of the developed methodology under different conditions like fog, smoke, and illumination changes.
Стилі APA, Harvard, Vancouver, ISO та ін.
34

Chen, Yuyang, and Feng Pan. "Multimodal detection of hateful memes by applying a vision-language pre-training model." PLOS ONE 17, no. 9 (September 12, 2022): e0274300. http://dx.doi.org/10.1371/journal.pone.0274300.

Повний текст джерела
Анотація:
Detrimental to individuals and society, online hateful messages have recently become a major social issue. Among them, one new type of hateful message, “hateful meme”, has emerged and brought difficulties in traditional deep learning-based detection. Because hateful memes were formatted with both text captions and images to express users’ intents, they cannot be accurately identified by singularly analyzing embedded text captions or images. In order to effectively detect a hateful meme, the algorithm must possess strong vision and language fusion capability. In this study, we move closer to this goal by feeding a triplet by stacking the visual features, object tags, and text features of memes generated by the object detection model named Visual features in Vision-Language (VinVl) and the optical character recognition (OCR) technology into a Transformer-based Vision-Language Pre-Training Model (VL-PTM) OSCAR+ to perform the cross-modal learning of memes. After fine-tuning and connecting to a random forest (RF) classifier, our model (OSCAR+RF) achieved an average accuracy and AUROC of 0.684 and 0.768, respectively, on the hateful meme detection task in a public test set, which was higher than the other eleven (11) published baselines. In conclusion, this study has demonstrated that VL-PTMs with the addition of anchor points can improve the performance of deep learning-based detection of hateful memes by involving a more substantial alignment between the text caption and visual information.
Стилі APA, Harvard, Vancouver, ISO та ін.
35

Wang, Qingwang, Yongke Chi, Tao Shen, Jian Song, Zifeng Zhang, and Yan Zhu. "Improving RGB-Infrared Object Detection by Reducing Cross-Modality Redundancy." Remote Sensing 14, no. 9 (April 22, 2022): 2020. http://dx.doi.org/10.3390/rs14092020.

Повний текст джерела
Анотація:
In the field of remote sensing image applications, RGB and infrared image object detection is an important technology. The object detection performance can be improved and the robustness of the algorithm will be enhanced by making full use of their complementary information. Existing RGB-infrared detection methods do not explicitly encourage RGB and infrared images to achieve effective multimodal learning. We find that when fusing RGB and infrared images, cross-modal redundant information weakens the degree of complementary information fusion. Inspired by this observation, we propose a redundant information suppression network (RISNet) which suppresses cross-modal redundant information and facilitates the fusion of RGB-Infrared complementary information. Specifically, we design a novel mutual information minimization module to reduce the redundancy between RGB appearance features and infrared radiation features, which enables the network to take full advantage of the complementary advantages of multimodality and improve the object detection performance. In addition, in view of the drawbacks of the current artificial classification of lighting conditions, such as the subjectivity of artificial classification and the lack of comprehensiveness (divided into day and night only), we propose a method based on histogram statistics to classify lighting conditions in more detail. Experimental results on two public RGB-infrared object detection datasets demonstrate the superiorities of our proposed method over the state-of-the-art approaches, especially under challenging conditions such as poor illumination, complex background, and low contrast.
Стилі APA, Harvard, Vancouver, ISO та ін.
36

Li, Minle, Yihua Hu, Nanxiang Zhao, and Qishu Qian. "One-Stage Multi-Sensor Data Fusion Convolutional Neural Network for 3D Object Detection." Sensors 19, no. 6 (March 23, 2019): 1434. http://dx.doi.org/10.3390/s19061434.

Повний текст джерела
Анотація:
Three-dimensional (3D) object detection has important applications in robotics, automatic loading, automatic driving and other scenarios. With the improvement of devices, people can collect multi-sensor/multimodal data from a variety of sensors such as Lidar and cameras. In order to make full use of various information advantages and improve the performance of object detection, we proposed a Complex-Retina network, a convolution neural network for 3D object detection based on multi-sensor data fusion. Firstly, a unified architecture with two feature extraction networks was designed, and the feature extraction of point clouds and images from different sensors realized synchronously. Then, we set a series of 3D anchors and projected them to the feature maps, which were cropped into 2D anchors with the same size and fused together. Finally, the object classification and 3D bounding box regression were carried out on the multipath of fully connected layers. The proposed network is a one-stage convolution neural network, which achieves the balance between the accuracy and speed of object detection. The experiments on KITTI datasets show that the proposed network is superior to the contrast algorithms in average precision (AP) and time consumption, which shows the effectiveness of the proposed network.
Стилі APA, Harvard, Vancouver, ISO та ін.
37

Wu, Zhaoli, Xuehan Wu, Yuancai Zhu, Jingxuan Zhai, Haibo Yang, Zhiwei Yang, Chao Wang, and Jilong Sun. "Research on Multimodal Image Fusion Target Detection Algorithm Based on Generative Adversarial Network." Wireless Communications and Mobile Computing 2022 (January 24, 2022): 1–10. http://dx.doi.org/10.1155/2022/1740909.

Повний текст джерела
Анотація:
In this paper, we propose a target detection algorithm based on adversarial discriminative domain adaptation for infrared and visible image fusion using unsupervised learning methods to reduce the differences between multimodal image information. Firstly, this paper improves the fusion model based on generative adversarial network and uses the fusion algorithm based on the dual discriminator generative adversarial network to generate high-quality IR-visible fused images and then blends the IR and visible images into a ternary dataset and combines the triple angular loss function to do migration learning. Finally, the fused images are used as the input images of faster RCNN object detection algorithm for detection, and a new nonmaximum suppression algorithm is used to improve the faster RCNN target detection algorithm, which further improves the target detection accuracy. Experiments prove that the method can achieve mutual complementation of multimodal feature information and make up for the lack of information in single-modal scenes, and the algorithm achieves good detection results for information from both modalities (infrared and visible light).
Стилі APA, Harvard, Vancouver, ISO та ін.
38

Holgado, Alexis Carlos, Tito Pradhono Tomo, Sophon Somlor, and Shigeki Sugano. "A Multimodal, Adjustable Sensitivity, Digital 3-Axis Skin Sensor Module." Sensors 20, no. 11 (June 1, 2020): 3128. http://dx.doi.org/10.3390/s20113128.

Повний текст джерела
Анотація:
This paper presents major improvements to a multimodal, adjustable sensitivity skin sensor module. It employs a geomagnetic 3-axis Hall effect sensor to measure changes in the position of a magnetic field generated by an electromagnet. The electromagnet is mounted on a flexible material, and different current values can be supplied to it, enabling adjustments to the sensitivity of the sensor during operation. Capacitive sensing has been added in this iteration of the module, with two sensing modalities: “pre-touch” detection with proximity sensing and normal force capacitive sensing. The sensor has been designed to be interconnected with other sensor modules to be able to cover large surfaces of a robot with normal and shear force sensing and object proximity detection. Furthermore, this paper introduces important size reductions of the previous sensor design, calibration results, and further analysis of other sensor characteristics.
Стилі APA, Harvard, Vancouver, ISO та ін.
39

Aldabbas, Omar Subhi. "Sensing as a Service for the Internet of Things." International Journal of Recent Contributions from Engineering, Science & IT (iJES) 4, no. 3 (October 26, 2016): 28. http://dx.doi.org/10.3991/ijes.v4i3.6087.

Повний текст джерела
Анотація:
Internet of Things (IoT) is a ubiquitous embedded ecosystem known for its capability to perform common application functions through coordinating resources, which are distributed on-object or on-network domains. As new applications evolve, the challenge is in the analysis and usage of multimodal data streamed by diverse kinds of sensors. This paper presents a new service-centric approach for data collection and retrieval. This approach considers objects as highly decentralized, composite and cost effective services. Such services can be constructed from objects located within close geographical proximity to retrieve spatio-temporal events from the gathered sensor data. To achieve this, we advocate Coordination languages and models to fuse multimodal, heterogeneous services through interfacing with every service to achieve the network objective according to the data they gather and analyze. In this paper we give an application scenario that illustrates the implementation of the coordination models to provision successful collaboration among IoT objects to retrieve information. The proposed solution reduced the communication delay before service composition by up to 43% and improved the target detection accuracy by up to 70%, while maintaining energy consumption 20% lower than its best rivals in the literature.
Стилі APA, Harvard, Vancouver, ISO та ін.
40

Rehman, Amjad, Tanzila Saba, Muhammad Zeeshan Khan, Robertas Damaševičius, and Saeed Ali Bahaj. "Internet-of-Things-Based Suspicious Activity Recognition Using Multimodalities of Computer Vision for Smart City Security." Security and Communication Networks 2022 (October 5, 2022): 1–12. http://dx.doi.org/10.1155/2022/8383461.

Повний текст джерела
Анотація:
Automatic human activity recognition is one of the milestones of smart city surveillance projects. Human activity detection and recognition aim to identify the activities based on the observations that are being performed by the subject. Hence, vision-based human activity recognition systems have a wide scope in video surveillance, health care systems, and human-computer interaction. Currently, the world is moving towards a smart and safe city concept. Automatic human activity recognition is the major challenge of smart city surveillance. The proposed research work employed fine-tuned YOLO-v4 for activity detection, whereas for classification purposes, 3D-CNN has been implemented. Besides the classification, the presented research model also leverages human-object interaction with the help of intersection over union (IOU). An Internet of Things (IoT) based architecture is implemented to take efficient and real-time decisions. The dataset of exploit classes has been taken from the UCF-Crime dataset for activity recognition. At the same time, the dataset extracted from MS-COCO for suspicious object detection is involved in human-object interaction. This research is also applied to human activity detection and recognition in the university premises for real-time suspicious activity detection and automatic alerts. The experiments have exhibited that the proposed multimodal approach achieves remarkable activity detection and recognition accuracy.
Стилі APA, Harvard, Vancouver, ISO та ін.
41

Gupta, Abhishek, and Xavier Fernando. "Simultaneous Localization and Mapping (SLAM) and Data Fusion in Unmanned Aerial Vehicles: Recent Advances and Challenges." Drones 6, no. 4 (March 28, 2022): 85. http://dx.doi.org/10.3390/drones6040085.

Повний текст джерела
Анотація:
This article presents a survey of simultaneous localization and mapping (SLAM) and data fusion techniques for object detection and environmental scene perception in unmanned aerial vehicles (UAVs). We critically evaluate some current SLAM implementations in robotics and autonomous vehicles and their applicability and scalability to UAVs. SLAM is envisioned as a potential technique for object detection and scene perception to enable UAV navigation through continuous state estimation. In this article, we bridge the gap between SLAM and data fusion in UAVs while also comprehensively surveying related object detection techniques such as visual odometry and aerial photogrammetry. We begin with an introduction to applications where UAV localization is necessary, followed by an analysis of multimodal sensor data fusion to fuse the information gathered from different sensors mounted on UAVs. We then discuss SLAM techniques such as Kalman filters and extended Kalman filters to address scene perception, mapping, and localization in UAVs. The findings are summarized to correlate prevalent and futuristic SLAM and data fusion for UAV navigation, and some avenues for further research are discussed.
Стилі APA, Harvard, Vancouver, ISO та ін.
42

Gardner, Marcus, C. Sebastian Mancero Castillo, Samuel Wilson, Dario Farina, Etienne Burdet, Boo Cheong Khoo, S. Farokh Atashzar, and Ravi Vaidyanathan. "A Multimodal Intention Detection Sensor Suite for Shared Autonomy of Upper-Limb Robotic Prostheses." Sensors 20, no. 21 (October 27, 2020): 6097. http://dx.doi.org/10.3390/s20216097.

Повний текст джерела
Анотація:
Neurorobotic augmentation (e.g., robotic assist) is now in regular use to support individuals suffering from impaired motor functions. A major unresolved challenge, however, is the excessive cognitive load necessary for the human–machine interface (HMI). Grasp control remains one of the most challenging HMI tasks, demanding simultaneous, agile, and precise control of multiple degrees-of-freedom (DoFs) while following a specific timing pattern in the joint and human–robot task spaces. Most commercially available systems use either an indirect mode-switching configuration or a limited sequential control strategy, limiting activation to one DoF at a time. To address this challenge, we introduce a shared autonomy framework centred around a low-cost multi-modal sensor suite fusing: (a) mechanomyography (MMG) to estimate the intended muscle activation, (b) camera-based visual information for integrated autonomous object recognition, and (c) inertial measurement to enhance intention prediction based on the grasping trajectory. The complete system predicts user intent for grasp based on measured dynamical features during natural motions. A total of 84 motion features were extracted from the sensor suite, and tests were conducted on 10 able-bodied and 1 amputee participants for grasping common household objects with a robotic hand. Real-time grasp classification accuracy using visual and motion features obtained 100%, 82.5%, and 88.9% across all participants for detecting and executing grasping actions for a bottle, lid, and box, respectively. The proposed multimodal sensor suite is a novel approach for predicting different grasp strategies and automating task performance using a commercial upper-limb prosthetic device. The system also shows potential to improve the usability of modern neurorobotic systems due to the intuitive control design.
Стилі APA, Harvard, Vancouver, ISO та ін.
43

Mimouna, Amira, Ihsen Alouani, Anouar Ben Khalifa, Yassin El Hillali, Abdelmalik Taleb-Ahmed, Atika Menhaj, Abdeldjalil Ouahabi, and Najoua Essoukri Ben Amara. "OLIMP: A Heterogeneous Multimodal Dataset for Advanced Environment Perception." Electronics 9, no. 4 (March 27, 2020): 560. http://dx.doi.org/10.3390/electronics9040560.

Повний текст джерела
Анотація:
A reliable environment perception is a crucial task for autonomous driving, especially in dense traffic areas. Recent improvements and breakthroughs in scene understanding for intelligent transportation systems are mainly based on deep learning and the fusion of different modalities. In this context, we introduce OLIMP: A heterOgeneous Multimodal Dataset for Advanced EnvIronMent Perception. This is the first public, multimodal and synchronized dataset that includes UWB radar data, acoustic data, narrow-band radar data and images. OLIMP comprises 407 scenes and 47,354 synchronized frames, presenting four categories: pedestrian, cyclist, car and tram. The dataset includes various challenges related to dense urban traffic such as cluttered environment and different weather conditions. To demonstrate the usefulness of the introduced dataset, we propose a fusion framework that combines the four modalities for multi object detection. The obtained results are promising and spur for future research.
Стилі APA, Harvard, Vancouver, ISO та ін.
44

Diete, Alexander, and Heiner Stuckenschmidt. "Fusing Object Information and Inertial Data for Activity Recognition." Sensors 19, no. 19 (September 23, 2019): 4119. http://dx.doi.org/10.3390/s19194119.

Повний текст джерела
Анотація:
In the field of pervasive computing, wearable devices have been widely used for recognizing human activities. One important area in this research is the recognition of activities of daily living where especially inertial sensors and interaction sensors (like RFID tags with scanners) are popular choices as data sources. Using interaction sensors, however, has one drawback: they may not differentiate between proper interaction and simple touching of an object. A positive signal from an interaction sensor is not necessarily caused by a performed activity e.g., when an object is only touched but no interaction occurred afterwards. There are, however, many scenarios like medicine intake that rely heavily on correctly recognized activities. In our work, we aim to address this limitation and present a multimodal egocentric-based activity recognition approach. Our solution relies on object detection that recognizes activity-critical objects in a frame. As it is infeasible to always expect a high quality camera view, we enrich the vision features with inertial sensor data that monitors the users’ arm movement. This way we try to overcome the drawbacks of each respective sensor. We present our results of combining inertial and video features to recognize human activities on different types of scenarios where we achieve an F 1 -measure of up to 79.6%.
Стилі APA, Harvard, Vancouver, ISO та ін.
45

Wang, Tao. "An Adaptive and Integrated Multimodal Sensing And Processing Framework For Long-Range Moving Object Detection And Classification." ELCVIA Electronic Letters on Computer Vision and Image Analysis 13, no. 2 (June 7, 2014): 67. http://dx.doi.org/10.5565/rev/elcvia.613.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
46

Zhang, Peng, Peijun Du, Cong Lin, Xin Wang, Erzhu Li, Zhaohui Xue, and Xuyu Bai. "A Hybrid Attention-Aware Fusion Network (HAFNet) for Building Extraction from High-Resolution Imagery and LiDAR Data." Remote Sensing 12, no. 22 (November 16, 2020): 3764. http://dx.doi.org/10.3390/rs12223764.

Повний текст джерела
Анотація:
Automated extraction of buildings from earth observation (EO) data has long been a fundamental but challenging research topic. Combining data from different modalities (e.g., high-resolution imagery (HRI) and light detection and ranging (LiDAR) data) has shown great potential in building extraction. Recent studies have examined the role that deep learning (DL) could play in both multimodal data fusion and urban object extraction. However, DL-based multimodal fusion networks may encounter the following limitations: (1) the individual modal and cross-modal features, which we consider both useful and important for final prediction, cannot be sufficiently learned and utilized and (2) the multimodal features are fused by a simple summation or concatenation, which appears ambiguous in selecting cross-modal complementary information. In this paper, we address these two limitations by proposing a hybrid attention-aware fusion network (HAFNet) for building extraction. It consists of RGB-specific, digital surface model (DSM)-specific, and cross-modal streams to sufficiently learn and utilize both individual modal and cross-modal features. Furthermore, an attention-aware multimodal fusion block (Att-MFBlock) was introduced to overcome the fusion problem by adaptively selecting and combining complementary features from each modality. Extensive experiments conducted on two publicly available datasets demonstrated the effectiveness of the proposed HAFNet for building extraction.
Стилі APA, Harvard, Vancouver, ISO та ін.
47

Cheng, Xu, Lihua Liu, and Chen Song. "A Cyclic Information–Interaction Model for Remote Sensing Image Segmentation." Remote Sensing 13, no. 19 (September 27, 2021): 3871. http://dx.doi.org/10.3390/rs13193871.

Повний текст джерела
Анотація:
Object detection and segmentation have recently shown encouraging results toward image analysis and interpretation due to their promising applications in remote sensing image fusion field. Although numerous methods have been proposed, implementing effective and efficient object detection is still very challenging for now, especially for the limitation of single modal data. The use of a single modal data is not always enough to reach proper spectral and spatial resolutions. The rapid expansion in the number and the availability of multi-source data causes new challenges for their effective and efficient processing. In this paper, we propose an effective feature information–interaction visual attention model for multimodal data segmentation and enhancement, which utilizes channel information to weight self-attentive feature maps of different sources, completing extraction, fusion, and enhancement of global semantic features with local contextual information of the object. Additionally, we further propose an adaptively cyclic feature information–interaction model, which adopts branch prediction to decide the number of visual perceptions, accomplishing adaptive fusion of global semantic features and local fine-grained information. Numerous experiments on several benchmarks show that the proposed approach can achieve significant improvements over baseline model.
Стилі APA, Harvard, Vancouver, ISO та ін.
48

Kaur, Baljit, and Jhilik Bhattacharya. "Scene perception system for visually impaired based on object detection and classification using multimodal deep convolutional neural network." Journal of Electronic Imaging 28, no. 01 (February 8, 2019): 1. http://dx.doi.org/10.1117/1.jei.28.1.013031.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
49

Kinzig, Christian, Markus Horn, Martin Lauer, Michael Buchholz, Christoph Stiller, and Klaus Dietmayer. "Automatic multimodal sensor calibration of the UNICARagil vehicles." tm - Technisches Messen 89, no. 4 (February 23, 2022): 289–99. http://dx.doi.org/10.1515/teme-2021-0110.

Повний текст джерела
Анотація:
Abstract Automated vehicles rely on a precise intrinsic and extrinsic calibration of all sensors. An exact calibration leads to accurate localization and object detection results. Especially for sensor data fusion, the transformation between different sensor frames must be well known. Moreover, modular and redundant platforms require a large number of sensors to cover their full surroundings. This makes the calibration process complex and challenging. In this article, we describe the procedure to calibrate the full sensor setup of a modular autonomous driving platform, consisting of camera, lidar, and radar sensors, in four subsequent steps. At first, the intrinsic and extrinsic camera parameters are determined. Afterwards, the transformation from lidar to camera on the one hand and from lidar to radar on the other hand is estimated. Lastly, the extrinsic calibration between all lidars and the vehicle frame is performed. In our evaluation, we show that these steps lead to an accurate calibration of the complete vehicle.
Стилі APA, Harvard, Vancouver, ISO та ін.
50

Perju, Veaceslav. "MULTIPLE CLASSIFICATION ALGORITHMS UNIMODAL AND MULTIMODAL TARGET RECOGNITION SYSTEMS." Journal of Engineering Science 28, no. 3 (September 2021): 87–95. http://dx.doi.org/10.52326/jes.utm.2021.28(3).07.

Повний текст джерела
Анотація:
Target recognition is of great importance in military and civil applications – object detection, security and surveillance, access and border control, etc. In the article the general structure and main components of a target recognition system are presented. The characteristics such as availability, distinctiveness, robustness, and accessibility are described, which influence the reliability of a TRS. The graph presentations and mathematical descriptions of a unimodal and multimodal TRS are given. The mathematical models for a probability of correct target recognition in these systems are presented. To increase the reliability of TRS, a new approach was proposed – to use a set of classification algorithms in the systems. This approach permits the development of new kinds of systems - Multiple Classification Algorithms Unimodal and Multimodal Systems (MAUMS and MAMMS). The graph presentations, mathematical descriptions of the MAUMS and MAMMS are described. The evaluation of the correct target recognition was made for different systems. The conditions of systems' effectiveness were established. The modality of the algorithm's recognition probabilitymaximal value determination for an established threshold level of the system's recognition probability was proposed, which will describe the requirements for the quality and, respectively, the costs of the recognition algorithms. The proposed theory permits the system's design depending on a predetermined recognition probability.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії