Journal articles on the topic 'Multi-Objects perception'

To see the other types of publications on this topic, follow the link: Multi-Objects perception.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multi-Objects perception.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Martín, Francisco, Carlos E. Agüero, and José M. Cañas. "Active Visual Perception for Humanoid Robots." International Journal of Humanoid Robotics 12, no. 01 (March 2015): 1550009. http://dx.doi.org/10.1142/s0219843615500097.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Robots detect and keep track of relevant objects in their environment to accomplish some tasks. Many of them are equipped with mobile cameras as the main sensors, process the images and maintain an internal representation of the detected objects. We propose a novel active visual memory that moves the camera to detect objects in robot's surroundings and tracks their positions. This visual memory is based on a combination of multi-modal filters that efficiently integrates partial information. The visual attention subsystem is distributed among the software components in charge of detecting relevant objects. We demonstrate the efficiency and robustness of this perception system in a real humanoid robot participating in the RoboCup SPL competition.
2

O’Sullivan, James, Jose Herrero, Elliot Smith, Catherine Schevon, Guy M. McKhann, Sameer A. Sheth, Ashesh D. Mehta, and Nima Mesgarani. "Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception." Neuron 104, no. 6 (December 2019): 1195–209. http://dx.doi.org/10.1016/j.neuron.2019.09.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Han, Dong, Hong Nie, Jinbao Chen, Meng Chen, Zhen Deng, and Jianwei Zhang. "Multi-modal haptic image recognition based on deep learning." Sensor Review 38, no. 4 (September 17, 2018): 486–93. http://dx.doi.org/10.1108/sr-08-2017-0160.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Purpose This paper aims to improve the diversity and richness of haptic perception by recognizing multi-modal haptic images. Design/methodology/approach First, the multi-modal haptic data collected by BioTac sensors from different objects are pre-processed, and then combined into haptic images. Second, a multi-class and multi-label deep learning model is designed, which can simultaneously learn four haptic features (hardness, thermal conductivity, roughness and texture) from the haptic images, and recognize objects based on these features. The haptic images with different dimensions and modalities are provided for testing the recognition performance of this model. Findings The results imply that multi-modal data fusion has a better performance than single-modal data on tactile understanding, and the haptic images with larger dimension are conducive to more accurate haptic measurement. Practical implications The proposed method has important potential application in unknown environment perception, dexterous grasping manipulation and other intelligent robotics domains. Originality/value This paper proposes a new deep learning model for extracting multiple haptic features and recognizing objects from multi-modal haptic images.
4

Lisowski, Józef. "Radar Perception of Multi-Object Collision Risk Neural Domains during Autonomous Driving." Electronics 13, no. 6 (March 13, 2024): 1065. http://dx.doi.org/10.3390/electronics13061065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The analysis of the state of the literature in the field of methods of perception and control of the movement of autonomous vehicles shows the possibilities of improving them by using an artificial neural network to generate domains of prohibited maneuvers of passing objects, contributing to increasing the safety of autonomous driving in various real conditions of the surrounding environment. This article concerns radar perception, which involves receiving information about the movement of many autonomous objects, then identifying and assigning them a collision risk and preparing a maneuvering response. In the identification process, each object is assigned a domain generated by a previously trained neural network. The size of the domain is proportional to the risk of collisions and distance changes during autonomous driving. Then, an optimal trajectory is determined from among the possible safe paths, ensuring control in a minimum of time. The presented solution to the radar perception task was illustrated with a computer simulation of autonomous driving in a situation of passing many objects. The main achievements presented in this article are the synthesis of a radar perception algorithm mapping the neural domains of autonomous objects characterizing their collision risk and the assessment of the degree of radar perception on the example of multi-object autonomous driving simulation.
5

Li, Yucheng, Fei Wang, Liangze Tao, and Juan Wu. "Multi-Modal Haptic Rendering Based on Genetic Algorithm." Electronics 11, no. 23 (November 24, 2022): 3878. http://dx.doi.org/10.3390/electronics11233878.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Multi-modal haptic rendering is an important research direction to improve realism in haptic rendering. It can produce various mechanical stimuli that render multiple perceptions, such as hardness and roughness. This paper proposes a multi-modal haptic rendering method based on a genetic algorithm (GA), which generates force and vibration stimuli of haptic actuators according to the user’s target hardness and roughness. The work utilizes a back propagation (BP) neural network to implement the perception model f that establishes the mapping (I=f(G)) from objective stimuli features G to perception intensities I. We use the perception model to design the fitness function of GA and set physically achievable constraints in fitness calculation. The perception model is transformed into the force/vibration control model by GA. Finally, we conducted realism evaluation experiments between real and virtual samples under single or multi-mode haptic rendering, where subjects scored 0-100. The average score was 70.86 for multi-modal haptic rendering compared with 57.81 for hardness rendering and 50.23 for roughness rendering, which proved that the multi-modal haptic rendering is more realistic than the single mode. Based on the work, our method can be applied to render objects in more perceptual dimensions, not only limited to hardness and roughness. It has significant implications for multi-modal haptic rendering.
6

Zhou, Wenjun, Tianfei Wang, Xiaoqin Wu, Chenglin Zuo, Yifan Wang, Quan Zhang, and Bo Peng. "Salient Object Detection via Fusion of Multi-Visual Perception." Applied Sciences 14, no. 8 (April 18, 2024): 3433. http://dx.doi.org/10.3390/app14083433.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Salient object detection aims to distinguish the most visually conspicuous regions, playing an important role in computer vision tasks. However, complex natural scenarios can challenge salient object detection, hindering accurate extraction of objects with rich morphological diversity. This paper proposes a novel method for salient object detection leveraging multi-visual perception, mirroring the human visual system’s rapid identification, and focusing on impressive objects/regions within complex scenes. First, a feature map is derived from the original image. Then, salient object detection results are obtained for each perception feature and combined via a feature fusion strategy to produce a saliency map. Finally, superpixel segmentation is employed for precise salient object extraction, removing interference areas. This multi-feature approach for salient object detection harnesses complementary features to adapt to complex scenarios. Competitive experiments on the MSRA10K and ECSSD datasets place our method in the first tier, achieving 0.1302 MAE and 0.9382 F-measure for the MSRA10K dataset and 0.0783 MAE and and 0.9635 F-measure for the ECSSD dataset, demonstrating superior salient object detection performance in complex natural scenarios.
7

Hirsch, Herb L., and Cathleen M. Moore. "Simulating Light Source Motion in Single Images for Enhanced Perceptual Object Detection." Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 9, no. 3 (February 22, 2012): 269–78. http://dx.doi.org/10.1177/1548512911431814.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The SIPHER technique uses mathematically-uncomplicated processing to impart interesting effects upon a static image. Importantly, it renders certain areas of an image more perceptible than others, and draws a human observer’s attention to particular objects or portions of an image scene. By varying coefficients of the processing in a time-ordered sequence, we can create a multi-frame video wherein the frame-to-frame temporal dynamics further enhance human perception of image objects. In this article we first explain the mathematical formulations and present results from applying SIPHER to simple three-dimensional shapes. Then we explore SIPHER’s utility in enhancing visual perception of targets or objects of military interest, in imagery with some typical backgrounds. We also explore how and why these effects enhance human visual perception of the image objects.
8

Marmodoro, Anna, and Matteo Grasso. "THE POWER OF COLOR." American Philosophical Quarterly 57, no. 1 (January 1, 2020): 65–78. http://dx.doi.org/10.2307/48570646.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Are colors features of objects “out there in the world” or are they features of our inner experience and only “in our head?” Color perception has been the focus of extensive philosophical and scientific debate. In this paper we discuss the limitations of the view that Chalmers’ (2006) has characterized as Primitivism, and we develop Marmodoro’s (2006) Constitutionalism further, to provide a metaphysical account of color perception in terms of causal powers. The result is Power-based Constitutionalism, the view that colors are (multi-track and multi-stage) powers of objects, whose (full) manifestations depend on the mutual manifestation of relevant powers of perceivers and the perceived objects being co-realized in mutual interaction. After a presentation of the tenets of Power-based Constitutionalism, we evaluate its strengths in contrast to two other recent power-based accounts: John Heil’s (2003, 2012) powerful qualities view and Max Kistler’s (2017) multi-track view.
9

Wang, Li, Ruifeng Li, Jingwen Sun, Xingxing Liu, Lijun Zhao, Hock Soon Seah, Chee Kwang Quah, and Budianto Tandianus. "Multi-View Fusion-Based 3D Object Detection for Robot Indoor Scene Perception." Sensors 19, no. 19 (September 21, 2019): 4092. http://dx.doi.org/10.3390/s19194092.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
To autonomously move and operate objects in cluttered indoor environments, a service robot requires the ability of 3D scene perception. Though 3D object detection can provide an object-level environmental description to fill this gap, a robot always encounters incomplete object observation, recurring detections of the same object, error in detection, or intersection between objects when conducting detection continuously in a cluttered room. To solve these problems, we propose a two-stage 3D object detection algorithm which is to fuse multiple views of 3D object point clouds in the first stage and to eliminate unreasonable and intersection detections in the second stage. For each view, the robot performs a 2D object semantic segmentation and obtains 3D object point clouds. Then, an unsupervised segmentation method called Locally Convex Connected Patches (LCCP) is utilized to segment the object accurately from the background. Subsequently, the Manhattan Frame estimation is implemented to calculate the main orientation of the object and subsequently, the 3D object bounding box can be obtained. To deal with the detected objects in multiple views, we construct an object database and propose an object fusion criterion to maintain it automatically. Thus, the same object observed in multi-view is fused together and a more accurate bounding box can be calculated. Finally, we propose an object filtering approach based on prior knowledge to remove incorrect and intersecting objects in the object dataset. Experiments are carried out on both SceneNN dataset and a real indoor environment to verify the stability and accuracy of 3D semantic segmentation and bounding box detection of the object with multi-view fusion.
10

Zhu, Jinchao, Xiaoyu Zhang, Shuo Zhang, and Junnan Liu. "Inferring Camouflaged Objects by Texture-Aware Interactive Guidance Network." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3599–607. http://dx.doi.org/10.1609/aaai.v35i4.16475.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Camouflaged objects, similar to the background, show indefinable boundaries and deceptive textures, which increases the difficulty of detection task and makes the model rely on features with more information. Herein, we design a texture label to facilitate our network for accurate camouflaged object segmentation. Motivated by the complementary relationship between texture labels and camouflaged object labels, we propose an interactive guidance framework named TINet, which focuses on finding the indefinable boundary and the texture difference by progressive interactive guidance. It maximizes the guidance effect of refined multi-level texture cues on segmentation. Specifically, texture perception decoder (TPD) makes a comprehensive analysis of texture information in multiple scales. Feature interaction guidance decoder (FGD) interactively refines multi-level features of camouflaged object detection and texture detection level by level. Holistic perception decoder (HPD) enhances FGD results by multi-level holistic perception. In addition, we propose a boundary weight map to help the loss function pay more attention to the object boundary. Sufficient experiments conducted on COD and SOD datasets demonstrate that the proposed method performs favorably against 23 state-of-the-art methods.
11

Shao, Zhiyu, Juan Wu, Qiangqiang Ouyang, Cong He, and Zhiyong Cao. "Multi-Layered Perceptual Model for Haptic Perception of Compliance." Electronics 8, no. 12 (December 7, 2019): 1497. http://dx.doi.org/10.3390/electronics8121497.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Haptic rendering of compliance is widely used in human–computer haptic interaction. Haptic impressions of virtual objects are usually controlled through rendering algorithms and devices. However, subjective feelings of compliance are easily affected by physical properties of objects, interactive modes, and so on. So it is important to ascertain the mapping relations between controlled physical parameters and subjective perceptual feelings. In this paper, a multi-layered perceptual model was built based on psychophysical experiments to discuss these relationships in a simplified scene. Interactive signals of physical stimuli are collected by the physical receptor layer, handled by the subjective classifier layer and finally generate the evaluation results of compliance. The physical perceptual layer is used to extract useful interaction features affecting perceptual results. The subjective classifier layer is used to analyze the perceptual dimensionality of the compliance perception. The final aim of the model is to determine the mapping relationships between interaction features and dimensions of perception space. Interactive features are extracted from the interaction data collected during the exploring process, perceptual dimensionality of the compliance perception was analyzed by the factor analysis method, and relations between hierarchical layers were obtained by multi-linear regression analysis. A verification test was performed to show whether the proposed model can predict the perceptual result of new samples well. The results indicate that the model was reliable to estimate the perceptual results of compliance with an accuracy of approximately 90%. This paper may contribute a lot to the design and improvement of human-computer interaction and intelligent sensing system.
12

Yang, Mingliang, Kun Jiang, Junze Wen, Liang Peng, Yanding Yang, Hong Wang, Mengmeng Yang, Xinyu Jiao, and Diange Yang. "Real-Time Evaluation of Perception Uncertainty and Validity Verification of Autonomous Driving." Sensors 23, no. 5 (March 6, 2023): 2867. http://dx.doi.org/10.3390/s23052867.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Deep neural network algorithms have achieved impressive performance in object detection. Real-time evaluation of perception uncertainty from deep neural network algorithms is indispensable for safe driving in autonomous vehicles. More research is required to determine how to assess the effectiveness and uncertainty of perception findings in real-time.This paper proposes a novel real-time evaluation method combining multi-source perception fusion and deep ensemble. The effectiveness of single-frame perception results is evaluated in real-time. Then, the spatial uncertainty of the detected objects and influencing factors are analyzed. Finally, the accuracy of spatial uncertainty is validated with the ground truth in the KITTI dataset. The research results show that the evaluation of perception effectiveness can reach 92% accuracy, and a positive correlation with the ground truth is found for both the uncertainty and the error. The spatial uncertainty is related to the distance and occlusion degree of detected objects.
13

Wu, Shuiye, Yunbing Yan, and Weiqiang Wang. "CF-YOLOX: An Autonomous Driving Detection Model for Multi-Scale Object Detection." Sensors 23, no. 8 (April 7, 2023): 3794. http://dx.doi.org/10.3390/s23083794.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In self-driving cars, object detection algorithms are becoming increasingly important, and the accurate and fast recognition of objects is critical to realize autonomous driving. The existing detection algorithms are not ideal for the detection of small objects. This paper proposes a YOLOX-based network model for multi-scale object detection tasks in complex scenes. This method adds a CBAM-G module to the backbone of the original network, which performs grouping operations on CBAM. It changes the height and width of the convolution kernel of the spatial attention module to 7 × 1 to improve the ability of the model to extract prominent features. We proposed an object-contextual feature fusion module, which can provide more semantic information and improve the perception of multi-scale objects. Finally, we considered the problem of fewer samples and less loss of small objects and introduced a scaling factor that could increase the loss of small objects to improve the detection ability of small objects. We validated the effectiveness of the proposed method on the KITTI dataset, and the mAP value was 2.46% higher than the original model. Experimental comparisons showed that our model achieved superior detection performance compared to other models.
14

Yu, Hai-Tao, and Mofei Song. "MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 6773–81. http://dx.doi.org/10.1609/aaai.v38i7.28501.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In perception, multiple sensory information is integrated to map visual information from 2D views onto 3D objects, which is beneficial for understanding in 3D environments. But in terms of a single 2D view rendered from different angles, only limited partial information can be provided. The richness and value of Multi-view 2D information can provide superior self-supervised signals for 3D objects. In this paper, we propose a novel self-supervised point cloud representation learning method, MM-Point, which is driven by intra-modal and inter-modal similarity objectives. The core of MM-Point lies in the Multi-modal interaction and transmission between 3D objects and multiple 2D views at the same time. In order to more effectively simultaneously perform the consistent cross-modal objective of 2D multi-view information based on contrastive learning, we further propose Multi-MLP and Multi-level Augmentation strategies. Through carefully designed transformation strategies, we further learn Multi-level invariance in 2D Multi-views. MM-Point demonstrates state-of-the-art (SOTA) performance in various downstream tasks. For instance, it achieves a peak accuracy of 92.4% on the synthetic dataset ModelNet40, and a top accuracy of 87.8% on the real-world dataset ScanObjectNN, comparable to fully supervised methods. Additionally, we demonstrate its effectiveness in tasks such as few-shot classification, 3D part segmentation and 3D semantic segmentation.
15

Thomason, Jesse, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, and Raymond Mooney. "Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog." Journal of Artificial Intelligence Research 67 (February 26, 2020): 327–74. http://dx.doi.org/10.1613/jair.1.11485.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy. The agent can be used for showing navigation routes, delivering objects to people, and relocating objects from one location to another. We use dialog clari_cation questions both to understand commands and to generate additional parsing training data. The agent employs opportunistic active learning to select questions about how words relate to objects, improving its understanding of perceptual concepts. We evaluated this agent on Amazon Mechanical Turk. After training on data induced from conversations, the agent reduced the number of dialog questions it asked while receiving higher usability ratings. Additionally, we demonstrated the agent on a robotic platform, where it learned new perceptual concepts on the y while completing a real-world task.
16

Giachritsis, Christos D., Manuel Ferre, Jordi Barrio, and Alan M. Wing. "Unimanual and bimanual weight perception of virtual objects with a new multi-finger haptic interface." Brain Research Bulletin 85, no. 5 (June 2011): 271–75. http://dx.doi.org/10.1016/j.brainresbull.2011.03.017.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Gu, Haiwei, Shaowei Fan, Hua Zong, Minghe Jin, and Hong Liu. "Haptic Perception of Unknown Object by Robot Hand: Exploration Strategy and Recognition Approach." International Journal of Humanoid Robotics 13, no. 03 (August 23, 2016): 1650008. http://dx.doi.org/10.1142/s0219843616500080.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this paper, the exploration and recognition in unknown object perception by robot hand is discussed. Inspired by the touch and exploration of human hand, a haptic exploration strategy for multi-fingered robot hand is proposed. Based on the observations from human experiments, the proposed strategy can be used to guide the robot hand to plan a series of movements to get tactile information from different unknown objects, with the precondition of avoiding unexpected collisions with the objects. A recognition approach is then presented to recognize object shapes based on the tactile point data collected by the strategy. Geometric feature vectors are extracted from tactile point locations and normal vectors after clustering, and the object shapes are recognized by the random forests classifier. Simulations and experiments results show that the exploration strategy can be used to guide the robot to gather tactile information from unknown object automatically, and the recognition approach is effective and robust in object shape recognition work. This framework provides a sensible solution for robot unknown object perception problem, which is suitable for the multi-fingered robot hand with low-resolution tactile sensors.
18

Dong, Xiaoyu, Zhihong Xi, Xu Sun, and Lianru Gao. "Transferred Multi-Perception Attention Networks for Remote Sensing Image Super-Resolution." Remote Sensing 11, no. 23 (December 1, 2019): 2857. http://dx.doi.org/10.3390/rs11232857.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Image super-resolution (SR) reconstruction plays a key role in coping with the increasing demand on remote sensing imaging applications with high spatial resolution requirements. Though many SR methods have been proposed over the last few years, further research is needed to improve SR processes with regard to the complex spatial distribution of the remote sensing images and the diverse spatial scales of ground objects. In this paper, a novel multi-perception attention network (MPSR) is developed with performance exceeding those of many existing state-of-the-art models. By incorporating the proposed enhanced residual block (ERB) and residual channel attention group (RCAG), MPSR can super-resolve low-resolution remote sensing images via multi-perception learning and multi-level information adaptive weighted fusion. Moreover, a pre-train and transfer learning strategy is introduced, which improved the SR performance and stabilized the training procedure. Experimental comparisons are conducted using 13 state-of-the-art methods over a remote sensing dataset and benchmark natural image sets. The proposed model proved its excellence in both objective criterion and subjective perspective.
19

Chen, Jun, Zhengyang Yu, and Cunjian Yang. "Key point extraction method for spatial objects in high-resolution remote sensing images based on multi-hot cross-entropy loss." Open Geosciences 14, no. 1 (January 1, 2022): 1409–20. http://dx.doi.org/10.1515/geo-2022-0393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Extracting spatial objects and their key points from remote sensing images has attracted great attention of worldwide researchers in intelligent machine perception of the Earth’s surface. However, the key points of spatial objects (KPSOs) extracted by the conventional mask region-convolution neural network model are difficult to be sorted reasonably, which is a key obstacle to enhance the ability of machine intelligent perception of spatial objects. The widely distributed artificial structures with stable morphological and spectral characteristics, such as sports fields, cross-river bridges, and urban intersections, are selected to study how to extract their key points with a multihot cross-entropy loss function. First, the location point in KPSOs is selected as one category individually to distinguish morphological feature points. Then, the two categories of key points are arranged in order while maintaining internal disorder, and the mapping relationship between KPSOs and the prediction heat map is improved to one category rather than a single key point. Therefore, the predicted heat map of each category can predict all the corresponding key points at one time. The experimental results demonstrate that the prediction accuracy of KPSOs extracted by the new method is 80.6%, taking part area of Huai’an City for example. It is reasonable to believe that this method will greatly promote the development of intelligent machine perception of the Earth’s surface.
20

Zagorskas, Jurgis, and Zenonas Turskis. "MULTI‐ATTRIBUTE MODEL FOR ESTIMATION OF RETAIL CENTRES INFLUENCE ON THE CITY STRUCTURE. KAUNAS CITY CASE STUDY." Technological and Economic Development of Economy 12, no. 4 (December 31, 2006): 347–52. http://dx.doi.org/10.3846/13928619.2006.9637765.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper concerns multi‐attribute decision support methodology applied to analyze the impact of retail centres on the city as a complex system. Influence on the city is described as the sum of effects the retail centres give to the quality of life to the neighborhood and other city population, the work of city transportation system, architectural and urban perception of the city. The gamut of impact is estimated and numerical expression is obtained. Using it different alternative objects are compared. The task is described by many attributes. The main attributes are distinguished to measure the influence of retail centres on quality of life, the work of transportation system, on the economics and the architectural ‐ urban perception of the city. On the basis of expert judgment the weights of attributes are estimated. The normalization of the efficiency attributes is done using linear normalization method. The values of different attributes are derived from the rating done by urbanism ground experts. From ideal values the optimal alternative is made. Influence is estimated using Multiplicative Summarized Optimal Criterion method. The strategy of retail centres development is defined by comparison of existing objects to ideal value. The multi‐attribute model for estimation of retail centres influence to the city was used in “Kaunas city municipality specialised plan for dislocation of retail centres”. The research results determined effectiveness of existing objects and future development strategy.
21

McMullin, Margaret A., Nathan C. Higgins, Brian Gygi, Rohit Kumar, Mounya Elhilali, and Joel S. Snyder. "Perception of global properties, objects, and settings in natural auditory scenes." Journal of the Acoustical Society of America 153, no. 3_supplement (March 1, 2023): A329. http://dx.doi.org/10.1121/10.0019028.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Theories of auditory scene analysis suggest our perception of scenes relies on identifying and segregating objects within it. However, a more global process may occur while analyzing scenes, which has been evidenced in the visual domain. In our first experiment, we studied perception of eight global properties (e.g., openness), using a collection of 200 high-quality auditory scenes. Participants showed high agreement on their ratings of global properties. The global properties were explained by a two-factor model. Acoustic features of scenes were explained by a seven-factor model, and linearly predicted the global ratings by different amounts (R-squared = 0.33–0.87), although we also observed nonlinear relationships between acoustical and global variables. A multi-layer neural network trained to recognize auditory objects in everyday soundscapes from YouTube shows high-level embeddings of our 200 scenes are correlated with some global variables at earlier stages of processing than others. In a second experiment, we evaluated participants’ accuracy in identifying the setting of and objects within scenes across three durations (1, 2, and 4 s). Overall, participants performed better on the object identification task, but needed longer duration stimuli to do so. These results suggest object identification may require more processing time and/or attention switching than setting identification.
22

Sun, Teng, Zhe Zhang, Zhonghua Miao, and Wen Zhang. "A Recognition Method for Soft Objects Based on the Fusion of Vision and Haptics." Biomimetics 8, no. 1 (February 20, 2023): 86. http://dx.doi.org/10.3390/biomimetics8010086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
For humans and animals to recognise an object, the integration of multiple sensing methods is essential when one sensing modality is only able to acquire limited information. Among the many sensing modalities, vision has been intensively studied and proven to have superior performance for many problems. Nevertheless, there are many problems which are difficult to solve by solitary vision, such as in a dark environment or for objects with a similar outlook but different inclusions. Haptic sensing is another commonly used means of perception, which can provide local contact information and physical features that are difficult to obtain by vision. Therefore, the fusion of vision and touch is beneficial to improve the robustness of object perception. To address this, an end-to-end visual–haptic fusion perceptual method has been proposed. In particular, the YOLO deep network is used to extract vision features, while haptic explorations are used to extract haptic features. Then, visual and haptic features are aggregated using a graph convolutional network, and the object is recognised based on a multi-layer perceptron. Experimental results show that the proposed method excels in distinguishing soft objects that have similar appearance but varied interior fillers, comparing a simple convolutional network and a Bayesian filter. The resultant average recognition accuracy was improved to 0.95 from vision only (mAP is 0.502). Moreover, the extracted physical features could be further used for manipulation tasks targeting soft objects.
23

Wang, Jiayu, Ye Liu, Yongjian Zhu, Dong Wang, and Yu Zhang. "3D Point Cloud Object Detection Method Based on Multi-Scale Dynamic Sparse Voxelization." Sensors 24, no. 6 (March 11, 2024): 1804. http://dx.doi.org/10.3390/s24061804.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Perception plays a crucial role in ensuring the safety and reliability of autonomous driving systems. However, the recognition and localization of small objects in complex scenarios still pose challenges. In this paper, we propose a point cloud object detection method based on dynamic sparse voxelization to enhance the detection performance of small objects. This method employs a specialized point cloud encoding network to learn and generate pseudo-images from point cloud features. The feature extraction part uses sliding windows and transformer-based methods. Furthermore, multi-scale feature fusion is performed to enhance the granularity of small object information. In this experiment, the term “small object” refers to objects such as cyclists and pedestrians, which have fewer pixels compared to vehicles with more pixels, as well as objects of poorer quality in terms of detection. The experimental results demonstrate that, compared to the PointPillars algorithm and other related algorithms on the KITTI public dataset, the proposed algorithm exhibits improved detection accuracy for cyclist and pedestrian target objects. In particular, there is notable improvement in the detection accuracy of objects in the moderate and hard quality categories, with an overall average increase in accuracy of about 5%.
24

Senel, Numan, Gordon Elger, Klaus Kefferpütz, and Kristina Doycheva. "Multi-Sensor Data Fusion for Real-Time Multi-Object Tracking." Processes 11, no. 2 (February 7, 2023): 501. http://dx.doi.org/10.3390/pr11020501.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Sensor data fusion is essential for environmental perception within smart traffic applications. By using multiple sensors cooperatively, the accuracy and probability of the perception are increased, which is crucial for critical traffic scenarios or under bad weather conditions. In this paper, a modular real-time capable multi-sensor fusion framework is presented and tested to fuse data on the object list level from distributed automotive sensors (cameras, radar, and LiDAR). The modular multi-sensor fusion architecture receives an object list (untracked objects) from each sensor. The fusion framework combines classical data fusion algorithms, as it contains a coordinate transformation module, an object association module (Hungarian algorithm), an object tracking module (unscented Kalman filter), and a movement compensation module. Due to the modular design, the fusion framework is adaptable and does not rely on the number of sensors or their types. Moreover, the method continues to operate because of this adaptable design in case of an individual sensor failure. This is an essential feature for safety-critical applications. The architecture targets environmental perception in challenging time-critical applications. The developed fusion framework is tested using simulation and public domain experimental data. Using the developed framework, sensor fusion is obtained well below 10 milliseconds of computing time using an AMD Ryzen 7 5800H mobile processor and the Python programming language. Furthermore, the object-level multi-sensor approach enables the detection of changes in the extrinsic calibration of the sensors and potential sensor failures. A concept was developed to use the multi-sensor framework to identify sensor malfunctions. This feature will become extremely important in ensuring the functional safety of the sensors for autonomous driving.
25

Fan, Chongshan, Hongpeng Wang, Zhongzhi Cao, Xinwei Chen, and Li Xu. "Path Planning of Autonomous 3-D Scanning and Reconstruction for Robotic Multi-Model Perception System." Machines 11, no. 1 (December 26, 2022): 26. http://dx.doi.org/10.3390/machines11010026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Applying a three-dimensional (3-D) reconstruction from mapping-oriented offline modeling to intelligent agent-oriented environment understanding and real-world environment construction oriented to agent autonomous behavior has important research and application value. Using a scanner to scan objects is a common way to obtain a 3-D model. However, the existing scanning methods rely heavily on manual work, fail to meet efficiency requirements, and are not sufficiently compatible with scanning objects of different sizes. In this article, we propose a creative visual coverage path planning approach for the robotic multi-model perception system (RMMP) in a 3-D environment under photogrammetric constraints. To realize the 3-D scanning of real scenes automatically, we designed a new robotic multi-model perception system. To reduce the influence of image distortion and resolution loss in 3-D reconstruction, we set scanner-to-scene projective geometric constraints. To optimize the scanning efficiency, we proposed a novel path planning method under photogrammetric and kinematics constraints. Under the RMMP system, a constraints-satisfied coverage path could be generated, and the 3-D reconstruction from the images collected along the way was carried out. In this way, the autonomous planning of the pose of the end scanner in scanning tasks was effectively solved. Experimental results show that the RMMP-based 3-D visual coverage method can improve the efficiency and quality in 3-D reconstruction.
26

Mirkovic, Bojana, and Dejan Popovic. "Prosthetic hand sensor placement: Analysis of touch perception during the grasp." Serbian Journal of Electrical Engineering 11, no. 1 (2014): 1–10. http://dx.doi.org/10.2298/sjee131004001m.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Humans rely on their hands to perform everyday tasks. The hand is used as a tool, but also as the interface to ?sense? the world. Current prosthetic hands are based on sophisticated multi-fingered structures, and include many sensors which counterpart natural proprioceptors and exteroceptors. The sensory information is used for control, but not sent to the user of the hand (amputee). Grasping without sensing is not good enough. This research is part of the development of the sensing interface for amputees, specifically addressing the analysis of human perception while grasping. The goal is to determine the small number of preferred positions of sensors on the prosthetic hand. This task has previously been approached by trying to replicate a natural sensory system characteristic for healthy humans, resulting in a multitude of redundant sensors and basic inability to make the patient aware of the sensor readings on the subconscious level. We based our artificial perception system on the reported sensations of humans when grasping various objects without seeing the objects (obstructed visual feedback). Subjects, with no known sensory deficits, were asked to report on the touch sensation while grasping. The analysis included objects of various sizes, weights, textures and temperatures. Based on this data we formed a map of the preferred positions for the sensors that is appropriate for five finger human-like robotic hand. The final map was intentionally minimized in size (number of sensors).
27

Liu, Zihang, and Quande Wang. "Edge-Enhanced Dual-Stream Perception Network for Monocular Depth Estimation." Electronics 13, no. 9 (April 25, 2024): 1652. http://dx.doi.org/10.3390/electronics13091652.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Estimating depth from a single RGB image has a wide range of applications, such as in robot navigation and autonomous driving. Currently, Convolutional Neural Networks based on encoder–decoder architecture are the most popular methods to estimate depth maps. However, convolutional operators have limitations in modeling large-scale dependence, often leading to inaccurate depth predictions at object edges. To address these issues, a new edge-enhanced dual-stream monocular depth estimation method is introduced in this paper. ResNet and Swin Transformer are combined to better extract global and local features, which benefits the estimation of the depth map. To better integrate the information from the two branches of the encoder and the shallow branch of the decoder, we designed a lightweight decoder based on the multi-head Cross-Attention Module. Furthermore, in order to improve the boundary clarity of objects in the depth map, a loss function with an additional penalty for depth estimation error on the edges of objects is presented. The results on three datasets, NYU Depth V2, KITTI, and SUN RGB-D, show that the method presented in this paper achieves better performance for monocular depth estimation. Additionally, it has good generalization capabilities for various scenarios and real-world images.
28

Ye, Zixun, Hongying Zhang, Jingliang Gu, and Xue Li. "YOLOv7-3D: A Monocular 3D Traffic Object Detection Method from a Roadside Perspective." Applied Sciences 13, no. 20 (October 17, 2023): 11402. http://dx.doi.org/10.3390/app132011402.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Current autonomous driving systems predominantly focus on 3D object perception from the vehicle’s perspective. However, the single-camera 3D object detection algorithm in the roadside monitoring scenario provides stereo perception of traffic objects, offering more accurate collection and analysis of traffic information to ensure reliable support for urban traffic safety. In this paper, we propose the YOLOv7-3D algorithm specifically designed for single-camera 3D object detection from a roadside viewpoint. Our approach utilizes various information, including 2D bounding boxes, projected corner keypoints, and offset vectors relative to the center of the 2D bounding boxes, to enhance the accuracy of 3D object bounding box detection. Additionally, we introduce a 5-layer feature pyramid network (FPN) structure and a multi-scale spatial attention mechanism to improve feature saliency for objects of different scales, thereby enhancing the detection accuracy of the network. Experimental results demonstrate that our YOLOv7-3D network achieved significantly higher detection accuracy on the Rope3D dataset while reducing computational complexity by 60%.
29

Li, Jinghui, Feng Shao, Qiang Liu, and Xiangchao Meng. "Global-Local Collaborative Learning Network for Optical Remote Sensing Image Change Detection." Remote Sensing 16, no. 13 (June 27, 2024): 2341. http://dx.doi.org/10.3390/rs16132341.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Due to the widespread applications of change detection technology in urban change analysis, environmental monitoring, agricultural surveillance, disaster detection, and other domains, the task of change detection has become one of the primary applications of Earth orbit satellite remote sensing data. However, the analysis of dual-temporal change detection (CD) remains a challenge in high-resolution optical remote sensing images due to the complexities in remote sensing images, such as intricate textures, seasonal variations in imaging time, climatic differences, and significant differences in the sizes of various objects. In this paper, we propose a novel U-shaped architecture for change detection. In the encoding stage, a multi-branch feature extraction module is employed by combining CNN and transformer networks to enhance the network’s perception capability for objects of varying sizes. Furthermore, a multi-branch aggregation module is utilized to aggregate features from different branches, providing the network with global attention while preserving detailed information. For dual-temporal features, we introduce a spatiotemporal discrepancy perception module to model the context of dual-temporal images. Particularly noteworthy is the construction of channel attention and token attention modules based on the transformer attention mechanism to facilitate information interaction between multi-level features, thereby enhancing the network’s contextual awareness. The effectiveness of the proposed network is validated on three public datasets, demonstrating its superior performance over other state-of-the-art methods through qualitative and quantitative experiments.
30

Guo, Jinghua, Jingyao Wang, Huinian Wang, Baoping Xiao, Zhifei He, and Lubin Li. "Research on Road Scene Understanding of Autonomous Vehicles Based on Multi-Task Learning." Sensors 23, no. 13 (July 7, 2023): 6238. http://dx.doi.org/10.3390/s23136238.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Road scene understanding is crucial to the safe driving of autonomous vehicles. Comprehensive road scene understanding requires a visual perception system to deal with a large number of tasks at the same time, which needs a perception model with a small size, fast speed, and high accuracy. As multi-task learning has evident advantages in performance and computational resources, in this paper, a multi-task model YOLO-Object, Drivable Area, and Lane Line Detection (YOLO-ODL) based on hard parameter sharing is proposed to realize joint and efficient detection of traffic objects, drivable areas, and lane lines. In order to balance tasks of YOLO-ODL, a weight balancing strategy is introduced so that the weight parameters of the model can be automatically adjusted during training, and a Mosaic migration optimization scheme is adopted to improve the evaluation indicators of the model. Our YOLO-ODL model performs well on the challenging BDD100K dataset, achieving the state of the art in terms of accuracy and computational efficiency.
31

Rövid, András, Viktor Remeli, Norbert Paufler, Henrietta Lengyel, Máté Zöldy, and Zsolt Szalay. "Towards Reliable Multisensory Perception and Its Automotive Applications." Periodica Polytechnica Transportation Engineering 48, no. 4 (July 7, 2020): 334–40. http://dx.doi.org/10.3311/pptr.15921.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Autonomous driving poses numerous challenging problems, one of which is perceiving and understanding the environment. Since self-driving is safety critical and many actions taken during driving rely on the outcome of various perception algorithms (for instance all traffic participants and infrastructural objects in the vehicle's surroundings must reliably be recognized and localized), thus the perception might be considered as one of the most critical subsystems in an autonomous vehicle. Although the perception itself might further be decomposed into various sub-problems, such as object detection, lane detection, traffic sign detection, environment modeling, etc. In this paper the focus is on fusion models in general (giving support for multisensory data processing) and some related automotive applications such as object detection, traffic sign recognition, end-to-end driving models and an example of taking decisions in multi-criterial traffic situations that are complex for both human drivers and for the self-driving vehicles as well.
32

Li, Peng, Dezheng Zhang, Aziguli Wulamu, Xin Liu, and Peng Chen. "Semantic Relation Model and Dataset for Remote Sensing Scene Understanding." ISPRS International Journal of Geo-Information 10, no. 7 (July 17, 2021): 488. http://dx.doi.org/10.3390/ijgi10070488.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A deep understanding of our visual world is more than an isolated perception on a series of objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, the span is so large that the various objects are always of different sizes and complex spatial compositions. Therefore, the recognition of semantic relations is conducive to strengthen the understanding of remote sensing scenes. In this paper, we propose a novel multi-scale semantic fusion network (MSFN). In this framework, dilated convolution is introduced into a graph convolutional network (GCN) based on an attentional mechanism to fuse and refine multi-scale semantic context, which is crucial to strengthen the cognitive ability of our model Besides, based on the mapping between visual features and semantic embeddings, we design a sparse relationship extraction module to remove meaningless connections among entities and improve the efficiency of scene graph generation. Meanwhile, to further promote the research of scene understanding in remote sensing field, this paper also proposes a remote sensing scene graph dataset (RSSGD). We carry out extensive experiments and the results show that our model significantly outperforms previous methods on scene graph generation. In addition, RSSGD effectively bridges the huge semantic gap between low-level perception and high-level cognition of remote sensing images.
33

Meng, Qiao, Huansheng Song, Gang Li, Yu’an Zhang, and Xiangqing Zhang. "A Block Object Detection Method Based on Feature Fusion Networks for Autonomous Vehicles." Complexity 2019 (February 6, 2019): 1–14. http://dx.doi.org/10.1155/2019/4042624.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Nowadays, automatic multi-objective detection remains a challenging problem for autonomous vehicle technologies. In the past decades, deep learning has been demonstrated successful for multi-objective detection, such as the Single Shot Multibox Detector (SSD) model. The current trend is to train the deep Convolutional Neural Networks (CNNs) with online autonomous vehicle datasets. However, network performance usually degrades when small objects are detected. Moreover, the existing autonomous vehicle datasets could not meet the need for domestic traffic environment. To improve the detection performance of small objects and ensure the validity of the dataset, we propose a new method. Specifically, the original images are divided into blocks as input to a VGG-16 network which add the feature map fusion after CNNs. Moreover, the image pyramid is built to project all the blocks detection results at the original objects size as much as possible. In addition to improving the detection method, a new autonomous driving vehicle dataset is created, in which the object categories and labelling criteria are defined, and a data augmentation method is proposed. The experimental results on the new datasets show that the performance of the proposed method is greatly improved, especially for small objects detection in large image. Moreover, the proposed method is adaptive to complex climatic conditions and contributes a lot for autonomous vehicle perception and planning.
34

Alabdulkreem, Eatedal, Jaber Alzahrani, Nadhem Nemri, Olayan Alharbi, Abdullah Mohamed, Radwa Marzouk, and Anwer Hilal. "Computational Intelligence with Wild Horse Optimization Based Object Recognition and Classification Model for Autonomous Driving Systems." Applied Sciences 12, no. 12 (June 20, 2022): 6249. http://dx.doi.org/10.3390/app12126249.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Presently, autonomous systems have gained considerable attention in several fields such as transportation, healthcare, autonomous driving, logistics, etc. It is highly needed to ensure the safe operations of the autonomous system before launching it to the general public. Since the design of a completely autonomous system is a challenging process, perception and decision-making act as vital parts. The effective detection of objects on the road under varying scenarios can considerably enhance the safety of autonomous driving. The recently developed computational intelligence (CI) and deep learning models help to effectively design the object detection algorithms for environment perception depending upon the camera system that exists in the autonomous driving systems. With this motivation, this study designed a novel computational intelligence with a wild horse optimization-based object recognition and classification (CIWHO-ORC) model for autonomous driving systems. The proposed CIWHO-ORC technique intends to effectively identify the presence of multiple static and dynamic objects such as vehicles, pedestrians, signboards, etc. Additionally, the CIWHO-ORC technique involves the design of a krill herd (KH) algorithm with a multi-scale Faster RCNN model for the detection of objects. In addition, a wild horse optimizer (WHO) with an online sequential ridge regression (OSRR) model was applied for the classification of recognized objects. The experimental analysis of the CIWHO-ORC technique is validated using benchmark datasets, and the obtained results demonstrate the promising outcome of the CIWHO-ORC technique in terms of several measures.
35

Matignon, Laetitia, Laurent Jeanpierre, and Abdel-Illah Mouaddib. "DECENTRALIZED MULTI-ROBOT PLANNING TO EXPLORE AND PERCEIVE." Acta Polytechnica 55, no. 3 (June 30, 2015): 169–76. http://dx.doi.org/10.14311/ap.2015.55.0169.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In a recent French robotic contest, the objective was to develop a multi-robot system able to autonomously map and explore an unknown area while also detecting and localizing objects. As a participant in this challenge, we proposed a new decentralized Markov decision process (Dec-MDP) resolution based on distributed value functions (DVF) to compute multi-robot exploration strategies. The idea is to take advantage of sparse interactions by allowing each robot to calculate locally a strategy that maximizes the explored space while minimizing robots interactions. In this paper, we propose an adaptation of this method to improve also object recognition by integrating into the DVF the interest in covering explored areas with photos. The robots will then act to maximize the explored space and the photo coverage, ensuring better perception and object recognition.
36

Han, Tong, Tieyong Cao, Yunfei Zheng, Lei Chen, Yang Wang, and Bingyang Fu. "Improving the Detection and Positioning of Camouflaged Objects in YOLOv8." Electronics 12, no. 20 (October 11, 2023): 4213. http://dx.doi.org/10.3390/electronics12204213.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Camouflaged objects can be perfectly hidden in the surrounding environment by designing their texture and color. Existing object detection models have high false-negative rates and inaccurate localization for camouflaged objects. To resolve this, we improved the YOLOv8 algorithm based on feature enhancement. In the feature extraction stage, an edge enhancement module was built to enhance the edge feature. In the feature fusion stage, multiple asymmetric convolution branches were introduced to obtain larger receptive fields and achieve multi-scale feature fusion. In the post-processing stage, the existing non-maximum suppression algorithm was improved to address the issue of missed detection caused by overlapping boxes. Additionally, a shape-enhanced data augmentation method was designed to enhance the model’s shape perception of camouflaged objects. Experimental evaluations were carried out on camouflaged object datasets, including COD and CAMO, which are publicly accessible. The improved method exhibits enhancements in detection performance by 8.3% and 9.1%, respectively, compared to the YOLOv8 model.
37

Wang, Liman, and Jihong Zhu. "Deformable Object Manipulation in Caregiving Scenarios: A Review." Machines 11, no. 11 (November 7, 2023): 1013. http://dx.doi.org/10.3390/machines11111013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper reviews the robotic manipulation of deformable objects in caregiving scenarios. Deformable objects like clothing, food, and medical supplies are ubiquitous in care tasks, yet pose modeling, control, and sensing challenges. This paper categorises caregiving deformable objects and analyses their distinct properties influencing manipulation. Key sections examine progress in simulation, perception, planning, control, and system designs for deformable object manipulation, along with end-to-end deep learning’s potential. Hybrid analytical data-driven modeling shows promise. While laboratory successes have been achieved, real-world caregiving applications lag behind. Enhancing safety, speed, generalisation, and human compatibility is crucial for adoption. The review synthesises critical technologies, capabilities, and limitations, while also pointing to open challenges in deformable object manipulation for robotic caregiving. It provides a comprehensive reference for researchers tackling this socially valuable domain. In conclusion, multi-disciplinary innovations combining analytical and data-driven methods are needed to advance real-world robot performance and safety in deformable object manipulation for patient care.
38

Chen, Liru, Hantao Zhao, Chenhui Shi, Youbo Wu, Xuewen Yu, Wenze Ren, Ziyi Zhang, and Xiaomeng Shi. "Enhancing Multi-Modal Perception and Interaction: An Augmented Reality Visualization System for Complex Decision Making." Systems 12, no. 1 (December 25, 2023): 7. http://dx.doi.org/10.3390/systems12010007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Visualization systems play a crucial role in industry, education, and research domains by offering valuable insights and enhancing decision making. These systems enable the representation of complex workflows and data in a visually intuitive manner, facilitating better understanding, analysis, and communication of information. This paper explores the potential of augmented reality (AR) visualization systems that enhance multi-modal perception and interaction for complex decision making. The proposed system combines the physicality and intuitiveness of the real world with the immersive and interactive capabilities of AR systems. By integrating physical objects and virtual elements, users can engage in natural and intuitive interactions, leveraging multiple sensory modalities. Specifically, the system incorporates vision, touch, eye-tracking, and sound as multi-modal interaction methods to further improve the user experience. This multi-modal nature enables users to perceive and interact in a more holistic and immersive manner. The software and hardware engineering of the proposed system are elaborated in detail, and the system’s architecture and preliminary function testing results are also included in the manuscript. The findings aim to aid visualization system designers, researchers, and practitioners in exploring and harnessing the capabilities of this integrated approach, ultimately leading to more engaging and immersive user experiences in various application domains.
39

Zhou, Ziqi, Zheng Wang, Huchuan Lu, Song Wang, and Meijun Sun. "Multi-Type Self-Attention Guided Degraded Saliency Detection." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 13082–89. http://dx.doi.org/10.1609/aaai.v34i07.7010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Existing saliency detection techniques are sensitive to image quality and perform poorly on degraded images. In this paper, we systematically analyze the current status of the research on detecting salient objects from degraded images and then propose a new multi-type self-attention network, namely MSANet, for degraded saliency detection. The main contributions include: 1) Applying attention transfer learning to promote semantic detail perception and internal feature mining of the target network on degraded images; 2) Developing a multi-type self-attention mechanism to achieve the weight recalculation of multi-scale features. By computing global and local attention scores, we obtain the weighted features of different scales, effectively suppress the interference of noise and redundant information, and achieve a more complete boundary extraction. The proposed MSANet converts low-quality inputs to high-quality saliency maps directly in an end-to-end fashion. Experiments on seven widely-used datasets show that our approach produces good performance on both clear and degraded images.
40

Sharifzadeh, Sahand, Sina Moayed Baharlou, and Volker Tresp. "Classification by Attention: Scene Graph Classification with Prior Knowledge." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 6 (May 18, 2021): 5025–33. http://dx.doi.org/10.1609/aaai.v35i6.16636.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another. Previous works have addressed this by relational reasoning over all objects in an image or incorporating prior knowledge into classification. Unlike previous works, we do not consider separate models for perception and prior knowledge. Instead, we take a multi-task learning approach by introducing schema representations and implementing the classification as an attention layer between image-based representations and the schemata. This allows for the prior knowledge to emerge and propagate within the perception model. By enforcing the model also to represent the prior, we achieve a strong inductive bias. We show that our model can accurately generate commonsense knowledge and that the iterative injection of this knowledge to scene representations, as a top-down mechanism, leads to significantly higher classification performance. Additionally, our model can be fine-tuned on external knowledge given as triples. When combined with self-supervised learning and with 1% of annotated images only, this gives more than 3% improvement in object classification, 26% in scene graph classification, and 36% in predicate prediction accuracy.
41

Jiao, Yang, Zequn Jie, Shaoxiang Chen, Lechao Cheng, Jingjing Chen, Lin Ma, and Yu-Gang Jiang. "Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 2598–606. http://dx.doi.org/10.1609/aaai.v38i3.28037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field. Under such a paradigm, accurate BEV representation construction relies on reliable depth estimation for multi-camera images. However, existing approaches exhaustively predict depths for every pixel without prioritizing objects, which are precisely the entities requiring detection in the 3D space. To this end, we propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector. First, a category-specific structural priors mining approach is proposed for enhancing the efficacy of monocular depth generation. Besides, a self-boosting learning strategy is further proposed to encourage the model to place more emphasis on challenging objects in computation-expensive temporal stereo matching. Together they provide advanced depth estimation results for high-quality BEV features construction, benefiting the ultimate 3D detection. The proposed method achieves state-of-the-art performances on the challenging nuScenes benchmark, and extensive experimental results demonstrate the effectiveness of our designs.
42

Zhang, Xiao, Bingrong Xu, and Liping Lu. "High-quality 3D Object Detection Based on Instance-aware Sampling." Journal of Physics: Conference Series 2674, no. 1 (December 1, 2023): 012025. http://dx.doi.org/10.1088/1742-6596/2674/1/012025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract Most existing 3D object detection that is researched on point clouds uses random down-sampling algorithms, which preserve the primary shape and feature information of the point clouds and extract robust features of sampling point clouds to decrease the consumption of computing resources and improve processing inference efficiency. On the other hand, this method of sampling may disregard significant areas within the point clouds, thereby diminishing the precision of detecting comparatively minute entities such as pedestrians and cyclists. In this research, an instance perception-based object detection method is presented that adaptively chooses additional foreground points in response to the properties of smaller objects in the initial point clouds. In addition, it offers end-to-end training and addresses the issue that the typical hand-designed multi-scale grouping receptive fields do not accurately reflect the size of the item. To effectively enhance the recall rate of foreground points, we present an instance-aware algorithm, which relies on farthest point sampling. Specifically, we employ dynamic gating networks to attain instance perception. This approach enables sample candidates to encompass a larger number of foreground objects. Then we use an aggregation layer to efficiently capture robust contextual features with rich spatial information. Through experiments performed on the KITTI dataset, we substantiate the effectiveness of our IAS when compared with current advanced 3D object detection methods. We have achieved a notable enhancement, especially in identifying diminutive objects.
43

Zhao, Hongzhuan, Dihua Sun, Min Zhao, and Senlin Cheng. "A Multi-Classification Method of Improved SVM-based Information Fusion for Traffic Parameters Forecasting." PROMET - Traffic&Transportation 28, no. 2 (April 25, 2016): 117–24. http://dx.doi.org/10.7307/ptt.v28i2.1643.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the enrichment of perception methods, modern transportation system has many physical objects whose states are influenced by many information factors so that it is a typical Cyber-Physical System (CPS). Thus, the traffic information is generally multi-sourced, heterogeneous and hierarchical. Existing research results show that the multisourced traffic information through accurate classification in the process of information fusion can achieve better parameters forecasting performance. For solving the problem of traffic information accurate classification, via analysing the characteristics of the multi-sourced traffic information and using redefined binary tree to overcome the shortcomings of the original Support Vector Machine (SVM) classification in information fusion, a multi-classification method using improved SVM in information fusion for traffic parameters forecasting is proposed. The experiment was conducted to examine the performance of the proposed scheme, and the results reveal that the method can get more accurate and practical outcomes.
44

Wang, Li, Ruifeng Li, Hezi Shi, Jingwen Sun, Lijun Zhao, Hock Seah, Chee Quah, and Budianto Tandianus. "Multi-Channel Convolutional Neural Network Based 3D Object Detection for Indoor Robot Environmental Perception." Sensors 19, no. 4 (February 21, 2019): 893. http://dx.doi.org/10.3390/s19040893.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Environmental perception is a vital feature for service robots when working in an indoor environment for a long time. The general 3D reconstruction is a low-level geometric information description that cannot convey semantics. In contrast, higher level perception similar to humans requires more abstract concepts, such as objects and scenes. Moreover, the 2D object detection based on images always fails to provide the actual position and size of an object, which is quite important for a robot’s operation. In this paper, we focus on the 3D object detection to regress the object’s category, 3D size, and spatial position through a convolutional neural network (CNN). We propose a multi-channel CNN for 3D object detection, which fuses three input channels including RGB, depth, and bird’s eye view (BEV) images. We also propose a method to generate 3D proposals based on 2D ones in the RGB image and semantic prior. Training and test are conducted on the modified NYU V2 dataset and SUN RGB-D dataset in order to verify the effectiveness of the algorithm. We also carry out the actual experiments in a service robot to utilize the proposed 3D object detection method to enhance the environmental perception of the robot.
45

Phupattanasilp, Pilaiwan, and Sheau-Ru Tong. "Augmented Reality in the Integrative Internet of Things (AR-IoT): Application for Precision Farming." Sustainability 11, no. 9 (May 9, 2019): 2658. http://dx.doi.org/10.3390/su11092658.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Benefitted by the Internet of Things (IoT), visualization capabilities facilitate the improvement of precision farming, especially in dynamic indoor planting. However, conventional IoT data visualization is usually carried out in offsite and textual environments, i.e., text and number, which do not promote a user’s sensorial perception and interaction. This paper introduces the use of augmented reality (AR) as a support to IoT data visualization, called AR-IoT. The AR-IoT system superimposes IoT data directly onto real-world objects and enhances object interaction. As a case study, this system is applied to crop monitoring. Multi-camera, a non-destructive and low-cost imaging platform of the IoT, is connected to the internet and integrated into the system to measure the three-dimensional (3D) coordinates of objects. The relationships among accuracy, object coordinates, augmented information (e.g., virtual objects), and object interaction are investigated. The proposed system shows a great potential to integrate IoT data with AR resolution, which will effectively contribute to updating precision agricultural techniques in an environmentally sustainable manner.
46

Yang, Shuo, Yongqi Wang, Xiaofeng Ji, and Xinxiao Wu. "Multi-Modal Prompting for Open-Vocabulary Video Visual Relationship Detection." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 6513–21. http://dx.doi.org/10.1609/aaai.v38i7.28472.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Open-vocabulary video visual relationship detection aims to extend video visual relationship detection beyond annotated categories by detecting unseen relationships between objects in videos. Recent progresses in open-vocabulary perception, primarily driven by large-scale image-text pre-trained models like CLIP, have shown remarkable success in recognizing novel objects and semantic categories. However, directly applying CLIP-like models to video visual relationship detection encounters significant challenges due to the substantial gap between images and video object relationships. To address this challenge, we propose a multi-modal prompting method that adapts CLIP well to open-vocabulary video visual relationship detection by prompt-tuning on both visual representation and language input. Specifically, we enhance the image encoder of CLIP by using spatio-temporal visual prompting to capture spatio-temporal contexts, thereby making it suitable for object-level relationship representation in videos. Furthermore, we propose visual-guided language prompting to leverage CLIP's comprehensive semantic knowledge for discovering unseen relationship categories, thus facilitating recognizing novel video relationships. Extensive experiments on two public datasets, VidVRD and VidOR, demonstrate the effectiveness of our method, especially achieving a significant gain of nearly 10% in mAP on novel relationship categories on the VidVRD dataset.
47

Kong, Yanzi, Feng Zhu, Haibo Sun, Zhiyuan Lin, and Qun Wang. "A Generic View Planning System Based on Formal Expression of Perception Tasks." Entropy 24, no. 5 (April 20, 2022): 578. http://dx.doi.org/10.3390/e24050578.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
View planning (VP) is a technique that guides the adjustment of the sensor’s postures in multi-view perception tasks. It converts the perception process into active perception, which improves the intelligence and reduces the resource consumption of the robot. We propose a generic VP system for multiple kinds of visual perception. The VP system is built on the basis of the formal description of the visual task, and the next best view is calculated by the system. When dealing with a given visual task, we can simply update its description as the input of the VP system, and obtain the defined best view in real time. Formal description of the perception task includes the task’s status, the objects’ prior information library, the visual representation status and the optimization goal. The task’s status and the visual representation status are updated when data are received at a new view. If the task’s status has not reached its goal, candidate views are sorted based on the updated visual representation status, and the next best view that can minimize the entropy of the model space is chosen as the output of the VP system. Experiments of view planning for 3D recognition and reconstruction tasks are conducted, and the result shows that our algorithm has good performance on different tasks.
48

Liu, Yong, Cheng Li, Jiade Huang, and Ming Gao. "MineSDS: A Unified Framework for Small Object Detection and Drivable Area Segmentation for Open-Pit Mining Scenario." Sensors 23, no. 13 (June 27, 2023): 5977. http://dx.doi.org/10.3390/s23135977.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
To tackle the challenges posed by dense small objects and fuzzy boundaries on unstructured roads in the mining scenario, we proposed an end-to-end small object detection and drivable area segmentation framework for open-pit mining. We employed a convolutional network backbone as a feature extractor for both two tasks, as multi-task learning yielded promising results in autonomous driving perception. To address small object detection, we introduced a lightweight attention module that allowed our network to focus more on the spatial and channel dimensions of small objects without impeding inference time. We also used a convolutional block attention module in the drivable area segmentation subnetwork, which assigned more weight to road boundaries to improve feature mapping capabilities. Furthermore, to improve our network perception accuracy of both tasks, we used weighted summation when designing the loss function. We validated the effectiveness of our approach by testing it on pre-collected mining data which were called Minescape. Our detection results on the Minescape dataset showed 87.8% mAP index, which was 9.3% higher than state-of-the-art algorithms. Our segmentation results surpassed the comparison algorithm by 1 percent in MIoU index. Our experimental results demonstrated that our approach achieves competitive performance.
49

Zhang, Rong, Zhongjie Zhu, Long Li, Yongqiang Bai, and Jiong Shi. "BFE-Net: Object Detection with Bidirectional Feature Enhancement." Electronics 12, no. 21 (November 3, 2023): 4531. http://dx.doi.org/10.3390/electronics12214531.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In realistic scenarios, existing object detection models still face challenges in resisting interference and detecting small objects due to complex environmental factors such as light and noise. For this reason, a novel scheme termed BFE-Net based on bidirectional feature enhancement is proposed. Firstly, a new multi-scale feature extraction module is constructed, which uses a self-attention mechanism to simulate human visual perception. It is used to capture global information and long-range dependencies between pixels, thereby optimizing the extraction of multi-scale features from input images. Secondly, a feature enhancement and denoising module is designed, based on bidirectional information flow. In the top-down, the impact of noise on the feature map is weakened to further enhance the feature extraction. In the bottom-up, multi-scale features are fused to improve the accuracy of small object feature extraction. Lastly, a generalized intersection over union regression loss function is employed to optimize the movement direction of predicted bounding boxes, improving the efficiency and accuracy of object localization. Experimental results using the public dataset PASCAL VOC2007test show that our scheme achieves a mean average precision (mAP) of 85% for object detection, which is 2.3% to 8.6% higher than classical methods such as RetinaNet and YOLOv5. Particularly, the anti-interference capability and the performance in detecting small objects show a significant enhancement.
50

Hua, Xia, Xinqing Wang, Ting Rui, Dong Wang, and Faming Shao. "Real-Time Object Detection in Remote Sensing Images Based on Visual Perception and Memory Reasoning." Electronics 8, no. 10 (October 11, 2019): 1151. http://dx.doi.org/10.3390/electronics8101151.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed of two fully convolutional networks, namely, the strengthened object self-attention pre-screening fully convolutional network (SOSA-FCN) and the object accurate detection fully convolutional network (AD-FCN). SOSA-FCN introduces a self-attention module to extract attention feature maps and constructs a depth feature pyramid to optimize the attention feature maps by combining convolutional long-term and short-term memory networks. It guides the acquisition of potential sub-regions of the object in the scene, reduces the computational complexity, and enhances the network’s ability to extract multi-scale object features. It adapts to the complex background and small object characteristics of a large-scene remote sensing image. In AD-FCN, the object mask and object orientation estimation layer are designed to achieve fine positioning of candidate frames. The performance of the proposed algorithm is compared with that of other advanced methods on NWPU_VHR-10, DOTA, UCAS-AOD, and other open datasets. The experimental results show that the proposed algorithm significantly improves the efficiency of object detection while ensuring detection accuracy and has high adaptability. It has extensive engineering application prospects.

To the bibliography