Journal articles on the topic 'Nuscenes'

To see the other types of publications on this topic, follow the link: Nuscenes.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 22 journal articles for your research on the topic 'Nuscenes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Fong, Whye Kit, Rohit Mohan, Juana Valeria Hurtado, Lubing Zhou, Holger Caesar, Oscar Beijbom, and Abhinav Valada. "Panoptic Nuscenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking." IEEE Robotics and Automation Letters 7, no. 2 (April 2022): 3795–802. http://dx.doi.org/10.1109/lra.2022.3148457.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

He, Qingdong, Hao Zeng, Yi Zeng, and Yijun Liu. "SCIR-Net: Structured Color Image Representation Based 3D Object Detection Network from Point Clouds." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 4486–94. http://dx.doi.org/10.1609/aaai.v36i4.20371.

Full text
Abstract:
3D object detection from point clouds data has become an indispensable part in autonomous driving. Previous works for processing point clouds lie in either projection or voxelization. However, projection-based methods suffer from information loss while voxelization-based methods bring huge computation. In this paper, we propose to encode point clouds into structured color image representation (SCIR) and utilize 2D CNN to fulfill the 3D detection task. Specifically, we use the structured color image encoding module to convert the irregular 3D point clouds into a squared 2D tensor image, where each point corresponds to a spatial point in the 3D space. Furthermore, in order to fit for the Euclidean structure, we apply feature normalization to parameterize the 2D tensor image onto a regular dense color image. Then, we conduct repeated multi-scale fusion with different levels so as to augment the initial features and learn scale-aware feature representations for box prediction. Extensive experiments on KITTI benchmark, Waymo Open Dataset and more challenging nuScenes dataset show that our proposed method yields decent results and demonstrate the effectiveness of such representations for point clouds.
APA, Harvard, Vancouver, ISO, and other styles
3

Dao, Minh-Quan, and Vincent Frémont. "A Two-Stage Data Association Approach for 3D Multi-Object Tracking." Sensors 21, no. 9 (April 21, 2021): 2894. http://dx.doi.org/10.3390/s21092894.

Full text
Abstract:
Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Jiarong, Ming Zhu, Bo Wang, Deyao Sun, Hua Wei, Changji Liu, and Haitao Nie. "KDA3D: Key-Point Densification and Multi-Attention Guidance for 3D Object Detection." Remote Sensing 12, no. 11 (June 11, 2020): 1895. http://dx.doi.org/10.3390/rs12111895.

Full text
Abstract:
In this paper, we propose a novel 3D object detector KDA3D, which achieves high-precision and robust classification, segmentation, and localization with the help of key-point densification and multi-attention guidance. The proposed end-to-end neural network architecture takes LIDAR point clouds as the main inputs that can be optionally complemented by RGB images. It consists of three parts: part-1 segments 3D foreground points and generates reliable proposals; part-2 (optional) enhances point cloud density and reconstructs the more compact full-point feature map; part-3 refines 3D bounding boxes and adds semantic segmentation as extra supervision. Our designed lightweight point-wise and channel-wise attention modules can adaptively strengthen the “skeleton” and “distinctiveness” point-features to help feature learning networks capture more representative or finer patterns. The proposed key-point densification component can generate pseudo-point clouds containing target information from monocular images through the distance preference strategy and K-means clustering so as to balance the density distribution and enrich sparse features. Extensive experiments on the KITTI and nuScenes 3D object detection benchmarks show that our KDA3D produces state-of-the-art results while running in near real-time with a low memory footprint.
APA, Harvard, Vancouver, ISO, and other styles
5

Xiao, Yanqiu, Shiao Yin, Guangzhen Cui, Lei Yao, Zhanpeng Fang, and Weili Zhang. "A Near-Field Area Object Detection Method for Intelligent Vehicles Based on Multi-Sensor Information Fusion." World Electric Vehicle Journal 13, no. 9 (August 24, 2022): 160. http://dx.doi.org/10.3390/wevj13090160.

Full text
Abstract:
In order to solve the difficulty for intelligent vehicles in detecting near-field targets, this paper proposes a near-field object detection method based on multi-sensor information fusion. Firstly, the F-CenterFusion method is proposed to fuse the information from LiDAR, millimeter wave (mmWave) radar, and camera to fully obtain target state information in the near-field area. Secondly, multi-attention modules are constructed in the image and point cloud feature extraction networks, respectively, to locate the targets’ class-dependent features and suppress the expression of useless information. Then, the dynamic connection mechanism is used to fuse image and point cloud information to enhance feature expression capabilities. The fusion results are input into the predictive inference head network to obtain target attributes, locations, and other data. This method is verified by the nuScenes dataset. Compared with the CenterFusion method using mmWave radar and camera fusion information, the NDS and mAP values of our method are improved by 5.1% and 10.9%, respectively, and the average accuracy score of multi-class detection is improved by 22.7%. The experimental results show that the proposed method can enable intelligent vehicles to realize near-field target detection with high accuracy and strong robustness.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhao, Lin, Siyuan Xu, Liman Liu, Delie Ming, and Wenbing Tao. "SVASeg: Sparse Voxel-Based Attention for 3D LiDAR Point Cloud Semantic Segmentation." Remote Sensing 14, no. 18 (September 7, 2022): 4471. http://dx.doi.org/10.3390/rs14184471.

Full text
Abstract:
3D LiDAR has become an indispensable sensor in autonomous driving vehicles. In LiDAR-based 3D point cloud semantic segmentation, most voxel-based 3D segmentors cannot efficiently capture large amounts of context information, resulting in limited receptive fields and limiting their performance. To address this problem, a sparse voxel-based attention network is introduced for 3D LiDAR point cloud semantic segmentation, termed SVASeg, which captures large amounts of context information between voxels through sparse voxel-based multi-head attention (SMHA). The traditional multi-head attention cannot directly be applied to the non-empty sparse voxels. To this end, a hash table is built according to the incrementation of voxel coordinates to lookup the non-empty neighboring voxels of each sparse voxel. Then, the sparse voxels are grouped into different groups, and each group corresponds to a local region. Afterwards, position embedding, multi-head attention and feature fusion are performed for each group to capture and aggregate the context information. Based on the SMHA module, the SVASeg can directly operate on the non-empty voxels, maintaining a comparable computational overhead to the convolutional method. Extensive experimental results on the SemanticKITTI and nuScenes datasets show the superiority of SVASeg.
APA, Harvard, Vancouver, ISO, and other styles
7

Grigorescu, Sorin, Cosmin Ginerica, Mihai Zaha, Gigel Macesanu, and Bogdan Trasnea. "LVD-NMPC: A learning-based vision dynamics approach to nonlinear model predictive control for autonomous vehicles." International Journal of Advanced Robotic Systems 18, no. 3 (May 1, 2021): 172988142110195. http://dx.doi.org/10.1177/17298814211019544.

Full text
Abstract:
In this article, we introduce a learning-based vision dynamics approach to nonlinear model predictive control (NMPC) for autonomous vehicles, coined learning-based vision dynamics (LVD) NMPC. LVD-NMPC uses an a-priori process model and a learned vision dynamics model used to calculate the dynamics of the driving scene, the controlled system’s desired state trajectory, and the weighting gains of the quadratic cost function optimized by a constrained predictive controller. The vision system is defined as a deep neural network designed to estimate the dynamics of the image scene. The input is based on historic sequences of sensory observations and vehicle states, integrated by an augmented memory component. Deep Q-learning is used to train the deep network, which once trained can also be used to calculate the desired trajectory of the vehicle. We evaluate LVD-NMPC against a baseline dynamic window approach (DWA) path planning executed using standard NMPC and against the PilotNet neural network. Performance is measured in our simulation environment GridSim, on a real-world 1:8 scaled model car as well as on a real size autonomous test vehicle and the nuScenes computer vision dataset.
APA, Harvard, Vancouver, ISO, and other styles
8

Huang, Y., J. Zhou, B. Li, J. Xiao, and Y. Cao. "ROLL-SENSITIVE ONLINE CAMERA ORIENTATION DETERMINATION ON THE STRUCTURED ROAD." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2022 (May 30, 2022): 687–93. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2022-687-2022.

Full text
Abstract:
Abstract. Online camera calibration technology can estimate the pose of the camera onboard in real time, playing an important role in many fields such as HD map production and autonomous vehicles. Some researchers use one vanishing point (VP) to calculate the pitch and yaw angle of the onboard camera. However, this method assumes that the roll angle is zero, which is impractical because of the inevitable installation error. This paper proposes a novel online camera orientation determination method based on a longitudinal vanishing point without the zero-roll hypothesis. The orientation of the camera is determined in two steps: calculating the pitch and yaw angles according to vanishing point theory, and then obtaining the roll angle with lane widths constraint which is modeled as an optimization problem. To verify the effectiveness of our algorithm, we evaluated it on the nuScenes dataset. As a result, the rotation error of the roll and pitch angle can achieve 0.154° and 0.116° respectively. Also, we deployed our method in the “Tuyou”, an autonomous vehicle developed by Wuhan University, and then tested it in the urban structured road. Our proposed method can reconstruct the ground space accurately compared with previous methods with zero-roll hypothesis.
APA, Harvard, Vancouver, ISO, and other styles
9

Koh, Junho, Jaekyum Kim, Jin Hyeok Yoo, Yecheol Kim, Dongsuk Kum, and Jun Won Choi. "Joint 3D Object Detection and Tracking Using Spatio-Temporal Representation of Camera Image and LiDAR Point Clouds." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 1210–18. http://dx.doi.org/10.1609/aaai.v36i1.20007.

Full text
Abstract:
In this paper, we propose a new joint object detection and tracking (JoDT) framework for 3D object detection and tracking based on camera and LiDAR sensors. The proposed method, referred to as 3D DetecTrack, enables the detector and tracker to cooperate to generate a spatio-temporal representation of the camera and LiDAR data, with which 3D object detection and tracking are then performed. The detector constructs the spatio-temporal features via the weighted temporal aggregation of the spatial features obtained by the camera and LiDAR fusion. Then, the detector reconfigures the initial detection results using information from the tracklets maintained up to the previous time step. Based on the spatio-temporal features generated by the detector, the tracker associates the detected objects with previously tracked objects using a graph neural network (GNN). We devise a fully-connected GNN facilitated by a combination of rule-based edge pruning and attention-based edge gating, which exploits both spatial and temporal object contexts to improve tracking performance. The experiments conducted on both KITTI and nuScenes benchmarks demonstrate that the proposed 3D DetecTrack achieves significant improvements in both detection and tracking performances over baseline methods and achieves state-of-the-art performance among existing methods through collaboration between the detector and tracker.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Chen, Zhe Chen, Jing Zhang, and Dacheng Tao. "SASA: Semantics-Augmented Set Abstraction for Point-Based 3D Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 221–29. http://dx.doi.org/10.1609/aaai.v36i1.19897.

Full text
Abstract:
Although point-based networks are demonstrated to be accurate for 3D point cloud modeling, they are still falling behind their voxel-based competitors in 3D detection. We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects. To tackle this issue, we propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA). Technically, we first add a binary segmentation module as the side output to help identify foreground points. Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling. In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection. Additionally, it is an easy-to-plug-in module and able to boost various point-based detectors, including single-stage and two-stage ones. Extensive experiments on the popular KITTI and nuScenes datasets validate the superiority of SASA, lifting point-based detection models to reach comparable performance to state-of-the-art voxel-based methods. Code is available at https://github.com/blakechen97/SASA.
APA, Harvard, Vancouver, ISO, and other styles
11

Rukhovich, D. D. "2D-to-3D Projection for Monocular and Multi-View 3D Object Detection in Outdoor Scenes." Programmnaya Ingeneria 12, no. 7 (October 11, 2021): 373–84. http://dx.doi.org/10.17587/prin.12.373-384.

Full text
Abstract:
In this article, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. In a multi-view formulation of the 3D object detection problem, several images of a static scene are used to detect objects in the scene. To address the 3D object detection problem in a multi-view formulation, we propose a novel 3D object detection method named ImVoxelNet. ImVoxelNet is based on a fully convolutional neural network. Unlike existing 3D object detection methods, ImVoxelNet works directly with 3D representations and does not mediate 3D object detection through 2D object detection. The proposed method accepts multi-view inputs. The number of monocular images in each multi-view input can vary during training and inference; actually, this number might be unique for each multi-view input. Moreover, we propose to treat a single RGB image as a special case of a multi-view input. Accordingly, the proposed method can also accept monocular inputs with no modifications. Through extensive evaluation, we demonstrate that the proposed method successfully handles a variety of outdoor scenes. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. The proposed method operates in real-time, which makes it possible to integrate it into the navigation systems of autonomous devices. The results of this study can be used to address tasks of navigation, path planning, and semantic scene mapping.
APA, Harvard, Vancouver, ISO, and other styles
12

Yin, Lingmei, Wei Tian, Ling Wang, Zhiang Wang, and Zhuoping Yu. "SPV-SSD: An Anchor-Free 3D Single-Stage Detector with Supervised-PointRendering and Visibility Representation." Remote Sensing 15, no. 1 (December 27, 2022): 161. http://dx.doi.org/10.3390/rs15010161.

Full text
Abstract:
Recently, 3D object detection based on multi-modal sensor fusion has been increasingly adopted in automated driving and robotics. For example, the semantic information provided by cameras and the geometric information provided by light detection and ranging (LiDAR) are fused to perceive 3D objects, as single modal sensors are unable to capture enough information from the environment. Many state-of-the-art methods fuse the signals sequentially for simplicity. By sequentially, we mean using the image semantic signals as auxiliary input for LiDAR-based object detectors would make the overall performance heavily rely on the semantic signals. Moreover, the error introduced by these signals may lead to detection errors. To remedy this dilemma, we propose an approach coined supervised-PointRendering to correct the potential errors in the image semantic segmentation results by training auxiliary tasks with fused features of the laser point geometry feature, the image semantic feature and a novel laser visibility feature. The laser visibility feature is obtained through the raycasting algorithm and is adopted to constrain the spatial distribution of fore- and background objects. Furthermore, we build an efficient anchor-free Single Stage Detector (SSD) powered by an advanced global-optimal label assignment to achieve a better time–accuracy balance. The new detection framework is evaluated on the extensively used KITTI and nuScenes datasets, manifesting the highest inference speed and at the same time outperforming most of the existing single-stage detectors with respect to the average precision.
APA, Harvard, Vancouver, ISO, and other styles
13

Shao, Huixiang, Zhijiang Zhang, Xiaoyu Feng, and Dan Zeng. "SCRnet: A Spatial Consistency Guided Network Using Contrastive Learning for Point Cloud Registration." Symmetry 14, no. 1 (January 12, 2022): 140. http://dx.doi.org/10.3390/sym14010140.

Full text
Abstract:
Point cloud registration is used to find a rigid transformation from the source point cloud to the target point cloud. The main challenge in the point cloud registration is in finding correct correspondences in complex scenes that may contain many noise and repetitive structures. At present, many existing methods use outlier rejections to help the network obtain more accurate correspondences, but they often ignore the spatial consistency between keypoints. Therefore, to address this issue, we propose a spatial consistency guided network using contrastive learning for point cloud registration (SCRnet), in which its overall stage is symmetrical. SCRnet consists of four blocks, namely feature extraction block, confidence estimation block, contrastive learning block and registration block. Firstly, we use mini-PointNet to extract coarse local and global features. Secondly, we propose confidence estimation block, which formulate outlier rejection as confidence estimation problem of keypoint correspondences. In addition, the local spatial features are encoded into the confidence estimation block, which makes the correspondence possess local spatial consistency. Moreover, we propose contrastive learning block by constructing positive point pairs and hard negative point pairs and using Point-Pair-INfoNCE contrastive loss, which can further remove hard outliers through global spatial consistency. Finally, the proposed registration block selects a set of matching points with high spatial consistency and uses these matching sets to calculate multiple transformations, then the best transformation can be identified by initial alignment and Iterative Closest Point (ICP) algorithm. Extensive experiments are conducted on KITTI and nuScenes dataset, which demonstrate the high accuracy and strong robustness of SCRnet on point cloud registration task.
APA, Harvard, Vancouver, ISO, and other styles
14

Nobis, Felix, Ehsan Shafiei, Phillip Karle, Johannes Betz, and Markus Lienkamp. "Radar Voxel Fusion for 3D Object Detection." Applied Sciences 11, no. 12 (June 17, 2021): 5598. http://dx.doi.org/10.3390/app11125598.

Full text
Abstract:
Automotive traffic scenes are complex due to the variety of possible scenarios, objects, and weather conditions that need to be handled. In contrast to more constrained environments, such as automated underground trains, automotive perception systems cannot be tailored to a narrow field of specific tasks but must handle an ever-changing environment with unforeseen events. As currently no single sensor is able to reliably perceive all relevant activity in the surroundings, sensor data fusion is applied to perceive as much information as possible. Data fusion of different sensors and sensor modalities on a low abstraction level enables the compensation of sensor weaknesses and misdetections among the sensors before the information-rich sensor data are compressed and thereby information is lost after a sensor-individual object detection. This paper develops a low-level sensor fusion network for 3D object detection, which fuses lidar, camera, and radar data. The fusion network is trained and evaluated on the nuScenes data set. On the test set, fusion of radar data increases the resulting AP (Average Precision) detection score by about 5.1% in comparison to the baseline lidar network. The radar sensor fusion proves especially beneficial in inclement conditions such as rain and night scenes. Fusing additional camera data contributes positively only in conjunction with the radar fusion, which shows that interdependencies of the sensors are important for the detection result. Additionally, the paper proposes a novel loss to handle the discontinuity of a simple yaw representation for object detection. Our updated loss increases the detection and orientation estimation performance for all sensor input configurations. The code for this research has been made available on GitHub.
APA, Harvard, Vancouver, ISO, and other styles
15

Feng, Yeli, Daniel Jun Xian Ng, and Arvind Easwaran. "Improving Variational Autoencoder based Out-of-Distribution Detection for Embedded Real-time Applications." ACM Transactions on Embedded Computing Systems 20, no. 5s (October 31, 2021): 1–26. http://dx.doi.org/10.1145/3477026.

Full text
Abstract:
Uncertainties in machine learning are a significant roadblock for its application in safety-critical cyber-physical systems (CPS). One source of uncertainty arises from distribution shifts in the input data between training and test scenarios. Detecting such distribution shifts in real-time is an emerging approach to address the challenge. The high dimensional input space in CPS applications involving imaging adds extra difficulty to the task. Generative learning models are widely adopted for the task, namely out-of-distribution (OoD) detection. To improve the state-of-the-art, we studied existing proposals from both machine learning and CPS fields. In the latter, safety monitoring in real-time for autonomous driving agents has been a focus. Exploiting the spatiotemporal correlation of motion in videos, we can robustly detect hazardous motion around autonomous driving agents. Inspired by the latest advances in the Variational Autoencoder (VAE) theory and practice, we tapped into the prior knowledge in data to further boost OoD detection’s robustness. Comparison studies over nuScenes and Synthia data sets show our methods significantly improve detection capabilities of OoD factors unique to driving scenarios, 42% better than state-of-the-art approaches. Our model also generalized near-perfectly, 97% better than the state-of-the-art across the real-world and simulation driving data sets experimented. Finally, we customized one proposed method into a twin-encoder model that can be deployed to resource limited embedded devices for real-time OoD detection. Its execution time was reduced over four times in low-precision 8-bit integer inference, while detection capability is comparable to its corresponding floating-point model.
APA, Harvard, Vancouver, ISO, and other styles
16

Nobis, Felix, Felix Fent, Johannes Betz, and Markus Lienkamp. "Kernel Point Convolution LSTM Networks for Radar Point Cloud Segmentation." Applied Sciences 11, no. 6 (March 15, 2021): 2599. http://dx.doi.org/10.3390/app11062599.

Full text
Abstract:
State-of-the-art 3D object detection for autonomous driving is achieved by processing lidar sensor data with deep-learning methods. However, the detection quality of the state of the art is still far from enabling safe driving in all conditions. Additional sensor modalities need to be used to increase the confidence and robustness of the overall detection result. Researchers have recently explored radar data as an additional input source for universal 3D object detection. This paper proposes artificial neural network architectures to segment sparse radar point cloud data. Segmentation is an intermediate step towards radar object detection as a complementary concept to lidar object detection. Conceptually, we adapt Kernel Point Convolution (KPConv) layers for radar data. Additionally, we introduce a long short-term memory (LSTM) variant based on KPConv layers to make use of the information content in the time dimension of radar data. This is motivated by classical radar processing, where tracking of features over time is imperative to generate confident object proposals. We benchmark several variants of the network on the public nuScenes data set against a state-of-the-art pointnet-based approach. The performance of the networks is limited by the quality of the publicly available data. The radar data and radar-label quality is of great importance to the training and evaluation of machine learning models. Therefore, the advantages and disadvantages of the available data set, regarding its radar data, are discussed in detail. The need for a radar-focused data set for object detection is expressed. We assume that higher segmentation scores should be achievable with better-quality data for all models compared, and differences between the models should manifest more clearly. To facilitate research with additional radar data, the modular code for this research will be made available to the public.
APA, Harvard, Vancouver, ISO, and other styles
17

Qi, Chunyang, Chuanxue Song, Naifu Zhang, Shixin Song, Xinyu Wang, and Feng Xiao. "Millimeter-Wave Radar and Vision Fusion Target Detection Algorithm Based on an Extended Network." Machines 10, no. 8 (August 10, 2022): 675. http://dx.doi.org/10.3390/machines10080675.

Full text
Abstract:
The need for a vehicle to perceive information about the external environmental as an independent intelligent individual has grown with the progress of intelligent driving from primary driver assistance to high-level autonomous driving. The ability of a common independent sensing unit to sense the external environment is limited by the sensor’s own characteristics and algorithm level. Hence, a common independent sensing unit fails to obtain comprehensive sensing information independently under conditions such as rain, fog, and night. Accordingly, an extended network-based fusion target detection algorithm for millimeter-wave radar and vision fusion is proposed in this work by combining the complementary perceptual performance of in-vehicle sensing elements, cost effectiveness, and maturity of independent detection technologies. Feature-level fusion is first used in this work according to the analysis of technical routes of the millimeter-wave radar and vision fusion. Training and test evaluation of the algorithm are carried out on the nuScenes dataset and test data from a homemade data acquisition platform. An extended investigation on the RetinaNet one-stage target detection algorithm based on the VGG-16+FPN backbone detection network is then conducted in this work to introduce millimeter-wave radar images as auxiliary information for visual image target detection. We use two-channel radar and three-channel visual images as inputs of the fusion network. We also propose an extended VGG-16 network applicable to millimeter-wave radar and visual fusion and an extended feature pyramid network. Test results showed that the mAP of the proposed network improves by 2.9% and the small target accuracy is enhanced by 18.73% compared with those of the reference network for pure visual image target detection. This finding verified the detection capability and algorithmic feasibility of the proposed extended fusion target detection network for visually insensitive targets.
APA, Harvard, Vancouver, ISO, and other styles
18

Khemmar, Redouane, Antoine Mauri, Camille Dulompont, Jayadeep Gajula, Vincent Vauchey, Madjid Haddad, and Rémi Boutteau. "Road and Railway Smart Mobility: A High-definition Ground Truth Hybrid Dataset." Sensors 22, no. 10 (May 22, 2022): 3922. http://dx.doi.org/10.3390/s22103922.

Full text
Abstract:
A robust visual understanding of complex urban environments using passive optical sensors is an onerous and essential task for autonomous navigation. The problem is heavily characterized by the quality of the available dataset and the number of instances it includes. Regardless of the benchmark results of perception algorithms, a model would only be reliable and capable of enhanced decision making if the dataset covers the exact domain of the end-use case. For this purpose, in order to improve the level of instances in datasets used for the training and validation of Autonomous Vehicles (AV), Advanced Driver Assistance Systems (ADAS), and autonomous driving, and to reduce the void due to the no-existence of any datasets in the context of railway smart mobility, we introduce our multimodal hybrid dataset called ESRORAD. ESRORAD is comprised of 34 videos, 2.7 k virtual images, and 100 k real images for both road and railway scenes collected in two Normandy towns, Rouen and Le Havre. All the images are annotated with 3D bounding boxes showing at least three different classes of persons, cars, and bicycles. Crucially, our dataset is the first of its kind with uncompromised efforts on being the best in terms of large volume, abundance in annotation, and diversity in scenes. Our escorting study provides an in-depth analysis of the dataset’s characteristics as well as a performance evaluation with various state-of-the-art models trained under other popular datasets, namely, KITTI and NUScenes. Some examples of image annotations and the prediction results of our 3D object detection lightweight algorithms are available in ESRORAD dataset. Finally, the dataset is available online. This repository consists of 52 datasets with their respective annotations performed.
APA, Harvard, Vancouver, ISO, and other styles
19

Hu, Yihan, Zhuangzhuang Ding, Runzhou Ge, Wenxin Shao, Li Huang, Kun Li, and Qiang Liu. "AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 969–79. http://dx.doi.org/10.1609/aaai.v36i1.19980.

Full text
Abstract:
There have been two streams in the 3D detection from point clouds: single-stage methods and two-stage methods. While the former is more computationally efficient, the latter usually provides better detection accuracy. By carefully examining the two-stage approaches, we have found that if appropriately designed, the first stage can produce accurate box regression. In this scenario, the second stage mainly rescores the boxes such that the boxes with better localization get selected. From this observation, we have devised a single-stage anchor-free network that can fulfill these requirements. This network, named AFDetV2, extends the previous work by incorporating a self-calibrated convolution block in the backbone, a keypoint auxiliary supervision, and an IoU prediction branch in the multi-task head. We take a simple product of the predicted IoU score with the classification heatmap to form the final classification confidence. The enhanced backbone strengthens the box localization capability, and the rescoring approach effectively joins the object presence confidence and the box regression accuracy. As a result, the detection accuracy is drastically boosted in the single-stage. To evaluate our approach, we have conducted extensive experiments on the Waymo Open Dataset and the nuScenes Dataset. We have observed that our AFDetV2 achieves the state-of-the-art results on these two datasets, superior to all the prior arts, including both the single-stage and the two-stage 3D detectors. AFDetV2 won the 1st place in the Real-Time 3D Detection of the Waymo Open Dataset Challenge 2021. In addition, a variant of our model AFDetV2-Base was entitled the "Most Efficient Model" by the Challenge Sponsor, showing a superior computational efficiency. To demonstrate the generality of this single-stage method, we have also applied it to the first stage of the two-stage networks. Without exception, the results show that with the strengthened backbone and the rescoring approach, the second stage refinement is no longer needed.
APA, Harvard, Vancouver, ISO, and other styles
20

Jiang, Kun, Yining Shi, Taohua Zhou, Mengmeng Yang, and Diange Yang. "PTMOT: A Probabilistic Multiple Object Tracker Enhanced by Tracklet Confidence for Autonomous Driving." Automotive Innovation, July 5, 2022. http://dx.doi.org/10.1007/s42154-022-00185-1.

Full text
Abstract:
AbstractReal driving scenarios, due to occlusions and disturbances, provide disordered and noisy measurements, which makes the task of multi-object tracking quite challenging. Conventional approach is to find deterministic data association; however, it has unstable performance in high clutter density. This paper proposes a novel probabilistic tracklet-enhanced multiple object tracker (PTMOT), which integrates Poisson multi-Bernoulli mixture (PMBM) filter with confidence of tracklets. The proposed method is able to realize efficient and robust probabilistic association for 3D multi-object tracking (MOT) and improve the PMBM filter’s continuity by smoothing single target hypothesis with global hypothesis. It consists of two key parts. First, the PMBM tracker based on sets of tracklets is implemented to realize probabilistic fusion of disordered measurements. Second, the confidence of tracklets is smoothed through a smoothing-while-filtering approach. Extensive MOT tests on nuScenes tracking dataset demonstrate that the proposed method achieves superior performance in different modalities.
APA, Harvard, Vancouver, ISO, and other styles
21

Yang, Biao, Jicheng Yang, Rongrong Ni, Changchun Yang, and Xiaofeng Liu. "Multi-granularity scenarios understanding network for trajectory prediction." Complex & Intelligent Systems, August 4, 2022. http://dx.doi.org/10.1007/s40747-022-00834-2.

Full text
Abstract:
AbstractUnderstanding agents’ motion behaviors under complex scenes is crucial for intelligent autonomous moving systems (like delivery robots and self-driving cars). It is challenging duo to the inherent uncertain of future trajectories and the large variation in the scene layout. However, most recent approaches ignored or underutilized the scenario information. In this work, a Multi-Granularity Scenarios Understanding framework, MGSU, is proposed to explore the scene layout from different granularity. MGSU can be divided into three modules: (1) A coarse-grained fusion module uses the cross-attention to fuse the observed trajectory with the semantic information of the scene. (2) The inverse reinforcement learning module generates optimal path strategy through grid-based policy sampling and outputs multiple scene paths. (3) The fine-grained fusion module integrates the observed trajectory with the scene paths to generate multiple future trajectories. To fully explore the scene information and improve the efficiency, we present a novel scene-fusion Transformer, whose encoder is used to extract scene features and the decoder is used to fuse scene and trajectory features to generate future trajectories. Compared with the current state-of-the-art methods, our method decreases the ADE errors by 4.3% and 3.3% by gradually integrating different granularity of scene information on SDD and NuScenes, respectively. The visualized trajectories demonstrate that our method can accurately predict future trajectories after fusing scene information.
APA, Harvard, Vancouver, ISO, and other styles
22

Li, Yan, Kai Zeng, and Tao Shen. "CenterTransFuser: radar point cloud and visual information fusion for 3D object detection." EURASIP Journal on Advances in Signal Processing 2023, no. 1 (January 11, 2023). http://dx.doi.org/10.1186/s13634-022-00944-6.

Full text
Abstract:
AbstractSensor fusion is an important component of the perception system in autonomous driving, and the fusion of radar point cloud information and camera visual information can improve the perception capability of autonomous vehicles. However, most of the existing studies ignore the extraction of local neighborhood information and only consider shallow fusion between the two modalities based on the extracted global information, which cannot perform a deep fusion of cross-modal contextual information interaction. Meanwhile, in data preprocessing, the noise in radar data is usually only filtered by the depth information derived from image feature prediction, and such methods affect the accuracy of radar branching to generate regions of interest and cannot effectively filter out irrelevant information of radar points. This paper proposes the CenterTransFuser model that makes full use of millimeter-wave radar point cloud information and visual information to enable cross-modal fusion of the two heterogeneous information. Specifically, a new interaction called cross-transformer is explored, which cooperatively exploits cross-modal cross-multiple attention and joint cross-multiple attention to mine radar and image complementary information. Meanwhile, an adaptive depth thresholding filtering method is designed to reduce the noise of radar modality-independent information projected onto the image. The CenterTransFuser model is evaluated on the challenging nuScenes dataset, and it achieves excellent performance. Particularly, the detection accuracy is significantly improved for pedestrians, motorcycles, and bicycles, showing the superiority and effectiveness of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography