Academic literature on the topic 'Nuscenes'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Nuscenes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Nuscenes"

1

Fong, Whye Kit, Rohit Mohan, Juana Valeria Hurtado, Lubing Zhou, Holger Caesar, Oscar Beijbom, and Abhinav Valada. "Panoptic Nuscenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking." IEEE Robotics and Automation Letters 7, no. 2 (April 2022): 3795–802. http://dx.doi.org/10.1109/lra.2022.3148457.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

He, Qingdong, Hao Zeng, Yi Zeng, and Yijun Liu. "SCIR-Net: Structured Color Image Representation Based 3D Object Detection Network from Point Clouds." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 4486–94. http://dx.doi.org/10.1609/aaai.v36i4.20371.

Full text
Abstract:
3D object detection from point clouds data has become an indispensable part in autonomous driving. Previous works for processing point clouds lie in either projection or voxelization. However, projection-based methods suffer from information loss while voxelization-based methods bring huge computation. In this paper, we propose to encode point clouds into structured color image representation (SCIR) and utilize 2D CNN to fulfill the 3D detection task. Specifically, we use the structured color image encoding module to convert the irregular 3D point clouds into a squared 2D tensor image, where each point corresponds to a spatial point in the 3D space. Furthermore, in order to fit for the Euclidean structure, we apply feature normalization to parameterize the 2D tensor image onto a regular dense color image. Then, we conduct repeated multi-scale fusion with different levels so as to augment the initial features and learn scale-aware feature representations for box prediction. Extensive experiments on KITTI benchmark, Waymo Open Dataset and more challenging nuScenes dataset show that our proposed method yields decent results and demonstrate the effectiveness of such representations for point clouds.
APA, Harvard, Vancouver, ISO, and other styles
3

Dao, Minh-Quan, and Vincent Frémont. "A Two-Stage Data Association Approach for 3D Multi-Object Tracking." Sensors 21, no. 9 (April 21, 2021): 2894. http://dx.doi.org/10.3390/s21092894.

Full text
Abstract:
Multi-Object Tracking (MOT) is an integral part of any autonomous driving pipelines because it produces trajectories of other moving objects in the scene and predicts their future motion. Thanks to the recent advances in 3D object detection enabled by deep learning, track-by-detection has become the dominant paradigm in 3D MOT. In this paradigm, a MOT system is essentially made of an object detector and a data association algorithm which establishes track-to-detection correspondence. While 3D object detection has been actively researched, association algorithms for 3D MOT has settled at bipartite matching formulated as a Linear Assignment Problem (LAP) and solved by the Hungarian algorithm. In this paper, we adapt a two-stage data association method which was successfully applied to image-based tracking to the 3D setting, thus providing an alternative for data association for 3D MOT. Our method outperforms the baseline using one-stage bipartite matching for data association by achieving 0.587 Average Multi-Object Tracking Accuracy (AMOTA) in NuScenes validation set and 0.365 AMOTA (at level 2) in Waymo test set.
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Jiarong, Ming Zhu, Bo Wang, Deyao Sun, Hua Wei, Changji Liu, and Haitao Nie. "KDA3D: Key-Point Densification and Multi-Attention Guidance for 3D Object Detection." Remote Sensing 12, no. 11 (June 11, 2020): 1895. http://dx.doi.org/10.3390/rs12111895.

Full text
Abstract:
In this paper, we propose a novel 3D object detector KDA3D, which achieves high-precision and robust classification, segmentation, and localization with the help of key-point densification and multi-attention guidance. The proposed end-to-end neural network architecture takes LIDAR point clouds as the main inputs that can be optionally complemented by RGB images. It consists of three parts: part-1 segments 3D foreground points and generates reliable proposals; part-2 (optional) enhances point cloud density and reconstructs the more compact full-point feature map; part-3 refines 3D bounding boxes and adds semantic segmentation as extra supervision. Our designed lightweight point-wise and channel-wise attention modules can adaptively strengthen the “skeleton” and “distinctiveness” point-features to help feature learning networks capture more representative or finer patterns. The proposed key-point densification component can generate pseudo-point clouds containing target information from monocular images through the distance preference strategy and K-means clustering so as to balance the density distribution and enrich sparse features. Extensive experiments on the KITTI and nuScenes 3D object detection benchmarks show that our KDA3D produces state-of-the-art results while running in near real-time with a low memory footprint.
APA, Harvard, Vancouver, ISO, and other styles
5

Xiao, Yanqiu, Shiao Yin, Guangzhen Cui, Lei Yao, Zhanpeng Fang, and Weili Zhang. "A Near-Field Area Object Detection Method for Intelligent Vehicles Based on Multi-Sensor Information Fusion." World Electric Vehicle Journal 13, no. 9 (August 24, 2022): 160. http://dx.doi.org/10.3390/wevj13090160.

Full text
Abstract:
In order to solve the difficulty for intelligent vehicles in detecting near-field targets, this paper proposes a near-field object detection method based on multi-sensor information fusion. Firstly, the F-CenterFusion method is proposed to fuse the information from LiDAR, millimeter wave (mmWave) radar, and camera to fully obtain target state information in the near-field area. Secondly, multi-attention modules are constructed in the image and point cloud feature extraction networks, respectively, to locate the targets’ class-dependent features and suppress the expression of useless information. Then, the dynamic connection mechanism is used to fuse image and point cloud information to enhance feature expression capabilities. The fusion results are input into the predictive inference head network to obtain target attributes, locations, and other data. This method is verified by the nuScenes dataset. Compared with the CenterFusion method using mmWave radar and camera fusion information, the NDS and mAP values of our method are improved by 5.1% and 10.9%, respectively, and the average accuracy score of multi-class detection is improved by 22.7%. The experimental results show that the proposed method can enable intelligent vehicles to realize near-field target detection with high accuracy and strong robustness.
APA, Harvard, Vancouver, ISO, and other styles
6

Zhao, Lin, Siyuan Xu, Liman Liu, Delie Ming, and Wenbing Tao. "SVASeg: Sparse Voxel-Based Attention for 3D LiDAR Point Cloud Semantic Segmentation." Remote Sensing 14, no. 18 (September 7, 2022): 4471. http://dx.doi.org/10.3390/rs14184471.

Full text
Abstract:
3D LiDAR has become an indispensable sensor in autonomous driving vehicles. In LiDAR-based 3D point cloud semantic segmentation, most voxel-based 3D segmentors cannot efficiently capture large amounts of context information, resulting in limited receptive fields and limiting their performance. To address this problem, a sparse voxel-based attention network is introduced for 3D LiDAR point cloud semantic segmentation, termed SVASeg, which captures large amounts of context information between voxels through sparse voxel-based multi-head attention (SMHA). The traditional multi-head attention cannot directly be applied to the non-empty sparse voxels. To this end, a hash table is built according to the incrementation of voxel coordinates to lookup the non-empty neighboring voxels of each sparse voxel. Then, the sparse voxels are grouped into different groups, and each group corresponds to a local region. Afterwards, position embedding, multi-head attention and feature fusion are performed for each group to capture and aggregate the context information. Based on the SMHA module, the SVASeg can directly operate on the non-empty voxels, maintaining a comparable computational overhead to the convolutional method. Extensive experimental results on the SemanticKITTI and nuScenes datasets show the superiority of SVASeg.
APA, Harvard, Vancouver, ISO, and other styles
7

Grigorescu, Sorin, Cosmin Ginerica, Mihai Zaha, Gigel Macesanu, and Bogdan Trasnea. "LVD-NMPC: A learning-based vision dynamics approach to nonlinear model predictive control for autonomous vehicles." International Journal of Advanced Robotic Systems 18, no. 3 (May 1, 2021): 172988142110195. http://dx.doi.org/10.1177/17298814211019544.

Full text
Abstract:
In this article, we introduce a learning-based vision dynamics approach to nonlinear model predictive control (NMPC) for autonomous vehicles, coined learning-based vision dynamics (LVD) NMPC. LVD-NMPC uses an a-priori process model and a learned vision dynamics model used to calculate the dynamics of the driving scene, the controlled system’s desired state trajectory, and the weighting gains of the quadratic cost function optimized by a constrained predictive controller. The vision system is defined as a deep neural network designed to estimate the dynamics of the image scene. The input is based on historic sequences of sensory observations and vehicle states, integrated by an augmented memory component. Deep Q-learning is used to train the deep network, which once trained can also be used to calculate the desired trajectory of the vehicle. We evaluate LVD-NMPC against a baseline dynamic window approach (DWA) path planning executed using standard NMPC and against the PilotNet neural network. Performance is measured in our simulation environment GridSim, on a real-world 1:8 scaled model car as well as on a real size autonomous test vehicle and the nuScenes computer vision dataset.
APA, Harvard, Vancouver, ISO, and other styles
8

Huang, Y., J. Zhou, B. Li, J. Xiao, and Y. Cao. "ROLL-SENSITIVE ONLINE CAMERA ORIENTATION DETERMINATION ON THE STRUCTURED ROAD." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2022 (May 30, 2022): 687–93. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2022-687-2022.

Full text
Abstract:
Abstract. Online camera calibration technology can estimate the pose of the camera onboard in real time, playing an important role in many fields such as HD map production and autonomous vehicles. Some researchers use one vanishing point (VP) to calculate the pitch and yaw angle of the onboard camera. However, this method assumes that the roll angle is zero, which is impractical because of the inevitable installation error. This paper proposes a novel online camera orientation determination method based on a longitudinal vanishing point without the zero-roll hypothesis. The orientation of the camera is determined in two steps: calculating the pitch and yaw angles according to vanishing point theory, and then obtaining the roll angle with lane widths constraint which is modeled as an optimization problem. To verify the effectiveness of our algorithm, we evaluated it on the nuScenes dataset. As a result, the rotation error of the roll and pitch angle can achieve 0.154° and 0.116° respectively. Also, we deployed our method in the “Tuyou”, an autonomous vehicle developed by Wuhan University, and then tested it in the urban structured road. Our proposed method can reconstruct the ground space accurately compared with previous methods with zero-roll hypothesis.
APA, Harvard, Vancouver, ISO, and other styles
9

Koh, Junho, Jaekyum Kim, Jin Hyeok Yoo, Yecheol Kim, Dongsuk Kum, and Jun Won Choi. "Joint 3D Object Detection and Tracking Using Spatio-Temporal Representation of Camera Image and LiDAR Point Clouds." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 1210–18. http://dx.doi.org/10.1609/aaai.v36i1.20007.

Full text
Abstract:
In this paper, we propose a new joint object detection and tracking (JoDT) framework for 3D object detection and tracking based on camera and LiDAR sensors. The proposed method, referred to as 3D DetecTrack, enables the detector and tracker to cooperate to generate a spatio-temporal representation of the camera and LiDAR data, with which 3D object detection and tracking are then performed. The detector constructs the spatio-temporal features via the weighted temporal aggregation of the spatial features obtained by the camera and LiDAR fusion. Then, the detector reconfigures the initial detection results using information from the tracklets maintained up to the previous time step. Based on the spatio-temporal features generated by the detector, the tracker associates the detected objects with previously tracked objects using a graph neural network (GNN). We devise a fully-connected GNN facilitated by a combination of rule-based edge pruning and attention-based edge gating, which exploits both spatial and temporal object contexts to improve tracking performance. The experiments conducted on both KITTI and nuScenes benchmarks demonstrate that the proposed 3D DetecTrack achieves significant improvements in both detection and tracking performances over baseline methods and achieves state-of-the-art performance among existing methods through collaboration between the detector and tracker.
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Chen, Zhe Chen, Jing Zhang, and Dacheng Tao. "SASA: Semantics-Augmented Set Abstraction for Point-Based 3D Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 221–29. http://dx.doi.org/10.1609/aaai.v36i1.19897.

Full text
Abstract:
Although point-based networks are demonstrated to be accurate for 3D point cloud modeling, they are still falling behind their voxel-based competitors in 3D detection. We observe that the prevailing set abstraction design for down-sampling points may maintain too much unimportant background information that can affect feature learning for detecting objects. To tackle this issue, we propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA). Technically, we first add a binary segmentation module as the side output to help identify foreground points. Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling. In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection. Additionally, it is an easy-to-plug-in module and able to boost various point-based detectors, including single-stage and two-stage ones. Extensive experiments on the popular KITTI and nuScenes datasets validate the superiority of SASA, lifting point-based detection models to reach comparable performance to state-of-the-art voxel-based methods. Code is available at https://github.com/blakechen97/SASA.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Nuscenes"

1

Caesar, Holger, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. "nuScenes: A Multimodal Dataset for Autonomous Driving." In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020. http://dx.doi.org/10.1109/cvpr42600.2020.01164.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chen, Zehui, Zhenyu Li, Shiquan Zhang, Liangji Fang, Qinhong Jiang, Feng Zhao, Bolei Zhou, and Hang Zhao. "AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/116.

Full text
Abstract:
Object detection through either RGB images or the LiDAR point clouds has been extensively explored in autonomous driving. However, it remains challenging to make these two data sources complementary and beneficial to each other. In this paper, we propose AutoAlign, an automatic feature fusion strategy for 3D object detection. Instead of establishing deterministic correspondence with camera projection matrix, we model the mapping relationship between the image and point clouds with a learnable alignment map. This map enables our model to automate the alignment of non-homogenous features in a dynamic and data-driven manner. Specifically, a cross-attention feature alignment module is devised to adaptively aggregate pixel-level image features for each voxel. To enhance the semantic consistency during feature alignment, we also design a self-supervised cross-modal feature interaction module, through which the model can learn feature aggregation with instance-level feature guidance. Extensive experimental results show that our approach can lead to 2.3 mAP and 7.0 mAP improvements on the KITTI and nuScenes datasets respectively. Notably, our best model reaches 70.9 NDS on the nuScenes testing leaderboard, achieving competitive performance among various state-of-the-arts.
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Xiaoyan, Gang Zhang, Tao Jiang, Xufen Cai, and Zhenhua Wang. "PRNet: Point-Range Fusion Network for Real-Time LiDAR Semantic Segmentation." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/156.

Full text
Abstract:
Accurate and real-time LiDAR semantic segmentation is necessary for advanced autonomous driving systems. To guarantee a fast inference speed, previous methods utilize the highly optimized 2D convolutions to extract features on the range view (RV), which is the most compact representation of the LiDAR point clouds. However, these methods often suffer from lower accuracy for two reasons: 1) the information loss during the projection from 3D points to the RV, 2) the semantic ambiguity when 3D points labels are assigned according to the RV predictions. In this work, we introduce an end-to-end point-range fusion network (PRNet) that extracts semantic features mainly on the RV and iteratively fuses the RV features back to the 3D points for the final prediction. Besides, a novel range view projection (RVP) operation is designed to alleviate the information loss during the projection to the RV, and a point-range convolution (PRConv) is proposed to automatically mitigate the semantic ambiguity during transmitting features from the RV back to 3D points. Experiments on the SemanticKITTI and nuScenes benchmarks demonstrate that the PRNet pushes the range-based methods to a new state-of-the-art, and achieves a better speed-accuracy trade-off.
APA, Harvard, Vancouver, ISO, and other styles
4

Azarchenkov, Andrey, and Maksim Lyubimov. "Multi-agent Approach to Predict the Trajectory of Road Infrastructure Agents Using a Convolutional Neural Network." In 31th International Conference on Computer Graphics and Vision. Keldysh Institute of Applied Mathematics, 2021. http://dx.doi.org/10.20948/graphicon-2021-3027-954-961.

Full text
Abstract:
The problem of creating a fully autonomous vehicle is one of the most urgent in the field of artificial intelligence. Many companies claim to sell such cars in certain working conditions. The task of interacting with other road users is to detect them, determine their physical properties, and predict their future states. The result of this prediction is the trajectory of road users’ movement for a given period of time in the near future. Based on such trajectories, the planning system determines the behavior of an autonomous-driving vehicle. This paper demonstrates a multi-agent method for determining the trajectories of road users, by means of a road map of the surrounding area, working with the use of convolutional neural networks. In addition, the input of the neural network gets an agent state vector containing additional information about the object. A number of experiments are conducted for the selected neural architecture in order to attract its modifications to the prediction result. The results are estimated using metrics showing the spatial deviation of the predicted trajectory. The method is trained using the nuscenes test dataset obtained from lgsvl-simulator.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography