To see the other types of publications on this topic, follow the link: 3D semantic scene completion.

Journal articles on the topic '3D semantic scene completion'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic '3D semantic scene completion.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Luo, Shoutong, Zhengxing Sun, Yunhan Sun, and Yi Wang. "Resolution‐switchable 3D Semantic Scene Completion." Computer Graphics Forum 41, no. 7 (October 2022): 121–30. http://dx.doi.org/10.1111/cgf.14662.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tang, Jiaxiang, Xiaokang Chen, Jingbo Wang, and Gang Zeng. "Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2352–60. http://dx.doi.org/10.1609/aaai.v36i2.20134.

Full text
Abstract:
We revisit Semantic Scene Completion (SSC), a useful task to predict the semantic and occupancy representation of 3D scenes, in this paper. A number of methods for this task are always based on voxelized scene representations. Although voxel representations keep local structures of the scene, these methods suffer from heavy computation redundancy due to the existence of visible empty voxels when the network goes deeper. To address this dilemma, we propose our novel point-voxel aggregation network for this task. We first transfer the voxelized scenes to point clouds by removing these visible empty voxels and adopt a deep point stream to capture semantic information from the scene efficiently. Meanwhile, a light-weight voxel stream containing only two 3D convolution layers preserves local structures of the voxelized scenes. Furthermore, we design an anisotropic voxel aggregation operator to fuse the structure details from the voxel stream into the point stream, and a semantic-aware propagation module to enhance the up-sampling process in the point stream by semantic labels. We demonstrate that our model surpasses state-of-the-arts on two benchmarks by a large margin, with only the depth images as input.
APA, Harvard, Vancouver, ISO, and other styles
3

Behley, Jens, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Jürgen Gall, and Cyrill Stachniss. "Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset." International Journal of Robotics Research 40, no. 8-9 (April 20, 2021): 959–67. http://dx.doi.org/10.1177/02783649211006735.

Full text
Abstract:
A holistic semantic scene understanding exploiting all available sensor modalities is a core capability to master self-driving in complex everyday traffic. To this end, we present the SemanticKITTI dataset that provides point-wise semantic annotations of Velodyne HDL-64E point clouds of the KITTI Odometry Benchmark. Together with the data, we also published three benchmark tasks for semantic scene understanding covering different aspects of semantic scene understanding: (1) semantic segmentation for point-wise classification using single or multiple point clouds as input; (2) semantic scene completion for predictive reasoning on the semantics and occluded regions; and (3) panoptic segmentation combining point-wise classification and assigning individual instance identities to separate objects of the same class. In this article, we provide details on our dataset showing an unprecedented number of fully annotated point cloud sequences, more information on our labeling process to efficiently annotate such a vast amount of point clouds, and lessons learned in this process. The dataset and resources are available at http://www.semantic-kitti.org .
APA, Harvard, Vancouver, ISO, and other styles
4

Xu, Jinfeng, Xianzhi Li, Yuan Tang, Qiao Yu, Yixue Hao, Long Hu, and Min Chen. "CasFusionNet: A Cascaded Network for Point Cloud Semantic Scene Completion by Dense Feature Fusion." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 3 (June 26, 2023): 3018–26. http://dx.doi.org/10.1609/aaai.v37i3.25405.

Full text
Abstract:
Semantic scene completion (SSC) aims to complete a partial 3D scene and predict its semantics simultaneously. Most existing works adopt the voxel representations, thus suffering from the growth of memory and computation cost as the voxel resolution increases. Though a few works attempt to solve SSC from the perspective of 3D point clouds, they have not fully exploited the correlation and complementarity between the two tasks of scene completion and semantic segmentation. In our work, we present CasFusionNet, a novel cascaded network for point cloud semantic scene completion by dense feature fusion. Specifically, we design (i) a global completion module (GCM) to produce an upsampled and completed but coarse point set, (ii) a semantic segmentation module (SSM) to predict the per-point semantic labels of the completed points generated by GCM, and (iii) a local refinement module (LRM) to further refine the coarse completed points and the associated labels from a local perspective. We organize the above three modules via dense feature fusion in each level, and cascade a total of four levels, where we also employ feature fusion between each level for sufficient information usage. Both quantitative and qualitative results on our compiled two point-based datasets validate the effectiveness and superiority of our CasFusionNet compared to state-of-the-art methods in terms of both scene completion and semantic segmentation. The codes and datasets are available at: https://github.com/JinfengX/CasFusionNet.
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Siqi, Changqing Zou, Yipeng Li, Xibin Zhao, and Yue Gao. "Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11402–9. http://dx.doi.org/10.1609/aaai.v34i07.6803.

Full text
Abstract:
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Yu, and Chao Tong. "H2GFormer: Horizontal-to-Global Voxel Transformer for 3D Semantic Scene Completion." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (March 24, 2024): 5722–30. http://dx.doi.org/10.1609/aaai.v38i6.28384.

Full text
Abstract:
3D Semantic Scene Completion (SSC) has emerged as a novel task in vision-based holistic 3D scene understanding. Its objective is to densely predict the occupancy and category of each voxel in a 3D scene based on input from either LiDAR or images. Currently, many transformer-based semantic scene completion frameworks employ simple yet popular Cross-Attention and Self-Attention mechanisms to integrate and infer dense geometric and semantic information of voxels. However, they overlook the distinctions among voxels in the scene, especially in outdoor scenarios where the horizontal direction contains more variations. And voxels located at object boundaries and within the interior of objects exhibit varying levels of positional significance. To address this issue, we propose a transformer-based SSC framework called H2GFormer that incorporates a horizontal-to-global approach. This framework takes into full consideration the variations of voxels in the horizontal direction and the characteristics of voxels on object boundaries. We introduce a horizontal window-to-global attention (W2G) module that effectively fuses semantic information by first diffusing it horizontally from reliably visible voxels and then propagating the semantic understanding to global voxels, ensuring a more reliable fusion of semantic-aware features. Moreover, an Internal-External Position Awareness Loss (IoE-PALoss) is utilized during network training to emphasize the critical positions within the transition regions between objects. The experiments conducted on the SemanticKITTI dataset demonstrate that H2GFormer exhibits superior performance in both geometric and semantic completion tasks. Our code is available on https://github.com/Ryanwy1/H2GFormer.
APA, Harvard, Vancouver, ISO, and other styles
7

Wang, Xuzhi, Di Lin, and Liang Wan. "FFNet: Frequency Fusion Network for Semantic Scene Completion." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 2550–57. http://dx.doi.org/10.1609/aaai.v36i3.20156.

Full text
Abstract:
Semantic scene completion (SSC) requires the estimation of the 3D geometric occupancies of objects in the scene, along with the object categories. Currently, many methods employ RGB-D images to capture the geometric and semantic information of objects. These methods use simple but popular spatial- and channel-wise operations, which fuse the information of RGB and depth data. Yet, they ignore the large discrepancy of RGB-D data and the uncertainty measurements of depth data. To solve this problem, we propose the Frequency Fusion Network (FFNet), a novel method for boosting semantic scene completion by better utilizing RGB-D data. FFNet explicitly correlates the RGB-D data in the frequency domain, different from the features directly extracted by the convolution operation. Then, the network uses the correlated information to guide the feature learning from the RG- B and depth images, respectively. Moreover, FFNet accounts for the properties of different frequency components of RGB- D features. It has a learnable elliptical mask to decompose the features learned from the RGB and depth images, attending to various frequencies to facilitate the correlation process of RGB-D data. We evaluate FFNet intensively on the public SSC benchmarks, where FFNet surpasses the state-of- the-art methods. The code package of FFNet is available at https://github.com/alanWXZ/FFNet.
APA, Harvard, Vancouver, ISO, and other styles
8

Shan, Y., Y. Xia, Y. Chen, and D. Cremers. "SCP: SCENE COMPLETION PRE-TRAINING FOR 3D OBJECT DETECTION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1/W2-2023 (December 13, 2023): 41–46. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-w2-2023-41-2023.

Full text
Abstract:
Abstract. 3D object detection using LiDAR point clouds is a fundamental task in the fields of computer vision, robotics, and autonomous driving. However, existing 3D detectors heavily rely on annotated datasets, which are both time-consuming and prone to errors during the process of labeling 3D bounding boxes. In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance of 3D object detectors with less labeled data. SCP offers three key advantages: (1) Improved initialization of the point cloud model. By completing the scene point clouds, SCP effectively captures the spatial and semantic relationships among objects within urban environments. (2) Elimination of the need for additional datasets. SCP serves as a valuable auxiliary network that does not impose any additional efforts or data requirements on the 3D detectors. (3) Reduction of the amount of labeled data for detection. With the help of SCP, the existing state-of-the-art 3D detectors can achieve comparable performance while only relying on 20% labeled data.
APA, Harvard, Vancouver, ISO, and other styles
9

Ding, Junzhe, Jin Zhang, Luqin Ye, and Cheng Wu. "Kalman-Based Scene Flow Estimation for Point Cloud Densification and 3D Object Detection in Dynamic Scenes." Sensors 24, no. 3 (January 31, 2024): 916. http://dx.doi.org/10.3390/s24030916.

Full text
Abstract:
Point cloud densification is essential for understanding the 3D environment. It provides crucial structural and semantic information for downstream tasks such as 3D object detection and tracking. However, existing registration-based methods struggle with dynamic targets due to the incompleteness and deformation of point clouds. To address this challenge, we propose a Kalman-based scene flow estimation method for point cloud densification and 3D object detection in dynamic scenes. Our method effectively tackles the issue of localization errors in scene flow estimation and enhances the accuracy and precision of shape completion. Specifically, we introduce a Kalman filter to correct the dynamic target’s position while estimating long sequence scene flow. This approach helps eliminate the cumulative localization error during the scene flow estimation process. Extended experiments on the KITTI 3D tracking dataset demonstrate that our method significantly improves the performance of LiDAR-only detectors, achieving superior results compared to the baselines.
APA, Harvard, Vancouver, ISO, and other styles
10

Park, Sang-Min, and Jong-Eun Ha. "3D Semantic Scene Completion With Multi-scale Feature Maps and Masked Autoencoder." Journal of Institute of Control, Robotics and Systems 29, no. 12 (December 31, 2023): 966–72. http://dx.doi.org/10.5302/j.icros.2023.23.0143.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Li, Bohan, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin, and Wenjun Zeng. "One at a Time: Progressive Multi-Step Volumetric Probability Learning for Reliable 3D Scene Perception." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 4 (March 24, 2024): 3028–36. http://dx.doi.org/10.1609/aaai.v38i4.28085.

Full text
Abstract:
Numerous studies have investigated the pivotal role of reliable 3D volume representation in scene perception tasks, such as multi-view stereo (MVS) and semantic scene completion (SSC). They typically construct 3D probability volumes directly with geometric correspondence, attempting to fully address the scene perception tasks in a single forward pass. However, such a single-step solution makes it hard to learn accurate and convincing volumetric probability, especially in challenging regions like unexpected occlusions and complicated light reflections. Therefore, this paper proposes to decompose the complicated 3D volume representation learning into a sequence of generative steps to facilitate fine and reliable scene perception. Considering the recent advances achieved by strong generative diffusion models, we introduce a multi-step learning framework, dubbed as VPD, dedicated to progressively refining the Volumetric Probability in a Diffusion process. Specifically, we first build a coarse probability volume from input images with the off-the-shelf scene perception baselines, which is then conditioned as the basic geometry prior before being fed into a 3D diffusion UNet, to progressively achieve accurate probability distribution modeling. To handle the corner cases in challenging areas, a Confidence-Aware Contextual Collaboration (CACC) module is developed to correct the uncertain regions for reliable volumetric learning based on multi-scale contextual contents. Moreover, an Online Filtering (OF) strategy is designed to maintain representation consistency for stable diffusion sampling. Extensive experiments are conducted on scene perception tasks including multi-view stereo (MVS) and semantic scene completion (SSC), to validate the efficacy of our method in learning reliable volumetric representations. Notably, for the SSC task, our work stands out as the first to surpass LiDAR-based methods on the SemanticKITTI dataset.
APA, Harvard, Vancouver, ISO, and other styles
12

Yan, Xu, Jiantao Gao, Jie Li, Ruimao Zhang, Zhen Li, Rui Huang, and Shuguang Cui. "Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3101–9. http://dx.doi.org/10.1609/aaai.v35i4.16419.

Full text
Abstract:
LiDAR point cloud analysis is a core task for 3D computer vision, especially for autonomous driving. However, due to the severe sparsity and noise interference in the single sweep LiDAR point cloud, the accurate semantic segmentation is non-trivial to achieve. In this paper, we propose a novel sparse LiDAR point cloud semantic segmentation framework assisted by learned contextual shape priors. In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input. By merging multiple frames in the LiDAR sequence as supervision, the optimized SSC module has learned the contextual shape priors from sequential LiDAR data, completing the sparse single sweep point cloud to the dense one. Thus, it inherently improves SS optimization through fully end-to-end training. Besides, a Point-Voxel Interaction (PVI) module is proposed to further enhance the knowledge fusion between SS and SSC tasks, i.e., promoting the interaction of incomplete local geometry of point cloud and complete voxel-wise global structure. Furthermore, the auxiliary SSC and PVI modules can be discarded during inference without extra burden for SS. Extensive experiments confirm that our JS3C-Net achieves superior performance on both SemanticKITTI and SemanticPOSS benchmarks, i.e., 4% and 3% improvement correspondingly.
APA, Harvard, Vancouver, ISO, and other styles
13

Deschaud, Jean-Emmanuel, David Duque, Jean Pierre Richa, Santiago Velasco-Forero, Beatriz Marcotegui, and François Goulette. "Paris-CARLA-3D: A Real and Synthetic Outdoor Point Cloud Dataset for Challenging Tasks in 3D Mapping." Remote Sensing 13, no. 22 (November 21, 2021): 4713. http://dx.doi.org/10.3390/rs13224713.

Full text
Abstract:
Paris-CARLA-3D is a dataset of several dense colored point clouds of outdoor environments built by a mobile LiDAR and camera system. The data are composed of two sets with synthetic data from the open source CARLA simulator (700 million points) and real data acquired in the city of Paris (60 million points), hence the name Paris-CARLA-3D. One of the advantages of this dataset is to have simulated the same LiDAR and camera platform in the open source CARLA simulator as the one used to produce the real data. In addition, manual annotation of the classes using the semantic tags of CARLA was performed on the real data, allowing the testing of transfer methods from the synthetic to the real data. The objective of this dataset is to provide a challenging dataset to evaluate and improve methods on difficult vision tasks for the 3D mapping of outdoor environments: semantic segmentation, instance segmentation, and scene completion. For each task, we describe the evaluation protocol as well as the experiments carried out to establish a baseline.
APA, Harvard, Vancouver, ISO, and other styles
14

Jarvis, R. A. "3D Shape and surface colour sensor fusion for robot vision." Robotica 10, no. 5 (September 1992): 389–96. http://dx.doi.org/10.1017/s0263574700010596.

Full text
Abstract:
SUMMARYThis paper argues the case for extracting as complete a set of sensory data as practicable from scenes consisting of complex assemblages of objects with the goal of completing the task of scene analysis, including placement, pose, identity and relationship amongst the components in a robust manner which supports goal directed robotic action, including collision-free trajectory planning, grip site location and manipulation of selected object classes.The emphasis of the paper is that of sensor fusion of range and surface colour data including preliminary results in proximity, surface normal directionality and colour based scene segmentation through semantic-free clustering processes. The larger context is that of imbedding the results of such analysis in a graphics world containing an articulated robotic manipulator and of carrying out experiments in that world prior to replication of safe manipulation sequences in the real world.
APA, Harvard, Vancouver, ISO, and other styles
15

Hu, Shengshan, Junwei Zhang, Wei Liu, Junhui Hou, Minghui Li, Leo Yu Zhang, Hai Jin, and Lichao Sun. "PointCA: Evaluating the Robustness of 3D Point Cloud Completion Models against Adversarial Examples." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 872–80. http://dx.doi.org/10.1609/aaai.v37i1.25166.

Full text
Abstract:
Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unknown. In addition, existing attack approaches towards point cloud classifiers cannot be applied to the completion models due to different output forms and attack purposes. In order to evaluate the robustness of the completion models, we propose PointCA, the first adversarial attack against 3D point cloud completion models. PointCA can generate adversarial point clouds that maintain high similarity with the original ones, while being completed as another object with totally different semantic information. Specifically, we minimize the representation discrepancy between the adversarial example and the target point set to jointly explore the adversarial point clouds in the geometry space and the feature space. Furthermore, to launch a stealthier attack, we innovatively employ the neighbourhood density information to tailor the perturbation constraint, leading to geometry-aware and distribution-adaptive modifications for each point. Extensive experiments against different premier point cloud completion networks show that PointCA can cause the performance degradation from 77.9% to 16.7%, with the structure chamfer distance kept below 0.01. We conclude that existing completion models are severely vulnerable to adversarial examples, and state-of-the-art defenses for point cloud classification will be partially invalid when applied to incomplete and uneven point cloud data.
APA, Harvard, Vancouver, ISO, and other styles
16

Mahmoud, Mostafa, Wu Chen, Yang Yang, Tianxia Liu, and Yaxin Li. "Leveraging Deep Learning for Automated Reconstruction of Indoor Unstructured Elements in Scan-to-BIM." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1-2024 (May 10, 2024): 479–86. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-2024-479-2024.

Full text
Abstract:
Abstract. Achieving automatic 3D reconstruction for indoor scenes is extremely useful in the field of scene understanding. Building information modeling (BIM) models are essential for lowering project costs, assisting in building planning and renovations, as well as improving building management efficiency. However, nearly all current available scan-to-BIM approaches employ manual or semi-automatic methods. These approaches concentrate solely on significant structured objects, neglecting other unstructured elements such as furniture. The limitation arises from challenges in modeling incomplete point clouds of obstructed objects and capturing indoor scene details. Therefore, this research introduces an innovative and effective reconstruction framework based on deep learning semantic segmentation and model-driven techniques to address these limitations. The proposed framework utilizes wall segment recognition, feature extraction, opening detection, and automatic modeling to reconstruct 3D structured models of point clouds with different room layouts in both Manhattan and non-Manhattan architectures. Moreover, it provides 3D BIM models of actual unstructured elements by detecting objects, completing point clouds, establishing bounding boxes, determining type and orientation, and automatically generating 3D BIM models with a parametric algorithm implemented into the Revit software. We evaluated this framework using publicly available and locally generated point cloud datasets with varying furniture combinations and layout complexity. The results demonstrate the proposed framework's efficiency in reconstructing structured indoor elements, exhibiting completeness and geometric accuracy, and achieving precision and recall values greater than 98%. Furthermore, the generated unstructured 3D BIM models keep essential real-scene characteristics such as geometry, spatial locations, numerical aspects, various shapes, and orientations compared to literature methods.
APA, Harvard, Vancouver, ISO, and other styles
17

Hu, Yubin, Sheng Ye, Wang Zhao, Matthieu Lin, Yuze He, Yu-Hui Wen, Ying He, and Yong-Jin Liu. "O^2-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 2285–93. http://dx.doi.org/10.1609/aaai.v38i3.28002.

Full text
Abstract:
Occlusion is a common issue in 3D reconstruction from RGB-D videos, often blocking the complete reconstruction of objects and presenting an ongoing problem. In this paper, we propose a novel framework, empowered by a 2D diffusion-based in-painting model, to reconstruct complete surfaces for the hidden parts of objects. Specifically, we utilize a pre-trained diffusion model to fill in the hidden areas of 2D images. Then we use these in-painted images to optimize a neural implicit surface representation for each instance for 3D reconstruction. Since creating the in-painting masks needed for this process is tricky, we adopt a human-in-the-loop strategy that involves very little human engagement to generate high-quality masks. Moreover, some parts of objects can be totally hidden because the videos are usually shot from limited perspectives. To ensure recovering these invisible areas, we develop a cascaded network architecture for predicting signed distance field, making use of different frequency bands of positional encoding and maintaining overall smoothness. Besides the commonly used rendering loss, Eikonal loss, and silhouette loss, we adopt a CLIP-based semantic consistency loss to guide the surface from unseen camera angles. Experiments on ScanNet scenes show that our proposed framework achieves state-of-the-art accuracy and completeness in object-level reconstruction from scene-level RGB-D videos. Code: https://github.com/THU-LYJ-Lab/O2-Recon.
APA, Harvard, Vancouver, ISO, and other styles
18

Camuffo, Elena, Daniele Mari, and Simone Milani. "Recent Advancements in Learning Algorithms for Point Clouds: An Updated Overview." Sensors 22, no. 4 (February 10, 2022): 1357. http://dx.doi.org/10.3390/s22041357.

Full text
Abstract:
Recent advancements in self-driving cars, robotics, and remote sensing have widened the range of applications for 3D Point Cloud (PC) data. This data format poses several new issues concerning noise levels, sparsity, and required storage space; as a result, many recent works address PC problems using Deep Learning (DL) solutions thanks to their capability to automatically extract features and achieve high performances. Such evolution has also changed the structure of processing chains and posed new problems to both academic and industrial researchers. The aim of this paper is to provide a comprehensive overview of the latest state-of-the-art DL approaches for the most crucial PC processing operations, i.e., semantic scene understanding, compression, and completion. With respect to the existing reviews, the work proposes a new taxonomical classification of the approaches, taking into account the characteristics of the acquisition set up, the peculiarities of the acquired PC data, the presence of side information (depending on the adopted dataset), the data formatting, and the characteristics of the DL architectures. This organization allows one to better comprehend some final performance comparisons on common test sets and cast a light on the future research trends.
APA, Harvard, Vancouver, ISO, and other styles
19

Pellis, E., A. Masiero, G. Tucci, M. Betti, and P. Grussenmeyer. "ASSEMBLING AN IMAGE AND POINT CLOUD DATASET FOR HERITAGE BUILDING SEMANTIC SEGMENTATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVI-M-1-2021 (August 28, 2021): 539–46. http://dx.doi.org/10.5194/isprs-archives-xlvi-m-1-2021-539-2021.

Full text
Abstract:
Abstract. Creating three-dimensional as-built models from point clouds is still a challenging task in the Cultural Heritage environment. Nowadays, performing such task typically requires the quite time-consuming manual intervention of an expert operator, in particular to deal with the complexities and peculiarities of heritage buildings. Motivated by these considerations, the development of automatic or semi-automatic tools to ease the completion of such task has recently became a very hot topic in the research community. Among the tools that can be considered to such aim, the use of deep learning methods for the semantic segmentation and classification of 2D and 3D data seems to be one of the most promising approaches. Indeed, these kinds of methods have already been successfully applied in several applications enabling scene understanding and comprehension, and, in particular, to ease the process of geometrical and informative model creation. Nevertheless, their use in the specific case of heritage buildings is still quite limited, and the already published results not completely satisfactory. The quite limited availability of dedicated benchmarks for the considered task in the heritage context can also be one of the factors for the not so satisfying results in the literature.Hence, this paper aims at partially reducing the issues related to the limited availability of benchmarks in the heritage context by presenting a new dataset for semantic segmentation of heritage buildings. The dataset is composed by both images and point clouds of the considered buildings, in order to enable the implementation, validation and comparison of both point-based and multiview-based semantic segmentation approaches. Ground truth segmentation is provided, for both the images and point clouds related to each building, according to the class definition used in the ARCHdataset, hence potentially enabling also the integration and comparison of the results obtained on such dataset.
APA, Harvard, Vancouver, ISO, and other styles
20

Abbasi, Ali, Sinan Kalkan, and Yusuf Sahillioğlu. "Deep 3D semantic scene extrapolation." Visual Computer 35, no. 2 (August 17, 2018): 271–79. http://dx.doi.org/10.1007/s00371-018-1586-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zhang, Shoulong, Shuai Li, Aimin Hao, and Hong Qin. "Point Cloud Semantic Scene Completion from RGB-D Images." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3385–93. http://dx.doi.org/10.1609/aaai.v35i4.16451.

Full text
Abstract:
In this paper, we devise a novel semantic completion network, called point cloud semantic scene completion network (PCSSC-Net), for indoor scenes solely based on point clouds. Existing point cloud completion networks still suffer from their inability of fully recovering complex structures and contents from global geometric descriptions neglecting semantic hints. To extract and infer comprehensive information from partial input, we design a patch-based contextual encoder to hierarchically learn point-level, patch-level, and scene-level geometric and contextual semantic information with a divide-and-conquer strategy. Consider that the scene semantics afford a high-level clue of constituting geometry for an indoor scene environment, we articulate a semantics-guided completion decoder where semantics could help cluster isolated points in the latent space and infer complicated scene geometry. Given the fact that real-world scans tend to be incomplete as ground truth, we choose to synthesize scene dataset with RGB-D images and annotate complete point clouds as ground truth for the supervised training purpose. Extensive experiments validate that our new method achieves the state-of-the-art performance, in contrast with the current methods applied to our dataset.
APA, Harvard, Vancouver, ISO, and other styles
22

Xia, Wei, Rongfeng Lu, Yaoqi Sun, Chenghao Xu, Kun Lv, Yanwei Jia, Zunjie Zhu, and Bolun Zheng. "3D Indoor Scene Completion via Room Layout Estimation." Journal of Physics: Conference Series 2025, no. 1 (September 1, 2021): 012102. http://dx.doi.org/10.1088/1742-6596/2025/1/012102.

Full text
Abstract:
Abstract Recent advances in 3D reconstructions have shown impressive progress in 3D indoor scene reconstruction, enabling automatic scene modeling; however, holes in the 3D scans hinder the further usage of the reconstructed models. Thus, we propose the task of layout-based hole filling for the incomplete indoor scene scans: from the mesh of a scene model, we estimate the scene layout by detecting the principal planes of a scene and leverage the layout as the prior for the accurate completion of planar regions. Experiments show that guiding scene model completion through the scene layout prior significantly outperforms the alternative approach to the task of scene model completion.
APA, Harvard, Vancouver, ISO, and other styles
23

Zhang, Junbo, Guofan Fan, Guanghan Wang, Zhengyuan Su, Kaisheng Ma, and Li Yi. "Language-Assisted 3D Feature Learning for Semantic Scene Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 3 (June 26, 2023): 3445–53. http://dx.doi.org/10.1609/aaai.v37i3.25453.

Full text
Abstract:
Learning descriptive 3D features is crucial for understanding 3D scenes with diverse objects and complex structures. However, it is usually unknown whether important geometric attributes and scene context obtain enough emphasis in an end-to-end trained 3D scene understanding network. To guide 3D feature learning toward important geometric attributes and scene context, we explore the help of textual scene descriptions. Given some free-form descriptions paired with 3D scenes, we extract the knowledge regarding the object relationships and object attributes. We then inject the knowledge to 3D feature learning through three classification-based auxiliary tasks. This language-assisted training can be combined with modern object detection and instance segmentation methods to promote 3D semantic scene understanding, especially in a label-deficient regime. Moreover, the 3D feature learned with language assistance is better aligned with the language features, which can benefit various 3D-language multimodal tasks. Experiments on several benchmarks of 3D-only and 3D-language tasks demonstrate the effectiveness of our language-assisted 3D feature learning. Code is available at https://github.com/Asterisci/Language-Assisted-3D.
APA, Harvard, Vancouver, ISO, and other styles
24

Wald, Johanna, Nassir Navab, and Federico Tombari. "Learning 3D Semantic Scene Graphs with Instance Embeddings." International Journal of Computer Vision 130, no. 3 (January 22, 2022): 630–51. http://dx.doi.org/10.1007/s11263-021-01546-9.

Full text
Abstract:
AbstractA 3D scene is more than the geometry and classes of the objects it comprises. An essential aspect beyond object-level perception is the scene context, described as a dense semantic network of interconnected nodes. Scene graphs have become a common representation to encode the semantic richness of images, where nodes in the graph are object entities connected by edges, so-called relationships. Such graphs have been shown to be useful in achieving state-of-the-art performance in image captioning, visual question answering and image generation or editing. While scene graph prediction methods so far focused on images, we propose instead a novel neural network architecture for 3D data, where the aim is to learn to regress semantic graphs from a given 3D scene. With this work, we go beyond object-level perception, by exploring relations between object entities. Our method learns instance embeddings alongside a scene segmentation and is able to predict semantics for object nodes and edges. We leverage 3DSSG, a large scale dataset based on 3RScan that features scene graphs of changing 3D scenes. Finally, we show the effectiveness of graphs as an intermediate representation on a retrieval task.
APA, Harvard, Vancouver, ISO, and other styles
25

Wang, Zifan, Zhuorui Ye, Haoran Wu, Junyu Chen, and Li Yi. "Semantic Complete Scene Forecasting from a 4D Dynamic Point Cloud Sequence." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (March 24, 2024): 5867–75. http://dx.doi.org/10.1609/aaai.v38i6.28400.

Full text
Abstract:
We study a new problem of semantic complete scene forecasting (SCSF) in this work. Given a 4D dynamic point cloud sequence, our goal is to forecast the complete scene corresponding to the future next frame along with its semantic labels. To tackle this challenging problem, we properly model the synergetic relationship between future forecasting and semantic scene completion through a novel network named SCSFNet. SCSFNet leverages a hybrid geometric representation for high-resolution complete scene forecasting. To leverage multi-frame observation as well as the understanding of scene dynamics to ease the completion task, SCSFNet introduces an attention-based skip connection scheme. To ease the need to model occlusion variations and to better focus on the occluded part, SCSFNet utilizes auxiliary visibility grids to guide the forecasting task. To evaluate the effectiveness of SCSFNet, we conduct experiments on various benchmarks including two large-scale indoor benchmarks we contributed and the outdoor SemanticKITTI benchmark. Extensive experiments show SCSFNet outperforms baseline methods on multiple metrics by a large margin, and also prove the synergy between future forecasting and semantic scene completion.The project page with code is available at scsfnet.github.io.
APA, Harvard, Vancouver, ISO, and other styles
26

Zhang, Chi, Zhong Yang, Bayang Xue, Haoze Zhuo, Luwei Liao, Xin Yang, and Zekun Zhu. "Perceiving like a Bat: Hierarchical 3D Geometric–Semantic Scene Understanding Inspired by a Biomimetic Mechanism." Biomimetics 8, no. 5 (September 19, 2023): 436. http://dx.doi.org/10.3390/biomimetics8050436.

Full text
Abstract:
Geometric–semantic scene understanding is a spatial intelligence capability that is essential for robots to perceive and navigate the world. However, understanding a natural scene remains challenging for robots because of restricted sensors and time-varying situations. In contrast, humans and animals are able to form a complex neuromorphic concept of the scene they move in. This neuromorphic concept captures geometric and semantic aspects of the scenario and reconstructs the scene at multiple levels of abstraction. This article seeks to reduce the gap between robot and animal perception by proposing an ingenious scene-understanding approach that seamlessly captures geometric and semantic aspects in an unexplored environment. We proposed two types of biologically inspired environment perception methods, i.e., a set of elaborate biomimetic sensors and a brain-inspired parsing algorithm related to scene understanding, that enable robots to perceive their surroundings like bats. Our evaluations show that the proposed scene-understanding system achieves competitive performance in image semantic segmentation and volumetric–semantic scene reconstruction. Moreover, to verify the practicability of our proposed scene-understanding method, we also conducted real-world geometric–semantic scene reconstruction in an indoor environment with our self-developed drone.
APA, Harvard, Vancouver, ISO, and other styles
27

Yeom, Sang-Sik, and Jong-Eun Ha. "3D Indoor Scene Semantic Segmentation using 2D Semantic Segmentation Projection." Journal of Institute of Control, Robotics and Systems 26, no. 11 (November 30, 2020): 949–54. http://dx.doi.org/10.5302/j.icros.2020.20.0120.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Cotofrei, Paul, Christophe Künzi, and Kilian Stoffel. "Semantic Interpretation of 3D Point Clouds of Historical Objects." Digital Presentation and Preservation of Cultural and Scientific Heritage 1 (September 30, 2011): 127–39. http://dx.doi.org/10.55630/dipp.2011.1.14.

Full text
Abstract:
This paper presents the main concepts of a project under development concerning the analysis process of a scene containing a large number of objects, represented as unstructured point clouds. To achieve what we called the ―optimal scene interpretation‖ ( the shortest scene description satisfying the MDL principle) we follow an approach for managing 3-D objects based on a semantic framework based on ontologies for adding and sharing conceptual knowledge about spatial objects.
APA, Harvard, Vancouver, ISO, and other styles
29

Park, Jisun, and Kyungeun Cho. "Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image." Mathematics 11, no. 14 (July 24, 2023): 3243. http://dx.doi.org/10.3390/math11143243.

Full text
Abstract:
In the rapidly emerging era of untact (“contact-free”) technologies, the requirement for three-dimensional (3D) virtual environments utilized in virtual reality (VR)/augmented reality (AR) and the metaverse has seen significant growth, owing to their extensive application across various domains. Current research focuses on the automatic transfer of the style of rendering images within a 3D virtual environment using artificial intelligence, which aims to minimize human intervention. However, the prevalent studies on rendering-based 3D environment-style transfers have certain inherent limitations. First, the training of a style transfer network dedicated to 3D virtual environments demands considerable style image data. These data must align with viewpoints that closely resemble those of the virtual environment. Second, there was noticeable inconsistency within the 3D structures. Predominant studies often neglect 3D scene geometry information instead of relying solely on 2D input image features. Finally, style adaptation fails to accommodate the unique characteristics inherent in each object. To address these issues, we propose a novel approach: a neural rendering-based 3D scene-style conversion technique. This methodology employs semantic nearest-neighbor feature matching, thereby facilitating the transfer of style within a 3D scene while considering the distinctive characteristics of each object, even when employing a single style image. The neural radiance field enables the network to comprehend the geometric information of a 3D scene in relation to its viewpoint. Subsequently, it transfers style features by employing the unique features of a single style image via semantic nearest-neighbor feature matching. In an empirical context, our proposed semantic 3D scene style transfer method was applied to 3D scene style transfers for both interior and exterior environments. This application utilizes the replica, 3DFront, and Tanks and Temples datasets for testing. The results illustrate that the proposed methodology surpasses existing style transfer techniques in terms of maintaining 3D viewpoint consistency, style uniformity, and semantic coherence.
APA, Harvard, Vancouver, ISO, and other styles
30

Zhang, Suiyun, Zhizhong Han, Ralph R. Martin, and Hui Zhang. "Semantic 3D indoor scene enhancement using guide words." Visual Computer 33, no. 6-8 (May 15, 2017): 925–35. http://dx.doi.org/10.1007/s00371-017-1394-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Cao, Xue, Hong-ming Cai, and Feng-lin Bu. "Semantic driven design reuse for 3D scene modeling." Journal of Shanghai Jiaotong University (Science) 17, no. 2 (April 2012): 233–36. http://dx.doi.org/10.1007/s12204-012-1258-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Sun, Jiahao, Chunmei Qing, Junpeng Tan, and Xiangmin Xu. "Superpoint Transformer for 3D Scene Instance Segmentation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 2 (June 26, 2023): 2393–401. http://dx.doi.org/10.1609/aaai.v37i2.25335.

Full text
Abstract:
Most existing methods realize 3D instance segmentation by extending those models used for 3D object detection or 3D semantic segmentation. However, these non-straightforward methods suffer from two drawbacks: 1) Imprecise bounding boxes or unsatisfactory semantic predictions limit the performance of the overall 3D instance segmentation framework. 2) Existing method requires a time-consuming intermediate step of aggregation. To address these issues, this paper proposes a novel end-to-end 3D instance segmentation method based on Superpoint Transformer, named as SPFormer. It groups potential features from point clouds into superpoints, and directly predicts instances through query vectors without relying on the results of object detection or semantic segmentation. The key step in this framework is a novel query decoder with transformers that can capture the instance information through the superpoint cross-attention mechanism and generate the superpoint masks of the instances. Through bipartite matching based on superpoint masks, SPFormer can implement the network training without the intermediate aggregation step, which accelerates the network. Extensive experiments on ScanNetv2 and S3DIS benchmarks verify that our method is concise yet efficient. Notably, SPFormer exceeds compared state-of-the-art methods by 4.3% on ScanNetv2 hidden test set in terms of mAP and keeps fast inference speed (247ms per frame) simultaneously. Code is available at https://github.com/sunjiahao1999/SPFormer.
APA, Harvard, Vancouver, ISO, and other styles
33

Kim, GeonU, Kim Youwang, and Tae-Hyun Oh. "FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 3 (March 24, 2024): 2750–58. http://dx.doi.org/10.1609/aaai.v38i3.28054.

Full text
Abstract:
We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently stylizes large-scale 3D scenes by introducing a style-decomposed 3D neural radiance field, which inherits AdaIN’s feed-forward stylization machinery, supporting arbitrary style reference images. Furthermore, FPRF supports multi-reference stylization with the semantic correspondence matching and local AdaIN, which adds diverse user control for 3D scene styles. FPRF also preserves multi-view consistency by applying semantic matching and style transfer processes directly onto queried features in 3D space. In experiments, we demonstrate that FPRF achieves favorable photorealistic quality 3D scene stylization for large-scale scenes with diverse reference images.
APA, Harvard, Vancouver, ISO, and other styles
34

Li, Jie, Yu Liu, Xia Yuan, Chunxia Zhao, Roland Siegwart, Ian Reid, and Cesar Cadena. "Depth Based Semantic Scene Completion With Position Importance Aware Loss." IEEE Robotics and Automation Letters 5, no. 1 (January 2020): 219–26. http://dx.doi.org/10.1109/lra.2019.2953639.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Han, Chaolin, Hongwei Li, Jian Xu, Bing Dong, Yalin Wang, Xiaowen Zhou, and Shan Zhao. "Unbiased 3D Semantic Scene Graph Prediction in Point Cloud Using Deep Learning." Applied Sciences 13, no. 9 (May 4, 2023): 5657. http://dx.doi.org/10.3390/app13095657.

Full text
Abstract:
As a core task of computer vision perception, 3D scene understanding has received widespread attention. However, the current research mainly focuses on the semantic understanding task at the level of entity objects and often neglects the semantic relationships between objects in the scene. This paper proposes a 3D scene graph prediction model based on deep learning methods for scanned point cloud data of indoor scenes to predict the semantic graph about the class of entity objects and their relationships. The model uses a multi-scale pyramidal feature extraction network, MP-DGCNN, to fuse features with the learned category-related unbiased meta-embedding vectors, and the relationship inference of the scene graph uses an ENA-GNN network incorporating node and edge cross-attention; in addition, considering the long-tail distribution effect, a category grouping re-weighting scheme is used in the embedded prior knowledge and loss function. For the 3D scene graph prediction task, experiments on the indoor point cloud 3DSSG dataset show that the model proposed in this paper performs well compared with the latest baseline model, and the prediction effectiveness and accuracy are substantially improved.
APA, Harvard, Vancouver, ISO, and other styles
36

Huang, Shi-Sheng, Ze-Yu Ma, Tai-Jiang Mu, Hongbo Fu, and Shi-Min Hu. "Supervoxel Convolution for Online 3D Semantic Segmentation." ACM Transactions on Graphics 40, no. 3 (August 2021): 1–15. http://dx.doi.org/10.1145/3453485.

Full text
Abstract:
Online 3D semantic segmentation, which aims to perform real-time 3D scene reconstruction along with semantic segmentation, is an important but challenging topic. A key challenge is to strike a balance between efficiency and segmentation accuracy. There are very few deep-learning-based solutions to this problem, since the commonly used deep representations based on volumetric-grids or points do not provide efficient 3D representation and organization structure for online segmentation. Observing that on-surface supervoxels, i.e., clusters of on-surface voxels, provide a compact representation of 3D surfaces and brings efficient connectivity structure via supervoxel clustering, we explore a supervoxel-based deep learning solution for this task. To this end, we contribute a novel convolution operation (SVConv) directly on supervoxels. SVConv can efficiently fuse the multi-view 2D features and 3D features projected on supervoxels during the online 3D reconstruction, and leads to an effective supervoxel-based convolutional neural network, termed as Supervoxel-CNN , enabling 2D-3D joint learning for 3D semantic prediction. With the Supervoxel-CNN , we propose a clustering-then-prediction online 3D semantic segmentation approach. The extensive evaluations on the public 3D indoor scene datasets show that our approach significantly outperforms the existing online semantic segmentation systems in terms of efficiency or accuracy.
APA, Harvard, Vancouver, ISO, and other styles
37

Stathopoulou, E. K., and F. Remondino. "MULTI-VIEW STEREO WITH SEMANTIC PRIORS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W15 (August 26, 2019): 1135–40. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w15-1135-2019.

Full text
Abstract:
<p><strong>Abstract.</strong> Patch-based stereo is nowadays a commonly used image-based technique for dense 3D reconstruction in large scale multi-view applications. The typical steps of such a pipeline can be summarized in stereo pair selection, depth map computation, depth map refinement and, finally, fusion in order to generate a complete and accurate representation of the scene in 3D. In this study, we aim to support the standard dense 3D reconstruction of scenes as implemented in the open source library OpenMVS by using semantic priors. To this end, during the depth map fusion step, along with the depth consistency check between depth maps of neighbouring views referring to the same part of the 3D scene, we impose extra semantic constraints in order to remove possible errors and selectively obtain segmented point clouds per label, boosting automation towards this direction. In order to reassure semantic coherence between neighbouring views, additional semantic criterions can be considered, aiming to eliminate mismatches of pixels belonging in different classes.</p>
APA, Harvard, Vancouver, ISO, and other styles
38

Hwang, Hyeong Jae, and Sang Min Yoon. "Single image‐based 3D scene estimation from semantic prior." Electronics Letters 51, no. 22 (October 2015): 1788–89. http://dx.doi.org/10.1049/el.2015.1458.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Harasymowicz-Boggio, Bogdan, and Barbara Siemiątkowska. "Using Ignorance in 3D Scene Understanding." Mathematical Problems in Engineering 2014 (2014): 1–11. http://dx.doi.org/10.1155/2014/902039.

Full text
Abstract:
Awareness of its own limitations is a fundamental feature of the human sight, which has been almost completely omitted in computer vision systems. In this paper we present a method of explicitly using information about perceptual limitations of a 3D vision system, such as occluded areas, limited field of view, loss of precision along with distance increase, and imperfect segmentation for a better understanding of the observed scene. The proposed mechanism integrates metric and semantic inference using Dempster-Shafer theory, which makes it possible to handle observations that have different degrees and kinds of uncertainty. The system has been implemented and tested in a real indoor environment, showing the benefits of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
40

Rai, A., N. Srivastava, K. Khoshelham, and K. Jain. "SEMANTIC ENRICHMENT OF 3D POINT CLOUDS USING 2D IMAGE SEGMENTATION." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1/W2-2023 (December 14, 2023): 1659–66. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-w2-2023-1659-2023.

Full text
Abstract:
Abstract. 3D point cloud segmentation is computationally intensive due to the lack of inherent structural information and the unstructured nature of the point cloud data, which hinders the identification and connection of neighboring points. Understanding the structure of the point cloud data plays a crucial role in obtaining a meaningful and accurate representation of the underlying 3D environment. In this paper, we propose an algorithm that builds on existing state-of-the-art techniques of 2D image segmentation and point cloud registration to enrich point clouds with semantic information. DeepLab2 with ResNet50 as backbone architecture trained on the COCO dataset is used for indoor scene semantic segmentation into several classes like wall, floor, ceiling, doors, and windows. Semantic information from 2D images is propagated along with other input data, i.e., RGB images, depth images, and sensor information to generate 3D point clouds with semantic information. Iterative Closest Point (ICP) algorithm is used for the pair-wise registration of consecutive point clouds and finally, optimization is applied using the pose graph optimization on the whole set of point clouds to generate the combined point cloud of the whole scene. 3D point cloud of the whole scene contains pseudo-color information which denotes the semantic class to which each point belongs. The proposed methodology use an off-the-shelf 2D semantic segmentation deep learning model to semantically segment 3D point clouds collected using handheld mobile LiDAR sensor. We demonstrate a comparison of the accuracy achieved compared to a manually segmented point cloud on an in-house dataset as well as a 2D3DS benchmark dataset.
APA, Harvard, Vancouver, ISO, and other styles
41

Zhang, Taolin, Sunan He, Tao Dai, Zhi Wang, Bin Chen, and Shu-Tao Xia. "Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 7296–304. http://dx.doi.org/10.1609/aaai.v38i7.28559.

Full text
Abstract:
In recent years, vision language pre-training frameworks have made significant progress in natural language processing and computer vision, achieving remarkable performance improvement on various downstream tasks. However, when extended to point cloud data, existing works mainly focus on building task-specific models, and fail to extract universal 3D vision-language embedding that generalize well. We carefully investigate three common tasks in semantic 3D scene understanding, and derive key insights into the development of a pre-training model. Motivated by these observations, we propose a vision-language pre-training framework 3DVLP (3D vision-language pre-training with object contrastive learning), which transfers flexibly on 3D vision-language downstream tasks. 3DVLP takes visual grounding as the proxy task and introduces Object-level IoU-guided Detection (OID) loss to obtain high-quality proposals in the scene. Moreover, we design Object-level Cross-Contrastive alignment (OCC) task and Object-level Self-Contrastive learning (OSC) task to align the objects with descriptions and distinguish different objects in the scene, respectively. Extensive experiments verify the excellent performance of 3DVLP on three 3D vision-language tasks, reflecting its superiority in semantic 3D scene understanding. Code is available at https://github.com/iridescentttt/3DVLP.
APA, Harvard, Vancouver, ISO, and other styles
42

Zou, Nan, Zhiyu Xiang, Yiman Chen, Shuya Chen, and Chengyu Qiao. "Simultaneous Semantic Segmentation and Depth Completion with Constraint of Boundary." Sensors 20, no. 3 (January 23, 2020): 635. http://dx.doi.org/10.3390/s20030635.

Full text
Abstract:
As the core task of scene understanding, semantic segmentation and depth completion play a vital role in lots of applications such as robot navigation, AR/VR and autonomous driving. They are responsible for parsing scenes from the angle of semantics and geometry, respectively. While great progress has been made in both tasks through deep learning technologies, few works have been done on building a joint model by deeply exploring the inner relationship of the above tasks. In this paper, semantic segmentation and depth completion are jointly considered under a multi-task learning framework. By sharing a common encoder part and introducing boundary features as inner constraints in the decoder part, the two tasks can properly share the required information from each other. An extra boundary detection sub-task is responsible for providing the boundary features and constructing cross-task joint loss functions for network training. The entire network is implemented end-to-end and evaluated with both RGB and sparse depth input. Experiments conducted on synthesized and real scene datasets show that our proposed multi-task CNN model can effectively improve the performance of every single task.
APA, Harvard, Vancouver, ISO, and other styles
43

Orlova, Svetlana, and Alexander Lopota. "Scene recognition for confined spaces in mobile robotics: current state and tendencies." Robotics and Technical Cybernetics 10, no. 1 (March 2022): 14–24. http://dx.doi.org/10.31776/rtcj.10102.

Full text
Abstract:
The article discusses the problem of scene recognition for mobile robotics. Subtasks that have to be solved to implement a high-level understanding of the environment are considered. The basis here is an understanding of the geometry and semantics of the scene, which can be decomposed into subtasks of robot localization, mapping and semantic analysis. Simultaneous localization and mapping (SLAM) techniques have already been successfully applied and, although they have some as yet unresolved problems for dynamic environments, do not present a problem for this issue. The focus of the work is on the task of semantic analysis of the scene, which assumes three-dimensional segmentation. The field of 3D segmentation, like the field of image segmentation, has been decomposed into semantic and object segmentation, contrary to the needs of many potential applications. However, at present, panoptic segmentation is beginning to develop, combining the two previous ones and most fully describing the scene. The paper reviews the methods of 3D panoptic segmentation, identifies promising approaches. The actual problems of the scene recognition problem are also discussed. There is a clear trend towards the development of complex incremental methods of metric-semantic SLAM, which combine segmentation with SLAM methods, and the use of scene graphs, which allow describing the geometry, semantics of scene elements and the relationship between them. Scene graphs are especially promising for the field of mobile robotics, since they provide a transition from low-level representations of objects and spaces (for example, segmented point clouds) to describing a scene at a high level of abstraction, close to a human one (a list of objects in a scene, their properties and location relative to each other).
APA, Harvard, Vancouver, ISO, and other styles
44

Ito, Seiya, Naoshi Kaneko, and Kazuhiko Sumi. "Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image." Sensors 20, no. 20 (October 12, 2020): 5765. http://dx.doi.org/10.3390/s20205765.

Full text
Abstract:
This paper proposes a novel 3D representation, namely, a latent 3D volume, for joint depth estimation and semantic segmentation. Most previous studies encoded an input scene (typically given as a 2D image) into a set of feature vectors arranged over a 2D plane. However, considering the real world is three-dimensional, this 2D arrangement reduces one dimension and may limit the capacity of feature representation. In contrast, we examine the idea of arranging the feature vectors in 3D space rather than in a 2D plane. We refer to this 3D volumetric arrangement as a latent 3D volume. We will show that the latent 3D volume is beneficial to the tasks of depth estimation and semantic segmentation because these tasks require an understanding of the 3D structure of the scene. Our network first constructs an initial 3D volume using image features and then generates latent 3D volume by passing the initial 3D volume through several 3D convolutional layers. We apply depth regression and semantic segmentation by projecting the latent 3D volume onto a 2D plane. The evaluation results show that our method outperforms previous approaches on the NYU Depth v2 dataset.
APA, Harvard, Vancouver, ISO, and other styles
45

Zhang, Liang, Le Wang, Xiangdong Zhang, Peiyi Shen, Mohammed Bennamoun, Guangming Zhu, Syed Afaq Ali Shah, and Juan Song. "Semantic scene completion with dense CRF from a single depth image." Neurocomputing 318 (November 2018): 182–95. http://dx.doi.org/10.1016/j.neucom.2018.08.052.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Wang, Xuzhi, Wei Feng, and Liang Wan. "Multi-modal fusion architecture search for camera-based semantic scene completion." Expert Systems with Applications 243 (June 2024): 122885. http://dx.doi.org/10.1016/j.eswa.2023.122885.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Mansour, Mostafa, Pavel Davidson, Oleg Stepanov, and Robert Piché. "Towards Semantic SLAM: 3D Position and Velocity Estimation by Fusing Image Semantic Information with Camera Motion Parameters for Traffic Scene Analysis." Remote Sensing 13, no. 3 (January 23, 2021): 388. http://dx.doi.org/10.3390/rs13030388.

Full text
Abstract:
In this paper, an EKF (Extended Kalman Filter)-based algorithm is proposed to estimate 3D position and velocity components of different cars in a scene by fusing the semantic information and car model, extracted from successive frames with camera motion parameters. First, a 2D virtual image of the scene is made using a prior knowledge of the 3D Computer Aided Design (CAD) models of the detected cars and their predicted positions. Then, a discrepancy, i.e., distance, between the actual image and the virtual image is calculated. The 3D position and the velocity components are recursively estimated by minimizing the discrepancy using EKF. The experiments on the KiTTi dataset show a good performance of the proposed algorithm with a position estimation error up to 3–5% at 30 m and velocity estimation error up to 1 m/s.
APA, Harvard, Vancouver, ISO, and other styles
48

Han, Fulin, Liang Huo, Tao Shen, Xiaoyong Zhang, Tianjia Zhang, and Na Ma. "Research on the Symbolic 3D Route Scene Expression Method Based on the Importance of Objects." Applied Sciences 12, no. 20 (October 19, 2022): 10532. http://dx.doi.org/10.3390/app122010532.

Full text
Abstract:
In the study of 3D route scene construction, the expression of key targets needs to be highlighted. This is because compared with the 3D model, the abstract 3D symbols can reflect the number and spatial distribution characteristics of entities more intuitively. Therefore, this research proposes a symbolic 3D route scene representation method based on the importance of the object. The method takes the object importance evaluation model as the theoretical basis, calculates the spatial importance of the same type of objects according to the spatial characteristics of the geographical objects in the 3D route scene, and constructs the object importance evaluation model by combining semantic factors. The 3D symbols are then designed in a hierarchical manner on the basis of the results of the object importance evaluation and the CityGML standard. Finally, the LOD0-LOD4 symbolic 3D railway scene was constructed on the basis of a railroad data to realise the multi-scale expression of symbolic 3D route scene. Compared with the conventional loading method, the real-time frame rate of the scene was improved by 20 fps and was more stable. The scene loading speed was also improved by 5–10 s. The results show that the method can effectively improve the efficiency of the 3D route scene construction and the prominent expression effect of the key objects in the 3D route scene.
APA, Harvard, Vancouver, ISO, and other styles
49

Dmitriev, E. A., and V. V. Myasnikov. "Possibility estimation of 3D scene reconstruction from multiple images." Information Technology and Nanotechnology, no. 2391 (2019): 293–96. http://dx.doi.org/10.18287/1613-0073-2019-2391-293-296.

Full text
Abstract:
This paper presents a pixel-by-pixel possibility estimation of 3D scene reconstruction from multiple images. This method estimates conjugate pairs number with convolutional neural networks for further 3D reconstruction using classic approach. We considered neural networks that showed good results in semantic segmentation problem. The efficiency criterion of an algorithm is the resulting estimation accuracy. We conducted all experiments on images from Unity 3d program. The results of experiments showed the effectiveness of our approach in 3D scene reconstruction problem.
APA, Harvard, Vancouver, ISO, and other styles
50

Jiang, J., Z. Kang, and J. Li. "CONSTRUCTION OF A DUAL-TASK MODEL FOR INDOOR SCENE RECOGNITION AND SEMANTIC SEGMENTATION BASED ON POINT CLOUDS." ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences X-1/W1-2023 (December 5, 2023): 469–78. http://dx.doi.org/10.5194/isprs-annals-x-1-w1-2023-469-2023.

Full text
Abstract:
Abstract. Indoor scene recognition remains a challenging problem in the fields of artificial intelligence and computer vision due to the complexity, similarity, and spatial variability of indoor scenes. The existing research is mainly based on 2D data, which lacks 3D information about the scene and cannot accurately identify scenes with a high frequency of changes in lighting, shading, layout, etc. Moreover, the existing research usually focuses on the global features of the scene, which cannot represent indoor scenes with cluttered objects and complex spatial layouts. To solve the above problems, this paper proposes a dual-task model for indoor scene recognition and semantic segmentation based on point cloud data. The model expands the data loading method by giving the dataset loader the ability to return multi-dimensional labels and then realizes the dual-task model of scene recognition and semantic segmentation by fine-tuning PointNet++, setting task state control parameters, and adding a common feature layer. Finally, in order to solve the problem that the similarity of indoor scenes leads to the wrong scene recognition results, the rules of scenes and elements are constructed to correct the scene recognition results. The experimental results showed that with the assistance of scene-element rules, the overall accuracy of scene recognition with the proposed method in this paper is 82.4%, and the overall accuracy of semantic segmentation is 98.9%, which is better than the comparison model in this paper and provides a new method for cognition of indoor scenes based on 3D point clouds.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography