Literatura científica selecionada sobre o tema "3D semantic scene completion"

Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos

Selecione um tipo de fonte:

Consulte a lista de atuais artigos, livros, teses, anais de congressos e outras fontes científicas relevantes para o tema "3D semantic scene completion".

Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.

Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.

Artigos de revistas sobre o assunto "3D semantic scene completion"

1

Luo, Shoutong, Zhengxing Sun, Yunhan Sun e Yi Wang. "Resolution‐switchable 3D Semantic Scene Completion". Computer Graphics Forum 41, n.º 7 (outubro de 2022): 121–30. http://dx.doi.org/10.1111/cgf.14662.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Tang, Jiaxiang, Xiaokang Chen, Jingbo Wang e Gang Zeng. "Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 2 (28 de junho de 2022): 2352–60. http://dx.doi.org/10.1609/aaai.v36i2.20134.

Texto completo da fonte
Resumo:
We revisit Semantic Scene Completion (SSC), a useful task to predict the semantic and occupancy representation of 3D scenes, in this paper. A number of methods for this task are always based on voxelized scene representations. Although voxel representations keep local structures of the scene, these methods suffer from heavy computation redundancy due to the existence of visible empty voxels when the network goes deeper. To address this dilemma, we propose our novel point-voxel aggregation network for this task. We first transfer the voxelized scenes to point clouds by removing these visible empty voxels and adopt a deep point stream to capture semantic information from the scene efficiently. Meanwhile, a light-weight voxel stream containing only two 3D convolution layers preserves local structures of the voxelized scenes. Furthermore, we design an anisotropic voxel aggregation operator to fuse the structure details from the voxel stream into the point stream, and a semantic-aware propagation module to enhance the up-sampling process in the point stream by semantic labels. We demonstrate that our model surpasses state-of-the-arts on two benchmarks by a large margin, with only the depth images as input.
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Behley, Jens, Martin Garbade, Andres Milioto, Jan Quenzel, Sven Behnke, Jürgen Gall e Cyrill Stachniss. "Towards 3D LiDAR-based semantic scene understanding of 3D point cloud sequences: The SemanticKITTI Dataset". International Journal of Robotics Research 40, n.º 8-9 (20 de abril de 2021): 959–67. http://dx.doi.org/10.1177/02783649211006735.

Texto completo da fonte
Resumo:
A holistic semantic scene understanding exploiting all available sensor modalities is a core capability to master self-driving in complex everyday traffic. To this end, we present the SemanticKITTI dataset that provides point-wise semantic annotations of Velodyne HDL-64E point clouds of the KITTI Odometry Benchmark. Together with the data, we also published three benchmark tasks for semantic scene understanding covering different aspects of semantic scene understanding: (1) semantic segmentation for point-wise classification using single or multiple point clouds as input; (2) semantic scene completion for predictive reasoning on the semantics and occluded regions; and (3) panoptic segmentation combining point-wise classification and assigning individual instance identities to separate objects of the same class. In this article, we provide details on our dataset showing an unprecedented number of fully annotated point cloud sequences, more information on our labeling process to efficiently annotate such a vast amount of point clouds, and lessons learned in this process. The dataset and resources are available at http://www.semantic-kitti.org .
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Xu, Jinfeng, Xianzhi Li, Yuan Tang, Qiao Yu, Yixue Hao, Long Hu e Min Chen. "CasFusionNet: A Cascaded Network for Point Cloud Semantic Scene Completion by Dense Feature Fusion". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 3 (26 de junho de 2023): 3018–26. http://dx.doi.org/10.1609/aaai.v37i3.25405.

Texto completo da fonte
Resumo:
Semantic scene completion (SSC) aims to complete a partial 3D scene and predict its semantics simultaneously. Most existing works adopt the voxel representations, thus suffering from the growth of memory and computation cost as the voxel resolution increases. Though a few works attempt to solve SSC from the perspective of 3D point clouds, they have not fully exploited the correlation and complementarity between the two tasks of scene completion and semantic segmentation. In our work, we present CasFusionNet, a novel cascaded network for point cloud semantic scene completion by dense feature fusion. Specifically, we design (i) a global completion module (GCM) to produce an upsampled and completed but coarse point set, (ii) a semantic segmentation module (SSM) to predict the per-point semantic labels of the completed points generated by GCM, and (iii) a local refinement module (LRM) to further refine the coarse completed points and the associated labels from a local perspective. We organize the above three modules via dense feature fusion in each level, and cascade a total of four levels, where we also employ feature fusion between each level for sufficient information usage. Both quantitative and qualitative results on our compiled two point-based datasets validate the effectiveness and superiority of our CasFusionNet compared to state-of-the-art methods in terms of both scene completion and semantic segmentation. The codes and datasets are available at: https://github.com/JinfengX/CasFusionNet.
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Li, Siqi, Changqing Zou, Yipeng Li, Xibin Zhao e Yue Gao. "Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion". Proceedings of the AAAI Conference on Artificial Intelligence 34, n.º 07 (3 de abril de 2020): 11402–9. http://dx.doi.org/10.1609/aaai.v34i07.6803.

Texto completo da fonte
Resumo:
This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Wang, Yu, e Chao Tong. "H2GFormer: Horizontal-to-Global Voxel Transformer for 3D Semantic Scene Completion". Proceedings of the AAAI Conference on Artificial Intelligence 38, n.º 6 (24 de março de 2024): 5722–30. http://dx.doi.org/10.1609/aaai.v38i6.28384.

Texto completo da fonte
Resumo:
3D Semantic Scene Completion (SSC) has emerged as a novel task in vision-based holistic 3D scene understanding. Its objective is to densely predict the occupancy and category of each voxel in a 3D scene based on input from either LiDAR or images. Currently, many transformer-based semantic scene completion frameworks employ simple yet popular Cross-Attention and Self-Attention mechanisms to integrate and infer dense geometric and semantic information of voxels. However, they overlook the distinctions among voxels in the scene, especially in outdoor scenarios where the horizontal direction contains more variations. And voxels located at object boundaries and within the interior of objects exhibit varying levels of positional significance. To address this issue, we propose a transformer-based SSC framework called H2GFormer that incorporates a horizontal-to-global approach. This framework takes into full consideration the variations of voxels in the horizontal direction and the characteristics of voxels on object boundaries. We introduce a horizontal window-to-global attention (W2G) module that effectively fuses semantic information by first diffusing it horizontally from reliably visible voxels and then propagating the semantic understanding to global voxels, ensuring a more reliable fusion of semantic-aware features. Moreover, an Internal-External Position Awareness Loss (IoE-PALoss) is utilized during network training to emphasize the critical positions within the transition regions between objects. The experiments conducted on the SemanticKITTI dataset demonstrate that H2GFormer exhibits superior performance in both geometric and semantic completion tasks. Our code is available on https://github.com/Ryanwy1/H2GFormer.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Wang, Xuzhi, Di Lin e Liang Wan. "FFNet: Frequency Fusion Network for Semantic Scene Completion". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 3 (28 de junho de 2022): 2550–57. http://dx.doi.org/10.1609/aaai.v36i3.20156.

Texto completo da fonte
Resumo:
Semantic scene completion (SSC) requires the estimation of the 3D geometric occupancies of objects in the scene, along with the object categories. Currently, many methods employ RGB-D images to capture the geometric and semantic information of objects. These methods use simple but popular spatial- and channel-wise operations, which fuse the information of RGB and depth data. Yet, they ignore the large discrepancy of RGB-D data and the uncertainty measurements of depth data. To solve this problem, we propose the Frequency Fusion Network (FFNet), a novel method for boosting semantic scene completion by better utilizing RGB-D data. FFNet explicitly correlates the RGB-D data in the frequency domain, different from the features directly extracted by the convolution operation. Then, the network uses the correlated information to guide the feature learning from the RG- B and depth images, respectively. Moreover, FFNet accounts for the properties of different frequency components of RGB- D features. It has a learnable elliptical mask to decompose the features learned from the RGB and depth images, attending to various frequencies to facilitate the correlation process of RGB-D data. We evaluate FFNet intensively on the public SSC benchmarks, where FFNet surpasses the state-of- the-art methods. The code package of FFNet is available at https://github.com/alanWXZ/FFNet.
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Shan, Y., Y. Xia, Y. Chen e D. Cremers. "SCP: SCENE COMPLETION PRE-TRAINING FOR 3D OBJECT DETECTION". International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-1/W2-2023 (13 de dezembro de 2023): 41–46. http://dx.doi.org/10.5194/isprs-archives-xlviii-1-w2-2023-41-2023.

Texto completo da fonte
Resumo:
Abstract. 3D object detection using LiDAR point clouds is a fundamental task in the fields of computer vision, robotics, and autonomous driving. However, existing 3D detectors heavily rely on annotated datasets, which are both time-consuming and prone to errors during the process of labeling 3D bounding boxes. In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance of 3D object detectors with less labeled data. SCP offers three key advantages: (1) Improved initialization of the point cloud model. By completing the scene point clouds, SCP effectively captures the spatial and semantic relationships among objects within urban environments. (2) Elimination of the need for additional datasets. SCP serves as a valuable auxiliary network that does not impose any additional efforts or data requirements on the 3D detectors. (3) Reduction of the amount of labeled data for detection. With the help of SCP, the existing state-of-the-art 3D detectors can achieve comparable performance while only relying on 20% labeled data.
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Ding, Junzhe, Jin Zhang, Luqin Ye e Cheng Wu. "Kalman-Based Scene Flow Estimation for Point Cloud Densification and 3D Object Detection in Dynamic Scenes". Sensors 24, n.º 3 (31 de janeiro de 2024): 916. http://dx.doi.org/10.3390/s24030916.

Texto completo da fonte
Resumo:
Point cloud densification is essential for understanding the 3D environment. It provides crucial structural and semantic information for downstream tasks such as 3D object detection and tracking. However, existing registration-based methods struggle with dynamic targets due to the incompleteness and deformation of point clouds. To address this challenge, we propose a Kalman-based scene flow estimation method for point cloud densification and 3D object detection in dynamic scenes. Our method effectively tackles the issue of localization errors in scene flow estimation and enhances the accuracy and precision of shape completion. Specifically, we introduce a Kalman filter to correct the dynamic target’s position while estimating long sequence scene flow. This approach helps eliminate the cumulative localization error during the scene flow estimation process. Extended experiments on the KITTI 3D tracking dataset demonstrate that our method significantly improves the performance of LiDAR-only detectors, achieving superior results compared to the baselines.
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Park, Sang-Min, e Jong-Eun Ha. "3D Semantic Scene Completion With Multi-scale Feature Maps and Masked Autoencoder". Journal of Institute of Control, Robotics and Systems 29, n.º 12 (31 de dezembro de 2023): 966–72. http://dx.doi.org/10.5302/j.icros.2023.23.0143.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.

Teses / dissertações sobre o assunto "3D semantic scene completion"

1

Roldão, Jimenez Luis Guillermo. "3D Scene Reconstruction and Completion for Autonomous Driving". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS415.

Texto completo da fonte
Resumo:
Dans cette thèse, nous nous intéressons à des problèmes liés à la reconstruction et la complétion des scènes 3D à partir de nuages de points de densité hétérogène. Nous étudions l'utilisation de grilles d'occupation tridimensionnelles pour la reconstruction d'une scène 3D à partir de plusieurs observations. Nous proposons d'exploiter les informations de trajet des rayons pour résoudre des ambiguïtés dans les cellules partiellement occupées. Notre approche permet de réduire les imprécisions dues à la discrétisation et d'effectuer des mises à jour d'occupation des cellules dans des scénarios dynamiques. Puis, dans le cas où le nuage de points correspond à une seule observation de la scène, nous introduisons un algorithme de reconstruction de surface implicite 3D capable de traiter des données de densité hétérogène en utilisant une stratégie de voisinages adaptatifs. Notre méthode permet de compléter de petites zones manquantes de la scène et génère une représentation continue de la scène. Enfin, nous nous intéressons aux approches d'apprentissage profond adaptées à la complétion sémantique d'une scène 3D. Après avoir présenté une étude approfondie des méthodes existantes, nous introduisons une nouvelle méthode de complétion sémantique multi-échelle appropriée aux scenarios en extérieur. Pour ce faire, nous proposons une architecture constituée d'un réseau neuronal convolutif hybride basé sur une branche principale 2D et comportant des têtes de segmentation 3D pour prédire la scène sémantique complète à différentes échelles. Notre approche est plus légère et plus rapide que les approches existantes, tout en ayant une efficacité similaire
In this thesis, we address the challenges of 3D scene reconstruction and completion from sparse and heterogeneous density point clouds. Therefore proposing different techniques to create a 3D model of the surroundings.In the first part, we study the use of 3-dimensional occupancy grids for multi-frame reconstruction, useful for localization and HD-Maps applications. This is done by exploiting ray-path information to resolve ambiguities in partially occupied cells. Our sensor model reduces discretization inaccuracies and enables occupancy updates in dynamic scenarios.We also focus on single-frame environment perception by the introduction of a 3D implicit surface reconstruction algorithm capable to deal with heterogeneous density data by employing an adaptive neighborhood strategy. Our method completes small regions of missing data and outputs a continuous representation useful for physical modeling or terrain traversability assessment.We dive into deep learning applications for the novel task of semantic scene completion, which completes and semantically annotates entire 3D input scans. Given the little consensus found in the literature, we present an in-depth survey of existing methods and introduce our lightweight multiscale semantic completion network for outdoor scenarios. Our method employs a new hybrid pipeline based on a 2D CNN backbone branch to reduce computation overhead and 3D segmentation heads to predict the complete semantic scene at different scales, being significantly lighter and faster than existing approaches
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Garbade, Martin [Verfasser]. "Semantic Segmentation and Completion of 2D and 3D Scenes / Martin Garbade". Bonn : Universitäts- und Landesbibliothek Bonn, 2019. http://d-nb.info/1201728010/34.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Jaritz, Maximilian. "2D-3D scene understanding for autonomous driving". Thesis, Université Paris sciences et lettres, 2020. https://pastel.archives-ouvertes.fr/tel-02921424.

Texto completo da fonte
Resumo:
Dans cette thèse, nous abordons les défis de la rareté des annotations et la fusion de données hétérogènes tels que les nuages de points 3D et images 2D. D’abord, nous adoptons une stratégie de conduite de bout en bout où un réseau de neurones est entraîné pour directement traduire l'entrée capteur (image caméra) en contrôles-commandes, ce qui rend cette approche indépendante des annotations dans le domaine visuel. Nous utilisons l’apprentissage par renforcement profond où l'algorithme apprend de la récompense, obtenue par interaction avec un simulateur réaliste. Nous proposons de nouvelles stratégies d'entraînement et fonctions de récompense pour une meilleure conduite et une convergence plus rapide. Cependant, le temps d’apprentissage reste élevé. C'est pourquoi nous nous concentrons sur la perception dans le reste de cette thèse pour étudier la fusion de nuage de points et d'images. Nous proposons deux méthodes différentes pour la fusion 2D-3D. Premièrement, nous projetons des nuages de points LiDAR 3D dans l’espace image 2D, résultant en des cartes de profondeur éparses. Nous proposons une nouvelle architecture encodeur-décodeur qui fusionne les informations de l’image et la profondeur pour la tâche de complétion de carte de profondeur, améliorant ainsi la résolution du nuage de points projeté dans l'espace image. Deuxièmement, nous fusionnons directement dans l'espace 3D pour éviter la perte d'informations dû à la projection. Pour cela, nous calculons les caractéristiques d’image issues de plusieurs vues avec un CNN 2D, puis nous les projetons dans un nuage de points 3D global pour les fusionner avec l’information 3D. Par la suite, ce nuage de point enrichi sert d'entrée à un réseau "point-based" dont la tâche est l'inférence de la sémantique 3D par point. Sur la base de ce travail, nous introduisons la nouvelle tâche d'adaptation de domaine non supervisée inter-modalités où on a accès à des données multi-capteurs dans une base de données source annotée et une base cible non annotée. Nous proposons une méthode d’apprentissage inter-modalités 2D-3D via une imitation mutuelle entre les réseaux d'images et de nuages de points pour résoudre l’écart de domaine source-cible. Nous montrons en outre que notre méthode est complémentaire à la technique unimodale existante dite de pseudo-labeling
In this thesis, we address the challenges of label scarcity and fusion of heterogeneous 3D point clouds and 2D images. We adopt the strategy of end-to-end race driving where a neural network is trained to directly map sensor input (camera image) to control output, which makes this strategy independent from annotations in the visual domain. We employ deep reinforcement learning where the algorithm learns from reward by interaction with a realistic simulator. We propose new training strategies and reward functions for better driving and faster convergence. However, training time is still very long which is why we focus on perception to study point cloud and image fusion in the remainder of this thesis. We propose two different methods for 2D-3D fusion. First, we project 3D LiDAR point clouds into 2D image space, resulting in sparse depth maps. We propose a novel encoder-decoder architecture to fuse dense RGB and sparse depth for the task of depth completion that enhances point cloud resolution to image level. Second, we fuse directly in 3D space to prevent information loss through projection. Therefore, we compute image features with a 2D CNN of multiple views and then lift them all to a global 3D point cloud for fusion, followed by a point-based network to predict 3D semantic labels. Building on this work, we introduce the more difficult novel task of cross-modal unsupervised domain adaptation, where one is provided with multi-modal data in a labeled source and an unlabeled target dataset. We propose to perform 2D-3D cross-modal learning via mutual mimicking between image and point cloud networks to address the source-target domain shift. We further showcase that our method is complementary to the existing uni-modal technique of pseudo-labeling
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Dewan, Ayush [Verfasser], e Wolfram [Akademischer Betreuer] Burgard. "Leveraging motion and semantic cues for 3D scene understanding". Freiburg : Universität, 2020. http://d-nb.info/1215499493/34.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Lind, Johan. "Make it Meaningful : Semantic Segmentation of Three-Dimensional Urban Scene Models". Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-143599.

Texto completo da fonte
Resumo:
Semantic segmentation of a scene aims to give meaning to the scene by dividing it into meaningful — semantic — parts. Understanding the scene is of great interest for all kinds of autonomous systems, but manual annotation is simply too time consuming, which is why there is a need for an alternative approach. This thesis investigates the possibility of automatically segmenting 3D-models of urban scenes, such as buildings, into a predetermined set of labels. The approach was to first acquire ground truth data by manually annotating five 3D-models of different urban scenes. The next step was to extract features from the 3D-models and evaluate which ones constitutes a suitable feature space. Finally, three supervised learners were implemented and evaluated: k-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Random Classification Forest (RCF). The classifications were done point-wise, classifying each 3D-point in the dense point cloud belonging to the model being classified. The result showed that the best suitable feature space is not necessarily the one containing all features. The KNN classifier got the highest average accuracy overall models — classifying 42.5% of the 3D points correct. The RCF classifier managed to classify 66.7% points correct in one of the models, but had worse performance for the rest of the models and thus resulting in a lower average accuracy compared to KNN. In general, KNN, SVM, and RCF seemed to have different benefits and drawbacks. KNN is simple and intuitive but by far the slowest classifier when dealing with a large set of training data. SVM and RCF are both fast but difficult to tune as there are more parameters to adjust. Whether the reason for obtaining the relatively low highest accuracy was due to the lack of ground truth training data, unbalanced validation models, or the capacity of the learners, was never investigated due to a limited time span. However, this ought to be investigated in future studies.
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Piewak, Florian [Verfasser], e J. M. [Akademischer Betreuer] Zöllner. "LiDAR-based Semantic Labeling : Automotive 3D Scene Understanding / Florian Pierre Joseph Piewak ; Betreuer: J. M. Zöllner". Karlsruhe : KIT-Bibliothek, 2020. http://d-nb.info/1212512405/34.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Piewak, Florian Pierre Joseph [Verfasser], e J. M. [Akademischer Betreuer] Zöllner. "LiDAR-based Semantic Labeling : Automotive 3D Scene Understanding / Florian Pierre Joseph Piewak ; Betreuer: J. M. Zöllner". Karlsruhe : KIT-Bibliothek, 2020. http://d-nb.info/1212512405/34.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Minto, Ludovico. "Deep learning for scene understanding with color and depth data". Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3422424.

Texto completo da fonte
Resumo:
Significant advancements have been made in the recent years concerning both data acquisition and processing hardware, as well as optimization and machine learning techniques. On one hand, the introduction of depth sensors in the consumer market has made possible the acquisition of 3D data at a very low cost, allowing to overcome many of the limitations and ambiguities that typically affect computer vision applications based on color information. At the same time, computationally faster GPUs have allowed researchers to perform time-consuming experimentations even on big data. On the other hand, the development of effective machine learning algorithms, including deep learning techniques, has given a highly performing tool to exploit the enormous amount of data nowadays at hand. Under the light of such encouraging premises, three classical computer vision problems have been selected and novel approaches for their solution have been proposed in this work that both leverage the output of a deep Convolutional Neural Network (ConvNet) as well jointly exploit color and depth data to achieve competing results. In particular, a novel semantic segmentation scheme for color and depth data is presented that uses the features extracted from a ConvNet together with geometric cues. A method for 3D shape classification is also proposed that uses a deep ConvNet fed with specific 3D data representations. Finally, a ConvNet for ToF and stereo confidence estimation has been employed underneath a ToF-stereo fusion algorithm thus avoiding to rely on complex yet inaccurate noise models for the confidence estimation task.
Negli ultimi anni sono stati raggiunti notevoli progressi sia per quanto concerne l'acquisizione di dati sia per quanto riguarda la strumentazione e gli algoritmi necessari per processarli. Da un lato, l'introduzione di sensori di profondità nel mercato del grande consumo ha reso possibile l'acquisizione di dati tridimensionali ad un costo irrisorio, permettendo così di superare le limitazioni cui sono tipicamente soggette svariate applicazioni basate solamente sull'elaborazione del colore. Al tempo stesso, processori grafici sempre più performanti hanno consentito l'estensione della ricerca ad algoritmi computazionalmente onerosi e la loro applicazione a grandi moli di dati. Dall'altro lato, lo sviluppo di algoritmi sempre più efficaci per l'apprendimento automatico, ivi incluse tecniche di apprendimento profondo, ha permesso di sfruttare l'enorme quantità di dati oggi a disposizione. Alla luce di queste premesse, vengono presentati in questa tesi tre tipici problemi nell'ambito della visione computazionale proponendo altrettanti approcci per una loro soluzione in grado di sfruttare sia l'utilizzo di reti neurali convoluzionali sia l'informazione congiunta convogliata da dati di colore e profondità. In particolare, viene presentato un approccio per la segmentazione semantica di immagini colore/profondità che utilizza sia l'informazione estratta con l'aiuto di una rete neurale convoluzionale sia l'informazione geometrica ricavata attraverso algoritmi più tradizionali. Viene descritto un metodo per la classificazione di forme tridimensionali basato anch'esso sull'utilizzo di una rete neurale convoluzionale operante su particolari rappresentazioni dei dati 3D a disposizione. Infine, viene proposto l'utilizzo dei una rete convoluzionale per stimare la confidenza associata a dati di profondità rispettivamente raccolti con un sensore ToF ed un sistema stereo al fine di guidare con successo la loro fusione senza impiegare, per lo stesso scopo, complicati modelli di rumore.
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Lai, Po Kong. "Immersive Dynamic Scenes for Virtual Reality from a Single RGB-D Camera". Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39663.

Texto completo da fonte
Resumo:
In this thesis we explore the concepts and components which can be used as individual building blocks for producing immersive virtual reality (VR) content from a single RGB-D sensor. We identify the properties of immersive VR videos and propose a system composed of a foreground/background separator, a dynamic scene re-constructor and a shape completer. We initially explore the foreground/background separator component in the context of video summarization. More specifically, we examined how to extract trajectories of moving objects from video sequences captured with a static camera. We then present a new approach for video summarization via minimization of the spatial-temporal projections of the extracted object trajectories. New evaluation criterion are also presented for video summarization. These concepts of foreground/background separation can then be applied towards VR scene creation by extracting relative objects of interest. We present an approach for the dynamic scene re-constructor component using a single moving RGB-D sensor. By tracking the foreground objects and removing them from the input RGB-D frames we can feed the background only data into existing RGB-D SLAM systems. The result is a static 3D background model where the foreground frames are then super-imposed to produce a coherent scene with dynamic moving foreground objects. We also present a specific method for extracting moving foreground objects from a moving RGB-D camera along with an evaluation dataset with benchmarks. Lastly, the shape completer component takes in a single view depth map of an object as input and "fills in" the occluded portions to produce a complete 3D shape. We present an approach that utilizes a new data minimal representation, the additive depth map, which allows traditional 2D convolutional neural networks to accomplish the task. The additive depth map represents the amount of depth required to transform the input into the "back depth map" which would exist if there was a sensor exactly opposite of the input. We train and benchmark our approach using existing synthetic datasets and also show that it can perform shape completion on real world data without fine-tuning. Our experiments show that our data minimal representation can achieve comparable results to existing state-of-the-art 3D networks while also being able to produce higher resolution outputs.
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Yalcin, Bayramoglu Neslihan. "Range Data Recognition: Segmentation, Matching, And Similarity Retrieval". Phd thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613586/index.pdf.

Texto completo da fonte
Resumo:
The improvements in 3D scanning technologies have led the necessity for managing range image databases. Hence, the requirement of describing and indexing this type of data arises. Up to now, rather much work is achieved on capturing, transmission and visualization
however, there is still a gap in the 3D semantic analysis between the requirements of the applications and the obtained results. In this thesis we studied 3D semantic analysis of range data. Under this broad title we address segmentation of range scenes, correspondence matching of range images and the similarity retrieval of range models. Inputs are considered as single view depth images. First, possible research topics related to 3D semantic analysis are introduced. Planar structure detection in range scenes are analyzed and some modifications on available methods are proposed. Also, a novel algorithm to segment 3D point cloud (obtained via TOF camera) into objects by using the spatial information is presented. We proposed a novel local range image matching method that combines 3D surface properties with the 2D scale invariant feature transform. Next, our proposal for retrieving similar models where the query and the database both consist of only range models is presented. Finally, analysis of heat diffusion process on range data is presented. Challenges and some experimental results are presented.
Estilos ABNT, Harvard, Vancouver, APA, etc.

Capítulos de livros sobre o assunto "3D semantic scene completion"

1

Ding, Laiyan, Panwen Hu, Jie Li e Rui Huang. "Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function". In Pattern Recognition and Computer Vision, 128–41. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8432-9_11.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Romero-González, Cristina, Jesus Martínez-Gómez e Ismael García-Varea. "3D Semantic Maps for Scene Segmentation". In ROBOT 2017: Third Iberian Robotics Conference, 603–12. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-70833-1_49.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Zhang, Jiahui, Hao Zhao, Anbang Yao, Yurong Chen, Li Zhang e Hongen Liao. "Efficient Semantic Scene Completion Network with Spatial Group Convolution". In Computer Vision – ECCV 2018, 749–65. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01258-8_45.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Akadas, Kiran, e Shankar Gangisetty. "3D Semantic Segmentation for Large-Scale Scene Understanding". In Computer Vision – ACCV 2020 Workshops, 87–102. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-69756-3_7.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Dai, Angela, e Matthias Nießner. "3DMV: Joint 3D-Multi-view Prediction for 3D Semantic Scene Segmentation". In Computer Vision – ECCV 2018, 458–74. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01249-6_28.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Henlein, Alexander, Attila Kett, Daniel Baumartz, Giuseppe Abrami, Alexander Mehler, Johannes Bastian, Yannic Blecher et al. "Semantic Scene Builder: Towards a Context Sensitive Text-to-3D Scene Framework". In Digital Human Modeling and Applications in Health, Safety, Ergonomics and Risk Management, 461–79. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-35748-0_32.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Wang, Jianan, Hanyu Xuan e Zhiliang Wu. "Semantic-Guided Completion Network for Video Inpainting in Complex Urban Scene". In Pattern Recognition and Computer Vision, 224–36. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8552-4_18.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Srinivasan, Sharadha, Shreya Kumar, Vallikannu Chockalingam e Chitrakala S. "3DSRASG: 3D Scene Retrieval and Augmentation Using Semantic Graphs". In Progress in Artificial Intelligence, 313–24. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86230-5_25.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Bultmann, Simon, e Sven Behnke. "3D Semantic Scene Perception Using Distributed Smart Edge Sensors". In Intelligent Autonomous Systems 17, 313–29. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-22216-0_22.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Cao, Chuqi, Mohammad Rafiq Swash e Hongying Meng. "Semantic 3D Scene Classification Based on Holoscopic 3D Camera for Autonomous Vehicles". In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery, 897–904. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-70665-4_96.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.

Trabalhos de conferências sobre o assunto "3D semantic scene completion"

1

Garbade, Martin, Yueh-Tung Chen, Johann Sawatzky e Juergen Gall. "Two Stream 3D Semantic Scene Completion". In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2019. http://dx.doi.org/10.1109/cvprw.2019.00055.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
2

Cao, Anh-Quan, e Raoul de Charette. "MonoScene: Monocular 3D Semantic Scene Completion". In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.00396.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
3

Wang, Yida, David Joseph Tan, Nassir Navab e Federico Tombari. "Adversarial Semantic Scene Completion from a Single Depth Image". In 2018 International Conference on 3D Vision (3DV). IEEE, 2018. http://dx.doi.org/10.1109/3dv.2018.00056.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
4

Wu, Shun-Cheng, Keisuke Tateno, Nassir Navab e Federico Tombari. "SCFusion: Real-time Incremental Scene Reconstruction with Semantic Completion". In 2020 International Conference on 3D Vision (3DV). IEEE, 2020. http://dx.doi.org/10.1109/3dv50981.2020.00090.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
5

Li, Jie, Kai Han, Peng Wang, Yu Liu e Xia Yuan. "Anisotropic Convolutional Networks for 3D Semantic Scene Completion". In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020. http://dx.doi.org/10.1109/cvpr42600.2020.00341.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
6

Li, Jie, Laiyan Ding e Rui Huang. "IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation through Iterative Mutual Enhancement". In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/110.

Texto completo da fonte
Resumo:
3D semantic scene completion and 2D semantic segmentation are two tightly correlated tasks that are both essential for indoor scene understanding, because they predict the same semantic classes, using positively correlated high-level features. Current methods use 2D features extracted from early-fused RGB-D images for 2D segmentation to improve 3D scene completion. We argue that this sequential scheme does not ensure these two tasks fully benefit each other, and present an Iterative Mutual Enhancement Network (IMENet) to solve them jointly, which interactively refines the two tasks at the late prediction stage. Specifically, two refinement modules are developed under a unified framework for the two tasks. The first is a 2D Deformable Context Pyramid (DCP) module, which receives the projection from the current 3D predictions to refine the 2D predictions. In turn, a 3D Deformable Depth Attention (DDA) module is proposed to leverage the reprojected results from 2D predictions to update the coarse 3D predictions. This iterative fusion happens to the stable high-level features of both tasks at a late stage. Extensive experiments on NYU and NYUCAD datasets verify the effectiveness of the proposed iterative late fusion scheme, and our approach outperforms the state of the art on both 3D semantic scene completion and 2D semantic segmentation.
Estilos ABNT, Harvard, Vancouver, APA, etc.
7

Zhang, Pingping, Wei Liu, Yinjie Lei, Huchuan Lu e Xiaoyun Yang. "Cascaded Context Pyramid for Full-Resolution 3D Semantic Scene Completion". In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2019. http://dx.doi.org/10.1109/iccv.2019.00789.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
8

Dourado, Aloisio, Frederico Guth e Teofilo de Campos. "Data Augmented 3D Semantic Scene Completion with 2D Segmentation Priors". In 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2022. http://dx.doi.org/10.1109/wacv51458.2022.00076.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
9

Yao, Jiawei, Chuming Li, Keqiang Sun, Yingjie Cai, Hao Li, Wanli Ouyang e Hongsheng Li. "NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space". In 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2023. http://dx.doi.org/10.1109/iccv51070.2023.00867.

Texto completo da fonte
Estilos ABNT, Harvard, Vancouver, APA, etc.
10

Guo, Yuxiao, e Xin Tong. "View-Volume Network for Semantic Scene Completion from a Single Depth Image". In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/101.

Texto completo da fonte
Resumo:
We introduce a View-Volume convolutional neural network (VVNet) for inferring the occupancy and semantic labels of a volumetric 3D scene from a single depth image. Our method extracts the detailed geometric features from the input depth image with a 2D view CNN and then projects the features into a 3D volume according to the input depth map via a projection layer. After that, we learn the 3D context information of the scene with a 3D volume CNN for computing the result volumetric occupancy and semantic labels. With combined 2D and 3D representations, the VVNet efficiently reduces the computational cost, enables feature extraction from multi-channel high resolution inputs, and thus significantly improve the result accuracy. We validate our method and demonstrate its efficiency and effectiveness on both synthetic SUNCG and real NYU dataset.
Estilos ABNT, Harvard, Vancouver, APA, etc.
Oferecemos descontos em todos os planos premium para autores cujas obras estão incluídas em seleções literárias temáticas. Contate-nos para obter um código promocional único!

Vá para a bibliografia