Siga este enlace para ver otros tipos de publicaciones sobre el tema: 3D semantic scene completion.

Tesis sobre el tema "3D semantic scene completion"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 15 mejores tesis para su investigación sobre el tema "3D semantic scene completion".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Roldão, Jimenez Luis Guillermo. "3D Scene Reconstruction and Completion for Autonomous Driving". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS415.

Texto completo
Resumen
Dans cette thèse, nous nous intéressons à des problèmes liés à la reconstruction et la complétion des scènes 3D à partir de nuages de points de densité hétérogène. Nous étudions l'utilisation de grilles d'occupation tridimensionnelles pour la reconstruction d'une scène 3D à partir de plusieurs observations. Nous proposons d'exploiter les informations de trajet des rayons pour résoudre des ambiguïtés dans les cellules partiellement occupées. Notre approche permet de réduire les imprécisions dues à la discrétisation et d'effectuer des mises à jour d'occupation des cellules dans des scénarios dynamiques. Puis, dans le cas où le nuage de points correspond à une seule observation de la scène, nous introduisons un algorithme de reconstruction de surface implicite 3D capable de traiter des données de densité hétérogène en utilisant une stratégie de voisinages adaptatifs. Notre méthode permet de compléter de petites zones manquantes de la scène et génère une représentation continue de la scène. Enfin, nous nous intéressons aux approches d'apprentissage profond adaptées à la complétion sémantique d'une scène 3D. Après avoir présenté une étude approfondie des méthodes existantes, nous introduisons une nouvelle méthode de complétion sémantique multi-échelle appropriée aux scenarios en extérieur. Pour ce faire, nous proposons une architecture constituée d'un réseau neuronal convolutif hybride basé sur une branche principale 2D et comportant des têtes de segmentation 3D pour prédire la scène sémantique complète à différentes échelles. Notre approche est plus légère et plus rapide que les approches existantes, tout en ayant une efficacité similaire
In this thesis, we address the challenges of 3D scene reconstruction and completion from sparse and heterogeneous density point clouds. Therefore proposing different techniques to create a 3D model of the surroundings.In the first part, we study the use of 3-dimensional occupancy grids for multi-frame reconstruction, useful for localization and HD-Maps applications. This is done by exploiting ray-path information to resolve ambiguities in partially occupied cells. Our sensor model reduces discretization inaccuracies and enables occupancy updates in dynamic scenarios.We also focus on single-frame environment perception by the introduction of a 3D implicit surface reconstruction algorithm capable to deal with heterogeneous density data by employing an adaptive neighborhood strategy. Our method completes small regions of missing data and outputs a continuous representation useful for physical modeling or terrain traversability assessment.We dive into deep learning applications for the novel task of semantic scene completion, which completes and semantically annotates entire 3D input scans. Given the little consensus found in the literature, we present an in-depth survey of existing methods and introduce our lightweight multiscale semantic completion network for outdoor scenarios. Our method employs a new hybrid pipeline based on a 2D CNN backbone branch to reduce computation overhead and 3D segmentation heads to predict the complete semantic scene at different scales, being significantly lighter and faster than existing approaches
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Garbade, Martin [Verfasser]. "Semantic Segmentation and Completion of 2D and 3D Scenes / Martin Garbade". Bonn : Universitäts- und Landesbibliothek Bonn, 2019. http://d-nb.info/1201728010/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Jaritz, Maximilian. "2D-3D scene understanding for autonomous driving". Thesis, Université Paris sciences et lettres, 2020. https://pastel.archives-ouvertes.fr/tel-02921424.

Texto completo
Resumen
Dans cette thèse, nous abordons les défis de la rareté des annotations et la fusion de données hétérogènes tels que les nuages de points 3D et images 2D. D’abord, nous adoptons une stratégie de conduite de bout en bout où un réseau de neurones est entraîné pour directement traduire l'entrée capteur (image caméra) en contrôles-commandes, ce qui rend cette approche indépendante des annotations dans le domaine visuel. Nous utilisons l’apprentissage par renforcement profond où l'algorithme apprend de la récompense, obtenue par interaction avec un simulateur réaliste. Nous proposons de nouvelles stratégies d'entraînement et fonctions de récompense pour une meilleure conduite et une convergence plus rapide. Cependant, le temps d’apprentissage reste élevé. C'est pourquoi nous nous concentrons sur la perception dans le reste de cette thèse pour étudier la fusion de nuage de points et d'images. Nous proposons deux méthodes différentes pour la fusion 2D-3D. Premièrement, nous projetons des nuages de points LiDAR 3D dans l’espace image 2D, résultant en des cartes de profondeur éparses. Nous proposons une nouvelle architecture encodeur-décodeur qui fusionne les informations de l’image et la profondeur pour la tâche de complétion de carte de profondeur, améliorant ainsi la résolution du nuage de points projeté dans l'espace image. Deuxièmement, nous fusionnons directement dans l'espace 3D pour éviter la perte d'informations dû à la projection. Pour cela, nous calculons les caractéristiques d’image issues de plusieurs vues avec un CNN 2D, puis nous les projetons dans un nuage de points 3D global pour les fusionner avec l’information 3D. Par la suite, ce nuage de point enrichi sert d'entrée à un réseau "point-based" dont la tâche est l'inférence de la sémantique 3D par point. Sur la base de ce travail, nous introduisons la nouvelle tâche d'adaptation de domaine non supervisée inter-modalités où on a accès à des données multi-capteurs dans une base de données source annotée et une base cible non annotée. Nous proposons une méthode d’apprentissage inter-modalités 2D-3D via une imitation mutuelle entre les réseaux d'images et de nuages de points pour résoudre l’écart de domaine source-cible. Nous montrons en outre que notre méthode est complémentaire à la technique unimodale existante dite de pseudo-labeling
In this thesis, we address the challenges of label scarcity and fusion of heterogeneous 3D point clouds and 2D images. We adopt the strategy of end-to-end race driving where a neural network is trained to directly map sensor input (camera image) to control output, which makes this strategy independent from annotations in the visual domain. We employ deep reinforcement learning where the algorithm learns from reward by interaction with a realistic simulator. We propose new training strategies and reward functions for better driving and faster convergence. However, training time is still very long which is why we focus on perception to study point cloud and image fusion in the remainder of this thesis. We propose two different methods for 2D-3D fusion. First, we project 3D LiDAR point clouds into 2D image space, resulting in sparse depth maps. We propose a novel encoder-decoder architecture to fuse dense RGB and sparse depth for the task of depth completion that enhances point cloud resolution to image level. Second, we fuse directly in 3D space to prevent information loss through projection. Therefore, we compute image features with a 2D CNN of multiple views and then lift them all to a global 3D point cloud for fusion, followed by a point-based network to predict 3D semantic labels. Building on this work, we introduce the more difficult novel task of cross-modal unsupervised domain adaptation, where one is provided with multi-modal data in a labeled source and an unlabeled target dataset. We propose to perform 2D-3D cross-modal learning via mutual mimicking between image and point cloud networks to address the source-target domain shift. We further showcase that our method is complementary to the existing uni-modal technique of pseudo-labeling
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Dewan, Ayush [Verfasser] y Wolfram [Akademischer Betreuer] Burgard. "Leveraging motion and semantic cues for 3D scene understanding". Freiburg : Universität, 2020. http://d-nb.info/1215499493/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Lind, Johan. "Make it Meaningful : Semantic Segmentation of Three-Dimensional Urban Scene Models". Thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-143599.

Texto completo
Resumen
Semantic segmentation of a scene aims to give meaning to the scene by dividing it into meaningful — semantic — parts. Understanding the scene is of great interest for all kinds of autonomous systems, but manual annotation is simply too time consuming, which is why there is a need for an alternative approach. This thesis investigates the possibility of automatically segmenting 3D-models of urban scenes, such as buildings, into a predetermined set of labels. The approach was to first acquire ground truth data by manually annotating five 3D-models of different urban scenes. The next step was to extract features from the 3D-models and evaluate which ones constitutes a suitable feature space. Finally, three supervised learners were implemented and evaluated: k-Nearest Neighbour (KNN), Support Vector Machine (SVM) and Random Classification Forest (RCF). The classifications were done point-wise, classifying each 3D-point in the dense point cloud belonging to the model being classified. The result showed that the best suitable feature space is not necessarily the one containing all features. The KNN classifier got the highest average accuracy overall models — classifying 42.5% of the 3D points correct. The RCF classifier managed to classify 66.7% points correct in one of the models, but had worse performance for the rest of the models and thus resulting in a lower average accuracy compared to KNN. In general, KNN, SVM, and RCF seemed to have different benefits and drawbacks. KNN is simple and intuitive but by far the slowest classifier when dealing with a large set of training data. SVM and RCF are both fast but difficult to tune as there are more parameters to adjust. Whether the reason for obtaining the relatively low highest accuracy was due to the lack of ground truth training data, unbalanced validation models, or the capacity of the learners, was never investigated due to a limited time span. However, this ought to be investigated in future studies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Piewak, Florian [Verfasser] y J. M. [Akademischer Betreuer] Zöllner. "LiDAR-based Semantic Labeling : Automotive 3D Scene Understanding / Florian Pierre Joseph Piewak ; Betreuer: J. M. Zöllner". Karlsruhe : KIT-Bibliothek, 2020. http://d-nb.info/1212512405/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Piewak, Florian Pierre Joseph [Verfasser] y J. M. [Akademischer Betreuer] Zöllner. "LiDAR-based Semantic Labeling : Automotive 3D Scene Understanding / Florian Pierre Joseph Piewak ; Betreuer: J. M. Zöllner". Karlsruhe : KIT-Bibliothek, 2020. http://d-nb.info/1212512405/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Minto, Ludovico. "Deep learning for scene understanding with color and depth data". Doctoral thesis, Università degli studi di Padova, 2018. http://hdl.handle.net/11577/3422424.

Texto completo
Resumen
Significant advancements have been made in the recent years concerning both data acquisition and processing hardware, as well as optimization and machine learning techniques. On one hand, the introduction of depth sensors in the consumer market has made possible the acquisition of 3D data at a very low cost, allowing to overcome many of the limitations and ambiguities that typically affect computer vision applications based on color information. At the same time, computationally faster GPUs have allowed researchers to perform time-consuming experimentations even on big data. On the other hand, the development of effective machine learning algorithms, including deep learning techniques, has given a highly performing tool to exploit the enormous amount of data nowadays at hand. Under the light of such encouraging premises, three classical computer vision problems have been selected and novel approaches for their solution have been proposed in this work that both leverage the output of a deep Convolutional Neural Network (ConvNet) as well jointly exploit color and depth data to achieve competing results. In particular, a novel semantic segmentation scheme for color and depth data is presented that uses the features extracted from a ConvNet together with geometric cues. A method for 3D shape classification is also proposed that uses a deep ConvNet fed with specific 3D data representations. Finally, a ConvNet for ToF and stereo confidence estimation has been employed underneath a ToF-stereo fusion algorithm thus avoiding to rely on complex yet inaccurate noise models for the confidence estimation task.
Negli ultimi anni sono stati raggiunti notevoli progressi sia per quanto concerne l'acquisizione di dati sia per quanto riguarda la strumentazione e gli algoritmi necessari per processarli. Da un lato, l'introduzione di sensori di profondità nel mercato del grande consumo ha reso possibile l'acquisizione di dati tridimensionali ad un costo irrisorio, permettendo così di superare le limitazioni cui sono tipicamente soggette svariate applicazioni basate solamente sull'elaborazione del colore. Al tempo stesso, processori grafici sempre più performanti hanno consentito l'estensione della ricerca ad algoritmi computazionalmente onerosi e la loro applicazione a grandi moli di dati. Dall'altro lato, lo sviluppo di algoritmi sempre più efficaci per l'apprendimento automatico, ivi incluse tecniche di apprendimento profondo, ha permesso di sfruttare l'enorme quantità di dati oggi a disposizione. Alla luce di queste premesse, vengono presentati in questa tesi tre tipici problemi nell'ambito della visione computazionale proponendo altrettanti approcci per una loro soluzione in grado di sfruttare sia l'utilizzo di reti neurali convoluzionali sia l'informazione congiunta convogliata da dati di colore e profondità. In particolare, viene presentato un approccio per la segmentazione semantica di immagini colore/profondità che utilizza sia l'informazione estratta con l'aiuto di una rete neurale convoluzionale sia l'informazione geometrica ricavata attraverso algoritmi più tradizionali. Viene descritto un metodo per la classificazione di forme tridimensionali basato anch'esso sull'utilizzo di una rete neurale convoluzionale operante su particolari rappresentazioni dei dati 3D a disposizione. Infine, viene proposto l'utilizzo dei una rete convoluzionale per stimare la confidenza associata a dati di profondità rispettivamente raccolti con un sensore ToF ed un sistema stereo al fine di guidare con successo la loro fusione senza impiegare, per lo stesso scopo, complicati modelli di rumore.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Lai, Po Kong. "Immersive Dynamic Scenes for Virtual Reality from a Single RGB-D Camera". Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39663.

Texto completo
Resumen
In this thesis we explore the concepts and components which can be used as individual building blocks for producing immersive virtual reality (VR) content from a single RGB-D sensor. We identify the properties of immersive VR videos and propose a system composed of a foreground/background separator, a dynamic scene re-constructor and a shape completer. We initially explore the foreground/background separator component in the context of video summarization. More specifically, we examined how to extract trajectories of moving objects from video sequences captured with a static camera. We then present a new approach for video summarization via minimization of the spatial-temporal projections of the extracted object trajectories. New evaluation criterion are also presented for video summarization. These concepts of foreground/background separation can then be applied towards VR scene creation by extracting relative objects of interest. We present an approach for the dynamic scene re-constructor component using a single moving RGB-D sensor. By tracking the foreground objects and removing them from the input RGB-D frames we can feed the background only data into existing RGB-D SLAM systems. The result is a static 3D background model where the foreground frames are then super-imposed to produce a coherent scene with dynamic moving foreground objects. We also present a specific method for extracting moving foreground objects from a moving RGB-D camera along with an evaluation dataset with benchmarks. Lastly, the shape completer component takes in a single view depth map of an object as input and "fills in" the occluded portions to produce a complete 3D shape. We present an approach that utilizes a new data minimal representation, the additive depth map, which allows traditional 2D convolutional neural networks to accomplish the task. The additive depth map represents the amount of depth required to transform the input into the "back depth map" which would exist if there was a sensor exactly opposite of the input. We train and benchmark our approach using existing synthetic datasets and also show that it can perform shape completion on real world data without fine-tuning. Our experiments show that our data minimal representation can achieve comparable results to existing state-of-the-art 3D networks while also being able to produce higher resolution outputs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Yalcin, Bayramoglu Neslihan. "Range Data Recognition: Segmentation, Matching, And Similarity Retrieval". Phd thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613586/index.pdf.

Texto completo
Resumen
The improvements in 3D scanning technologies have led the necessity for managing range image databases. Hence, the requirement of describing and indexing this type of data arises. Up to now, rather much work is achieved on capturing, transmission and visualization
however, there is still a gap in the 3D semantic analysis between the requirements of the applications and the obtained results. In this thesis we studied 3D semantic analysis of range data. Under this broad title we address segmentation of range scenes, correspondence matching of range images and the similarity retrieval of range models. Inputs are considered as single view depth images. First, possible research topics related to 3D semantic analysis are introduced. Planar structure detection in range scenes are analyzed and some modifications on available methods are proposed. Also, a novel algorithm to segment 3D point cloud (obtained via TOF camera) into objects by using the spatial information is presented. We proposed a novel local range image matching method that combines 3D surface properties with the 2D scale invariant feature transform. Next, our proposal for retrieving similar models where the query and the database both consist of only range models is presented. Finally, analysis of heat diffusion process on range data is presented. Challenges and some experimental results are presented.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Duan, Liuyun. "Modélisation géométrique de scènes urbaines par imagerie satellitaire". Thesis, Université Côte d'Azur (ComUE), 2017. http://www.theses.fr/2017AZUR4025.

Texto completo
Resumen
La modélisation automatique de villes à partir d’images satellites est l'un des principaux défis en lien avec la reconstruction urbaine. Son objectif est de représenter des villes en 3D de manière suffisamment compacte et précise. Elle trouve son application dans divers domaines, qui vont de la planification urbaine aux télécommunications, en passant par la gestion des catastrophes. L'imagerie satellite offre plusieurs avantages sur l'imagerie aérienne classique, tels qu'un faible coût d'acquisition, une couverture mondiale et une bonne fréquence de passage au-dessus des sites visités. Elle impose toutefois un certain nombre de contraintes techniques. Les méthodes existantes ne permettent que la synthèse de DSM (Digital Surface Models), dont la précision est parfois inégale. Cette dissertation décrit une méthode entièrement automatique pour la production de modèles 3D compacts, précis et répondant à une sémantique particulière, à partir de deux images satellites en stéréo. Cette méthode repose sur deux grands concepts. D'une part, la description géométrique des objets et leur assimilation à des catégories génériques sont effectuées simultanément, conférant ainsi une certaine robustesse face aux occlusions partielles ainsi qu'à la faible qualité des images. D'autre part, la méthode opère à une échelle géométrique très basse, ce qui permet la préservation de la forme des objets, avec finalement, une plus grande efficacité et un meilleur passage à l'échelle. Pour générer des régions élémentaires, un algorithme de partitionnement de l'image en polygones convexes est présenté
Automatic city modeling from satellite imagery is one of the biggest challenges in urban reconstruction. The ultimate goal is to produce compact and accurate 3D city models that benefit many application fields such as urban planning, telecommunications and disaster management. Compared with aerial acquisition, satellite imagery provides appealing advantages such as low acquisition cost, worldwide coverage and high collection frequency. However, satellite context also imposes a set of technical constraints as a lower pixel resolution and a wider that challenge 3D city reconstruction. In this PhD thesis, we present a set of methodological tools for generating compact, semantically-aware and geometrically accurate 3D city models from stereo pairs of satellite images. The proposed pipeline relies on two key ingredients. First, geometry and semantics are retrieved simultaneously providing robust handling of occlusion areas and low image quality. Second, it operates at the scale of geometric atomic regions which allows the shape of urban objects to be well preserved, with a gain in scalability and efficiency. Images are first decomposed into convex polygons that capture geometric details via Voronoi diagram. Semantic classes, elevations, and 3D geometric shapes are then retrieved in a joint classification and reconstruction process operating on polygons. Experimental results on various cities around the world show the robustness, scalability and efficiency of the proposed approach
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Yin, Wei. "3D Scene Reconstruction from A Monocular Image". Thesis, 2022. https://hdl.handle.net/2440/134585.

Texto completo
Resumen
3D scene reconstruction is a fundamental task in computer vision. The established approaches to address this task are based on multi-view geometry, which create correspondence of feature points with consecutive frames or multiple views. Finally, 3D information of these feature points can be recovered. In contrast, we aim to achieve dense 3D scene shape reconstruction from a single in-the-wild image. Without multiple views available, we rely on deep learning techniques. Recently, deep neural networks have been the dominant solution for various computer vision problems. Thus, we propose a two stage method based on learning-based methods. Firstly, we employ fully-convolutional neural networks to learn accurate depth from a monocular image. To recover high-quality depth, we lift the depth to 3D space and propose a global geometric constraint, termed virtual normal loss. To improve the generalization ability of the monocular depth estimation module, we construct a large-scale and diverse dataset and propose to learn the affine-invariant depth on that. Experiments demonstrate that our monocular depth estimation methods can robustly work in the wild and recover high-quality 3D geometry information. Furthermore, we propose a novel second stage to predict the focal length with a point cloud network. Instead of directly predicting it, the point cloud module leverages point cloud encoder networks that predict focal length adjustment factors from an initial guess of the scene point cloud reconstruction. The domain gap is significantly less of an issue for point clouds than that for images. Combing two stage modules together, 3D shape can be recovered from a single image input. Note that such reconstruction is up to a scale. To recover metric 3D shape, we propose to input the sparse points as guidance. Our proposed training method can significantly improve the robustness of the system, including robustness to various sparsity patterns and diverse scenes.
Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Zhuo, Wei. "2D+3D Indoor Scene Understanding from a Single Monocular Image". Phd thesis, 2018. http://hdl.handle.net/1885/144616.

Texto completo
Resumen
Scene understanding, as a broad field encompassing many subtopics, has gained great interest in recent years. Among these subtopics, indoor scene understanding, having its own specific attributes and challenges compared to outdoor scene under- standing, has drawn a lot of attention. It has potential applications in a wide variety of domains, such as robotic navigation, object grasping for personal robotics, augmented reality, etc. To our knowledge, existing research for indoor scenes typically makes use of depth sensors, such as Kinect, that is however not always available. In this thesis, we focused on addressing the indoor scene understanding tasks in a general case, where only a monocular color image of the scene is available. Specifically, we first studied the problem of estimating a detailed depth map from a monocular image. Then, benefiting from deep-learning-based depth estimation, we tackled the higher-level tasks of 3D box proposal generation, and scene parsing with instance segmentation, semantic labeling and support relationship inference from a monocular image. Our research on indoor scene understanding provides a comprehensive scene interpretation at various perspectives and scales. For monocular image depth estimation, previous approaches are limited in that they only reason about depth locally on a single scale, and do not utilize the important information of geometric scene structures. Here, we developed a novel graphical model, which reasons about detailed depth while leveraging geometric scene structures at multiple scales. For 3D box proposals, to our best knowledge, our approach constitutes the first attempt to reason about class-independent 3D box proposals from a single monocular image. To this end, we developed a novel integrated, differentiable framework that estimates depth, extracts a volumetric scene representation and generates 3D proposals. At the core of this framework lies a novel residual, differentiable truncated signed distance function module, which is able to handle the relatively low accuracy of the predicted depth map. For scene parsing, we tackled its three subtasks of instance segmentation, se- mantic labeling, and the support relationship inference on instances. Existing work typically reasons about these individual subtasks independently. Here, we leverage the fact that they bear strong connections, which can facilitate addressing these sub- tasks if modeled properly. To this end, we developed an integrated graphical model that reasons about the mutual relationships of the above subtasks. In summary, in this thesis, we introduced novel and effective methodologies for each of three indoor scene understanding tasks, i.e., depth estimation, 3D box proposal generation, and scene parsing, and exploited the dependencies on depth estimates of the latter two tasks. Evaluation on several benchmark datasets demonstrated the effectiveness of our algorithms and the benefits of utilizing depth estimates for higher-level tasks.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Wu, Sheng-Han y 吳昇翰. "Indoor Scene Semantic Modeling with Scalable 3D Model Retrieval to Interact with Real-world Environment in Virtual Reality". Thesis, 2019. http://ndltd.ncl.edu.tw/handle/p74u4k.

Texto completo
Resumen
碩士
國立交通大學
資訊科學與工程研究所
108
In recent of years, Virtual Reality (VR) applications have been developed rapidly. However, few of them support interacting with real-world environment, because large efforts are required to build almost the same 3D models as real objects and put them into VR environment. In this paper, we propose a fully automatic method for indoor scene semantic modeling. Moreover, the reconstructed 3D model is completely fit real scene and can allow user to touch real objects in VR. First, we acquire the real indoor scene by using SemanticFusion which provides a point cloud with semantic labels. After that, we present a method to handle incorrect labels and extract individual object point cloud. Finally, a novel 3D object model retrieval is proposed. Unlike existing works, our method is able to generate the geometrically faithful models and work well even when there is no exactly the same 3D object in the shape database or object point cloud is incomplete. The result has been applied to a VR application, and it shows the reconstruction model is precise enough for haptic touch in VR.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Najafi, Mohammad. "On the Role of Context at Different Scales in Scene Parsing". Phd thesis, 2017. http://hdl.handle.net/1885/116302.

Texto completo
Resumen
Scene parsing can be formulated as a labeling problem where each visual data element, e.g., each pixel of an image or each 3D point in a point cloud, is assigned a semantic class label. One can approach this problem by training a classifier and predicting a class label for the data elements purely based on their local properties. This approach, however, does not take into account any kind of contextual information between different elements in the image or point cloud. For example, in an application where we are interested in labeling roadside objects, the fact that most of the utility poles are connected to some power wires can be very helpful in disambiguating them from other similar looking classes. Recurrence of certain class combinations can be also considered as a good contextual hint since they are very likely to co-occur again. These forms of high-level contextual information are often formulated using pairwise and higher-order Conditional Random Fields (CRFs). A CRF is a probabilistic graphical model that encodes the contextual relationships between the data elements in a scene. In this thesis, we study the potential of contextual information at different scales (ranges) in scene parsing problems. First, we propose a model that utilizes the local context of the scene via a pairwise CRF. Our model acquires contextual interactions between different classes by assessing their misclassification rates using only the local properties of data. In other words, no extra training is required for obtaining the class interaction information. Next, we expand the context field of view from a local range to a longer range, and make use of higher-order models to encode more complex contextual cues. More specifically, we introduce a new model to employ geometric higher-order terms in a CRF for semantic labeling of 3D point cloud data. Despite the potential of the above models at capturing the contextual cues in the scene, there are higher-level context cues that cannot be encoded via pairwise and higher-order CRFs. For instance, a vehicle is very unlikely to appear in a sea scene, or buildings are frequently observed in a street scene. Such information can be described using scene context and are modeled using global image descriptors. In particular, through an image retrieval procedure, we find images whose content is similar to that of the query image, and use them for scene parsing. Another problem of the above methods is that they rely on a computationally expensive training process for the classification using the local properties of data elements, which needs to be repeated every time the training data is modified. We address this issue by proposing a fast and efficient approach that exempts us from the cumbersome training task, by transferring the ground-truth information directly from the training data to the test data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía