Dissertationen zum Thema „RGB-D Image“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit Top-50 Dissertationen für die Forschung zum Thema "RGB-D Image" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.
Murgia, Julian. „Segmentation d'objets mobiles par fusion RGB-D et invariance colorimétrique“. Thesis, Belfort-Montbéliard, 2016. http://www.theses.fr/2016BELF0289/document.
Der volle Inhalt der QuelleThis PhD thesis falls within the scope of video-surveillance, and more precisely focuses on the detection of movingobjects in image sequences. In many applications, good detection of moving objects is an indispensable prerequisiteto any treatment applied to these objects such as people or cars tracking, passengers counting, detection ofdangerous situations in specific environments (level crossings, pedestrian crossings, intersections, etc.), or controlof autonomous vehicles. The reliability of computer vision based systems require robustness against difficultconditions often caused by lighting conditions (day/night, shadows), weather conditions (rain, wind, snow...) and thetopology of the observed scene (occultation...).Works detailed in this PhD thesis aim at reducing the impact of illumination conditions by improving the quality of thedetection of mobile objects in indoor or outdoor environments and at any time of the day. Thus, we propose threestrategies working as a combination to improve the detection of moving objects:i) using colorimetric invariants and/or color spaces that provide invariant properties ;ii) using passive stereoscopic camera (in outdoor environments) and Microsoft Kinect active camera (in outdoorenvironments) in order to partially reconstruct the 3D environment, providing an additional dimension (a depthinformation) to the background/foreground subtraction algorithm ;iii) a new fusion algorithm based on fuzzy logic in order to combine color and depth information with a certain level ofuncertainty for the pixels classification
Tykkälä, Tommi. „Suivi de caméra image en temps réel base et cartographie de l'environnement“. Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00933813.
Der volle Inhalt der QuelleLai, Po Kong. „Immersive Dynamic Scenes for Virtual Reality from a Single RGB-D Camera“. Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39663.
Der volle Inhalt der QuelleKadkhodamohammadi, Abdolrahim. „3D detection and pose estimation of medical staff in operating rooms using RGB-D images“. Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD047/document.
Der volle Inhalt der QuelleIn this thesis, we address the two problems of person detection and pose estimation in Operating Rooms (ORs), which are key ingredients in the development of surgical assistance applications. We perceive the OR using compact RGB-D cameras that can be conveniently integrated in the room. These sensors provide complementary information about the scene, which enables us to develop methods that can cope with numerous challenges present in the OR, e.g. clutter, textureless surfaces and occlusions. We present novel part-based approaches that take advantage of depth, multi-view and temporal information to construct robust human detection and pose estimation models. Evaluation is performed on new single- and multi-view datasets recorded in operating rooms. We demonstrate very promising results and show that our approaches outperform state-of-the-art methods on this challenging data acquired during real surgeries
Meilland, Maxime. „Cartographie RGB-D dense pour la localisation visuelle temps-réel et la navigation autonome“. Phd thesis, Ecole Nationale Supérieure des Mines de Paris, 2012. http://tel.archives-ouvertes.fr/tel-00686803.
Der volle Inhalt der QuelleVillota, Juan Carlos Perafán. „Adaptive registration using 2D and 3D features for indoor scene reconstruction“. Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/3/3139/tde-17042017-090901/.
Der volle Inhalt der QuelleO alinhamento entre pares de nuvens de pontos é uma tarefa importante na construção de mapas de ambientes em 3D. A combinação de características locais 2D com informação de profundidade fornecida por câmeras RGB-D são frequentemente utilizadas para melhorar tais alinhamentos. No entanto, em ambientes internos com baixa iluminação ou pouca textura visual o método usando somente características locais 2D não é particularmente robusto. Nessas condições, as características 2D são difíceis de serem detectadas, conduzindo a um desalinhamento entre pares de quadros consecutivos. A utilização de características 3D locais pode ser uma solução uma vez que tais características são extraídas diretamente de pontos 3D e são resistentes a variações na textura visual e na iluminação. Como situações de variações em cenas reais em ambientes internos são inevitáveis, essa tese apresenta um novo sistema desenvolvido com o objetivo de melhorar o alinhamento entre pares de quadros usando uma combinação adaptativa de características esparsas 2D e 3D. Tal combinação está baseada nos níveis de estrutura geométrica e de textura visual contidos em cada cena. Esse sistema foi testado com conjuntos de dados RGB-D, incluindo vídeos com movimentos irrestritos da câmera e mudanças naturais na iluminação. Os resultados experimentais mostram que a nossa proposta supera aqueles métodos que usam características 2D ou 3D separadamente, obtendo uma melhora da precisão no alinhamento de cenas em ambientes internos reais.
Shi, Yangyu. „Infrared Imaging Decision Aid Tools for Diagnosis of Necrotizing Enterocolitis“. Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40714.
Der volle Inhalt der QuelleBaban, a. erep Thierry Roland. „Contribution au développement d'un système intelligent de quantification des nutriments dans les repas d'Afrique subsaharienne“. Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSEP100.
Der volle Inhalt der QuelleMalnutrition, including under- and overnutrition, is a global health challenge affecting billions of people. It impacts all organ systems and is a significant risk factor for noncommunicable diseases such as cardiovascular diseases, diabetes, and some cancers. Assessing food intake is crucial for preventing malnutrition but remains challenging. Traditional methods for dietary assessment are labor-intensive and prone to bias. Advancements in AI have made Vision-Based Dietary Assessment (VBDA) a promising solution for automatically analyzing food images to estimate portions and nutrition. However, food image segmentation in VBDA faces challenges due to food's non-rigid structure, high intra-class variation (where the same dish can look very different), inter-class resemblance (where different foods appear similar) and scarcity of publicly available datasets.Almost all food segmentation research has focused on Asian and Western foods, with no datasets for African cuisines. However, African dishes often involve mixed food classes, making accurate segmentation challenging. Additionally, research has largely focus on RGB images, which provides color and texture but may lack geometric detail. To address this, RGB-D segmentation combines depth data with RGB images. Depth images provide crucial geometric details that enhance RGB data, improve object discrimination, and are robust to factors like illumination and fog. Despite its success in other fields, RGB-D segmentation for food is underexplored due to difficulties in collecting food depth images.This thesis makes key contributions by developing new deep learning models for RGB (mid-DeepLabv3+) and RGB-D (ESeNet-D) image segmentation and introducing the first food segmentation datasets focused on African food images. Mid-DeepLabv3+ is based on DeepLabv3+, featuring a simplified ResNet backbone with and added skip layer (middle layer) in the decoder and SimAM attention mechanism. This model offers an optimal balance between performance and efficiency, matching DeepLabv3+'s performance while cutting computational load by half. ESeNet-D consists on two encoder branches using EfficientNetV2 as backbone, with a fusion block for multi-scale integration and a decoder employing self-calibrated convolution and learned interpolation for precise segmentation. ESeNet-D outperforms many RGB and RGB-D benchmark models while having fewer parameters and FLOPs. Our experiments show that, when properly integrated, depth information can significantly improve food segmentation accuracy. We also present two new datasets: AfricaFoodSeg for “food/non-food” segmentation with 3,067 images (2,525 for training, 542 for validation), and CamerFood focusing on Cameroonian cuisine. CamerFood datasets include CamerFood10 with 1,422 images from ten food classes, and CamerFood15, an enhanced version with 15 food classes, 1,684 training images, and 514 validation images. Finally, we address the challenge of scarce depth data in RGB-D food segmentation by demonstrating that Monocular Depth Estimation (MDE) models can aid in generating effective depth maps for RGB-D datasets
Hasnat, Md Abul. „Unsupervised 3D image clustering and extension to joint color and depth segmentation“. Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4013/document.
Der volle Inhalt der QuelleAccess to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysis
Řehánek, Martin. „Detekce objektů pomocí Kinectu“. Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236602.
Der volle Inhalt der QuelleHammond, Patrick Douglas. „Deep Synthetic Noise Generation for RGB-D Data Augmentation“. BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7516.
Der volle Inhalt der QuelleSILVA, DJALMA LUCIO SOARES DA. „USING PLANAR STRUCTURES EXTRACTED FROM RGB-D IMAGES IN AUGMENTED REALITY APPLICATIONS“. PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28675@1.
Der volle Inhalt der QuelleCOORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Esta dissertação investiga o uso das estruturas planares extraídas de imagens RGB-Dem aplicações de Realidade Aumentada. Ter o modelo da cena é fundamental para as aplicações de realidade aumentada. O uso de imagens RGB-D auxilia bastante o processo da construção destes modelos, pois elas fornecem a geometria e os aspectos fotométricos da cena. Devido a grande parte das aplicações de realidade aumentada utilizarem superfícies planares como sua principal componente para projeção de objetos virtuais, é fundamental ter um método robusto e eficaz para obter e representar as estruturas que constituem estas superfícies planares. Neste trabalho, apresentaremos um método para identificar, segmentar e representar estruturas planares a partir de imagens RGB-D. Nossa representação das estruturas planares são polígonos bidimensionais triangulados, simplificados e texturizados, que estão no sistema de coordenadas do plano, onde os pontos destes polígonos definem as regiões deste plano. Demonstramos através de diversos experimentos e da implementação de uma aplicação de realidade aumentada, as técnicas e métodos utilizados para extrair as estruturas planares a partir das imagens RGB-D.
This dissertation investigates the use of planar geometric structures extracted from RGB-D images in Augmented Reality Applications. The model of a scene is essential for augmented reality applications. RGB-D images can greatly help the construction of these models because they provide geometric and photometric information about the scene. Planar structures are prevalent in many 3D scenes and, for this reason, augmented reality applications use planar surfaces as one of the main components for projection of virtual objects. Therefore, it is extremely important to have robust and efficient methods to acquire and represent the structures that compose these planar surfaces. In this work, we will present a method for identifying, targeting and representing planar structures from RGB-D images. Our planar structures representation is triangulated two-dimensional polygons, simplified and textured, forming a triangle mesh intrinsic to the plane that defines regions in this space corresponding to surface of objects in the 3D scene. We have demonstrated through various experiments and implementation of an augmented reality application, the techniques and methods used to extract the planar structures from the RGB-D images.
Basso, Marcos Aurélio. „Um método robusto para modelagem 3D de ambientes internos usando dados RGB-D“. reponame:Repositório Institucional da UFPR, 2015. http://hdl.handle.net/1884/45430.
Der volle Inhalt der QuelleTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências da Terra, Programa de Pós-Graduação em Ciências Geodésicas. Defesa: Curitiba, 12/11/2015
Inclui referências : f. 106-112
Resumo: O objetivo deste trabalho é propor um método robusto para modelagem 3D de ambientes internos usando dados RGB-D. Basicamente, a modelagem 3D de ambientes está dividida em quatro tarefas, a saber: a escolha do sensor de imageamento; o problema do registro de nuvem de pontos 3D adquiridos pelo sensor de imageamento em diferentes pontos de vista; o problema da detecção de lugares anteriormente visitados (loop closure); e o problema da análise de consistência global. Atualmente, o Kinect é o sensor RGB-D mais empregado na aquisição de dados para modelagem de ambientes internos, uma vez que é leve, flexível e de fácil manuseio. A etapa de registro consiste em determinar os parâmetros de transformação relativa entre pares de nuvens de pontos e, neste trabalho, é dividida em duas partes: a primeira parte consiste em executar o registro inicial dos dados 3D usando pontos visuais e o modelo de corpo rígido 3D; na segunda parte, os parâmetros iniciais são refinados empregando um modelo matemático baseado numa abordagem paralaxe-a-plano, o que torna o método robusto. Para minimizar os efeitos da propagação de erros provocados na etapa de registro dos pares de nuvens de pontos 3D, o método proposto detecta lugares anteriormente visitados usando uma imagem de (frame-chave). Basicamente, é feita uma busca por imagens com grau de similaridade com a imagem de referência e, por fim, é obtida uma nova restrição espacial. A etapa de consistência global cria um grafo dirigido e ponderado, sendo cada vértice do grafo representado pelos parâmetros de transformação obtidos na etapa de registro dos dados, enquanto suas arestas representam as restrições espaciais definidas pelos parâmetros de transformação obtidos entre os lugares revisitados. A otimização deste grafo é feito através do método GraphSLAM. Experimentos foram realizados em cinco ambientes internos e o método proposto propiciou uma acurácia relativa em torno de 6,85 cm. . Palavras-chave:sensor RGB-D; modelagem 3D; Otimização da trajetória baseado em grafos; registro de pares de nuvens de pontos; análise de consistência global.
Abstract: The objective of this paper is to propose a robust method for 3D modeling indoors using RGB-D data. Basically, the 3D modeling environment is divided into four problems, namely: the choice of the imaging sensor; the cloud Registration problem of 3D points acquired by the imaging sensor in different views; the problem of detection places previously visited (loop closure); and the problem of global consistency analysis. Currently, Kinect is the RGB-D sensor more employed in data acquisition for modeling indoor environments, since they are lightweight, flexible and easy to use. The registration step is to determine the transformation parameters relating between pairs of point cloud and in this paper is divided into two parts: the first part is to run the initial registration of 3D data using visual points and rigid body model 3D; in the second part, the initial parameters are refined using a mathematical model based on a parallax-the-plan approach, which makes the robust method. To minimize the effects of propagation of errors caused in the 3D point cloud pairs registration step, the proposed method detects previously visited places using a reference image (key-frame). Basically, a search for images with degree of correlation is made with the reference image, and finally, a new spatial constraint is obtained. The overall consistency of step creates a directed and weighted graph, each nodes in the graph represented by the transformation parameters obtained in the data registration step, whereas its edges represent the spatial constraints defined by the transformation parameters obtained between Revisited places. The optimization of the graph is made by GraphSLAM method. Experiments were carried out in five indoor and the proposed method provided a relative accuracy around 6,85 cm.. Keywords: RGB-D sensor; mapping 3D; GraphSLAM; pairs registration of point clouds; consistency global analysis.
Martins, Renato. „Odométrie visuelle directe et cartographie dense de grands environnements à base d'images panoramiques RGB-D“. Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEM004/document.
Der volle Inhalt der QuelleThis thesis is in the context of self-localization and 3D mapping from RGB-D cameras for mobile robots and autonomous systems. We present image alignment and mapping techniques to perform the camera localization (tracking) notably for large camera motions or low frame rate. Possible domains of application are localization of autonomous vehicles, 3D reconstruction of environments, security or in virtual and augmented reality. We propose a consistent localization and 3D dense mapping framework considering as input a sequence of RGB-D images acquired from a mobile platform. The core of this framework explores and extends the domain of applicability of direct/dense appearance-based image registration methods. With regard to feature-based techniques, direct/dense image registration (or image alignment) techniques are more accurate and allow us a more consistent dense representation of the scene. However, these techniques have a smaller domain of convergence and rely on the assumption that the camera motion is small.In the first part of the thesis, we propose two formulations to relax this assumption. Firstly, we describe a fast pose estimation strategy to compute a rough estimate of large motions, based on the normal vectors of the scene surfaces and on the geometric properties between the RGB-D images. This rough estimation can be used as initialization to direct registration methods for refinement. Secondly, we propose a direct RGB-D camera tracking method that exploits adaptively the photometric and geometric error properties to improve the convergence of the image alignment.In the second part of the thesis, we propose techniques of regularization and fusion to create compact and accurate representations of large scale environments. The regularization is performed from a segmentation of spherical frames in piecewise patches using simultaneously the photometric and geometric information to improve the accuracy and the consistency of the scene 3D reconstruction. This segmentation is also adapted to tackle the non-uniform resolution of panoramic images. Finally, the regularized frames are combined to build a compact keyframe-based map composed of spherical RGB-D panoramas optimally distributed in the environment. These representations are helpful for autonomous navigation and guiding tasks as they allow us an access in constant time with a limited storage which does not depend on the size of the environment
Zeni, Luis Felipe de Araujo. „Reconhecimento facial tolerante à variação de pose utilizando uma câmera RGB-D de baixo custo“. reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/101659.
Der volle Inhalt der QuelleRecognizing the identity of human beings from recorded digital images of their faces is important for a variety of applications, namely, security access, human computer interation, digital entertainment, etc. This dissertation proposes a new method for automatic face recognition that uses both 2D and 3D information of an RGB-D(Kinect) camera. The method uses the color information of the 2D image to locate faces in the scene, once a face is properly located it is cut and normalized to a standard size and color. Afterwards, using depth information the method estimates the pose of the head relative to the camera. With the normalized faces and their respective pose information, the proposed method trains a model of faces that is robust to pose and expressions using a new automatic technique that separates different poses in different models of faces. With the trained model, the method is able to identify whether people used to train the model are present or not in new acquired images, which the model had no access during the training phase. The experiments demonstrate that the proposed method considerably improves the result of classification in real images with varying pose and expression.
Vo, Duc My [Verfasser], und Andreas [Akademischer Betreuer] Zell. „Person Detection, Tracking and Identification by Mobile Robots Using RGB-D Images / Duc My Vo ; Betreuer: Andreas Zell“. Tübingen : Universitätsbibliothek Tübingen, 2015. http://d-nb.info/1163396826/34.
Der volle Inhalt der QuelleGokhool, Tawsif Ahmad Hussein. „Cartographie dense basée sur une représentation compacte RGB-D dédiée à la navigation autonome“. Thesis, Nice, 2015. http://www.theses.fr/2015NICE4028/document.
Der volle Inhalt der QuelleOur aim is concentrated around building ego-centric topometric maps represented as a graph of keyframe nodes which can be efficiently used by autonomous agents. The keyframe nodes which combines a spherical image and a depth map (augmented visual sphere) synthesises information collected in a local area of space by an embedded acquisition system. The representation of the global environment consists of a collection of augmented visual spheres that provide the necessary coverage of an operational area. A "pose" graph that links these spheres together in six degrees of freedom, also defines the domain potentially exploitable for navigation tasks in real time. As part of this research, an approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and ; geometric information available from our augmented spherical database ; how to determine the quantity and optimal placement of these augmented spheres to cover an environment completely ; how tomodel sensor uncertainties and update the dense infomation of the augmented spheres ; how to compactly represent the information contained in the augmented sphere to ensure robustness, accuracy and stability along an explored trajectory by making use of saliency maps
Thörnberg, Jesper. „Combining RGB and Depth Images for Robust Object Detection using Convolutional Neural Networks“. Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174137.
Der volle Inhalt der QuelleChiesa, Valeria. „Revisiting face processing with light field images“. Electronic Thesis or Diss., Sorbonne université, 2019. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2019SORUS059.pdf.
Der volle Inhalt der QuelleBeing able to predict the macroscopic response of a material from the knowledge of its constituent at a microscopic or mesoscopic scale has always been the Holy Grail pursued by material science, for it provides building bricks for the understanding of complex structures as well as for the development of tailor-made optimized materials. The homogenization theory constitutes nowadays a well-established theoretical framework to estimate the overall response of composite materials for a broad range of mechanical behaviors. Such a framework is still lacking for brittle fracture, which is a dissipative evolution problem that (ii) localizes at the crack tip and (iii) is related to a structural one. In this work, we propose a theoretical framework based on a perturbative approach of Linear Elastic Fracture Mechanics to model (i) crack propagation in large-scale disordered materials as well (ii) the dissipative processes involved at the crack tip during the interaction of a crack with material heterogeneities. Their ultimate contribution to the macroscopic toughness of the composite is (iii) estimated from the resolution of the structural problem using an approach inspired by statistical physics. The theoretical and numerical inputs presented in the thesis are finally compared to experimental measurements of crack propagation in 3D-printed heterogeneous polymers obtained through digital image correlation
Shiu, Feng-Shuo, und 許峯碩. „Stereoscopic Image Generation from RGB-D Images“. Thesis, 2017. http://ndltd.ncl.edu.tw/handle/454p5w.
Der volle Inhalt der Quelle義守大學
資訊工程學系
105
Traditional stereoscopic images are generated using multiple color-cameras. With the popularity of depth capture devices, it becomes a possible way to generate stereoscopic images by use of color and depth images. During the generating process, holes are created because of visual occlusion or reflective material. Based on color and depth images from RGB-D camera, this study explores the technology of generating stereoscopic images and filling the holes. First, use the DIBR algorithm to generate the left-eye and right-eye images respectively, and then propose an improved inpainting method to fill the hole. To verify the actual effect of stereoscopic images, the head-mounted display is used to view the inpainted stereoscopic images.
Kuo, Hung-Yu, und 郭弘裕. „Image Segmentation from RGB-D Data“. Thesis, 2017. http://ndltd.ncl.edu.tw/handle/n72ag9.
Der volle Inhalt der Quelle義守大學
資訊工程學系
105
Image segmentation is one of the most important foundations of computer vision. In many applications such as image retrieval, pattern recognition, machine vision and related fields, it is necessary to have a good segmentation technology to facilitate the follow-up retrieval and recognition work. Traditional image segmentation methods are mainly based on the color information in images. With the growing popularity of cheap RGB-D cameras, let us have a new method to do image segmentation. This study uses Kinect camera to get color and depth information for image segmentation. First, image is initially segmented according to color information, followed by the use of color and depth information for the merge of adjacent blocks to get the final segmented results. Use the depth information to make up for the past only color for the lack of segmentation to get the effect of appropriate results.
Tu, Chieh-Min, und 杜介民. „Depth Image Inpainting with RGB-D Camera“. Thesis, 2015. http://ndltd.ncl.edu.tw/handle/k4m42a.
Der volle Inhalt der Quelle義守大學
資訊工程學系
103
Since Microsoft released the cheap Kinect sensors as a new natural user interface, stereo imaging is made from previous multi-view color image synthesis, to now synthesis of color image and depth image. But the captured depth images may lose some depth values so that stereoscopic effect is often poor in general. This thesis is based on Kinect RGB-D camera to develop an object-based depth inpainting method. Firstly, the background differencing, frame differencing and depth thresholding strategies are used as a basis for segmenting foreground objects from a dynamic background image. Then, the task of hole inpainting is divided into background area and foreground area, in which background area is inpainted by background depth image and foreground area is inpainted by a best-fit neighborhood depth value. Experimental results show that such an inpainting method is helpful to fill holes, and to improve the contour edges and image quality.
Ting, Hao-Chan, und 丁浩展. „Human Skeleton Correction Based on RGB-D Image“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/c77uu5.
Der volle Inhalt der Quelle國立臺灣科技大學
電子工程系
102
The currently accepted human skeleton extraction techniques depend on OpenNI framework and NITE middleware. By using this technique, the human skeleton can be tracked with a real time process while human position was recognized at the beginning. However, the incorrect skeleton detection may happen when human holds an object and corresponding depth image is affected by this object. In this thesis, we propose a method to reduce this kind of problem and increase the human skeleton detection accuracy. We detect the object when human holds an object and then filter the object from corresponding depth map. After filtering the object in depth map, the human skeleton detection technique of NITE middleware will get the correct skeleton information. Meanwhile, we can obtain the human skeleton information include 15 joints positions, orientations and corresponding confidents. Experimental results show that human skeleton obtained from the proposed method can reduce the effect when human holds an object, and the process of tracking skeleton is still real time for developer.
Madeira, Tiago de Matos Ferreira. „Enhancement of RGB-D image alignment using fiducial markers“. Master's thesis, 2019. http://hdl.handle.net/10773/29603.
Der volle Inhalt der QuelleA reconstrução 3D é a criação de modelos tridimensionais a partir da forma e aparência capturadas de objetos reais. É um campo que teve origem em diversos ramos da visão computacional e computação gráfica, e que ganhou grande importância em áreas como a arquitetura, robótica, condução autónoma, medicina e arqueologia. A maioria das tecnologias de aquisição de modelos atuais são baseadas em LiDAR, câmeras RGB-D e abordagens baseadas em imagens, como o SLAM visual. Apesar das melhorias que foram alcançadas, os métodos que dependem de instrumentos profissionais e da sua operação resultam em elevados custos, tanto de capital, como logísticos. Nesta dissertação foi desenvolvido um processo de otimização capaz de melhorar as reconstruções 3D criadas usando uma câmera RGB-D portátil, disponível ao nível do consumidor, de fácil manipulação e que tem uma interface familiar para o utilizador de smartphones, através da utilização de marcadores fiduciais colocados no ambiente. Além disso, uma ferramenta foi desenvolvida para permitir a remoção dos ditos marcadores fiduciais da textura da cena, como um complemento para mitigar uma desvantagem da abordagem adotada, mas que pode ser útil em outros contextos.
Mestrado em Engenharia de Computadores e Telemática
Lee, Chi-cheng, und 李其真. „Image Database and RGB-D Camera Image Based Simultaneous Localization and Mapping“. Thesis, 2014. http://ndltd.ncl.edu.tw/handle/71614046675523628278.
Der volle Inhalt der Quelle國立臺灣科技大學
機械工程系
102
Recently, due to the advances in technology and the growing popularity of social network, almost everyone owns a smartphone now. Most of people are happy to upload the photographs they took in the internet and share with others. It is easy to get the historical images and the image’s information for an unfamiliar environment. Therefor,when we are in an unfamiliar environment, if we can use the numerous historical images which were upload by people in cloud network to achieve the purpose of localization and mapping, then can reduce the cost of creating database and processing large amounts of information. The purpose of this paper is to depend on a computer vision system to assist the simultaneous localization and mapping for a realtime camera, which includes a RGB-D depth camera for capture images and a computer for processing computing analysis. In the image pre-processing part, we first assume that real-time camera coordinate is world coordinate, then via matching the feature points between historical images and realtime images, we can get the projection model. By the definition of coordinate system and the property of camera calibration, we can get the information of position and angle for historical image database relative to the realtime camera, and get the relative information of realtime camera in world coordinate with the coordinate transformation. In the localization and mapping part, we use the Extended Kalman filter SLAM estimator to generate a stabilize measurement result for the state of realtime camera, and image database can get a great convergence result, then can create the path and map for the realtime camera. Our contribution of this thesis is we don’t have to match the high similarity features in the continuous image like the general SLAM, we can directly find the relative state between two images instead, and it can reduce the time cost in finding features. In additionally, due to our experimental equipment is RGB-D camera, so we don’t have to use two ordinary image to find 3D features. Instead we can directly get the 3D information and can reduce the number of coordinate transformation and get the relative state between realtime image and database image faster. In this paper, our application is that we can use the exist image database in the unknown area and a RGB-D realtime camera to achieve the positioning.
Chuang, Hui-Chi, und 莊惠琪. „Real-Time Fingerspelling Recognition System Design Based on RGB-D Image Information“. Thesis, 2014. http://ndltd.ncl.edu.tw/handle/78417928215280967136.
Der volle Inhalt der Quelle國立交通大學
電控工程研究所
102
Communication is a very important part for human-computer interaction. This thesis provides a fingerspelling recognition system with high accuracy rate based on RGB-D image. The system are separated into three parts, including ROI selection, hand feature extraction, and fingerspelling recognition. For the ROI selection, the regions of hand and face are first obtained by skin color detection and connect component labeling (CCL), and then the hand, the ROI, is determined by the feature point extraction based on distance transform. Followed is the hand feature extraction which consists of the hand structure and the hand texture. From the feature points of ROI, the locations of palm and fingertips, palm direction, and finger vectors are formed as the hand structure. In addition to the hand structure, this thesis adopts the LBP operator to generate the hand texture to deal with the fingerspelling not recognizable by the hand structure. Finally, the extracted hand features are sent into the fingerspelling recognition system, which is built with several different neural network classifiers. The experimental results show that this system is an effective real-time recognition system whose accuracy is higher than 80% for most of the fingerspelling in ASL.
Lai, Chih-Chia, und 賴志嘉. „On Constructing the Registration Graph of a 3-D Scene Using RGB-D Image Streams“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/42601737493375454708.
Der volle Inhalt der Quelle國立暨南國際大學
資訊工程學系
101
The key problem of using a mobile robot equipped with an RGB-D camera to explore an unknown environment is how to fuse the information contained in the acquired images. Due to the limited field of view of the camera, it is inevitable to register the acquired images. If we represent each image as a node and each pairwise registration result as an edge linking two registered images, then the completed registration results can be expressed as a registration graph. Constructing a registration graph from a series of input images can greatly simplify the 3-D scene reconstruction problem. Notably, the critical issue of registration graph construction is to determine whether a pair of given images are overlapped. If two images are determined to be overlapped, then the second problem is to determine their registration parameters and to add an edge to link those two images. In this work, we use the number of SIFT feature correspondences to select possibly overlapped images. However, the computational complexity of the traditional SIFT feature matching method is too high. Hence, we propose a fast SIFT feature matching algorithm based on the visual word (VW) technique. We first quantize the SIFT features via the vector quantization method with a specified codebook. If two SIFT features are quantized to different VWs, then those two SIFT features are deemed as not matched. Therefore, when matching SIFT features, we only have to consider those features having the same VW and, thus, the computation cost can be greatly reduced.The matched SIFT features computed with the VW approach are further verified with the RANSAC algorithm to remove incorrect matching results and to estimate the registration parameters. Experimental results show that the proposed method can improve the computation speed for 38 times without sacrificing two much matching accuracy.
Fan, Chuan-Chi, und 范銓奇. „Low Power High Speed 8-Bit Pipelined A/D Converter for RGB Image Processing“. Thesis, 2005. http://ndltd.ncl.edu.tw/handle/40871581804580984571.
Der volle Inhalt der Quelle國立中正大學
電機工程所
93
Three architectures of 8-Bit high speed pipelined A/D Converters for RGB image processing is implemented. Firstly, a conventional 1.5-bit/stage pipelined ADC required 7 amplifiers and 15 comparators is designed and fabricated. According to measured results, the SNDR of 33.45 dB under sampling frequency of 2 MHz and input signal of 9.8 kHz is obtained. The ENOB is 5.26 Bit. In order to reduce the power comsumption, the amplifier sharing technique with only 4 amplifiers is included in the 1.5-bit/stage pipelined ADC design. Finally, with only 4 amplifiers and 9 comparators is proposed in the third 1.5-bit/stage pipelined ADC design to further reduce the power consumption. For the OPAMP implementation, the fully differential structure is used. The ADC is implemented in tsmc 0.35um 2P4M Mixed Signal Process thchnology, Based on the post-layout simulation, the ADC SNDR and ENOB are 44.02 dB and 7.02 Bit, respectively, with an input frequency of 9.34 MHz under sampling frequency of 140 MHz. The DNL is about +0.45/-0.5 LSB, and INL is about +2.33/-0.36 LSB. The total ADC power consumption under supply voltage of 3.3 V is about 118.1 mW. The technique will achieve a power saving of 33% compared with conventional pipelined ADC.
Chiu, Meng-Tzu, und 邱夢姿. „Real-Time Finger Writing Digit Recognition System Design Based on RGB-D Image Information“. Thesis, 2015. http://ndltd.ncl.edu.tw/handle/69874109674929453878.
Der volle Inhalt der Quelle國立交通大學
電控工程研究所
103
The finger writing recognition approach has been introduced to a diversity of fields, like video games and remote control systems, because it provides a natural and intuitive communication for Human-Computer Interaction (HCI). This thesis proposes a real-time finger writing digit recognition system with high accuracy rate based on the RGB-D information. The system is divided into three main parts, including ROI selection, feature extraction, and finger writing digit recognition. For the ROI selection, first detect the skin color regions, then determine the palm and the fingertips based on the connect component labeling (CCL) and the distance transform respectively. Further, track the fingertip to create the trajectory and extract its directional features for digit recognition. However, since it is often confused in recognizing 0 and 6, three extra features are added to increase their recognition rate. Finally, with series k-NN classifiers, the experimental results show that the accuracy rate is higher than 95% in finger writing digit recognition, which implies the proposed real-time recognition system is indeed effective and efficient.
CENG, YUN-FENG, und 曾雲楓. „Deep-Learning-Based Object Classification and Grasping Point Determination Based on RGB-D Image for Robot Arm Operation“. Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6epvc5.
Der volle Inhalt der QuelleLiu, Chen-Yu, und 劉貞佑. „Multiview Stereo Images Generation from RGB-D Images“. Thesis, 2014. http://ndltd.ncl.edu.tw/handle/29963224176167062421.
Der volle Inhalt der Quelle國立臺灣師範大學
資訊工程學系
102
Nowadays, 3D display technology has been well developed and gradually became a matured technology. However, limited 3D contain resources obstruct this technology to be popularized to the market. Even if the customers can afford expensive media equipment, there is still lack of useable resources to function 3D display technology. This research provides the solution of converting RGB+D image to 3D image to partially improve the shortage of 3D resources. In recent decades, many researches are already working on how to create 3D images, which always involved depth measurement and generating image with another perspective. Depth measurement can be done by implementing the solutions such as manual judgments, depths cues, or using depth cameras. The former two solutions are relatively time consuming than the latter one. Especially the depths cues usually cause inaccuracy. Moreover, using depth cameras simplifies the difficulties of getting the depth data and decreases the inaccuracy as well. But there is a problem when using the cameras to collect the depth data, the images may have holes occurs which depends on shooting scenarios. The depth data need to be repaired under a reasonable condition because these two factors impact the 3D images’ qualities. In the past, solution to image inpainting has been proposed from many researches. The main considerations are about the colors and the texture. This research implements two methods to process the missing value of depth images. One is based on images’ low rank feature to use matrix completion technique; the other is based on image segmentation technique to do the depth image repairing. The results of experiment show that our 3D depth quality is obviously higher than the traditional 2D convert to 3D method. Furthermore, depth camera collects the depth data with higher accuracy so we can provide viewers a better experience in 3D display technology.
Wu, Shang-Yu, und 吳尚諭. „Parallel Hierarchical 3-D Matching of RGB-D Images“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/cbae6h.
Der volle Inhalt der Quelle國立暨南國際大學
資訊工程學系
101
This thesis proposes a new method for RGB-D image matching which is different from the traditional point-to-point/point-to-plane matching methods. An objective function is proposed that fuses both depth and color information for estimating the transformation matrix between two RGB-D images. A hierarchical scale space parameter estimation method is proposed for dealing with image matching with large motion. The main idea is to smooth the input image appropriately so that the minute features are temporarily ignored to simplify the matching problem of main 3-D structures. Notably, image smoothing will eliminate a portion of the image information. To fully utilize the RGB-D information, the degree of blurriness is reduced gradually to introduce the minute image features into the parameter estimation process in a coarse-to-fine matching approach. The image matching method is implemented with CUDA parallel processing framework. Experimental results show that the proposed method can efficiently match two RGB-D images.
Kuo, Pei-Hsuan, und 郭姵萱. „Object Retrieval Based on RGB-D Images“. Thesis, 2017. http://ndltd.ncl.edu.tw/handle/fpsqr5.
Der volle Inhalt der Quelle義守大學
資訊工程學系
105
Nowadays, modern multimedia are widely used in daily life. In order to response the requirements, people continuously explore how to effectively manage and retrieve multimedia data. Most previous studies focused on 2D images and 3D models. However, along with the increasing RGB-D image data, the related retrieval technologies are still inadequate. Therefore, it is an urgent task for development of related retrieval algorithms. This research uses RGB-D image data obtained by Kinect sensing device as the input source, and studies how to extract both color and geometry features from the point cloud data and design a new 3D object retrieval system. By exhibiting the results, a testing platform is established to verify actual effectiveness.
Peng, Hsiao-Chia, und 彭小佳. „3D Face Reconstruction on RGB and RGB-D Images for Recognition Across Pose“. Thesis, 2015. http://ndltd.ncl.edu.tw/handle/88142215912683274078.
Der volle Inhalt der Quelle國立臺灣科技大學
機械工程系
103
Face recognition across pose is a challenging problem in computer vision. Two scenarios are considered in this thesis. One is the common setup with one single frontal facial image of each subject in the gallery set and the images of other poses in the probe set. The other considers a RGB-D image of the frontal face for each subject in the gallery, but the probe set is the same as in the previous case that only contains RGB images of other poses. The second scenario simulates the case that RGB-D camera can be available for user registration only and recognition can be performed on regular RGB images without the depth channel. Two approaches are proposed for handling the first scenario, one is holistic and the other is component-based. The former is extended from a face reconstruction approach and improved with different sets of landmarks for alignment and multiple reference models considered in the reconstruction phase. The latter focuses on the reconstruction of facial components obtained by the pose-invariant landmarks, and the recognition with different components considered at different poses. Such a component-based reconstruction for handling cross-pose recognition is rarely seen in the literature. Although the approach for handling the second scenario, i.e., the RGB-D based recognition, is partially similar to the approach for handling the first scenario, the novelty is on the handling of the depth readings corrupted by quantization noise, which are often encountered when the face is not close enough to the RGB-D camera at registration. An approach is proposed to resurface the corrupted depth map and substantially improve the recognition performance. All of the proposed approaches are evaluated on benchmark databases and proven comparable to state-of-the-art approaches.
Patrisia, Sherryl Santoso, und 溫夏夢. „Learning-based Pedestrian Detection Applied to RGB-D Images“. Thesis, 2016. http://ndltd.ncl.edu.tw/handle/04755862474315325960.
Der volle Inhalt der Quelle國立交通大學
電機資訊國際學程
104
In the complicated environments of real world, accurate pedestrian detection is still a challenging topic. To overcome this issue, we adopt the R-CNN method, which has the ability on extracting robust features and localization as well. The process starts with region proposals (Selective Search) for generating detection candidates, followed by deep learning (CNNs) to produce robust features. Furthermore, the depth information is often helpful in detecting pedestrians and/or objects. We thus use the RGB-D dataset and combine both color picture and depth map information for pedestrian detection. In this thesis, we use a depth-encoding method to convert the original depth map to the HHA format so that it can be processed by CNNs. The HHA encoding method includes three channels: horizontal disparity, height above ground, and angle with gravity. Another technique we adopt is the selective search method that generates region proposals (object candidates). We could use either RGB or HHA images to generate object candidates. In our system, we use CNNs to learn and extract features based on either the RGB or HHA generated candidates. We found that they (two region proposals) make significant difference in our pedestrian detection problem. The HHA proposals lead to much better results. One step further, we could combine the outputs produced by the RGB data and the HHA data in the detection. The information fusion process can be inserted at different points in the system. We can process each data source (RGB and HHA) separate to examine their individual decision (probability) to make the final binary decision. Also, we can combine the feature spaces. In order to combine the features of two sources, we also add an SVM process to make the final decision. Furthermore, we also use PCA to reduce redundant data in data fusion. We design two types of techniques: pre-PCA and post-PCA. The pre-PCA technique applies before features fusion, while post-PCA is after features fusion. The final experiments indicate that generating bounding boxes from HHA Selective Search, then applied to RGB and HHA Images can produce more robust region proposals. Next, PCA can reduce unnecessary features also left only the important features. Finally, by fusing RGB and HHA region proposals features combining with pre-PCA can produce good Pedestrian Detection Rate and lowest False Positive rate.
Aguiar, Mário André Pinto Ferraz de. „3D reconstruction from multiple RGB-D images with different perspectives“. Master's thesis, 2015. https://repositorio-aberto.up.pt/handle/10216/89542.
Der volle Inhalt der Quelle3D model reconstruction can be a useful tool for multiple purposes. Some examples are modeling a person or objects for an animation, in robotics, modeling spaces for exploration or, for clinical purposes, modeling patients over time to keep a history of the patient's body. The reconstruction process is constituted by the captures of the object to be reconstructed, the conversion of these captures to point clouds and the registration of each point cloud to achieve the 3D model.The implemented methodology for the registration process was as much general as possible, to be usable for the multiple purposes discussed above, with a special focus on non-rigid objects. This focus comes from the need to reconstruct high quality 3D models, of patients treated for breast cancer, for the evaluation of the aesthetic outcome. With the non-rigid algorithms the reconstruction process is more robust to small movements during the captures.The sensor used for the captures was the Microsoft Kinect, due to the possibility of obtaining both color (RGB) and depth images, called RGB-D images. With this type of data the final 3D model can be textured, which is an advantage for many cases. The other main reason for this choice was the fact that Microsoft Kinect is a low-cost equipment, thereby becoming an alternative to expensive systems available in the market.The main achieved objectives were the reconstruction of 3D models with good quality from noisy captures, using a low cost sensor. The registration of point clouds without knowing the sensor's pose, allowing the free movement of the sensor around the objects. Finally the registration of point clouds with small deformations between them, where the conventional rigid registration algorithms could not be used.
Liu, Che-Wei, und 劉哲瑋. „Evaluation of Disparity Estimation Schemes using Captured RGB-D Images“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/41941973935343767817.
Der volle Inhalt der Quelle國立交通大學
電子工程學系 電子研究所
101
In 3D image processing, the depth estimation based on the given left and right images (the so-called stereo matching algorithms) has been widely used in many 3D applications. One type of applications tracks the body motion and/or poses with the aid of depth information. How to evaluate depth estimation algorithms for different applications becomes an issue. The conventional method of evaluating these depth estimation algorithms is often using a small number of test computer-generated images, which is insufficient to reflect the problems in the real world applications. In this study, we design a number of scenes and capture them using the RGB-D cameras; that is, our dataset consists of stereo pair images and their corresponding ground truth disparity map. Our dataset contains two categories of factors that may affect the performance of the stereo matching algorithms. They are image content factors and image quality factors. The image content factor group includes simple and complex backgrounds, different number of objects, different hand poses and clothing with various color patterns. In the group of image quality factor, we create images with different PSNR and rectification errors. In addition, each stereo pair has their ground truth disparity map. All images and the depth maps are captured by a pair of Kinect devices. To generate appropriate images for the test dataset, we need to calibrate and rectify the captured RGB image pairs and we also need to process the captured depth maps and create the so-called trimaps for evaluation purpose. For the left and right color images, because they come from different sensors, we must perform camera calibration to obtain the camera parameters, and color calibration to match colors in two images. Also, we align the left and right images using the existing camera rectification technique. To generate the ground truth disparity map, we first capture the raw depth map from Kinect, and we warp it from the view of the IR camera to the RGB camera. These depth maps have many black holes due to its sensing mechanism. To make the ground truth disparity map more reliable, we propose an adaptive hole-filling algorithm. Last, we adopt the matting segmentation concept to create a tri-value map (trimap) that classifies image pixels into foreground, background, and in-between regions. Our error metrics are bad-matching pixel rate and the mean square error between the ground truth disparity map and the estimated disparity map. We focus on the performance in the foreground region. In our experiments, three stereo matching algorithms are used to test our dataset and evaluation methodology. We analyze these algorithms based on the collected data.
Aguiar, Mário André Pinto Ferraz de. „3D reconstruction from multiple RGB-D images with different perspectives“. Dissertação, 2015. https://repositorio-aberto.up.pt/handle/10216/89542.
Der volle Inhalt der Quelle3D model reconstruction can be a useful tool for multiple purposes. Some examples are modeling a person or objects for an animation, in robotics, modeling spaces for exploration or, for clinical purposes, modeling patients over time to keep a history of the patient's body. The reconstruction process is constituted by the captures of the object to be reconstructed, the conversion of these captures to point clouds and the registration of each point cloud to achieve the 3D model.The implemented methodology for the registration process was as much general as possible, to be usable for the multiple purposes discussed above, with a special focus on non-rigid objects. This focus comes from the need to reconstruct high quality 3D models, of patients treated for breast cancer, for the evaluation of the aesthetic outcome. With the non-rigid algorithms the reconstruction process is more robust to small movements during the captures.The sensor used for the captures was the Microsoft Kinect, due to the possibility of obtaining both color (RGB) and depth images, called RGB-D images. With this type of data the final 3D model can be textured, which is an advantage for many cases. The other main reason for this choice was the fact that Microsoft Kinect is a low-cost equipment, thereby becoming an alternative to expensive systems available in the market.The main achieved objectives were the reconstruction of 3D models with good quality from noisy captures, using a low cost sensor. The registration of point clouds without knowing the sensor's pose, allowing the free movement of the sensor around the objects. Finally the registration of point clouds with small deformations between them, where the conventional rigid registration algorithms could not be used.
Lin, Ku-Ying, und 林谷穎. „Real-time Human Detection System Design Based on RGB-D Images“. Thesis, 2013. http://ndltd.ncl.edu.tw/handle/21485387071019525935.
Der volle Inhalt der Quelle國立交通大學
電控工程研究所
101
This thesis proposes a real-time human detection system based on RGB-D images generated by Kinect to find out humans from a sequence of images. The system is separated into four parts, including region-of-interest (ROI) selection, feature extraction, human shape recognition and motionless human checking. First, the histogram projection, connected component labeling and moving objects segmentation are applied to select the ROIs according to the property that human is walking or standing with motion. Second, resize the ROIs based on the bilinear interpolation approach and extract the human shape feature by Histogram of Oriented Gradients (HOG). Then, support vector machine or artificial neural network is adopted to train the classifier based on Leeds Sports Pose dataset, and human shape recognition is implemented by the classifier. Finally, check whether the image contains any motionless human, and then recognize it. From the experimental results, the system could detect humans in real-time with high accuracy rate.
Pintor, António Bastos. „A rigid 3D registration framework of women body RGB-D images“. Master's thesis, 2016. https://repositorio-aberto.up.pt/handle/10216/88729.
Der volle Inhalt der QuellePintor, António Bastos. „A rigid 3D registration framework of women body RGB-D images“. Dissertação, 2016. https://repositorio-aberto.up.pt/handle/10216/88729.
Der volle Inhalt der QuelleLourenço, Francisco Rodrigues. „6DoF Object Pose Estimation from RGB-D Images Using Machine Learning Approaches“. Master's thesis, 2021. http://hdl.handle.net/10316/96141.
Der volle Inhalt der QuelleA estimativa da pose de objetos em imagens RGB-D, tem ganho bastante atenção na passada década com o aparecimento de sensores RGB-D ao nível do consumidor. O seu baixo custo acoplado com relevantes especificações técnicas, levaram à sua aplicação em áreas cientificas tais como condução autónoma, realidade aumentada e robótica.Em geral, a informação de profundidade trouxe complexidade adicional a grande parte das aplicações práticas onde se usavam apenas imagens RGB. Para além disso, quando se tenta estimar a pose de um objeto, há outros desafios tais como cenas com vários objetos, oclusão por parte dos mesmos, objetos simétricos, objetos sem textura e até falta de visibilidade devido a pouca iluminação. Tendo isto em conta, os investigadores começaram a adoptar técnicas de aprendizagem automática para resolver o problema da estimação da pose de objetos. O problema com esta abordagem é que, por norma, costuma ser computacionalmente intensiva e complexa de implementar. Para além disso, apenas recentemente a investigação se tem direcionado para vídeos RGB-D, com o primeiro dataset de referência contendo apenas vídeos a ser publicado em 2017. Portanto, apenas poucos e bastante recentes métodos foram desenvolvidos para funcionar com vídeos, tornando assim o funcionamento em tempo real numa questão ainda por resolver.Posto isto, esta tese tem como objectivo explorar todas as ferramentas necessárias para construir um estimador da pose, oferecer uma revisão compreensiva para cada uma destas ferramentas, comparar e avalia-las, estudar como estas podem ser implementadas, avaliar se a estimação da pose poderá ser ou não feita em tempo real e também como esta se generaliza para o mundo real. Em adição a isto, será proposto o uso de estatística direcional para o avaliação da repetibilidade de sensores RGB-D, um melhoramento na estrutura de um bastante conhecido estimador da pose, uma arquitetura que utiliza um algoritmo de aproximação geométrica bastante recente como auxílio ao estimador da pose, e ainda uma métrica que permite avaliar a repetibilidade tanto das poses estimadas como das poses fundamentais de um dataset.
Object pose estimation using RGB-D images has gained increasing attention in the past decade with the emergence of consumer-level RGB-D sensors in the market. Their low-cost coupled with relevant technical specifications led to their application in areas such as autonomous driving, augmented reality, and robotics.Depth information has, in general, brought additional complexity to most applications that previously used only RGB images. Moreover, when trying to estimate an object pose, one may face challenges such as cluttered scenes, occlusion, symmetric objects, texture-less objects, and low visibility due to insufficient illumination. Accordingly, researchers started to adopt machine learning approaches to tackle the 6DoF of the object pose estimation problem. Such approaches are often quite complex to implement and computationally demanding. Furthermore, the research was only directed to RGB-D videos quite recently, with the first benchmark dataset containing videos being published only in 2017. Therefore, only very recent methods were designed to process videos, and some questions regarding real-time applicability arise.That being said, this thesis aims to explore all the tools required to build a 6DoF pose estimator, provide a comprehensive review on each tool, compare and evaluate them, assess how a practitioner can implement such tools, evaluate whether or not it is possible to estimate 6DoF poses in real-time, and also evaluate how these tools generalize to a real-world scenario. As a plus, it will be proposed the usage of directional statistics to evaluate an RGB-D sensor precision, a tweak to a famous 6DoF object pose estimation model, a pipeline that uses a novel 3D point cloud registration algorithm to aid the pose estimator, and a metric that can measure the precision/repeatability of both estimated poses of a model and the ground-truth poses of a dataset.
Kuo, Syuan-Wei, und 郭宣瑋. „Orientation Modelling for RGB-D Images using Angle and Distance Combined Adjustment“. Thesis, 2018. http://ndltd.ncl.edu.tw/handle/z3b6d4.
Der volle Inhalt der Quelle國立交通大學
土木工程系所
106
RGB-D cameras are widely applied in indoor mapping and pattern recognition, capturing both RGB image and per-pixels depth image simultaneously. It is important to acquire the correct orientations for indoor mapping, yet homogenous areas are hard to be detected for feature points. Therefore, taking advantage of RGB-D data, the observations contains angle and range information that construct control points to constraint the orientation modeling. This thesis proposed a novel method for orientation modeling for sequential point clouds registration. The main process of this study comprises three parts. First, the intrinsic parameters of two sensors and depth distortion are calibrated. Second, four orientation modeling methods are introduced in this paragraph. Triangulation optimizes only angle information with collinearity equations, trilateration optimizes range, combining both triangulation with trilateration (called combine-1 in this study), and scale fixed adjustment (called combine-2 in this study) with rigid constraints in every rays. Finally, the iterative closest point (ICP) algorithm was performed in such registration of transformed sequential point clouds. The experimental results show the image distortion and depth distortion of RGB-D sensors need to be considered in data preprocessing. In the evaluation of different orientation modeling methods, we first simulated a number of control points, variance of depths and different distribution of control points on the images. This research registers sequential point clouds using RGB and depth information. The standard deviations of camera position in triangulation, combine-1 and combine-2 are respectively 14.108, 0.677 and 0.595 mm. Meanwhile, thestandard deviations of camera rotation angles in triangulation, combined-1 and combine-2 are 0.005, 0.007 and 0.001 rad. The results indicated that the combined adjustment show better precision than triangulation method. The point-to-point distances of point clouds pairs computed by ICP algorithm are better than 11.3 mm, and it is about 1.5 times of range precision (i.e. 3.5mm).
Hsieh, Kai-Nan, und 謝鎧楠. „Rear obstacle detection using a deep convolutional neural network with RGB-D images“. Thesis, 2018. http://ndltd.ncl.edu.tw/handle/ghsw4p.
Der volle Inhalt der Quelle國立中央大學
資訊工程學系
106
Car accident happens frequently after becoming the most popular transportation devices in daily life, and it costs life and properties because of driver’s negligence. Therefore, many motor manufacturing have invested and developed the “Driving Assistant System” in order to promote the safety of driving. Computer Vision (CV) has been adopted due to it’s ability of object detection and recognition. In recent years, Convolutional neural networks (CNN) has dramaticly developed which makes computer vision much more reliable. We train our “Rear obstacle detection and recognizing system” via deep learning model and use data of color image and depth image which received from Microsoft KinectV2. Because of the field of view (FOV) from KinectV2 is different, we calibrate color image and depth image using Kinect SDK in order to decrease the disparity of pixel position. Our detecting and recognizing system is based on Faster R-CNN. Our input data contains two images, and we experiment two different architectures on convolutional neural networks to extract feature maps from input data. One is single feature extractor and single classifier, and the other is two feature extractor and single classifier. Two feature extractor generate the best detection result. Furthermore, we use only color image or depth image as input doing experiments comparing with previous two methods. Finally, after detecting obstacle we use depth image to estimate the distance between vehicle and obstacle.
Cho, Shih-Hsuan, und 卓士軒. „Semantic Segmentation of Indoor-Scene RGB-D Images Based on Iterative Contraction and Merging“. Thesis, 2017. http://ndltd.ncl.edu.tw/handle/c9a9vg.
Der volle Inhalt der Quelle國立交通大學
電子研究所
105
For semantic segmentation of indoor-scene images, we propose a method which combines convolutional neural network (CNNs) and the Iterative Contraction & Merging (ICM) algorithm. We also simultaneously utilize the depth images to efficiently analyze the 3-D space in indoor-scene images. The raw depth image from the depth camera is processed by two bilateral filters to recover a smoother and more complete depth image. On the other hand, the ICM algorithm is an unsupervised segmentation method that can preserve the boundary information well. We utilize the dense prediction from CNN, depth image and normal vector map as the high-level information to guide the ICM process for generating image segments in a more accurate way. In other words, we progressively generate the regions from high resolution to low resolution and generate a hierarchical segmentation tree. We also propose a decision process to determine the final decision of the semantic segmentation based on the hierarchical segmentation tree by using the dense prediction map as a reference. The proposed method can generate more accurate object boundaries as compared to the state-of-the-art methods. Our experiments also show that the use of high-level information does improve the performance of semantic segmentation as compared to the use of RGB information only.
Yuan-Cheng, Lee, und 李元正. „Accurate and robust face recognition from RGB-D images with a deep learning approach“. Thesis, 2016. http://ndltd.ncl.edu.tw/handle/34473208854998399783.
Der volle Inhalt der Quelle國立清華大學
資訊工程學系
104
Face recognition from RGB-D images utilizes two complementary types of image data, i.e. color and depth images, to achieve more accurate recognition. In this thesis, we propose a face recognition system based on deep learning, which can be used to verify and identify a subject from the color and depth face images captured with a consumer-level RGB-D camera. (e.g., Microsoft Kinect). To recognize faces with color and depth information, our system contains 3 parts: depth image recovery, deep learning for feature extraction, and joint classification. To gain recognition performance of a depth face image, we propose a series of image processing techniques to recover and enhance a depth image from its neighboring depth frames, thus reconstructing a precise 3D facial model. With multi-view resampling, we can compute the depth images corresponding to various viewing angles of a single 3D face model. To alleviate the problem of the limited size of available RGB-D data for deep learning, transfer learning is applied. Our deep network architecture contains recently popular components. We first train the deep network on color face dataset, and next fine-tune with depth images for transfer learning. The deep networks are used to extract discriminative feature (deep representation) from color and depth images. Not only these deep representations are taken into consideration, we analyze the relation between each image and the other images in the database, to design our classifier, to reach higher recognition accuracy and better robustness. Our experiments show that the proposed face recognition system provides very accurate face recognition results on public datasets, and it is robust against variations in head pose and illumination.
Gama, Filipe Xavier da Graça. „Efficient processing techniques for depth maps generated by rgb-d sensors“. Master's thesis, 2015. http://hdl.handle.net/10400.8/2524.
Der volle Inhalt der QuelleDENG, JU-CHIEH, und 鄧茹潔. „The Application of Deep Learning in RGB-D Images for the Control of Robot Arm“. Thesis, 2017. http://ndltd.ncl.edu.tw/handle/36348540372513931289.
Der volle Inhalt der Quelle銘傳大學
電腦與通訊工程學系碩士班
105
Robot research is one of the important issues of the development of science and technology, the robotics and artificial intelligence robot development, work completed by the robot is no longer a simple, repetitive movements, but expect the robot has independent thinking ability, increase the application of robots, to improve the practicality, the robot vision has become one of the most critical technology. In Google I/O conference, Google can be seen in the world to promote the Impact Challenge project and Google.org project, bringing together technology and a new team, the use of science and technology to make the world better, including in the limb, hearing impaired and Parkinson's disease patients and other fields. Therefore, the aim of this study is to assist the people with upper limb disabilities in grasping distant objects, such as sports injury, elderly joint degeneration and spinal muscular atrophy, which could lead the upper limbs to move abnormally, the use of robot arms help the people with upper limb disabilities and enhance the convenient of daily life. In this study, RA605 joint robot arm is used to combine the visual images and the application of deep learning, conduct system integration, to achieve the robot arm precise positioning, target recognition, mobile control and grasping the target object. The vision system uses Kinect v2 camera and Logitech C525 camera. The environment image is extracted by Kinect v2 and the deep learning algorithm is used to recognize the target object and obtain the coordinate position of the object. Logitech C525 camera mounted on the sixth joint of the robot, can be rotated with the sixth joint. In order to confirm the position calculated by the above Kinect v2 and capture the image, calculate the clamping position of the target object and control the electric gripper so as to successfully capture the target object. To achieve the goal of assisting the people with upper limb disabilities in grasping distant objects.
HUNG, CHENG-HSIAO, und 洪承孝. „Study of Real-time Workpieces Recognition in Powder Coating Production Line Based on RGB-D Images“. Thesis, 2019. http://ndltd.ncl.edu.tw/handle/gsue24.
Der volle Inhalt der Quelle朝陽科技大學
資訊工程系
107
Powder coating is used often on the metal or aluminum for protective as well as decorative purposes, for example, office furniture, bicycle frame, and so on. However, the length of a powder coating production line may be over 400 meters. The elapsed time to finish the powder coating of a workpiece is about one hour. There are several steps in the powder coating process, including cleaning, pre-treatment, rinse/dry, powder coat, curing. So, the tracking and tracing workpieces in the production line and collecting real-time production data is an important issue of manufacturing execution system (MES). It is also the key information of intelligent manufacturing. In order to achieve the above goal, the cooperative company, MaChan International Co., LTD., attempted to develop a RFID-based system two years ago. However, several problems cause the failure of the system, including, the cost too high, one hook with multiple workpieces, multiple hooks with one workpiece, lost workpiece, duplicate process. This study is proposed and expected to achieve the above goal using pattern recognition technique. In this study, several monitoring stations will be installed in the production line. All the workpieces are coating in groups. The workpieces in the same group are almost identical. For every station, those workpieces are detected, grouping, and counting. In advanced, a synchronized hardware counter is used in every monitoring stations. The counter value can be used to identify the same group, lost workpiece, or duplicate processed workpieces. In the experimental study, the accuracy of the group identification can reach 90% no matter in daytime or nighttime. The accuracy of the line stop detection can reach 90% in daytime and 100% in nighttime. The above results should that the proposed group identification method is feasible.
Marques, Márcio Filipe Santos. „Sistemas de monitorização e proteção baseados em visão 3D : desenvolvimento de uma aplicação de segurança e proteção industrial utilizandos Sensores RGB-D“. Master's thesis, 2017. http://hdl.handle.net/10400.26/23023.
Der volle Inhalt der Quelle