To see the other types of publications on this topic, follow the link: RGB-D Image.

Dissertations / Theses on the topic 'RGB-D Image'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'RGB-D Image.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Murgia, Julian. "Segmentation d'objets mobiles par fusion RGB-D et invariance colorimétrique." Thesis, Belfort-Montbéliard, 2016. http://www.theses.fr/2016BELF0289/document.

Full text
Abstract:
Cette thèse s'inscrit dans un cadre de vidéo-surveillance, et s'intéresse plus précisément à la détection robustesd'objets mobiles dans une séquence d'images. Une bonne détection d'objets mobiles est un prérequis indispensableà tout traitement appliqué à ces objets dans de nombreuses applications telles que le suivi de voitures ou depersonnes, le comptage des passagers de transports en commun, la détection de situations dangereuses dans desenvironnements spécifiques (passages à niveau, passages piéton, carrefours, etc.), ou encore le contrôle devéhicules autonomes. Un très grand nombre de ces applications utilise un système de vision par ordinateur. Lafiabilité de ces systèmes demande une robustesse importante face à des conditions parfois difficiles souventcausées par les conditions d'illumination (jour/nuit, ombres portées), les conditions météorologiques (pluie, vent,neige) ainsi que la topologie même de la scène observée (occultations). Les travaux présentés dans cette thèsevisent à améliorer la qualité de détection d'objets mobiles en milieu intérieur ou extérieur, et à tout moment de lajournée.Pour ce faire, nous avons proposé trois stratégies combinables :i) l'utilisation d'invariants colorimétriques et/ou d'espaces de représentation couleur présentant des propriétésinvariantes ;ii) l'utilisation d'une caméra stéréoscopique et d'une caméra active Microsoft Kinect en plus de la caméra couleurafin de reconstruire l'environnement 3D partiel de la scène, et de fournir une dimension supplémentaire, à savoirune information de profondeur, à l'algorithme de détection d'objets mobiles pour la caractérisation des pixels ;iii) la proposition d'un nouvel algorithme de fusion basé sur la logique floue permettant de combiner les informationsde couleur et de profondeur tout en accordant une certaine marge d'incertitude quant à l'appartenance du pixel aufond ou à un objet mobile
This PhD thesis falls within the scope of video-surveillance, and more precisely focuses on the detection of movingobjects in image sequences. In many applications, good detection of moving objects is an indispensable prerequisiteto any treatment applied to these objects such as people or cars tracking, passengers counting, detection ofdangerous situations in specific environments (level crossings, pedestrian crossings, intersections, etc.), or controlof autonomous vehicles. The reliability of computer vision based systems require robustness against difficultconditions often caused by lighting conditions (day/night, shadows), weather conditions (rain, wind, snow...) and thetopology of the observed scene (occultation...).Works detailed in this PhD thesis aim at reducing the impact of illumination conditions by improving the quality of thedetection of mobile objects in indoor or outdoor environments and at any time of the day. Thus, we propose threestrategies working as a combination to improve the detection of moving objects:i) using colorimetric invariants and/or color spaces that provide invariant properties ;ii) using passive stereoscopic camera (in outdoor environments) and Microsoft Kinect active camera (in outdoorenvironments) in order to partially reconstruct the 3D environment, providing an additional dimension (a depthinformation) to the background/foreground subtraction algorithm ;iii) a new fusion algorithm based on fuzzy logic in order to combine color and depth information with a certain level ofuncertainty for the pixels classification
APA, Harvard, Vancouver, ISO, and other styles
2

Tykkälä, Tommi. "Suivi de caméra image en temps réel base et cartographie de l'environnement." Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00933813.

Full text
Abstract:
Dans ce travail, méthodes d'estimation basées sur des images, également connu sous le nom de méthodes directes, sont étudiées qui permettent d'éviter l'extraction de caractéristiques et l'appariement complètement. L'objectif est de produire pose 3D précis et des estimations de la structure. Les fonctions de coût présenté minimiser l'erreur du capteur, car les mesures ne sont pas transformés ou modifiés. Dans la caméra photométrique estimation de la pose, rotation 3D et les paramètres de traduction sont estimées en minimisant une séquence de fonctions de coûts à base d'image, qui sont des non-linéaires en raison de la perspective projection et la distorsion de l'objectif. Dans l'image la structure basée sur le raffinement, d'autre part, de la structure 3D est affinée en utilisant un certain nombre de vues supplémentaires et un coût basé sur l'image métrique. Les principaux domaines d'application dans ce travail sont des reconstitutions d'intérieur, la robotique et la réalité augmentée. L'objectif global du projet est d'améliorer l'image des méthodes d'estimation fondées, et pour produire des méthodes de calcul efficaces qui peuvent être accueillis dans des applications réelles. Les principales questions pour ce travail sont : Qu'est-ce qu'une formulation efficace pour une image 3D basé estimation de la pose et de la structure tâche de raffinement ? Comment organiser calcul afin de permettre une mise en œuvre efficace en temps réel ? Quelles sont les considérations pratiques utilisant l'image des méthodes d'estimation basées sur des applications telles que la réalité augmentée et la reconstruction 3D ?
APA, Harvard, Vancouver, ISO, and other styles
3

Lai, Po Kong. "Immersive Dynamic Scenes for Virtual Reality from a Single RGB-D Camera." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39663.

Full text
Abstract:
In this thesis we explore the concepts and components which can be used as individual building blocks for producing immersive virtual reality (VR) content from a single RGB-D sensor. We identify the properties of immersive VR videos and propose a system composed of a foreground/background separator, a dynamic scene re-constructor and a shape completer. We initially explore the foreground/background separator component in the context of video summarization. More specifically, we examined how to extract trajectories of moving objects from video sequences captured with a static camera. We then present a new approach for video summarization via minimization of the spatial-temporal projections of the extracted object trajectories. New evaluation criterion are also presented for video summarization. These concepts of foreground/background separation can then be applied towards VR scene creation by extracting relative objects of interest. We present an approach for the dynamic scene re-constructor component using a single moving RGB-D sensor. By tracking the foreground objects and removing them from the input RGB-D frames we can feed the background only data into existing RGB-D SLAM systems. The result is a static 3D background model where the foreground frames are then super-imposed to produce a coherent scene with dynamic moving foreground objects. We also present a specific method for extracting moving foreground objects from a moving RGB-D camera along with an evaluation dataset with benchmarks. Lastly, the shape completer component takes in a single view depth map of an object as input and "fills in" the occluded portions to produce a complete 3D shape. We present an approach that utilizes a new data minimal representation, the additive depth map, which allows traditional 2D convolutional neural networks to accomplish the task. The additive depth map represents the amount of depth required to transform the input into the "back depth map" which would exist if there was a sensor exactly opposite of the input. We train and benchmark our approach using existing synthetic datasets and also show that it can perform shape completion on real world data without fine-tuning. Our experiments show that our data minimal representation can achieve comparable results to existing state-of-the-art 3D networks while also being able to produce higher resolution outputs.
APA, Harvard, Vancouver, ISO, and other styles
4

Kadkhodamohammadi, Abdolrahim. "3D detection and pose estimation of medical staff in operating rooms using RGB-D images." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD047/document.

Full text
Abstract:
Dans cette thèse, nous traitons des problèmes de la détection des personnes et de l'estimation de leurs poses dans la Salle Opératoire (SO), deux éléments clés pour le développement d'applications d'assistance chirurgicale. Nous percevons la salle grâce à des caméras RGB-D qui fournissent des informations visuelles complémentaires sur la scène. Ces informations permettent de développer des méthodes mieux adaptées aux difficultés propres aux SO, comme l'encombrement, les surfaces sans texture et les occlusions. Nous présentons des nouvelles approches qui tirent profit des informations temporelles, de profondeur et des vues multiples afin de construire des modèles robustes pour la détection des personnes et de leurs poses. Une évaluation est effectuée sur plusieurs jeux de données complexes enregistrés dans des salles opératoires avec une ou plusieurs caméras. Les résultats obtenus sont très prometteurs et montrent que nos approches surpassent les méthodes de l'état de l'art sur ces données cliniques
In this thesis, we address the two problems of person detection and pose estimation in Operating Rooms (ORs), which are key ingredients in the development of surgical assistance applications. We perceive the OR using compact RGB-D cameras that can be conveniently integrated in the room. These sensors provide complementary information about the scene, which enables us to develop methods that can cope with numerous challenges present in the OR, e.g. clutter, textureless surfaces and occlusions. We present novel part-based approaches that take advantage of depth, multi-view and temporal information to construct robust human detection and pose estimation models. Evaluation is performed on new single- and multi-view datasets recorded in operating rooms. We demonstrate very promising results and show that our approaches outperform state-of-the-art methods on this challenging data acquired during real surgeries
APA, Harvard, Vancouver, ISO, and other styles
5

Meilland, Maxime. "Cartographie RGB-D dense pour la localisation visuelle temps-réel et la navigation autonome." Phd thesis, Ecole Nationale Supérieure des Mines de Paris, 2012. http://tel.archives-ouvertes.fr/tel-00686803.

Full text
Abstract:
Dans le contexte de la navigation autonome en environnement urbain, une localisation précise du véhicule est importante pour une navigation sure et fiable. La faible précision des capteurs bas coût existants tels que le système GPS, nécessite l'utilisation d'autres capteurs eux aussi à faible coût. Les caméras mesurent une information photométrique riche et précise sur l'environnement, mais nécessitent l'utilisation d'algorithmes de traitement avancés pour obtenir une information sur la géométrie et sur la position de la caméra dans l'environnement. Cette problématique est connue sous le terme de Cartographie et Localisation Simultanées (SLAM visuel). En général, les techniques de SLAM sont incrémentales et dérivent sur de longues trajectoires. Pour simplifier l'étape de localisation, il est proposé de découpler la partie cartographie et la partie localisation en deux phases: la carte est construite hors-ligne lors d'une phase d'apprentissage, et la localisation est effectuée efficacement en ligne à partir de la carte 3D de l'environnement. Contrairement aux approches classiques, qui utilisent un modèle 3D global approximatif, une nouvelle représentation égo-centrée dense est proposée. Cette représentation est composée d'un graphe d'images sphériques augmentées par l'information dense de profondeur (RGB+D), et permet de cartographier de larges environnements. Lors de la localisation en ligne, ce type de modèle apporte toute l'information nécessaire pour une localisation précise dans le voisinage du graphe, et permet de recaler en temps-réel l'image perçue par une caméra embarquée sur un véhicule, avec les images du graphe, en utilisant une technique d'alignement d'images directe. La méthode de localisation proposée, est précise, robuste aux aberrations et prend en compte les changements d'illumination entre le modèle de la base de données et les images perçues par la caméra. Finalement, la précision et la robustesse de la localisation permettent à un véhicule autonome, équipé d'une caméra, de naviguer de façon sure en environnement urbain.
APA, Harvard, Vancouver, ISO, and other styles
6

Villota, Juan Carlos Perafán. "Adaptive registration using 2D and 3D features for indoor scene reconstruction." Universidade de São Paulo, 2016. http://www.teses.usp.br/teses/disponiveis/3/3139/tde-17042017-090901/.

Full text
Abstract:
Pairwise alignment between point clouds is an important task in building 3D maps of indoor environments with partial information. The combination of 2D local features with depth information provided by RGB-D cameras are often used to improve such alignment. However, under varying lighting or low visual texture, indoor pairwise frame registration with sparse 2D local features is not a particularly robust method. In these conditions, features are hard to detect, thus leading to misalignment between consecutive pairs of frames. The use of 3D local features can be a solution as such features come from the 3D points themselves and are resistant to variations in visual texture and illumination. Because varying conditions in real indoor scenes are unavoidable, we propose a new framework to improve the pairwise frame alignment using an adaptive combination of sparse 2D and 3D features based on both the levels of geometric structure and visual texture contained in each scene. Experiments with datasets including unrestricted RGB-D camera motion and natural changes in illumination show that the proposed framework convincingly outperforms methods using 2D or 3D features separately, as reflected in better level of alignment accuracy.
O alinhamento entre pares de nuvens de pontos é uma tarefa importante na construção de mapas de ambientes em 3D. A combinação de características locais 2D com informação de profundidade fornecida por câmeras RGB-D são frequentemente utilizadas para melhorar tais alinhamentos. No entanto, em ambientes internos com baixa iluminação ou pouca textura visual o método usando somente características locais 2D não é particularmente robusto. Nessas condições, as características 2D são difíceis de serem detectadas, conduzindo a um desalinhamento entre pares de quadros consecutivos. A utilização de características 3D locais pode ser uma solução uma vez que tais características são extraídas diretamente de pontos 3D e são resistentes a variações na textura visual e na iluminação. Como situações de variações em cenas reais em ambientes internos são inevitáveis, essa tese apresenta um novo sistema desenvolvido com o objetivo de melhorar o alinhamento entre pares de quadros usando uma combinação adaptativa de características esparsas 2D e 3D. Tal combinação está baseada nos níveis de estrutura geométrica e de textura visual contidos em cada cena. Esse sistema foi testado com conjuntos de dados RGB-D, incluindo vídeos com movimentos irrestritos da câmera e mudanças naturais na iluminação. Os resultados experimentais mostram que a nossa proposta supera aqueles métodos que usam características 2D ou 3D separadamente, obtendo uma melhora da precisão no alinhamento de cenas em ambientes internos reais.
APA, Harvard, Vancouver, ISO, and other styles
7

Shi, Yangyu. "Infrared Imaging Decision Aid Tools for Diagnosis of Necrotizing Enterocolitis." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/40714.

Full text
Abstract:
Neonatal necrotizing enterocolitis (NEC) is one of the most severe digestive tract emergencies in neonates, involving bowel edema, hemorrhage, and necrosis, and can lead to serious complications including death. Since it is difficult to diagnose early, the morbidity and mortality rates are high due to severe complications in later stages of NEC and thus early detection is key to the treatment of NEC. In this thesis, a novel automatic image acquisition and analysis system combining a color and depth (RGB-D) sensor with an infrared (IR) camera is proposed for NEC diagnosis. A design for sensors configuration and a data acquisition process are introduced. A calibration method between the three cameras is described which aims to ensure frames synchronization and observation consistency among the color, depth, and IR images. Subsequently, complete segmentation procedures based on the original color, depth, and IR information are proposed to automatically separate the human body from the background, remove other interfering items, identify feature points on the human body joints, distinguish the human torso and limbs, and extract the abdominal region of interest. Finally, first-order statistical analysis is performed on thermal data collected over the entire extracted abdominal region to compare differences in thermal data distribution between different patient groups. Experimental validation in a real clinical environment is reported and shows encouraging results.
APA, Harvard, Vancouver, ISO, and other styles
8

Baban, a. erep Thierry Roland. "Contribution au développement d'un système intelligent de quantification des nutriments dans les repas d'Afrique subsaharienne." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSEP100.

Full text
Abstract:
La malnutrition, qu'elle soit liée à un apport insuffisant ou excessif en nutriments, représente un défi mondial de santé publique touchant des milliards de personnes. Elle affecte tous les systèmes organiques en étant un facteur majeur de risque pour les maladies non transmissibles telles que les maladies cardiovasculaires, le diabète et certains cancers. Évaluer l'apport alimentaire est crucial pour prévenir la malnutrition, mais cela reste un défi. Les méthodes traditionnelles d'évaluation alimentaire sont laborieuses et sujettes aux biais. Les avancées en IA ont permis la conception de VBDA, solution prometteuse pour analyser automatiquement les images alimentaires afin d'estimer les portions et la composition nutritionnelle. Cependant, la segmentation des images alimentaires dans un VBDA rencontre des difficultés en raison de la structure non rigide des aliments, de la variation intra-classe élevée (où le même type d'aliment peut apparaître très différent), de la ressemblance inter-classe (où différents types d'aliments semblent visuellement très similaires) et de la rareté des ensembles de données disponibles publiquement.Presque toutes les recherches sur la segmentation alimentaire se sont concentrées sur les aliments asiatiques et occidentaux, en l'absence de bases de données pour les cuisines africaines. Cependant, les plats africains impliquent souvent des classes alimentaires mélangées, rendant la segmentation précise difficile. De plus, la recherche s'est largement concentrée sur les images RGB, qui fournissent des informations sur la couleur et la texture mais pourraient manquer de suffisamment de détails géométriques. Pour y remédier, la segmentation RGB-D combine des données de profondeur avec des images RGB. Les images de profondeur fournissent des détails géométriques cruciaux qui enrichissent les données RGB, améliorent la discrimination des objets et sont robustes face à des facteurs tels que l'illumination et le brouillard. Malgré son succès dans d'autres domaines, la segmentation RGB-D pour les aliments est peu explorée en raison des difficultés à collecter des images de profondeur des aliments.Cette thèse apporte des contributions clés en développant de nouveaux modèles d'apprentissage profond pour la segmentation d'images RGB (mid-DeepLabv3+) et RGB-D (ESeNet-D) et en introduisant les premiers ensembles de données axés sur les images alimentaires africaines. Mid-DeepLabv3+ est basé sur DeepLabv3+, avec un backbone ResNet simplifié et une couche de saut (middle layer) ajoutée dans le décodeur, ainsi que des couches mécanisme d'attention SimAM. Ce model offre un excellent compromis entre performance et efficacité computationnelle. ESeNet-D est composé de deux branches d'encodeurs utilisant EfficientNetV2 comme backbone, avec un bloc de fusion pour l'intégration multi-échelle et un décodeur employant des convolutions auto-calibrée et interpolations entrainées pour une segmentation précise. ESeNet-D surpasse de nombreux modèles de référence RGB et RGB-D tout en ayant une charge computationnelle plus faible. Nos expériences ont montré que, lorsqu'elles sont correctement intégrées, les informations relatives à la profondeur peuvent améliorer de manière significative la précision de la segmentation des images alimentaires.Nous présentons également deux nouvelles bases de données : AfricaFoodSeg pour la segmentation « aliment/non-aliment » avec 3067 images (2525 pour l'entraînement, 542 pour la validation), et CamerFood, axée sur la cuisine camerounaise. Les ensembles de données CamerFood comprennent CamerFood10 avec 1422 images et dix classes alimentaires, et CamerFood15, une version améliorée avec 15 classes alimentaires, 1684 images d'entraînement et 514 images de validation. Enfin, nous abordons le défi des données de profondeur rares dans la segmentation RGB-D des aliments en démontrant que les modèles MDE peuvent aider à générer des cartes de profondeur efficaces pour les ensembles de données RGB-D
Malnutrition, including under- and overnutrition, is a global health challenge affecting billions of people. It impacts all organ systems and is a significant risk factor for noncommunicable diseases such as cardiovascular diseases, diabetes, and some cancers. Assessing food intake is crucial for preventing malnutrition but remains challenging. Traditional methods for dietary assessment are labor-intensive and prone to bias. Advancements in AI have made Vision-Based Dietary Assessment (VBDA) a promising solution for automatically analyzing food images to estimate portions and nutrition. However, food image segmentation in VBDA faces challenges due to food's non-rigid structure, high intra-class variation (where the same dish can look very different), inter-class resemblance (where different foods appear similar) and scarcity of publicly available datasets.Almost all food segmentation research has focused on Asian and Western foods, with no datasets for African cuisines. However, African dishes often involve mixed food classes, making accurate segmentation challenging. Additionally, research has largely focus on RGB images, which provides color and texture but may lack geometric detail. To address this, RGB-D segmentation combines depth data with RGB images. Depth images provide crucial geometric details that enhance RGB data, improve object discrimination, and are robust to factors like illumination and fog. Despite its success in other fields, RGB-D segmentation for food is underexplored due to difficulties in collecting food depth images.This thesis makes key contributions by developing new deep learning models for RGB (mid-DeepLabv3+) and RGB-D (ESeNet-D) image segmentation and introducing the first food segmentation datasets focused on African food images. Mid-DeepLabv3+ is based on DeepLabv3+, featuring a simplified ResNet backbone with and added skip layer (middle layer) in the decoder and SimAM attention mechanism. This model offers an optimal balance between performance and efficiency, matching DeepLabv3+'s performance while cutting computational load by half. ESeNet-D consists on two encoder branches using EfficientNetV2 as backbone, with a fusion block for multi-scale integration and a decoder employing self-calibrated convolution and learned interpolation for precise segmentation. ESeNet-D outperforms many RGB and RGB-D benchmark models while having fewer parameters and FLOPs. Our experiments show that, when properly integrated, depth information can significantly improve food segmentation accuracy. We also present two new datasets: AfricaFoodSeg for “food/non-food” segmentation with 3,067 images (2,525 for training, 542 for validation), and CamerFood focusing on Cameroonian cuisine. CamerFood datasets include CamerFood10 with 1,422 images from ten food classes, and CamerFood15, an enhanced version with 15 food classes, 1,684 training images, and 514 validation images. Finally, we address the challenge of scarce depth data in RGB-D food segmentation by demonstrating that Monocular Depth Estimation (MDE) models can aid in generating effective depth maps for RGB-D datasets
APA, Harvard, Vancouver, ISO, and other styles
9

Hasnat, Md Abul. "Unsupervised 3D image clustering and extension to joint color and depth segmentation." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4013/document.

Full text
Abstract:
L'accès aux séquences d'images 3D s'est aujourd'hui démocratisé, grâce aux récentes avancées dans le développement des capteurs de profondeur ainsi que des méthodes permettant de manipuler des informations 3D à partir d'images 2D. De ce fait, il y a une attente importante de la part de la communauté scientifique de la vision par ordinateur dans l'intégration de l'information 3D. En effet, des travaux de recherche ont montré que les performances de certaines applications pouvaient être améliorées en intégrant l'information 3D. Cependant, il reste des problèmes à résoudre pour l'analyse et la segmentation de scènes intérieures comme (a) comment l'information 3D peut-elle être exploitée au mieux ? et (b) quelle est la meilleure manière de prendre en compte de manière conjointe les informations couleur et 3D ? Nous abordons ces deux questions dans cette thèse et nous proposons de nouvelles méthodes non supervisées pour la classification d'images 3D et la segmentation prenant en compte de manière conjointe les informations de couleur et de profondeur. A cet effet, nous formulons l'hypothèse que les normales aux surfaces dans les images 3D sont des éléments à prendre en compte pour leur analyse, et leurs distributions sont modélisables à l'aide de lois de mélange. Nous utilisons la méthode dite « Bregman Soft Clustering » afin d'être efficace d'un point de vue calculatoire. De plus, nous étudions plusieurs lois de probabilités permettant de modéliser les distributions de directions : la loi de von Mises-Fisher et la loi de Watson. Les méthodes de classification « basées modèles » proposées sont ensuite validées en utilisant des données de synthèse puis nous montrons leur intérêt pour l'analyse des images 3D (ou de profondeur). Une nouvelle méthode de segmentation d'images couleur et profondeur, appelées aussi images RGB-D, exploitant conjointement la couleur, la position 3D, et la normale locale est alors développée par extension des précédentes méthodes et en introduisant une méthode statistique de fusion de régions « planes » à l'aide d'un graphe. Les résultats montrent que la méthode proposée donne des résultats au moins comparables aux méthodes de l'état de l'art tout en demandant moins de temps de calcul. De plus, elle ouvre des perspectives nouvelles pour la fusion non supervisée des informations de couleur et de géométrie. Nous sommes convaincus que les méthodes proposées dans cette thèse pourront être utilisées pour la classification d'autres types de données comme la parole, les données d'expression en génétique, etc. Elles devraient aussi permettre la réalisation de tâches complexes comme l'analyse conjointe de données contenant des images et de la parole
Access to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysis
APA, Harvard, Vancouver, ISO, and other styles
10

Řehánek, Martin. "Detekce objektů pomocí Kinectu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236602.

Full text
Abstract:
With the release of the Kinect device new possibilities appeared, allowing a simple use of image depth in image processing. The aim of this thesis is to propose a method for object detection and recognition in a depth map. Well known method Bag of Words and a descriptor based on Spin Image method are used for the object recognition. The Spin Image method is one of several existing approaches to depth map which are described in this thesis. Detection of object in picture is ensured by the sliding window technique. That is improved and speeded up by utilization of the depth information.
APA, Harvard, Vancouver, ISO, and other styles
11

Hammond, Patrick Douglas. "Deep Synthetic Noise Generation for RGB-D Data Augmentation." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7516.

Full text
Abstract:
Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets.
APA, Harvard, Vancouver, ISO, and other styles
12

SILVA, DJALMA LUCIO SOARES DA. "USING PLANAR STRUCTURES EXTRACTED FROM RGB-D IMAGES IN AUGMENTED REALITY APPLICATIONS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28675@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Esta dissertação investiga o uso das estruturas planares extraídas de imagens RGB-Dem aplicações de Realidade Aumentada. Ter o modelo da cena é fundamental para as aplicações de realidade aumentada. O uso de imagens RGB-D auxilia bastante o processo da construção destes modelos, pois elas fornecem a geometria e os aspectos fotométricos da cena. Devido a grande parte das aplicações de realidade aumentada utilizarem superfícies planares como sua principal componente para projeção de objetos virtuais, é fundamental ter um método robusto e eficaz para obter e representar as estruturas que constituem estas superfícies planares. Neste trabalho, apresentaremos um método para identificar, segmentar e representar estruturas planares a partir de imagens RGB-D. Nossa representação das estruturas planares são polígonos bidimensionais triangulados, simplificados e texturizados, que estão no sistema de coordenadas do plano, onde os pontos destes polígonos definem as regiões deste plano. Demonstramos através de diversos experimentos e da implementação de uma aplicação de realidade aumentada, as técnicas e métodos utilizados para extrair as estruturas planares a partir das imagens RGB-D.
This dissertation investigates the use of planar geometric structures extracted from RGB-D images in Augmented Reality Applications. The model of a scene is essential for augmented reality applications. RGB-D images can greatly help the construction of these models because they provide geometric and photometric information about the scene. Planar structures are prevalent in many 3D scenes and, for this reason, augmented reality applications use planar surfaces as one of the main components for projection of virtual objects. Therefore, it is extremely important to have robust and efficient methods to acquire and represent the structures that compose these planar surfaces. In this work, we will present a method for identifying, targeting and representing planar structures from RGB-D images. Our planar structures representation is triangulated two-dimensional polygons, simplified and textured, forming a triangle mesh intrinsic to the plane that defines regions in this space corresponding to surface of objects in the 3D scene. We have demonstrated through various experiments and implementation of an augmented reality application, the techniques and methods used to extract the planar structures from the RGB-D images.
APA, Harvard, Vancouver, ISO, and other styles
13

Basso, Marcos Aurélio. "Um método robusto para modelagem 3D de ambientes internos usando dados RGB-D." reponame:Repositório Institucional da UFPR, 2015. http://hdl.handle.net/1884/45430.

Full text
Abstract:
Orientadora : Daniel Rodrigues dos Santos
Tese (doutorado) - Universidade Federal do Paraná, Setor de Ciências da Terra, Programa de Pós-Graduação em Ciências Geodésicas. Defesa: Curitiba, 12/11/2015
Inclui referências : f. 106-112
Resumo: O objetivo deste trabalho é propor um método robusto para modelagem 3D de ambientes internos usando dados RGB-D. Basicamente, a modelagem 3D de ambientes está dividida em quatro tarefas, a saber: a escolha do sensor de imageamento; o problema do registro de nuvem de pontos 3D adquiridos pelo sensor de imageamento em diferentes pontos de vista; o problema da detecção de lugares anteriormente visitados (loop closure); e o problema da análise de consistência global. Atualmente, o Kinect é o sensor RGB-D mais empregado na aquisição de dados para modelagem de ambientes internos, uma vez que é leve, flexível e de fácil manuseio. A etapa de registro consiste em determinar os parâmetros de transformação relativa entre pares de nuvens de pontos e, neste trabalho, é dividida em duas partes: a primeira parte consiste em executar o registro inicial dos dados 3D usando pontos visuais e o modelo de corpo rígido 3D; na segunda parte, os parâmetros iniciais são refinados empregando um modelo matemático baseado numa abordagem paralaxe-a-plano, o que torna o método robusto. Para minimizar os efeitos da propagação de erros provocados na etapa de registro dos pares de nuvens de pontos 3D, o método proposto detecta lugares anteriormente visitados usando uma imagem de (frame-chave). Basicamente, é feita uma busca por imagens com grau de similaridade com a imagem de referência e, por fim, é obtida uma nova restrição espacial. A etapa de consistência global cria um grafo dirigido e ponderado, sendo cada vértice do grafo representado pelos parâmetros de transformação obtidos na etapa de registro dos dados, enquanto suas arestas representam as restrições espaciais definidas pelos parâmetros de transformação obtidos entre os lugares revisitados. A otimização deste grafo é feito através do método GraphSLAM. Experimentos foram realizados em cinco ambientes internos e o método proposto propiciou uma acurácia relativa em torno de 6,85 cm. . Palavras-chave:sensor RGB-D; modelagem 3D; Otimização da trajetória baseado em grafos; registro de pares de nuvens de pontos; análise de consistência global.
Abstract: The objective of this paper is to propose a robust method for 3D modeling indoors using RGB-D data. Basically, the 3D modeling environment is divided into four problems, namely: the choice of the imaging sensor; the cloud Registration problem of 3D points acquired by the imaging sensor in different views; the problem of detection places previously visited (loop closure); and the problem of global consistency analysis. Currently, Kinect is the RGB-D sensor more employed in data acquisition for modeling indoor environments, since they are lightweight, flexible and easy to use. The registration step is to determine the transformation parameters relating between pairs of point cloud and in this paper is divided into two parts: the first part is to run the initial registration of 3D data using visual points and rigid body model 3D; in the second part, the initial parameters are refined using a mathematical model based on a parallax-the-plan approach, which makes the robust method. To minimize the effects of propagation of errors caused in the 3D point cloud pairs registration step, the proposed method detects previously visited places using a reference image (key-frame). Basically, a search for images with degree of correlation is made with the reference image, and finally, a new spatial constraint is obtained. The overall consistency of step creates a directed and weighted graph, each nodes in the graph represented by the transformation parameters obtained in the data registration step, whereas its edges represent the spatial constraints defined by the transformation parameters obtained between Revisited places. The optimization of the graph is made by GraphSLAM method. Experiments were carried out in five indoor and the proposed method provided a relative accuracy around 6,85 cm.. Keywords: RGB-D sensor; mapping 3D; GraphSLAM; pairs registration of point clouds; consistency global analysis.
APA, Harvard, Vancouver, ISO, and other styles
14

Martins, Renato. "Odométrie visuelle directe et cartographie dense de grands environnements à base d'images panoramiques RGB-D." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEM004/document.

Full text
Abstract:
Cette thèse se situe dans le domaine de l'auto-localisation et de la cartographie 3D des caméras RGB-D pour des robots mobiles et des systèmes autonomes avec des caméras RGB-D. Nous présentons des techniques d'alignement et de cartographie pour effectuer la localisation d'une caméra (suivi), notamment pour des caméras avec mouvements rapides ou avec faible cadence. Les domaines d'application possibles sont la réalité virtuelle et augmentée, la localisation de véhicules autonomes ou la reconstruction 3D des environnements.Nous proposons un cadre consistant et complet au problème de localisation et cartographie 3D à partir de séquences d'images RGB-D acquises par une plateforme mobile. Ce travail explore et étend le domaine d'applicabilité des approches de suivi direct dites "appearance-based". Vis-à-vis des méthodes fondées sur l'extraction de primitives, les approches directes permettent une représentation dense et plus précise de la scène mais souffrent d'un domaine de convergence plus faible nécessitant une hypothèse de petits déplacements entre images.Dans la première partie de la thèse, deux contributions sont proposées pour augmenter ce domaine de convergence. Tout d'abord une méthode d'estimation des grands déplacements est développée s'appuyant sur les propriétés géométriques des cartes de profondeurs contenues dans l'image RGB-D. Cette estimation grossière (rough estimation) peut être utilisée pour initialiser la fonction de coût minimisée dans l'approche directe. Une seconde contribution porte sur l'étude des domaines de convergence de la partie photométrique et de la partie géométrique de cette fonction de coût. Il en résulte une nouvelle fonction de coût exploitant de manière adaptative l'erreur photométrique et géométrique en se fondant sur leurs propriétés de convergence respectives.Dans la deuxième partie de la thèse, nous proposons des techniques de régularisation et de fusion pour créer des représentations précises et compactes de grands environnements. La régularisation s'appuie sur une segmentation de l'image sphérique RGB-D en patchs utilisant simultanément les informations géométriques et photométriques afin d'améliorer la précision et la stabilité de la représentation 3D de la scène. Cette segmentation est également adaptée pour la résolution non uniforme des images panoramiques. Enfin les images régularisées sont fusionnées pour créer une représentation compacte de la scène, composée de panoramas RGB-D sphériques distribués de façon optimale dans l'environnement. Ces représentations sont particulièrement adaptées aux applications de mobilité, tâches de navigation autonome et de guidage, car elles permettent un accès en temps constant avec une faible occupation de mémoire qui ne dépendent pas de la taille de l'environnement
This thesis is in the context of self-localization and 3D mapping from RGB-D cameras for mobile robots and autonomous systems. We present image alignment and mapping techniques to perform the camera localization (tracking) notably for large camera motions or low frame rate. Possible domains of application are localization of autonomous vehicles, 3D reconstruction of environments, security or in virtual and augmented reality. We propose a consistent localization and 3D dense mapping framework considering as input a sequence of RGB-D images acquired from a mobile platform. The core of this framework explores and extends the domain of applicability of direct/dense appearance-based image registration methods. With regard to feature-based techniques, direct/dense image registration (or image alignment) techniques are more accurate and allow us a more consistent dense representation of the scene. However, these techniques have a smaller domain of convergence and rely on the assumption that the camera motion is small.In the first part of the thesis, we propose two formulations to relax this assumption. Firstly, we describe a fast pose estimation strategy to compute a rough estimate of large motions, based on the normal vectors of the scene surfaces and on the geometric properties between the RGB-D images. This rough estimation can be used as initialization to direct registration methods for refinement. Secondly, we propose a direct RGB-D camera tracking method that exploits adaptively the photometric and geometric error properties to improve the convergence of the image alignment.In the second part of the thesis, we propose techniques of regularization and fusion to create compact and accurate representations of large scale environments. The regularization is performed from a segmentation of spherical frames in piecewise patches using simultaneously the photometric and geometric information to improve the accuracy and the consistency of the scene 3D reconstruction. This segmentation is also adapted to tackle the non-uniform resolution of panoramic images. Finally, the regularized frames are combined to build a compact keyframe-based map composed of spherical RGB-D panoramas optimally distributed in the environment. These representations are helpful for autonomous navigation and guiding tasks as they allow us an access in constant time with a limited storage which does not depend on the size of the environment
APA, Harvard, Vancouver, ISO, and other styles
15

Zeni, Luis Felipe de Araujo. "Reconhecimento facial tolerante à variação de pose utilizando uma câmera RGB-D de baixo custo." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2014. http://hdl.handle.net/10183/101659.

Full text
Abstract:
Reconhecer a identidade de seres humanos a partir de imagens digitais gravadas de suas faces é uma etapa importante para uma variedade de aplicações que incluem segurança de acesso, iteração humano computador, entretenimento digital, entre outras. Neste trabalho é proposto um novo método automático para reconhecimento facial que utiliza simultaneamente a informação 2D e 3D de uma câmera RGB-D(Kinect). O método proposto utiliza a informação de cor da imagem 2D para localizar faces na cena, uma vez que uma face é localizada ela é devidamente recortada e normalizada para um padrão de tamanho e cor. Posteriormente com a informação de profundidade o método estima a pose da cabeça em relação com à câmera. Com faces recortadas e suas respectivas informações de pose, o método proposto treina um modelo de faces robusto à variação de poses e expressões propondo uma nova técnica automática que separa diferentes poses em diferentes modelos de faces. Com o modelo treinado o método é capaz de identificar se as pessoas utilizadas para aprender o modelo estão ou não presentes em novas imagens adquiridas, as quais o modelo não teve acesso na etapa de treinamento. Os experimentos realizados demonstram que o método proposto melhora consideravelmente o resultado de classificação em imagens reais com variação de pose e expressão.
Recognizing the identity of human beings from recorded digital images of their faces is important for a variety of applications, namely, security access, human computer interation, digital entertainment, etc. This dissertation proposes a new method for automatic face recognition that uses both 2D and 3D information of an RGB-D(Kinect) camera. The method uses the color information of the 2D image to locate faces in the scene, once a face is properly located it is cut and normalized to a standard size and color. Afterwards, using depth information the method estimates the pose of the head relative to the camera. With the normalized faces and their respective pose information, the proposed method trains a model of faces that is robust to pose and expressions using a new automatic technique that separates different poses in different models of faces. With the trained model, the method is able to identify whether people used to train the model are present or not in new acquired images, which the model had no access during the training phase. The experiments demonstrate that the proposed method considerably improves the result of classification in real images with varying pose and expression.
APA, Harvard, Vancouver, ISO, and other styles
16

Vo, Duc My [Verfasser], and Andreas [Akademischer Betreuer] Zell. "Person Detection, Tracking and Identification by Mobile Robots Using RGB-D Images / Duc My Vo ; Betreuer: Andreas Zell." Tübingen : Universitätsbibliothek Tübingen, 2015. http://d-nb.info/1163396826/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Gokhool, Tawsif Ahmad Hussein. "Cartographie dense basée sur une représentation compacte RGB-D dédiée à la navigation autonome." Thesis, Nice, 2015. http://www.theses.fr/2015NICE4028/document.

Full text
Abstract:
Dans ce travail, nous proposons une représentation efficace de l’environnement adaptée à la problématique de la navigation autonome. Cette représentation topométrique est constituée d’un graphe de sphères de vision augmentées d’informations de profondeur. Localement la sphère de vision augmentée constitue une représentation égocentrée complète de l’environnement proche. Le graphe de sphères permet de couvrir un environnement de grande taille et d’en assurer la représentation. Les "poses" à 6 degrés de liberté calculées entre sphères sont facilement exploitables par des tâches de navigation en temps réel. Dans cette thèse, les problématiques suivantes ont été considérées : Comment intégrer des informations géométriques et photométriques dans une approche d’odométrie visuelle robuste ; comment déterminer le nombre et le placement des sphères augmentées pour représenter un environnement de façon complète ; comment modéliser les incertitudes pour fusionner les observations dans le but d’augmenter la précision de la représentation ; comment utiliser des cartes de saillances pour augmenter la précision et la stabilité du processus d’odométrie visuelle
Our aim is concentrated around building ego-centric topometric maps represented as a graph of keyframe nodes which can be efficiently used by autonomous agents. The keyframe nodes which combines a spherical image and a depth map (augmented visual sphere) synthesises information collected in a local area of space by an embedded acquisition system. The representation of the global environment consists of a collection of augmented visual spheres that provide the necessary coverage of an operational area. A "pose" graph that links these spheres together in six degrees of freedom, also defines the domain potentially exploitable for navigation tasks in real time. As part of this research, an approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and ; geometric information available from our augmented spherical database ; how to determine the quantity and optimal placement of these augmented spheres to cover an environment completely ; how tomodel sensor uncertainties and update the dense infomation of the augmented spheres ; how to compactly represent the information contained in the augmented sphere to ensure robustness, accuracy and stability along an explored trajectory by making use of saliency maps
APA, Harvard, Vancouver, ISO, and other styles
18

Thörnberg, Jesper. "Combining RGB and Depth Images for Robust Object Detection using Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174137.

Full text
Abstract:
We investigated the advantage of combining RGB images with depth data to get more robust object classifications and detections using pre-trained deep convolutional neural networks. We relied upon the raw images from publicly available datasets captured using Microsoft Kinect cameras. The raw images varied in size, and therefore required resizing to fit our network. We designed a resizing method called "bleeding edge" to avoid distorting the objects in the images. We present a novel method of interpolating the missing depth pixel values by comparing to similar RGB values. This method proved superior to the other methods tested. We showed that a simple colormap transformation of the depth image can provide close to state-of-art performance. Using our methods, we can present state-of-art performance on the Washington Object dataset and we provide some results on the Washington Scenes (V1) dataset. Specifically, for the detection, we used contours at different thresholds to find the likely object locations in the images. For the classification task we can report state-of-art results using only RGB and RGB-D images, depth data alone gave close to state-of-art results. For the detection task we found the RGB only detector to be superior to the other detectors.
APA, Harvard, Vancouver, ISO, and other styles
19

Chiesa, Valeria. "Revisiting face processing with light field images." Electronic Thesis or Diss., Sorbonne université, 2019. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2019SORUS059.pdf.

Full text
Abstract:
L'objectif principal de cette thèse est de présenter une technologie d'acquisition non conventionnelle, d'étudier les performances d'analyse du visage en utilisant des images collectées avec une caméra spéciale, de comparer les résultats avec ceux obtenus en élaborant des données à partir de dispositifs similaires et de démontrer le bénéfice apporté par l'utilisation de dispositifs modernes par rapport à des caméras standards utilisées en biométrie. Au début de la thèse, la littérature sur l'analyse du visage à l'aide de données "light field" a été étudiée. Le problème de la rareté des données biométriques (et en particulier des images de visages humains) recueillies à l'aide de caméras plénoptiques a été résolu par l'acquisition systématique d'une base de données de visages "light field", désormais accessible au public. Grâce aux données recueillies, il a été possible de concevoir et de développer des expériences en analyse du visage. De plus, une base de référence exhaustive pour une comparaison entre deux technologies RGB-D a été créée pour appuyer les études en perspective. Pendant la période de cette thèse, l'intérêt pour la technologie du plénoptique appliquée à l'analyse du visage s'est accrue et la nécessité d'une étude d'un algorithme dédié aux images "light field" est devenue incontournable. Ainsi, une vue d'ensemble complète des méthodes existantes a été élaborée
Being able to predict the macroscopic response of a material from the knowledge of its constituent at a microscopic or mesoscopic scale has always been the Holy Grail pursued by material science, for it provides building bricks for the understanding of complex structures as well as for the development of tailor-made optimized materials. The homogenization theory constitutes nowadays a well-established theoretical framework to estimate the overall response of composite materials for a broad range of mechanical behaviors. Such a framework is still lacking for brittle fracture, which is a dissipative evolution problem that (ii) localizes at the crack tip and (iii) is related to a structural one. In this work, we propose a theoretical framework based on a perturbative approach of Linear Elastic Fracture Mechanics to model (i) crack propagation in large-scale disordered materials as well (ii) the dissipative processes involved at the crack tip during the interaction of a crack with material heterogeneities. Their ultimate contribution to the macroscopic toughness of the composite is (iii) estimated from the resolution of the structural problem using an approach inspired by statistical physics. The theoretical and numerical inputs presented in the thesis are finally compared to experimental measurements of crack propagation in 3D-printed heterogeneous polymers obtained through digital image correlation
APA, Harvard, Vancouver, ISO, and other styles
20

Shiu, Feng-Shuo, and 許峯碩. "Stereoscopic Image Generation from RGB-D Images." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/454p5w.

Full text
Abstract:
碩士
義守大學
資訊工程學系
105
Traditional stereoscopic images are generated using multiple color-cameras. With the popularity of depth capture devices, it becomes a possible way to generate stereoscopic images by use of color and depth images. During the generating process, holes are created because of visual occlusion or reflective material. Based on color and depth images from RGB-D camera, this study explores the technology of generating stereoscopic images and filling the holes. First, use the DIBR algorithm to generate the left-eye and right-eye images respectively, and then propose an improved inpainting method to fill the hole. To verify the actual effect of stereoscopic images, the head-mounted display is used to view the inpainted stereoscopic images.
APA, Harvard, Vancouver, ISO, and other styles
21

Kuo, Hung-Yu, and 郭弘裕. "Image Segmentation from RGB-D Data." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/n72ag9.

Full text
Abstract:
碩士
義守大學
資訊工程學系
105
Image segmentation is one of the most important foundations of computer vision. In many applications such as image retrieval, pattern recognition, machine vision and related fields, it is necessary to have a good segmentation technology to facilitate the follow-up retrieval and recognition work. Traditional image segmentation methods are mainly based on the color information in images. With the growing popularity of cheap RGB-D cameras, let us have a new method to do image segmentation. This study uses Kinect camera to get color and depth information for image segmentation. First, image is initially segmented according to color information, followed by the use of color and depth information for the merge of adjacent blocks to get the final segmented results. Use the depth information to make up for the past only color for the lack of segmentation to get the effect of appropriate results.
APA, Harvard, Vancouver, ISO, and other styles
22

Tu, Chieh-Min, and 杜介民. "Depth Image Inpainting with RGB-D Camera." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/k4m42a.

Full text
Abstract:
碩士
義守大學
資訊工程學系
103
Since Microsoft released the cheap Kinect sensors as a new natural user interface, stereo imaging is made from previous multi-view color image synthesis, to now synthesis of color image and depth image. But the captured depth images may lose some depth values so that stereoscopic effect is often poor in general. This thesis is based on Kinect RGB-D camera to develop an object-based depth inpainting method. Firstly, the background differencing, frame differencing and depth thresholding strategies are used as a basis for segmenting foreground objects from a dynamic background image. Then, the task of hole inpainting is divided into background area and foreground area, in which background area is inpainted by background depth image and foreground area is inpainted by a best-fit neighborhood depth value. Experimental results show that such an inpainting method is helpful to fill holes, and to improve the contour edges and image quality.
APA, Harvard, Vancouver, ISO, and other styles
23

Ting, Hao-Chan, and 丁浩展. "Human Skeleton Correction Based on RGB-D Image." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/c77uu5.

Full text
Abstract:
碩士
國立臺灣科技大學
電子工程系
102
The currently accepted human skeleton extraction techniques depend on OpenNI framework and NITE middleware. By using this technique, the human skeleton can be tracked with a real time process while human position was recognized at the beginning. However, the incorrect skeleton detection may happen when human holds an object and corresponding depth image is affected by this object. In this thesis, we propose a method to reduce this kind of problem and increase the human skeleton detection accuracy. We detect the object when human holds an object and then filter the object from corresponding depth map. After filtering the object in depth map, the human skeleton detection technique of NITE middleware will get the correct skeleton information. Meanwhile, we can obtain the human skeleton information include 15 joints positions, orientations and corresponding confidents. Experimental results show that human skeleton obtained from the proposed method can reduce the effect when human holds an object, and the process of tracking skeleton is still real time for developer.
APA, Harvard, Vancouver, ISO, and other styles
24

Madeira, Tiago de Matos Ferreira. "Enhancement of RGB-D image alignment using fiducial markers." Master's thesis, 2019. http://hdl.handle.net/10773/29603.

Full text
Abstract:
3D reconstruction is the creation of three-dimensional models from the captured shape and appearance of real objects. It is a field that has its roots in several areas within computer vision and graphics, and has gained high importance in others, such as architecture, robotics, autonomous driving, medicine, and archaeology. Most of the current model acquisition technologies are based on LiDAR, RGB-D cameras, and image-based approaches such as visual SLAM. Despite the improvements that have been achieved, methods that rely on professional instruments and operation result in high costs, both capital and logistical. In this dissertation, we develop an optimization procedure capable of enhancing the 3D reconstructions created using a consumer level RGB-D hand-held camera, a product that is widely available, easily handled, with a familiar interface to the average smartphone user, through the utilisation of fiducial markers placed in the environment. Additionally, a tool was developed to allow the removal of said fiducial markers from the texture of the scene, as a complement to mitigate a downside of the approach taken, but that may prove useful in other contexts.
A reconstrução 3D é a criação de modelos tridimensionais a partir da forma e aparência capturadas de objetos reais. É um campo que teve origem em diversos ramos da visão computacional e computação gráfica, e que ganhou grande importância em áreas como a arquitetura, robótica, condução autónoma, medicina e arqueologia. A maioria das tecnologias de aquisição de modelos atuais são baseadas em LiDAR, câmeras RGB-D e abordagens baseadas em imagens, como o SLAM visual. Apesar das melhorias que foram alcançadas, os métodos que dependem de instrumentos profissionais e da sua operação resultam em elevados custos, tanto de capital, como logísticos. Nesta dissertação foi desenvolvido um processo de otimização capaz de melhorar as reconstruções 3D criadas usando uma câmera RGB-D portátil, disponível ao nível do consumidor, de fácil manipulação e que tem uma interface familiar para o utilizador de smartphones, através da utilização de marcadores fiduciais colocados no ambiente. Além disso, uma ferramenta foi desenvolvida para permitir a remoção dos ditos marcadores fiduciais da textura da cena, como um complemento para mitigar uma desvantagem da abordagem adotada, mas que pode ser útil em outros contextos.
Mestrado em Engenharia de Computadores e Telemática
APA, Harvard, Vancouver, ISO, and other styles
25

Lee, Chi-cheng, and 李其真. "Image Database and RGB-D Camera Image Based Simultaneous Localization and Mapping." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/71614046675523628278.

Full text
Abstract:
碩士
國立臺灣科技大學
機械工程系
102
Recently, due to the advances in technology and the growing popularity of social network, almost everyone owns a smartphone now. Most of people are happy to upload the photographs they took in the internet and share with others. It is easy to get the historical images and the image’s information for an unfamiliar environment. Therefor,when we are in an unfamiliar environment, if we can use the numerous historical images which were upload by people in cloud network to achieve the purpose of localization and mapping, then can reduce the cost of creating database and processing large amounts of information. The purpose of this paper is to depend on a computer vision system to assist the simultaneous localization and mapping for a realtime camera, which includes a RGB-D depth camera for capture images and a computer for processing computing analysis. In the image pre-processing part, we first assume that real-time camera coordinate is world coordinate, then via matching the feature points between historical images and realtime images, we can get the projection model. By the definition of coordinate system and the property of camera calibration, we can get the information of position and angle for historical image database relative to the realtime camera, and get the relative information of realtime camera in world coordinate with the coordinate transformation. In the localization and mapping part, we use the Extended Kalman filter SLAM estimator to generate a stabilize measurement result for the state of realtime camera, and image database can get a great convergence result, then can create the path and map for the realtime camera. Our contribution of this thesis is we don’t have to match the high similarity features in the continuous image like the general SLAM, we can directly find the relative state between two images instead, and it can reduce the time cost in finding features. In additionally, due to our experimental equipment is RGB-D camera, so we don’t have to use two ordinary image to find 3D features. Instead we can directly get the 3D information and can reduce the number of coordinate transformation and get the relative state between realtime image and database image faster. In this paper, our application is that we can use the exist image database in the unknown area and a RGB-D realtime camera to achieve the positioning.
APA, Harvard, Vancouver, ISO, and other styles
26

Chuang, Hui-Chi, and 莊惠琪. "Real-Time Fingerspelling Recognition System Design Based on RGB-D Image Information." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/78417928215280967136.

Full text
Abstract:
碩士
國立交通大學
電控工程研究所
102
Communication is a very important part for human-computer interaction. This thesis provides a fingerspelling recognition system with high accuracy rate based on RGB-D image. The system are separated into three parts, including ROI selection, hand feature extraction, and fingerspelling recognition. For the ROI selection, the regions of hand and face are first obtained by skin color detection and connect component labeling (CCL), and then the hand, the ROI, is determined by the feature point extraction based on distance transform. Followed is the hand feature extraction which consists of the hand structure and the hand texture. From the feature points of ROI, the locations of palm and fingertips, palm direction, and finger vectors are formed as the hand structure. In addition to the hand structure, this thesis adopts the LBP operator to generate the hand texture to deal with the fingerspelling not recognizable by the hand structure. Finally, the extracted hand features are sent into the fingerspelling recognition system, which is built with several different neural network classifiers. The experimental results show that this system is an effective real-time recognition system whose accuracy is higher than 80% for most of the fingerspelling in ASL.
APA, Harvard, Vancouver, ISO, and other styles
27

Lai, Chih-Chia, and 賴志嘉. "On Constructing the Registration Graph of a 3-D Scene Using RGB-D Image Streams." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/42601737493375454708.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
101
The key problem of using a mobile robot equipped with an RGB-D camera to explore an unknown environment is how to fuse the information contained in the acquired images. Due to the limited field of view of the camera, it is inevitable to register the acquired images. If we represent each image as a node and each pairwise registration result as an edge linking two registered images, then the completed registration results can be expressed as a registration graph. Constructing a registration graph from a series of input images can greatly simplify the 3-D scene reconstruction problem. Notably, the critical issue of registration graph construction is to determine whether a pair of given images are overlapped. If two images are determined to be overlapped, then the second problem is to determine their registration parameters and to add an edge to link those two images. In this work, we use the number of SIFT feature correspondences to select possibly overlapped images. However, the computational complexity of the traditional SIFT feature matching method is too high. Hence, we propose a fast SIFT feature matching algorithm based on the visual word (VW) technique. We first quantize the SIFT features via the vector quantization method with a specified codebook. If two SIFT features are quantized to different VWs, then those two SIFT features are deemed as not matched. Therefore, when matching SIFT features, we only have to consider those features having the same VW and, thus, the computation cost can be greatly reduced.The matched SIFT features computed with the VW approach are further verified with the RANSAC algorithm to remove incorrect matching results and to estimate the registration parameters. Experimental results show that the proposed method can improve the computation speed for 38 times without sacrificing two much matching accuracy.
APA, Harvard, Vancouver, ISO, and other styles
28

Fan, Chuan-Chi, and 范銓奇. "Low Power High Speed 8-Bit Pipelined A/D Converter for RGB Image Processing." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/40871581804580984571.

Full text
Abstract:
碩士
國立中正大學
電機工程所
93
Three architectures of 8-Bit high speed pipelined A/D Converters for RGB image processing is implemented. Firstly, a conventional 1.5-bit/stage pipelined ADC required 7 amplifiers and 15 comparators is designed and fabricated. According to measured results, the SNDR of 33.45 dB under sampling frequency of 2 MHz and input signal of 9.8 kHz is obtained. The ENOB is 5.26 Bit. In order to reduce the power comsumption, the amplifier sharing technique with only 4 amplifiers is included in the 1.5-bit/stage pipelined ADC design. Finally, with only 4 amplifiers and 9 comparators is proposed in the third 1.5-bit/stage pipelined ADC design to further reduce the power consumption. For the OPAMP implementation, the fully differential structure is used. The ADC is implemented in tsmc 0.35um 2P4M Mixed Signal Process thchnology, Based on the post-layout simulation, the ADC SNDR and ENOB are 44.02 dB and 7.02 Bit, respectively, with an input frequency of 9.34 MHz under sampling frequency of 140 MHz. The DNL is about +0.45/-0.5 LSB, and INL is about +2.33/-0.36 LSB. The total ADC power consumption under supply voltage of 3.3 V is about 118.1 mW. The technique will achieve a power saving of 33% compared with conventional pipelined ADC.
APA, Harvard, Vancouver, ISO, and other styles
29

Chiu, Meng-Tzu, and 邱夢姿. "Real-Time Finger Writing Digit Recognition System Design Based on RGB-D Image Information." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/69874109674929453878.

Full text
Abstract:
碩士
國立交通大學
電控工程研究所
103
The finger writing recognition approach has been introduced to a diversity of fields, like video games and remote control systems, because it provides a natural and intuitive communication for Human-Computer Interaction (HCI). This thesis proposes a real-time finger writing digit recognition system with high accuracy rate based on the RGB-D information. The system is divided into three main parts, including ROI selection, feature extraction, and finger writing digit recognition. For the ROI selection, first detect the skin color regions, then determine the palm and the fingertips based on the connect component labeling (CCL) and the distance transform respectively. Further, track the fingertip to create the trajectory and extract its directional features for digit recognition. However, since it is often confused in recognizing 0 and 6, three extra features are added to increase their recognition rate. Finally, with series k-NN classifiers, the experimental results show that the accuracy rate is higher than 95% in finger writing digit recognition, which implies the proposed real-time recognition system is indeed effective and efficient.
APA, Harvard, Vancouver, ISO, and other styles
30

CENG, YUN-FENG, and 曾雲楓. "Deep-Learning-Based Object Classification and Grasping Point Determination Based on RGB-D Image for Robot Arm Operation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/6epvc5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Liu, Chen-Yu, and 劉貞佑. "Multiview Stereo Images Generation from RGB-D Images." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/29963224176167062421.

Full text
Abstract:
碩士
國立臺灣師範大學
資訊工程學系
102
Nowadays, 3D display technology has been well developed and gradually became a matured technology. However, limited 3D contain resources obstruct this technology to be popularized to the market. Even if the customers can afford expensive media equipment, there is still lack of useable resources to function 3D display technology. This research provides the solution of converting RGB+D image to 3D image to partially improve the shortage of 3D resources. In recent decades, many researches are already working on how to create 3D images, which always involved depth measurement and generating image with another perspective. Depth measurement can be done by implementing the solutions such as manual judgments, depths cues, or using depth cameras. The former two solutions are relatively time consuming than the latter one. Especially the depths cues usually cause inaccuracy. Moreover, using depth cameras simplifies the difficulties of getting the depth data and decreases the inaccuracy as well. But there is a problem when using the cameras to collect the depth data, the images may have holes occurs which depends on shooting scenarios. The depth data need to be repaired under a reasonable condition because these two factors impact the 3D images’ qualities. In the past, solution to image inpainting has been proposed from many researches. The main considerations are about the colors and the texture. This research implements two methods to process the missing value of depth images. One is based on images’ low rank feature to use matrix completion technique; the other is based on image segmentation technique to do the depth image repairing. The results of experiment show that our 3D depth quality is obviously higher than the traditional 2D convert to 3D method. Furthermore, depth camera collects the depth data with higher accuracy so we can provide viewers a better experience in 3D display technology.
APA, Harvard, Vancouver, ISO, and other styles
32

Wu, Shang-Yu, and 吳尚諭. "Parallel Hierarchical 3-D Matching of RGB-D Images." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/cbae6h.

Full text
Abstract:
碩士
國立暨南國際大學
資訊工程學系
101
This thesis proposes a new method for RGB-D image matching which is different from the traditional point-to-point/point-to-plane matching methods. An objective function is proposed that fuses both depth and color information for estimating the transformation matrix between two RGB-D images. A hierarchical scale space parameter estimation method is proposed for dealing with image matching with large motion. The main idea is to smooth the input image appropriately so that the minute features are temporarily ignored to simplify the matching problem of main 3-D structures. Notably, image smoothing will eliminate a portion of the image information. To fully utilize the RGB-D information, the degree of blurriness is reduced gradually to introduce the minute image features into the parameter estimation process in a coarse-to-fine matching approach. The image matching method is implemented with CUDA parallel processing framework. Experimental results show that the proposed method can efficiently match two RGB-D images.
APA, Harvard, Vancouver, ISO, and other styles
33

Kuo, Pei-Hsuan, and 郭姵萱. "Object Retrieval Based on RGB-D Images." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/fpsqr5.

Full text
Abstract:
碩士
義守大學
資訊工程學系
105
Nowadays, modern multimedia are widely used in daily life. In order to response the requirements, people continuously explore how to effectively manage and retrieve multimedia data. Most previous studies focused on 2D images and 3D models. However, along with the increasing RGB-D image data, the related retrieval technologies are still inadequate. Therefore, it is an urgent task for development of related retrieval algorithms. This research uses RGB-D image data obtained by Kinect sensing device as the input source, and studies how to extract both color and geometry features from the point cloud data and design a new 3D object retrieval system. By exhibiting the results, a testing platform is established to verify actual effectiveness.
APA, Harvard, Vancouver, ISO, and other styles
34

Peng, Hsiao-Chia, and 彭小佳. "3D Face Reconstruction on RGB and RGB-D Images for Recognition Across Pose." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/88142215912683274078.

Full text
Abstract:
博士
國立臺灣科技大學
機械工程系
103
Face recognition across pose is a challenging problem in computer vision. Two scenarios are considered in this thesis. One is the common setup with one single frontal facial image of each subject in the gallery set and the images of other poses in the probe set. The other considers a RGB-D image of the frontal face for each subject in the gallery, but the probe set is the same as in the previous case that only contains RGB images of other poses. The second scenario simulates the case that RGB-D camera can be available for user registration only and recognition can be performed on regular RGB images without the depth channel. Two approaches are proposed for handling the first scenario, one is holistic and the other is component-based. The former is extended from a face reconstruction approach and improved with different sets of landmarks for alignment and multiple reference models considered in the reconstruction phase. The latter focuses on the reconstruction of facial components obtained by the pose-invariant landmarks, and the recognition with different components considered at different poses. Such a component-based reconstruction for handling cross-pose recognition is rarely seen in the literature. Although the approach for handling the second scenario, i.e., the RGB-D based recognition, is partially similar to the approach for handling the first scenario, the novelty is on the handling of the depth readings corrupted by quantization noise, which are often encountered when the face is not close enough to the RGB-D camera at registration. An approach is proposed to resurface the corrupted depth map and substantially improve the recognition performance. All of the proposed approaches are evaluated on benchmark databases and proven comparable to state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
35

Patrisia, Sherryl Santoso, and 溫夏夢. "Learning-based Pedestrian Detection Applied to RGB-D Images." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/04755862474315325960.

Full text
Abstract:
碩士
國立交通大學
電機資訊國際學程
104
In the complicated environments of real world, accurate pedestrian detection is still a challenging topic. To overcome this issue, we adopt the R-CNN method, which has the ability on extracting robust features and localization as well. The process starts with region proposals (Selective Search) for generating detection candidates, followed by deep learning (CNNs) to produce robust features. Furthermore, the depth information is often helpful in detecting pedestrians and/or objects. We thus use the RGB-D dataset and combine both color picture and depth map information for pedestrian detection. In this thesis, we use a depth-encoding method to convert the original depth map to the HHA format so that it can be processed by CNNs. The HHA encoding method includes three channels: horizontal disparity, height above ground, and angle with gravity. Another technique we adopt is the selective search method that generates region proposals (object candidates). We could use either RGB or HHA images to generate object candidates. In our system, we use CNNs to learn and extract features based on either the RGB or HHA generated candidates. We found that they (two region proposals) make significant difference in our pedestrian detection problem. The HHA proposals lead to much better results. One step further, we could combine the outputs produced by the RGB data and the HHA data in the detection. The information fusion process can be inserted at different points in the system. We can process each data source (RGB and HHA) separate to examine their individual decision (probability) to make the final binary decision. Also, we can combine the feature spaces. In order to combine the features of two sources, we also add an SVM process to make the final decision. Furthermore, we also use PCA to reduce redundant data in data fusion. We design two types of techniques: pre-PCA and post-PCA. The pre-PCA technique applies before features fusion, while post-PCA is after features fusion. The final experiments indicate that generating bounding boxes from HHA Selective Search, then applied to RGB and HHA Images can produce more robust region proposals. Next, PCA can reduce unnecessary features also left only the important features. Finally, by fusing RGB and HHA region proposals features combining with pre-PCA can produce good Pedestrian Detection Rate and lowest False Positive rate.
APA, Harvard, Vancouver, ISO, and other styles
36

Aguiar, Mário André Pinto Ferraz de. "3D reconstruction from multiple RGB-D images with different perspectives." Master's thesis, 2015. https://repositorio-aberto.up.pt/handle/10216/89542.

Full text
Abstract:
Reconstrução de modelos 3D pode ser uma tarefa útil para várias finalidades. Alguns exemplos são a modelação de uma pessoa ou objeto para uma animação, em robótica, modelação de espaços para exploração ou, para fins clínicos, modelação de pacientes ao longo do tempo para manter um histórico do corpo do paciente. O processo de reconstrução é constituído pelas capturas do objeto a ser modelado, a conversão destas capturas para nuvens de pontos e o alinhamento de cada nuvem de pontos por forma a obter o modelo 3D.A metodologia implementada para o processo de alinhamento foi o mais genérico quanto possível, para poder ser usado para os múltiplos fins discutidos acima, com um foco especial nos objetos não-rígidos. Este foco vem da necessidade de reconstruir modelos 3D de alta qualidade, de pacientes tratadas para o cancro da mama, para a avaliação estética do resultado cirúrgico. Com o uso de algoritmos de alinhamento não-rígido, o processo de reconstrução fica mais robusto a pequenos movimentos durante as capturas.O sensor utilizado para as capturas foi o Microsoft Kinect, devido à possibilidade de se obter imagens de cores (RGB) e imagens de profundidade, mais conhecidas por imagens RGB -D. Com este tipo de dados o modelo 3D final pode ter textura, o que é uma vantagem em muitos casos. A outra razão principal para esta escolha foi o fato de o Microsoft Kinect ser um sensor de baixo custo, tornando-se assim uma alternativa aos sistemas mais dispendiosos disponíveis no mercado.Os principais objetivos alcançados foram a reconstrução de modelos 3D com boa qualidade a partir de capturas com ruido, usando um sensor de baixo custo. O registro de nuvens de pontos sem conhecimento prévio sobre a pose do sensor, permitindo a livre circulação do sensor em torno dos objetos. Por fim, o registo de nuvens de pontos com pequenas deformações entre elas, onde os algoritmos de alinhamento rígido convencionais não podem ser utilizados.
3D model reconstruction can be a useful tool for multiple purposes. Some examples are modeling a person or objects for an animation, in robotics, modeling spaces for exploration or, for clinical purposes, modeling patients over time to keep a history of the patient's body. The reconstruction process is constituted by the captures of the object to be reconstructed, the conversion of these captures to point clouds and the registration of each point cloud to achieve the 3D model.The implemented methodology for the registration process was as much general as possible, to be usable for the multiple purposes discussed above, with a special focus on non-rigid objects. This focus comes from the need to reconstruct high quality 3D models, of patients treated for breast cancer, for the evaluation of the aesthetic outcome. With the non-rigid algorithms the reconstruction process is more robust to small movements during the captures.The sensor used for the captures was the Microsoft Kinect, due to the possibility of obtaining both color (RGB) and depth images, called RGB-D images. With this type of data the final 3D model can be textured, which is an advantage for many cases. The other main reason for this choice was the fact that Microsoft Kinect is a low-cost equipment, thereby becoming an alternative to expensive systems available in the market.The main achieved objectives were the reconstruction of 3D models with good quality from noisy captures, using a low cost sensor. The registration of point clouds without knowing the sensor's pose, allowing the free movement of the sensor around the objects. Finally the registration of point clouds with small deformations between them, where the conventional rigid registration algorithms could not be used.
APA, Harvard, Vancouver, ISO, and other styles
37

Liu, Che-Wei, and 劉哲瑋. "Evaluation of Disparity Estimation Schemes using Captured RGB-D Images." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/41941973935343767817.

Full text
Abstract:
碩士
國立交通大學
電子工程學系 電子研究所
101
In 3D image processing, the depth estimation based on the given left and right images (the so-called stereo matching algorithms) has been widely used in many 3D applications. One type of applications tracks the body motion and/or poses with the aid of depth information. How to evaluate depth estimation algorithms for different applications becomes an issue. The conventional method of evaluating these depth estimation algorithms is often using a small number of test computer-generated images, which is insufficient to reflect the problems in the real world applications. In this study, we design a number of scenes and capture them using the RGB-D cameras; that is, our dataset consists of stereo pair images and their corresponding ground truth disparity map. Our dataset contains two categories of factors that may affect the performance of the stereo matching algorithms. They are image content factors and image quality factors. The image content factor group includes simple and complex backgrounds, different number of objects, different hand poses and clothing with various color patterns. In the group of image quality factor, we create images with different PSNR and rectification errors. In addition, each stereo pair has their ground truth disparity map. All images and the depth maps are captured by a pair of Kinect devices. To generate appropriate images for the test dataset, we need to calibrate and rectify the captured RGB image pairs and we also need to process the captured depth maps and create the so-called trimaps for evaluation purpose. For the left and right color images, because they come from different sensors, we must perform camera calibration to obtain the camera parameters, and color calibration to match colors in two images. Also, we align the left and right images using the existing camera rectification technique. To generate the ground truth disparity map, we first capture the raw depth map from Kinect, and we warp it from the view of the IR camera to the RGB camera. These depth maps have many black holes due to its sensing mechanism. To make the ground truth disparity map more reliable, we propose an adaptive hole-filling algorithm. Last, we adopt the matting segmentation concept to create a tri-value map (trimap) that classifies image pixels into foreground, background, and in-between regions. Our error metrics are bad-matching pixel rate and the mean square error between the ground truth disparity map and the estimated disparity map. We focus on the performance in the foreground region. In our experiments, three stereo matching algorithms are used to test our dataset and evaluation methodology. We analyze these algorithms based on the collected data.
APA, Harvard, Vancouver, ISO, and other styles
38

Aguiar, Mário André Pinto Ferraz de. "3D reconstruction from multiple RGB-D images with different perspectives." Dissertação, 2015. https://repositorio-aberto.up.pt/handle/10216/89542.

Full text
Abstract:
Reconstrução de modelos 3D pode ser uma tarefa útil para várias finalidades. Alguns exemplos são a modelação de uma pessoa ou objeto para uma animação, em robótica, modelação de espaços para exploração ou, para fins clínicos, modelação de pacientes ao longo do tempo para manter um histórico do corpo do paciente. O processo de reconstrução é constituído pelas capturas do objeto a ser modelado, a conversão destas capturas para nuvens de pontos e o alinhamento de cada nuvem de pontos por forma a obter o modelo 3D.A metodologia implementada para o processo de alinhamento foi o mais genérico quanto possível, para poder ser usado para os múltiplos fins discutidos acima, com um foco especial nos objetos não-rígidos. Este foco vem da necessidade de reconstruir modelos 3D de alta qualidade, de pacientes tratadas para o cancro da mama, para a avaliação estética do resultado cirúrgico. Com o uso de algoritmos de alinhamento não-rígido, o processo de reconstrução fica mais robusto a pequenos movimentos durante as capturas.O sensor utilizado para as capturas foi o Microsoft Kinect, devido à possibilidade de se obter imagens de cores (RGB) e imagens de profundidade, mais conhecidas por imagens RGB -D. Com este tipo de dados o modelo 3D final pode ter textura, o que é uma vantagem em muitos casos. A outra razão principal para esta escolha foi o fato de o Microsoft Kinect ser um sensor de baixo custo, tornando-se assim uma alternativa aos sistemas mais dispendiosos disponíveis no mercado.Os principais objetivos alcançados foram a reconstrução de modelos 3D com boa qualidade a partir de capturas com ruido, usando um sensor de baixo custo. O registro de nuvens de pontos sem conhecimento prévio sobre a pose do sensor, permitindo a livre circulação do sensor em torno dos objetos. Por fim, o registo de nuvens de pontos com pequenas deformações entre elas, onde os algoritmos de alinhamento rígido convencionais não podem ser utilizados.
3D model reconstruction can be a useful tool for multiple purposes. Some examples are modeling a person or objects for an animation, in robotics, modeling spaces for exploration or, for clinical purposes, modeling patients over time to keep a history of the patient's body. The reconstruction process is constituted by the captures of the object to be reconstructed, the conversion of these captures to point clouds and the registration of each point cloud to achieve the 3D model.The implemented methodology for the registration process was as much general as possible, to be usable for the multiple purposes discussed above, with a special focus on non-rigid objects. This focus comes from the need to reconstruct high quality 3D models, of patients treated for breast cancer, for the evaluation of the aesthetic outcome. With the non-rigid algorithms the reconstruction process is more robust to small movements during the captures.The sensor used for the captures was the Microsoft Kinect, due to the possibility of obtaining both color (RGB) and depth images, called RGB-D images. With this type of data the final 3D model can be textured, which is an advantage for many cases. The other main reason for this choice was the fact that Microsoft Kinect is a low-cost equipment, thereby becoming an alternative to expensive systems available in the market.The main achieved objectives were the reconstruction of 3D models with good quality from noisy captures, using a low cost sensor. The registration of point clouds without knowing the sensor's pose, allowing the free movement of the sensor around the objects. Finally the registration of point clouds with small deformations between them, where the conventional rigid registration algorithms could not be used.
APA, Harvard, Vancouver, ISO, and other styles
39

Lin, Ku-Ying, and 林谷穎. "Real-time Human Detection System Design Based on RGB-D Images." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/21485387071019525935.

Full text
Abstract:
碩士
國立交通大學
電控工程研究所
101
This thesis proposes a real-time human detection system based on RGB-D images generated by Kinect to find out humans from a sequence of images. The system is separated into four parts, including region-of-interest (ROI) selection, feature extraction, human shape recognition and motionless human checking. First, the histogram projection, connected component labeling and moving objects segmentation are applied to select the ROIs according to the property that human is walking or standing with motion. Second, resize the ROIs based on the bilinear interpolation approach and extract the human shape feature by Histogram of Oriented Gradients (HOG). Then, support vector machine or artificial neural network is adopted to train the classifier based on Leeds Sports Pose dataset, and human shape recognition is implemented by the classifier. Finally, check whether the image contains any motionless human, and then recognize it. From the experimental results, the system could detect humans in real-time with high accuracy rate.
APA, Harvard, Vancouver, ISO, and other styles
40

Pintor, António Bastos. "A rigid 3D registration framework of women body RGB-D images." Master's thesis, 2016. https://repositorio-aberto.up.pt/handle/10216/88729.

Full text
Abstract:
O trabalho realizado foca-se no melhoramento e automatização da framework desenvolvida para o projeto PICTURE do grupo de investigação VCMI do INESC-TEC. O principal objetivo tem que ver com a criação de modelos 3D do torso de pacientes com cancro da mama, a partir de dados adquiridos com sensores RGB-D low-cost, como o Kinect da Microsoft. As contribuições da tese passam pela criação de algoritmos para a automatização de processos, tais como: seleção da pose da mulher, segmentação do torso, remoção de ruído e o registo de múltiplas nuvens de pontos. O trabalho tem decorrido de forma aproximada o plano traçado no relatório de PDI, e neste momento encontra-se numa fase de finalização da implementação e testes para validação dos algoritmos desenvolvidos. Por outro lado, já foi iniciado o processo de escrita do documento final da Dissertação.
APA, Harvard, Vancouver, ISO, and other styles
41

Pintor, António Bastos. "A rigid 3D registration framework of women body RGB-D images." Dissertação, 2016. https://repositorio-aberto.up.pt/handle/10216/88729.

Full text
Abstract:
O trabalho realizado foca-se no melhoramento e automatização da framework desenvolvida para o projeto PICTURE do grupo de investigação VCMI do INESC-TEC. O principal objetivo tem que ver com a criação de modelos 3D do torso de pacientes com cancro da mama, a partir de dados adquiridos com sensores RGB-D low-cost, como o Kinect da Microsoft. As contribuições da tese passam pela criação de algoritmos para a automatização de processos, tais como: seleção da pose da mulher, segmentação do torso, remoção de ruído e o registo de múltiplas nuvens de pontos. O trabalho tem decorrido de forma aproximada o plano traçado no relatório de PDI, e neste momento encontra-se numa fase de finalização da implementação e testes para validação dos algoritmos desenvolvidos. Por outro lado, já foi iniciado o processo de escrita do documento final da Dissertação.
APA, Harvard, Vancouver, ISO, and other styles
42

Lourenço, Francisco Rodrigues. "6DoF Object Pose Estimation from RGB-D Images Using Machine Learning Approaches." Master's thesis, 2021. http://hdl.handle.net/10316/96141.

Full text
Abstract:
Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia
A estimativa da pose de objetos em imagens RGB-D, tem ganho bastante atenção na passada década com o aparecimento de sensores RGB-D ao nível do consumidor. O seu baixo custo acoplado com relevantes especificações técnicas, levaram à sua aplicação em áreas cientificas tais como condução autónoma, realidade aumentada e robótica.Em geral, a informação de profundidade trouxe complexidade adicional a grande parte das aplicações práticas onde se usavam apenas imagens RGB. Para além disso, quando se tenta estimar a pose de um objeto, há outros desafios tais como cenas com vários objetos, oclusão por parte dos mesmos, objetos simétricos, objetos sem textura e até falta de visibilidade devido a pouca iluminação. Tendo isto em conta, os investigadores começaram a adoptar técnicas de aprendizagem automática para resolver o problema da estimação da pose de objetos. O problema com esta abordagem é que, por norma, costuma ser computacionalmente intensiva e complexa de implementar. Para além disso, apenas recentemente a investigação se tem direcionado para vídeos RGB-D, com o primeiro dataset de referência contendo apenas vídeos a ser publicado em 2017. Portanto, apenas poucos e bastante recentes métodos foram desenvolvidos para funcionar com vídeos, tornando assim o funcionamento em tempo real numa questão ainda por resolver.Posto isto, esta tese tem como objectivo explorar todas as ferramentas necessárias para construir um estimador da pose, oferecer uma revisão compreensiva para cada uma destas ferramentas, comparar e avalia-las, estudar como estas podem ser implementadas, avaliar se a estimação da pose poderá ser ou não feita em tempo real e também como esta se generaliza para o mundo real. Em adição a isto, será proposto o uso de estatística direcional para o avaliação da repetibilidade de sensores RGB-D, um melhoramento na estrutura de um bastante conhecido estimador da pose, uma arquitetura que utiliza um algoritmo de aproximação geométrica bastante recente como auxílio ao estimador da pose, e ainda uma métrica que permite avaliar a repetibilidade tanto das poses estimadas como das poses fundamentais de um dataset.
Object pose estimation using RGB-D images has gained increasing attention in the past decade with the emergence of consumer-level RGB-D sensors in the market. Their low-cost coupled with relevant technical specifications led to their application in areas such as autonomous driving, augmented reality, and robotics.Depth information has, in general, brought additional complexity to most applications that previously used only RGB images. Moreover, when trying to estimate an object pose, one may face challenges such as cluttered scenes, occlusion, symmetric objects, texture-less objects, and low visibility due to insufficient illumination. Accordingly, researchers started to adopt machine learning approaches to tackle the 6DoF of the object pose estimation problem. Such approaches are often quite complex to implement and computationally demanding. Furthermore, the research was only directed to RGB-D videos quite recently, with the first benchmark dataset containing videos being published only in 2017. Therefore, only very recent methods were designed to process videos, and some questions regarding real-time applicability arise.That being said, this thesis aims to explore all the tools required to build a 6DoF pose estimator, provide a comprehensive review on each tool, compare and evaluate them, assess how a practitioner can implement such tools, evaluate whether or not it is possible to estimate 6DoF poses in real-time, and also evaluate how these tools generalize to a real-world scenario. As a plus, it will be proposed the usage of directional statistics to evaluate an RGB-D sensor precision, a tweak to a famous 6DoF object pose estimation model, a pipeline that uses a novel 3D point cloud registration algorithm to aid the pose estimator, and a metric that can measure the precision/repeatability of both estimated poses of a model and the ground-truth poses of a dataset.
APA, Harvard, Vancouver, ISO, and other styles
43

Kuo, Syuan-Wei, and 郭宣瑋. "Orientation Modelling for RGB-D Images using Angle and Distance Combined Adjustment." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/z3b6d4.

Full text
Abstract:
碩士
國立交通大學
土木工程系所
106
RGB-D cameras are widely applied in indoor mapping and pattern recognition, capturing both RGB image and per-pixels depth image simultaneously. It is important to acquire the correct orientations for indoor mapping, yet homogenous areas are hard to be detected for feature points. Therefore, taking advantage of RGB-D data, the observations contains angle and range information that construct control points to constraint the orientation modeling. This thesis proposed a novel method for orientation modeling for sequential point clouds registration. The main process of this study comprises three parts. First, the intrinsic parameters of two sensors and depth distortion are calibrated. Second, four orientation modeling methods are introduced in this paragraph. Triangulation optimizes only angle information with collinearity equations, trilateration optimizes range, combining both triangulation with trilateration (called combine-1 in this study), and scale fixed adjustment (called combine-2 in this study) with rigid constraints in every rays. Finally, the iterative closest point (ICP) algorithm was performed in such registration of transformed sequential point clouds. The experimental results show the image distortion and depth distortion of RGB-D sensors need to be considered in data preprocessing. In the evaluation of different orientation modeling methods, we first simulated a number of control points, variance of depths and different distribution of control points on the images. This research registers sequential point clouds using RGB and depth information. The standard deviations of camera position in triangulation, combine-1 and combine-2 are respectively 14.108, 0.677 and 0.595 mm. Meanwhile, thestandard deviations of camera rotation angles in triangulation, combined-1 and combine-2 are 0.005, 0.007 and 0.001 rad. The results indicated that the combined adjustment show better precision than triangulation method. The point-to-point distances of point clouds pairs computed by ICP algorithm are better than 11.3 mm, and it is about 1.5 times of range precision (i.e. 3.5mm).
APA, Harvard, Vancouver, ISO, and other styles
44

Hsieh, Kai-Nan, and 謝鎧楠. "Rear obstacle detection using a deep convolutional neural network with RGB-D images." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/ghsw4p.

Full text
Abstract:
碩士
國立中央大學
資訊工程學系
106
Car accident happens frequently after becoming the most popular transportation devices in daily life, and it costs life and properties because of driver’s negligence. Therefore, many motor manufacturing have invested and developed the “Driving Assistant System” in order to promote the safety of driving. Computer Vision (CV) has been adopted due to it’s ability of object detection and recognition. In recent years, Convolutional neural networks (CNN) has dramaticly developed which makes computer vision much more reliable. We train our “Rear obstacle detection and recognizing system” via deep learning model and use data of color image and depth image which received from Microsoft KinectV2. Because of the field of view (FOV) from KinectV2 is different, we calibrate color image and depth image using Kinect SDK in order to decrease the disparity of pixel position. Our detecting and recognizing system is based on Faster R-CNN. Our input data contains two images, and we experiment two different architectures on convolutional neural networks to extract feature maps from input data. One is single feature extractor and single classifier, and the other is two feature extractor and single classifier. Two feature extractor generate the best detection result. Furthermore, we use only color image or depth image as input doing experiments comparing with previous two methods. Finally, after detecting obstacle we use depth image to estimate the distance between vehicle and obstacle.
APA, Harvard, Vancouver, ISO, and other styles
45

Cho, Shih-Hsuan, and 卓士軒. "Semantic Segmentation of Indoor-Scene RGB-D Images Based on Iterative Contraction and Merging." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/c9a9vg.

Full text
Abstract:
碩士
國立交通大學
電子研究所
105
For semantic segmentation of indoor-scene images, we propose a method which combines convolutional neural network (CNNs) and the Iterative Contraction & Merging (ICM) algorithm. We also simultaneously utilize the depth images to efficiently analyze the 3-D space in indoor-scene images. The raw depth image from the depth camera is processed by two bilateral filters to recover a smoother and more complete depth image. On the other hand, the ICM algorithm is an unsupervised segmentation method that can preserve the boundary information well. We utilize the dense prediction from CNN, depth image and normal vector map as the high-level information to guide the ICM process for generating image segments in a more accurate way. In other words, we progressively generate the regions from high resolution to low resolution and generate a hierarchical segmentation tree. We also propose a decision process to determine the final decision of the semantic segmentation based on the hierarchical segmentation tree by using the dense prediction map as a reference. The proposed method can generate more accurate object boundaries as compared to the state-of-the-art methods. Our experiments also show that the use of high-level information does improve the performance of semantic segmentation as compared to the use of RGB information only.
APA, Harvard, Vancouver, ISO, and other styles
46

Yuan-Cheng, Lee, and 李元正. "Accurate and robust face recognition from RGB-D images with a deep learning approach." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/34473208854998399783.

Full text
Abstract:
碩士
國立清華大學
資訊工程學系
104
Face recognition from RGB-D images utilizes two complementary types of image data, i.e. color and depth images, to achieve more accurate recognition. In this thesis, we propose a face recognition system based on deep learning, which can be used to verify and identify a subject from the color and depth face images captured with a consumer-level RGB-D camera. (e.g., Microsoft Kinect). To recognize faces with color and depth information, our system contains 3 parts: depth image recovery, deep learning for feature extraction, and joint classification. To gain recognition performance of a depth face image, we propose a series of image processing techniques to recover and enhance a depth image from its neighboring depth frames, thus reconstructing a precise 3D facial model. With multi-view resampling, we can compute the depth images corresponding to various viewing angles of a single 3D face model. To alleviate the problem of the limited size of available RGB-D data for deep learning, transfer learning is applied. Our deep network architecture contains recently popular components. We first train the deep network on color face dataset, and next fine-tune with depth images for transfer learning. The deep networks are used to extract discriminative feature (deep representation) from color and depth images. Not only these deep representations are taken into consideration, we analyze the relation between each image and the other images in the database, to design our classifier, to reach higher recognition accuracy and better robustness. Our experiments show that the proposed face recognition system provides very accurate face recognition results on public datasets, and it is robust against variations in head pose and illumination.
APA, Harvard, Vancouver, ISO, and other styles
47

Gama, Filipe Xavier da Graça. "Efficient processing techniques for depth maps generated by rgb-d sensors." Master's thesis, 2015. http://hdl.handle.net/10400.8/2524.

Full text
Abstract:
Nesta dissertação, um novo método para melhorar mapas de profundidade gerados por sensores RGB-D como a Microsoft Kinect e a Asus Xtion Pro Live é apresentado. Mapas de profundidade gerados por sensores RGB-D normalmente contêm diversos artefactos como oclusões ou buracos devido aos erros de medição da profundidade e das propriedades da superfície dos objetos. Estes problemas têm no seu conjunto um impacto negativo em aplicações que dependem da qualidade dos mapas de profundidade. Neste trabalho, um novo método foi desenvolvido de modo a assegurar que os buracos são preenchidos, que os contornos dos objetos são bem definidos e alinhados com a imagem de textura. O método proposto combina a informação da segmentação da imagem de textura com o método de inpainting baseado nas equações de Navier-Stokes. Para além disso, o método proposto também integra um filtro não-linear para redução do ruído. Este filtro utiliza a informação da imagem de textura para o efeito. Testes experimentais mostram que o método proposto obtém a mesma performance de métodos da literatura atual em termos de qualidade do melhoramento de mapas de profundidade.
APA, Harvard, Vancouver, ISO, and other styles
48

DENG, JU-CHIEH, and 鄧茹潔. "The Application of Deep Learning in RGB-D Images for the Control of Robot Arm." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/36348540372513931289.

Full text
Abstract:
碩士
銘傳大學
電腦與通訊工程學系碩士班
105
Robot research is one of the important issues of the development of science and technology, the robotics and artificial intelligence robot development, work completed by the robot is no longer a simple, repetitive movements, but expect the robot has independent thinking ability, increase the application of robots, to improve the practicality, the robot vision has become one of the most critical technology. In Google I/O conference, Google can be seen in the world to promote the Impact Challenge project and Google.org project, bringing together technology and a new team, the use of science and technology to make the world better, including in the limb, hearing impaired and Parkinson's disease patients and other fields. Therefore, the aim of this study is to assist the people with upper limb disabilities in grasping distant objects, such as sports injury, elderly joint degeneration and spinal muscular atrophy, which could lead the upper limbs to move abnormally, the use of robot arms help the people with upper limb disabilities and enhance the convenient of daily life. In this study, RA605 joint robot arm is used to combine the visual images and the application of deep learning, conduct system integration, to achieve the robot arm precise positioning, target recognition, mobile control and grasping the target object. The vision system uses Kinect v2 camera and Logitech C525 camera. The environment image is extracted by Kinect v2 and the deep learning algorithm is used to recognize the target object and obtain the coordinate position of the object. Logitech C525 camera mounted on the sixth joint of the robot, can be rotated with the sixth joint. In order to confirm the position calculated by the above Kinect v2 and capture the image, calculate the clamping position of the target object and control the electric gripper so as to successfully capture the target object. To achieve the goal of assisting the people with upper limb disabilities in grasping distant objects.
APA, Harvard, Vancouver, ISO, and other styles
49

HUNG, CHENG-HSIAO, and 洪承孝. "Study of Real-time Workpieces Recognition in Powder Coating Production Line Based on RGB-D Images." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/gsue24.

Full text
Abstract:
碩士
朝陽科技大學
資訊工程系
107
Powder coating is used often on the metal or aluminum for protective as well as decorative purposes, for example, office furniture, bicycle frame, and so on. However, the length of a powder coating production line may be over 400 meters. The elapsed time to finish the powder coating of a workpiece is about one hour. There are several steps in the powder coating process, including cleaning, pre-treatment, rinse/dry, powder coat, curing. So, the tracking and tracing workpieces in the production line and collecting real-time production data is an important issue of manufacturing execution system (MES). It is also the key information of intelligent manufacturing. In order to achieve the above goal, the cooperative company, MaChan International Co., LTD., attempted to develop a RFID-based system two years ago. However, several problems cause the failure of the system, including, the cost too high, one hook with multiple workpieces, multiple hooks with one workpiece, lost workpiece, duplicate process. This study is proposed and expected to achieve the above goal using pattern recognition technique. In this study, several monitoring stations will be installed in the production line. All the workpieces are coating in groups. The workpieces in the same group are almost identical. For every station, those workpieces are detected, grouping, and counting. In advanced, a synchronized hardware counter is used in every monitoring stations. The counter value can be used to identify the same group, lost workpiece, or duplicate processed workpieces. In the experimental study, the accuracy of the group identification can reach 90% no matter in daytime or nighttime. The accuracy of the line stop detection can reach 90% in daytime and 100% in nighttime. The above results should that the proposed group identification method is feasible.
APA, Harvard, Vancouver, ISO, and other styles
50

Marques, Márcio Filipe Santos. "Sistemas de monitorização e proteção baseados em visão 3D : desenvolvimento de uma aplicação de segurança e proteção industrial utilizandos Sensores RGB-D." Master's thesis, 2017. http://hdl.handle.net/10400.26/23023.

Full text
Abstract:
Este trabalho teve como objetivo principal o estudo da monitorização de áreas ou volumes nos mais diversos ambientes. Um objetivo mais específico foi o desenvolvimento de um sistema de monitorização de segurança e proteção em ambiente industrial utilizando sensores RGB-D. Procedeu-se ao levantamento da legislação em vigor para a implementação e utilização de sistemas de monitorização de espaços públicos e privados com base em vídeo, bem como das tecnologias e sistemas existentes atualmente no mercado com o mesmo objetivo. Relativamente à legislação a considerar, foi necessário analisar duas perspetivas diferentes. A perspetiva da videovigilância e monitorização de espaços com o objetivo de garantir a segurança de pessoas e bens em relação a comportamentos impróprios e criminais, como vandalismo, roubo e violência, que obedece a regras definidas em legislação própria para a sua implementação e utilização. A outra perspetiva refere-se ao ambiente industrial, onde os sistemas de monitorização têm por finalidade o apoio à produção e à segurança das pessoas e equipamentos, e onde os aspetos mais relevantes a considerar estão associados ao cumprimento de normas relativas aos procedimentos, equipamentos e instalações. Estas duas perspetivas podem conjugar-se se existir captura e gravação de imagens e vídeo, onde seja permitida a identificação dos locais e das pessoas. Em termos tecnológicos, começou-se por analisar as técnicas e tecnologias associadas à captação de imagem e vídeo 3D, tais como a Triangulação, Stereo Vision e a técnica Time Of Flight. Considerando a técnica Time Of Flight, a mais avançada e com melhores resultados em termos de exatidão e precisão, procedeu-se à pesquisa de hardware que satisfizesse as necessidades específicas do projeto. O sensor Kinect V2 da Microsoft satisfaz essas necessidades e, portanto, foi a escolha natural para avançar com o projeto. Relativamente ao software utilizou-se o Matlab como ambiente de desenvolvimento que, em conjunto com o SDK da Microsoft e código C específico de integração do Kinect V2 no Matlab, serviram de base para a evolução e a concretização dos objetivos do projeto. No desenvolvimento da aplicação foram consideradas duas fases, a fase de calibração e a fase de monitorização e deteção. Na fase de calibração foi definido o volume a monitorizar e quantificada a ocupação do volume sem a existência de elementos invasivos. Na fase de monitorização e deteção foi capturada a informação 3D, que permite identificar elementos externos que representam uma intrusão ou evasão no/do volume, que está constantemente a ser monitorizado. Este processo de monitorização foi conseguido através da implementação de um algoritmo de quantificação do volume ocupado, baseado na Triangulação Delaunay em 3D. Para proceder ao teste em ambiente industrial do sistema desenvolvido, este foi inserido no espaço de operação de um braço robótico a funcionar em modo contínuo, de modo a verificar o seu bom funcionamento e avaliar a sua performance. Por fim, como objetivo de desenvolvimento futuro, foi proposta a associação integrada de vários sistemas Kinect V2, de forma a mapear completamente todo o volume 3D que se pretende monitorizar, possibilitando total flexibilidade de monitorização e deteção.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography