To see the other types of publications on this topic, follow the link: RGB-Depth Image.

Dissertations / Theses on the topic 'RGB-Depth Image'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 16 dissertations / theses for your research on the topic 'RGB-Depth Image.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Deng, Zhuo. "RGB-DEPTH IMAGE SEGMENTATION AND OBJECT RECOGNITION FOR INDOOR SCENES." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/427631.

Full text
Abstract:
Computer and Information Science
Ph.D.
With the advent of Microsoft Kinect, the landscape of various vision-related tasks has been changed. Firstly, using an active infrared structured light sensor, the Kinect can provide directly the depth information that is hard to infer from traditional RGB images. Secondly, RGB and depth information are generated synchronously and can be easily aligned, which makes their direct integration possible. In this thesis, I propose several algorithms or systems that focus on how to integrate depth information with traditional visual appearances for addressing different computer vision applications. Those applications cover both low level (image segmentation, class agnostic object proposals) and high level (object detection, semantic segmentation) computer vision tasks. To firstly understand whether and how depth information is helpful for improving computer vision performances, I start research on the image segmentation field, which is a fundamental problem and has been studied extensively in natural color images. We propose an unsupervised segmentation algorithm that is carefully crafted to balance the contribution of color and depth features in RGB-D images. The segmentation problem is then formulated as solving the Maximum Weight Independence Set (MWIS) problem. Given superpixels obtained from different layers of a hierarchical segmentation, the saliency of each superpixel is estimated based on balanced combination of features originating from depth, gray level intensity, and texture information. We evaluate the segmentation quality based on five standard measures on the commonly used NYU-v2 RGB-Depth dataset. A surprising message indicated from experiments is that unsupervised image segmentation of RGB-D images yields comparable results to supervised segmentation. In image segmentation, an image is partitioned into several groups of pixels (or super-pixels). We take one step further to investigate on the problem of assigning class labels to every pixel, i.e., semantic scene segmentation. We propose a novel image region labeling method which augments CRF formulation with hard mutual exclusion (mutex) constraints. This way our approach can make use of rich and accurate 3D geometric structure coming from Kinect in a principled manner. The final labeling result must satisfy all mutex constraints, which allows us to eliminate configurations that violate common sense physics laws like placing a floor above a night stand. Three classes of mutex constraints are proposed: global object co-occurrence constraint, relative height relationship constraint, and local support relationship constraint. Segments obtained from image segmentation can be either too fine or too coarse. A full object region not only conveys global features but also arguably enriches contextual features as confusing background is separated. We propose a novel unsupervised framework for automatically generating bottom up class independent object candidates for detection and recognition in cluttered indoor environments. Utilizing raw depth map, we propose a novel plane segmentation algorithm for dividing an indoor scene into predominant planar regions and non-planar regions. Based on this partition, we are able to effectively predict object locations and their spatial extensions. Our approach automatically generates object proposals considering five different aspects: Non-planar Regions (NPR), Planar Regions (PR), Detected Planes (DP), Merged Detected Planes (MDP) and Hierarchical Clustering (HC) of 3D point clouds. Object region proposals include both bounding boxes and instance segments. Although 2D computer vision tasks can roughly identify where objects are placed on image planes, their true locations and poses in the physical 3D world are difficult to determine due to multiple factors such as occlusions and the uncertainty arising from perspective projections. However, it is very natural for human beings to understand how far objects are from viewers, object poses and their full extents from still images. These kind of features are extremely desirable for many applications such as robotics navigation, grasp estimation, and Augmented Reality (AR) etc. In order to fill the gap, we addresses the problem of amodal perception of 3D object detection. The task is to not only find object localizations in the 3D world, but also estimate their physical sizes and poses, even if only parts of them are visible in the RGB-D image. Recent approaches have attempted to harness point cloud from depth channel to exploit 3D features directly in the 3D space and demonstrated the superiority over traditional 2D representation approaches. We revisit the amodal 3D detection problem by sticking to the 2D representation framework, and directly relate 2D visual appearance to 3D objects. We propose a novel 3D object detection system that simultaneously predicts objects' 3D locations, physical sizes, and orientations in indoor scenes.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
2

Hasnat, Md Abul. "Unsupervised 3D image clustering and extension to joint color and depth segmentation." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4013/document.

Full text
Abstract:
L'accès aux séquences d'images 3D s'est aujourd'hui démocratisé, grâce aux récentes avancées dans le développement des capteurs de profondeur ainsi que des méthodes permettant de manipuler des informations 3D à partir d'images 2D. De ce fait, il y a une attente importante de la part de la communauté scientifique de la vision par ordinateur dans l'intégration de l'information 3D. En effet, des travaux de recherche ont montré que les performances de certaines applications pouvaient être améliorées en intégrant l'information 3D. Cependant, il reste des problèmes à résoudre pour l'analyse et la segmentation de scènes intérieures comme (a) comment l'information 3D peut-elle être exploitée au mieux ? et (b) quelle est la meilleure manière de prendre en compte de manière conjointe les informations couleur et 3D ? Nous abordons ces deux questions dans cette thèse et nous proposons de nouvelles méthodes non supervisées pour la classification d'images 3D et la segmentation prenant en compte de manière conjointe les informations de couleur et de profondeur. A cet effet, nous formulons l'hypothèse que les normales aux surfaces dans les images 3D sont des éléments à prendre en compte pour leur analyse, et leurs distributions sont modélisables à l'aide de lois de mélange. Nous utilisons la méthode dite « Bregman Soft Clustering » afin d'être efficace d'un point de vue calculatoire. De plus, nous étudions plusieurs lois de probabilités permettant de modéliser les distributions de directions : la loi de von Mises-Fisher et la loi de Watson. Les méthodes de classification « basées modèles » proposées sont ensuite validées en utilisant des données de synthèse puis nous montrons leur intérêt pour l'analyse des images 3D (ou de profondeur). Une nouvelle méthode de segmentation d'images couleur et profondeur, appelées aussi images RGB-D, exploitant conjointement la couleur, la position 3D, et la normale locale est alors développée par extension des précédentes méthodes et en introduisant une méthode statistique de fusion de régions « planes » à l'aide d'un graphe. Les résultats montrent que la méthode proposée donne des résultats au moins comparables aux méthodes de l'état de l'art tout en demandant moins de temps de calcul. De plus, elle ouvre des perspectives nouvelles pour la fusion non supervisée des informations de couleur et de géométrie. Nous sommes convaincus que les méthodes proposées dans cette thèse pourront être utilisées pour la classification d'autres types de données comme la parole, les données d'expression en génétique, etc. Elles devraient aussi permettre la réalisation de tâches complexes comme l'analyse conjointe de données contenant des images et de la parole
Access to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysis
APA, Harvard, Vancouver, ISO, and other styles
3

Baban, a. erep Thierry Roland. "Contribution au développement d'un système intelligent de quantification des nutriments dans les repas d'Afrique subsaharienne." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSEP100.

Full text
Abstract:
La malnutrition, qu'elle soit liée à un apport insuffisant ou excessif en nutriments, représente un défi mondial de santé publique touchant des milliards de personnes. Elle affecte tous les systèmes organiques en étant un facteur majeur de risque pour les maladies non transmissibles telles que les maladies cardiovasculaires, le diabète et certains cancers. Évaluer l'apport alimentaire est crucial pour prévenir la malnutrition, mais cela reste un défi. Les méthodes traditionnelles d'évaluation alimentaire sont laborieuses et sujettes aux biais. Les avancées en IA ont permis la conception de VBDA, solution prometteuse pour analyser automatiquement les images alimentaires afin d'estimer les portions et la composition nutritionnelle. Cependant, la segmentation des images alimentaires dans un VBDA rencontre des difficultés en raison de la structure non rigide des aliments, de la variation intra-classe élevée (où le même type d'aliment peut apparaître très différent), de la ressemblance inter-classe (où différents types d'aliments semblent visuellement très similaires) et de la rareté des ensembles de données disponibles publiquement.Presque toutes les recherches sur la segmentation alimentaire se sont concentrées sur les aliments asiatiques et occidentaux, en l'absence de bases de données pour les cuisines africaines. Cependant, les plats africains impliquent souvent des classes alimentaires mélangées, rendant la segmentation précise difficile. De plus, la recherche s'est largement concentrée sur les images RGB, qui fournissent des informations sur la couleur et la texture mais pourraient manquer de suffisamment de détails géométriques. Pour y remédier, la segmentation RGB-D combine des données de profondeur avec des images RGB. Les images de profondeur fournissent des détails géométriques cruciaux qui enrichissent les données RGB, améliorent la discrimination des objets et sont robustes face à des facteurs tels que l'illumination et le brouillard. Malgré son succès dans d'autres domaines, la segmentation RGB-D pour les aliments est peu explorée en raison des difficultés à collecter des images de profondeur des aliments.Cette thèse apporte des contributions clés en développant de nouveaux modèles d'apprentissage profond pour la segmentation d'images RGB (mid-DeepLabv3+) et RGB-D (ESeNet-D) et en introduisant les premiers ensembles de données axés sur les images alimentaires africaines. Mid-DeepLabv3+ est basé sur DeepLabv3+, avec un backbone ResNet simplifié et une couche de saut (middle layer) ajoutée dans le décodeur, ainsi que des couches mécanisme d'attention SimAM. Ce model offre un excellent compromis entre performance et efficacité computationnelle. ESeNet-D est composé de deux branches d'encodeurs utilisant EfficientNetV2 comme backbone, avec un bloc de fusion pour l'intégration multi-échelle et un décodeur employant des convolutions auto-calibrée et interpolations entrainées pour une segmentation précise. ESeNet-D surpasse de nombreux modèles de référence RGB et RGB-D tout en ayant une charge computationnelle plus faible. Nos expériences ont montré que, lorsqu'elles sont correctement intégrées, les informations relatives à la profondeur peuvent améliorer de manière significative la précision de la segmentation des images alimentaires.Nous présentons également deux nouvelles bases de données : AfricaFoodSeg pour la segmentation « aliment/non-aliment » avec 3067 images (2525 pour l'entraînement, 542 pour la validation), et CamerFood, axée sur la cuisine camerounaise. Les ensembles de données CamerFood comprennent CamerFood10 avec 1422 images et dix classes alimentaires, et CamerFood15, une version améliorée avec 15 classes alimentaires, 1684 images d'entraînement et 514 images de validation. Enfin, nous abordons le défi des données de profondeur rares dans la segmentation RGB-D des aliments en démontrant que les modèles MDE peuvent aider à générer des cartes de profondeur efficaces pour les ensembles de données RGB-D
Malnutrition, including under- and overnutrition, is a global health challenge affecting billions of people. It impacts all organ systems and is a significant risk factor for noncommunicable diseases such as cardiovascular diseases, diabetes, and some cancers. Assessing food intake is crucial for preventing malnutrition but remains challenging. Traditional methods for dietary assessment are labor-intensive and prone to bias. Advancements in AI have made Vision-Based Dietary Assessment (VBDA) a promising solution for automatically analyzing food images to estimate portions and nutrition. However, food image segmentation in VBDA faces challenges due to food's non-rigid structure, high intra-class variation (where the same dish can look very different), inter-class resemblance (where different foods appear similar) and scarcity of publicly available datasets.Almost all food segmentation research has focused on Asian and Western foods, with no datasets for African cuisines. However, African dishes often involve mixed food classes, making accurate segmentation challenging. Additionally, research has largely focus on RGB images, which provides color and texture but may lack geometric detail. To address this, RGB-D segmentation combines depth data with RGB images. Depth images provide crucial geometric details that enhance RGB data, improve object discrimination, and are robust to factors like illumination and fog. Despite its success in other fields, RGB-D segmentation for food is underexplored due to difficulties in collecting food depth images.This thesis makes key contributions by developing new deep learning models for RGB (mid-DeepLabv3+) and RGB-D (ESeNet-D) image segmentation and introducing the first food segmentation datasets focused on African food images. Mid-DeepLabv3+ is based on DeepLabv3+, featuring a simplified ResNet backbone with and added skip layer (middle layer) in the decoder and SimAM attention mechanism. This model offers an optimal balance between performance and efficiency, matching DeepLabv3+'s performance while cutting computational load by half. ESeNet-D consists on two encoder branches using EfficientNetV2 as backbone, with a fusion block for multi-scale integration and a decoder employing self-calibrated convolution and learned interpolation for precise segmentation. ESeNet-D outperforms many RGB and RGB-D benchmark models while having fewer parameters and FLOPs. Our experiments show that, when properly integrated, depth information can significantly improve food segmentation accuracy. We also present two new datasets: AfricaFoodSeg for “food/non-food” segmentation with 3,067 images (2,525 for training, 542 for validation), and CamerFood focusing on Cameroonian cuisine. CamerFood datasets include CamerFood10 with 1,422 images from ten food classes, and CamerFood15, an enhanced version with 15 food classes, 1,684 training images, and 514 validation images. Finally, we address the challenge of scarce depth data in RGB-D food segmentation by demonstrating that Monocular Depth Estimation (MDE) models can aid in generating effective depth maps for RGB-D datasets
APA, Harvard, Vancouver, ISO, and other styles
4

Řehánek, Martin. "Detekce objektů pomocí Kinectu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236602.

Full text
Abstract:
With the release of the Kinect device new possibilities appeared, allowing a simple use of image depth in image processing. The aim of this thesis is to propose a method for object detection and recognition in a depth map. Well known method Bag of Words and a descriptor based on Spin Image method are used for the object recognition. The Spin Image method is one of several existing approaches to depth map which are described in this thesis. Detection of object in picture is ensured by the sliding window technique. That is improved and speeded up by utilization of the depth information.
APA, Harvard, Vancouver, ISO, and other styles
5

SANTOS, LEANDRO TAVARES ARAGAO DOS. "GENERATING SUPERRESOLVED DEPTH MAPS USING LOW COST SENSORS AND RGB IMAGES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28673@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
As aplicações da reconstrução em três dimensões de uma cena real são as mais diversas. O surgimento de sensores de profundidade de baixo custo, tal qual o Kinect, sugere o desenvolvimento de sistemas de reconstrução mais baratos que aqueles já existentes. Contudo, os dados disponibilizados por este dispositivo ainda carecem em muito quando comparados àqueles providos por sistemas mais sofisticados. No mundo acadêmico e comercial, algumas iniciativas, como aquelas de Tong et al. [1] e de Cui et al. [2], se propõem a solucionar tal problema. A partir do estudo das mesmas, este trabalho propôs a modificação do algoritmo de super-resolução descrito por Mitzel et al. [3] no intuito de considerar em seus cálculos as imagens coloridas também fornecidas pelo dispositivo, conforme abordagem de Cui et al. [2]. Tal alteração melhorou os mapas de profundidade super-resolvidos fornecidos, mitigando interferências geradas por movimentações repentinas na cena captada. Os testes realizados comprovam a melhoria dos mapas gerados, bem como analisam o impacto da implementação em CPU e GPU dos algoritmos nesta etapa da super-resolução. O trabalho se restringe a esta etapa. As etapas seguintes da reconstrução 3D não foram implementadas.
There are a lot of three dimensions reconstruction applications of real scenes. The rise of low cost sensors, like the Kinect, suggests the development of systems cheaper than the existing ones. Nevertheless, data provided by this device are worse than that provided by more sophisticated sensors. In the academic and commercial world, some initiatives, described in Tong et al. [1] and in Cui et al. [2], try to solve that problem. Studying that attempts, this work suggests the modification of super-resolution algorithm described for Mitzel et al. [3] in order to consider in its calculations coloured images provided by Kinect, like the approach of Cui et al. [2]. This change improved the super resolved depth maps provided, mitigating interference caused by sudden changes of captured scenes. The tests proved the improvement of generated maps and analysed the impact of CPU and GPU algorithms implementation in the superresolution step. This work is restricted to this step. The next stages of 3D reconstruction have not been implemented.
APA, Harvard, Vancouver, ISO, and other styles
6

Thörnberg, Jesper. "Combining RGB and Depth Images for Robust Object Detection using Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174137.

Full text
Abstract:
We investigated the advantage of combining RGB images with depth data to get more robust object classifications and detections using pre-trained deep convolutional neural networks. We relied upon the raw images from publicly available datasets captured using Microsoft Kinect cameras. The raw images varied in size, and therefore required resizing to fit our network. We designed a resizing method called "bleeding edge" to avoid distorting the objects in the images. We present a novel method of interpolating the missing depth pixel values by comparing to similar RGB values. This method proved superior to the other methods tested. We showed that a simple colormap transformation of the depth image can provide close to state-of-art performance. Using our methods, we can present state-of-art performance on the Washington Object dataset and we provide some results on the Washington Scenes (V1) dataset. Specifically, for the detection, we used contours at different thresholds to find the likely object locations in the images. For the classification task we can report state-of-art results using only RGB and RGB-D images, depth data alone gave close to state-of-art results. For the detection task we found the RGB only detector to be superior to the other detectors.
APA, Harvard, Vancouver, ISO, and other styles
7

Möckelind, Christoffer. "Improving deep monocular depth predictions using dense narrow field of view depth images." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235660.

Full text
Abstract:
In this work we study a depth prediction problem where we provide a narrow field of view depth image and a wide field of view RGB image to a deep network tasked with predicting the depth for the entire RGB image. We show that by providing a narrow field of view depth image, we improve results for the area outside the provided depth compared to an earlier approach only utilizing a single RGB image for depth prediction. We also show that larger depth maps provide a greater advantage than smaller ones and that the accuracy of the model decreases with the distance from the provided depth. Further, we investigate several architectures as well as study the effect of adding noise and lowering the resolution of the provided depth image. Our results show that models provided low resolution noisy data performs on par with the models provided unaltered depth.
I det här arbetet studerar vi ett djupapproximationsproblem där vi tillhandahåller en djupbild med smal synvinkel och en RGB-bild med bred synvinkel till ett djupt nätverk med uppgift att förutsäga djupet för hela RGB-bilden. Vi visar att genom att ge djupbilden till nätverket förbättras resultatet för området utanför det tillhandahållna djupet jämfört med en existerande metod som använder en RGB-bild för att förutsäga djupet. Vi undersöker flera arkitekturer och storlekar på djupbildssynfält och studerar effekten av att lägga till brus och sänka upplösningen på djupbilden. Vi visar att större synfält för djupbilden ger en större fördel och även att modellens noggrannhet minskar med avståndet från det angivna djupet. Våra resultat visar också att modellerna som använde sig av det brusiga lågupplösta djupet presterade på samma nivå som de modeller som använde sig av det omodifierade djupet.
APA, Harvard, Vancouver, ISO, and other styles
8

Hammond, Patrick Douglas. "Deep Synthetic Noise Generation for RGB-D Data Augmentation." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7516.

Full text
Abstract:
Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets.
APA, Harvard, Vancouver, ISO, and other styles
9

Tu, Chieh-Min, and 杜介民. "Depth Image Inpainting with RGB-D Camera." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/k4m42a.

Full text
Abstract:
碩士
義守大學
資訊工程學系
103
Since Microsoft released the cheap Kinect sensors as a new natural user interface, stereo imaging is made from previous multi-view color image synthesis, to now synthesis of color image and depth image. But the captured depth images may lose some depth values so that stereoscopic effect is often poor in general. This thesis is based on Kinect RGB-D camera to develop an object-based depth inpainting method. Firstly, the background differencing, frame differencing and depth thresholding strategies are used as a basis for segmenting foreground objects from a dynamic background image. Then, the task of hole inpainting is divided into background area and foreground area, in which background area is inpainted by background depth image and foreground area is inpainted by a best-fit neighborhood depth value. Experimental results show that such an inpainting method is helpful to fill holes, and to improve the contour edges and image quality.
APA, Harvard, Vancouver, ISO, and other styles
10

Lin, Shih-Pi, and 林士筆. "In-air Handwriting Chinese Character Recognition Base on RGB Image without Depth Information." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/2mhfzk.

Full text
Abstract:
碩士
國立中央大學
資訊工程學系
107
As technology changes rapidly, Human-Computer Interaction(HCI) no longer being limited by keyboard. Existing handwriting products are provided sufficient feature to recognize handwriting trajectories on density and stability. For Chinese font, it is relatively difficult for machines to obtain stable trajectory comparing to English and numerals. In the past, in-air hand detection and tracking often used the devices with depth information. For example, Kinect uses two infrared cameras to obtain depth information, which cause higher price on devices. Therefore, the use of RGB information with one camera to achieve object detection and tracking is a trend in recent years. The use of RGB camera as HCI media for in-air handwriting need to deal with accurate hand detection and stability tracking, and the handwriting trajectory has one stroke-finished attribute, which means that it will have both real stroke and virtual stroke, it increases the difficulty of recognition. The hand database uses to build the model contains, self-recorded handwriting videos and the relevant hand data sets collected on the Internet. By adding the Multiple Receptive Field(MRF) in processing data, which scale the ground truth and regard the scaled as a new object, it increases the robustness of detection. This paper uses YOLO v3 as the core neural network model, and adds Convolutional Recurrent Neural Network(CRNN) to convert YOLO into a time-sequential neural network to stabilize tracking. The analysis of the experimental results shows that the hand detection can be more robust after the data processed by the MRF. The converted YOLO improves the stability of hand tracking. Overall, using several Chinese character recognition methods, the accuracy of recognize in-air handwriting trajectory in Chinese characters is about 96.33%.
APA, Harvard, Vancouver, ISO, and other styles
11

Lu, Kaiyue. "Learning to Enhance RGB and Depth Images with Guidance." Phd thesis, 2022. http://hdl.handle.net/1885/258498.

Full text
Abstract:
Image enhancement improves the visual quality of the input image to better identify key features and make it more suitable for other vision applications. Structure degradation remains a challenging problem in image enhancement, which refers to blurry edges or discontinuous structures due to unbalanced or inconsistent intensity transitions on structural regions. To overcome this issue, it is popular to make use of a guidance image to provide additional structural cues. In this thesis, we focus on two image enhancement tasks, i.e., RGB image smoothing and depth image completion. Through the two research problems, we aim to have a better understanding of what constitutes suitable guidance and how its proper use can benefit the reduction of structure degradation in image enhancement. Image smoothing retains salient structures and removes insignificant textures in an image. Structure degradation results from the difficulty in distinguishing structures and textures with low-level cues. Structures may be inevitably blurred if the filter tries to remove some strong textures that have high contrast. Moreover, these strong textures may also be mistakenly retained as structures. We address this issue by applying two forms of guidance for structures and textures respectively. We first design a kernel-based double-guided filter (DGF), where we adopt semantic edge detection as structure guidance, and texture decomposition as texture guidance. The DGF is the first kernel filter that simultaneously leverages structure guidance and texture guidance to be both ''structure-aware'' and ''texture-aware''. Considering that textures present high randomness and variations in spatial distribution and intensities, it is not robust to localize and identify textures with hand-crafted features. Hence, we take advantage of deep learning for richer feature extraction and better generalization. Specifically, we generate synthetic data by blending natural textures with clean structure-only images. With the data, we build a texture prediction network (TPN) that estimates the location and magnitude of textures. We then combine the texture prediction results from TPN with a semantic structure prediction network so that the final texture and structure aware filtering network (TSAFN) is able to distinguish structures and textures more effectively. Our model achieves superior smoothing results than existing filters. Depth completion recovers dense depth from sparse measurements, e.g., LiDAR. Existing depth-only methods use sparse depth as the only input and suffer from structure degradation, i.e., failing to recover semantically consistent boundaries or small/thin objects due to (1) the sparse nature of depth points and (2) the lack of images to provide structural cues. In the thesis, we deal with the structure degradation issue by using RGB image guidance in both supervised and unsupervised depth-only settings. For the supervised model, the unique design is that it simultaneously outputs a reconstructed image and a dense depth map. Specifically, we treat image reconstruction from sparse depth as an auxiliary task during training that is supervised by the image. For the unsupervised model, we regard dense depth as a reconstructed result of the sparse input, and formulate our model as an auto-encoder. To reduce structure degradation, we employ the image to guide latent features by penalizing their difference in the training process. The image guidance loss in both models enables them to acquire more dense and structural cues that are beneficial for producing more accurate and consistent depth values. For inference, the two models only take sparse depth as input and no image is required. On the KITTI Depth Completion Benchmark, we validate the effectiveness of the proposed image guidance through extensive experiments and achieve competitive performance over state-of-the-art supervised and unsupervised methods. Our approach is also applicable to indoor scenes.
APA, Harvard, Vancouver, ISO, and other styles
12

Huang, BO-XI, and 黃博熙. "Improve Alignment of RGB and Depth Images for KINECT V2." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/xmzj9h.

Full text
Abstract:
碩士
逢甲大學
資訊工程學系
106
Due to the vigorous development of computer vision, the use of 3D reconstruction has become more and more extensive in recent years, such as industrial design, archaeological research, entertainment industry, etc. Therefore, it is urgent to get more accurate reconstruction information. Instruments that used to perform 3D reconstruction have been quite expensive. However, in recent years, the introduction of Microsoft KINECT V2 not only has a cheaper price, but also provides depth and color information. Because the KINECT V2 color and depth camera are not in the same horizontal position, the images captured by the two cameras cannot be directly aligned, so the depth image coordinates can be converted to be aligned with the color image. The main purpose of this thesis is to use KINECT V2 to align the color and depth images by coordinate transformation method, and compare it with the alignment method using affine transformation. Furthermore, since the images captured by the KINECT V2 color camera and the depth camera may be distorted before alignment, the camera needs to be calibrated to facilitate higher accuracy after alignment. This thesis is divided into three parts. The first part is camera calibration, which is divided into internal correction, external correction and distortion correction. Mainly with reference to the methods of D. C. Herrera [1] and Zhengyou Zhang [2] to improve. The second part is to use the coordinate conversion method to align the color and depth image; The third part is the comparison of our proposed coordinate transformation method and the affine transformation method based on the root mean square error of the aligned images.
APA, Harvard, Vancouver, ISO, and other styles
13

RUSSO, PAOLO. "Broadening deep learning horizons: models for RGB and depth images adaptation." Doctoral thesis, 2020. http://hdl.handle.net/11573/1365047.

Full text
Abstract:
Deep Learning has revolutionized the whole field of Computer Vision. Very deep models with an huge number of parameters have been successfully applied on big image datasets for difficult tasks like object classification, person re-identification, semantic segmentation. Two-fold results have been obtained: astonishing performance, with accuracy often comparable or better than a human counterpart on one hand, and on the other the development of robust, complex and powerful visual features which exhibit the ability to generalize to new visual tasks. Still, the success of Deep Learning methods relies on the availability of big datasets: whenever the available, labeled data is limited or redundant, a deep neural network model will typically overfit on training data, showing poor performance on new, unseen data. A typical solution used by the Deep Learning community in those cases is to rely on some Transfer Learning techniques; within the several available methods, the most successful one has been to pre-train the deep model on a big heterogeneous dataset (like ImageNet) and then to finetune the model on the available training data. Among several fields of application, this approach has been heavily used by the robotic community for depth images object recognition. Depth images are usually provided by depth sensors (eg. Kinect) and their availability is somewhat scarce: the biggest depth images dataset publicly available includes 50.000 samples, making the use of a pre-trained network the only successful method to exploit deep models on depth data. Without any doubt, this method provides suboptimal results as the network is trained on traditional RGB images having very different perceptual information with respect to depth maps; better results could be obtained if a big enough depth dataset would be available, enabling the training a deep model from scratch. Another frequent issue is the difference of statistical properties between training and test data (domain gap). In this case, even in the presence of enough training data, the generalization ability of the model will be poor, thus making the use of a Domain Adaptation method able to reduce the domains gap; this can improve both the robustness of the model and its final classification performances. In this thesis both problems have been tackled by developing a series of Deep Learning solutions for Domain Adaptation and Transfer Learning tasks on RGB and depth images domains: a new synthetic depth images dataset is presented, showing the performance of a deep model trained from scratch on depth-only data. At the same time, a new powerful depthRGB mapping module is analyzed, to optimize the classification accuracy on depth images tasks while using pretrained-on-ImageNet deep models. The study of the depth domain ends with a recurrent neural network for egocentric action recognition capable of exploiting depth images as an additional source of attention. A novel GAN model and an hybrid pixel/features adaptation architecture for RGB images have been developed: the former on single-domain adaptation tasks, while the latter on multidomain adaptation and generalization tasks. Finally, a preliminary approach to the problem of multi-source Domain Adaptation on a semantic segmentation task is examined, based on the combination of a multi-branch segmentation model and a adversarial technique, capable of exploiting all the available synthetic training datasets and to increase the overall performance. The performance obtained by using the proposed algorithms are often better or equivalent with respect to the currently available state of the art methods on several datasets and domains, demonstrating the superiority of our approach. Moreover, our analysis shows that the creation of ad-hoc domain adaptation and transfer learning techniques are mandatory in order to obtain the best accuracy in the presence of any domain gap, with a little or negligible additional computational cost.
APA, Harvard, Vancouver, ISO, and other styles
14

hao, Huang wei, and 黃偉豪. "Accurate Alignment of RGB and Depth Images for KINECT by Camera Calibration." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/16807955249441238409.

Full text
Abstract:
碩士
逢甲大學
資訊工程學系
103
Three-dimensional reconstruction is widely applied to various fields, such as robot vision, medical imaging, archeology, etc. Recently, Microsoft company released KINECT as a tool for three-dimensional reconstruction. It not only can provide a fairly accurate depth value, and also is much cheaper than other traditional depth cameras. The main research of this paper is the alignment of the KINECT’s color and depth images. Because the color camera and the depth camera of Kinect are located at different horizontal position, the captured images can’t be accurately aligned. Alignment of the two images is necessary for the following image processing and applications. Furthermore, calibration of the two cameras before alignment can help more accurate result of the alignment. Therefore, the focus of the research is divided into two parts. First part is camera calibration, which includes internal and external calibrations. Our method is mainly based on the method proposed by Zheng-you Zhang’s research result [7]. Second part is how to perform the alignment of the color and depth images. Our method is mainly based on observing the difference between two images by the affine transformation, and then finding the corresponding relationship between each other to transform the depth image. Finally, experiments are performed to validate our proposed approach. In terms of the camera calibration, we can observe that the calibrated image becomes less distorted. In terms of alignment, it significantly aligned the color and depth images.
APA, Harvard, Vancouver, ISO, and other styles
15

Kang, Wen-Yao, and 康文耀. "Combined RGB and Depth Images to Detect Crop Rows for Sanshin Green Onion." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/dy9y8h.

Full text
Abstract:
碩士
國立宜蘭大學
生物機電工程學系碩士班
106
Ilan Shansin green onion is an important economic crop in Ilan. Because the weather of Ilan is wet and rainy, most crops are grown by ridge and furrow farming. Sanshin green onion’s cultivation and management is very labor-intensive and time-consuming. Therefore, detecting row crops lines of the Sanshin green onion on ridge by machine vision will be the critical technology for the unmanned field vehicle. The purpose of the study is to propose a method to combine the results of both color image and depth image acquired by the Kinect sensors to automatic detect the crop rows lines under varying conditions of light brightness and growth period. A Kinect sensor was used to acquire both color and depth images of the crop rows of the Sanshin green onion field. Green onion features were segmented from the background by the difference of the RGB components of the color image. Hough transform method was then used to find crop lines using feature points selected by the proposed grid squares. In the depth image the green onion features were segmented differently by inspecting the difference of the depth in the horizontal lines, and Hough transform method was also used. The slopes of the crop lines and the distances between the upper and lower endpoints of those lines were used to determine whether the results of color image or the depth image were correct. If both were correct, the final result was obtained by multiple specific weighting factors. Those factors were obtained by observing the successful rates and average errors from the experiment results of both the color and depth images. A crop row lines detecting method was successfully developed by combing the results of both color and depth images acquiring by Kinect. The experiment results showed that the average successful rate of color images to find the crop lines in sunny days was 90.4% and in cloudy day was 93.5%. Using depth images to find the crop lines in sunny day was 53.8%, and in cloudy day was 91.3%. The proposed combining method in this study can detect crop lines successful at a rate of 92.3% in sunny days and 100% in cloudy days. Therefore, the combining method in this study can effectively improve the total successful rate comparing with that used only color or depth image.
APA, Harvard, Vancouver, ISO, and other styles
16

Gama, Filipe Xavier da Graça. "Efficient processing techniques for depth maps generated by rgb-d sensors." Master's thesis, 2015. http://hdl.handle.net/10400.8/2524.

Full text
Abstract:
Nesta dissertação, um novo método para melhorar mapas de profundidade gerados por sensores RGB-D como a Microsoft Kinect e a Asus Xtion Pro Live é apresentado. Mapas de profundidade gerados por sensores RGB-D normalmente contêm diversos artefactos como oclusões ou buracos devido aos erros de medição da profundidade e das propriedades da superfície dos objetos. Estes problemas têm no seu conjunto um impacto negativo em aplicações que dependem da qualidade dos mapas de profundidade. Neste trabalho, um novo método foi desenvolvido de modo a assegurar que os buracos são preenchidos, que os contornos dos objetos são bem definidos e alinhados com a imagem de textura. O método proposto combina a informação da segmentação da imagem de textura com o método de inpainting baseado nas equações de Navier-Stokes. Para além disso, o método proposto também integra um filtro não-linear para redução do ruído. Este filtro utiliza a informação da imagem de textura para o efeito. Testes experimentais mostram que o método proposto obtém a mesma performance de métodos da literatura atual em termos de qualidade do melhoramento de mapas de profundidade.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography