Academic literature on the topic 'RGB-Depth Image'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'RGB-Depth Image.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "RGB-Depth Image"

1

Li, Hengyu, Hang Liu, Ning Cao, Yan Peng, Shaorong Xie, Jun Luo, and Yu Sun. "Real-time RGB-D image stitching using multiple Kinects for improved field of view." International Journal of Advanced Robotic Systems 14, no. 2 (March 1, 2017): 172988141769556. http://dx.doi.org/10.1177/1729881417695560.

Full text
Abstract:
This article concerns the problems of a defective depth map and limited field of view of Kinect-style RGB-D sensors. An anisotropic diffusion based hole-filling method is proposed to recover invalid depth data in the depth map. The field of view of the Kinect-style RGB-D sensor is extended by stitching depth and color images from several RGB-D sensors. By aligning the depth map with the color image, the registration data calculated by registering color images can be used to stitch depth and color images into a depth and color panoramic image concurrently in real time. Experiments show that the proposed stitching method can generate a RGB-D panorama with no invalid depth data and little distortion in real time and can be extended to incorporate more RGB-D sensors to construct even a 360° field of view panoramic RGB-D image.
APA, Harvard, Vancouver, ISO, and other styles
2

Wu, Yan, Jiqian Li, and Jing Bai. "Multiple Classifiers-Based Feature Fusion for RGB-D Object Recognition." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 05 (February 27, 2017): 1750014. http://dx.doi.org/10.1142/s0218001417500148.

Full text
Abstract:
RGB-D-based object recognition has been enthusiastically investigated in the past few years. RGB and depth images provide useful and complementary information. Fusing RGB and depth features can significantly increase the accuracy of object recognition. However, previous works just simply take the depth image as the fourth channel of the RGB image and concatenate the RGB and depth features, ignoring the different power of RGB and depth information for different objects. In this paper, a new method which contains three different classifiers is proposed to fuse features extracted from RGB image and depth image for RGB-D-based object recognition. Firstly, a RGB classifier and a depth classifier are trained by cross-validation to get the accuracy difference between RGB and depth features for each object. Then a variant RGB-D classifier is trained with different initialization parameters for each class according to the accuracy difference. The variant RGB-D-classifier can result in a more robust classification performance. The proposed method is evaluated on two benchmark RGB-D datasets. Compared with previous methods, ours achieves comparable performance with the state-of-the-art method.
APA, Harvard, Vancouver, ISO, and other styles
3

OYAMA, Tadahiro, and Daisuke MATSUZAKI. "Depth Image Generation from monocular RGB image." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2019 (2019): 2P2—H09. http://dx.doi.org/10.1299/jsmermd.2019.2p2-h09.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Cao, Hao, Xin Zhao, Ang Li, and Meng Yang. "Depth Image Rectification Based on an Effective RGB–Depth Boundary Inconsistency Model." Electronics 13, no. 16 (August 22, 2024): 3330. http://dx.doi.org/10.3390/electronics13163330.

Full text
Abstract:
Depth image has been widely involved in various tasks of 3D systems with the advancement of depth acquisition sensors in recent years. Depth images suffer from serious distortions near object boundaries due to the limitations of depth sensors or estimation methods. In this paper, a simple method is proposed to rectify the erroneous object boundaries of depth images with the guidance of reference RGB images. First, an RGB–Depth boundary inconsistency model is developed to measure whether collocated pixels in depth and RGB images belong to the same object. The model extracts the structures of RGB and depth images, respectively, by Gaussian functions. The inconsistency of two collocated pixels is then statistically determined inside large-sized local windows. In this way, pixels near object boundaries of depth images are identified to be erroneous when they are inconsistent with collocated ones in RGB images. Second, a depth image rectification method is proposed by embedding the model into a simple weighted mean filter (WMF). Experiment results on two datasets verify that the proposed method well improves the RMSE and SSIM of depth images by 2.556 and 0.028, respectively, compared with recent optimization-based and learning-based methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Longyu, Hao Xia, and Yanyou Qiao. "Texture Synthesis Repair of RealSense D435i Depth Images with Object-Oriented RGB Image Segmentation." Sensors 20, no. 23 (November 24, 2020): 6725. http://dx.doi.org/10.3390/s20236725.

Full text
Abstract:
A depth camera is a kind of sensor that can directly collect distance information between an object and the camera. The RealSense D435i is a low-cost depth camera that is currently in widespread use. When collecting data, an RGB image and a depth image are acquired simultaneously. The quality of the RGB image is good, whereas the depth image typically has many holes. In a lot of applications using depth images, these holes can lead to serious problems. In this study, a repair method of depth images was proposed. The depth image is repaired using the texture synthesis algorithm with the RGB image, which is segmented through a multi-scale object-oriented method. The object difference parameter is added to the process of selecting the best sample block. In contrast with previous methods, the experimental results show that the proposed method avoids the error filling of holes, the edge of the filled holes is consistent with the edge of RGB images, and the repair accuracy is better. The root mean square error, peak signal-to-noise ratio, and structural similarity index measure from the repaired depth images and ground-truth image were better than those obtained by two other methods. We believe that the repair of the depth image can improve the effects of depth image applications.
APA, Harvard, Vancouver, ISO, and other styles
6

Kwak, Jeonghoon, and Yunsick Sung. "Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR." Remote Sensing 12, no. 7 (April 3, 2020): 1142. http://dx.doi.org/10.3390/rs12071142.

Full text
Abstract:
To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.
APA, Harvard, Vancouver, ISO, and other styles
7

Tang, Shengjun, Qing Zhu, Wu Chen, Walid Darwish, Bo Wu, Han Hu, and Min Chen. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprsannals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combining RGB image-based modeling and depth-based modeling. The scale ambiguity problem during the pose estimation with RGB image sequences can be resolved by integrating the information from the depth and visual information provided by the proposed system. A robust rigid-transformation recovery method is developed to register the RGB image-based and depth-based 3D models together. The proposed method is examined with two datasets collected in indoor environments for which the experimental results demonstrate the feasibility and robustness of the proposed method
APA, Harvard, Vancouver, ISO, and other styles
8

Tang, Shengjun, Qing Zhu, Wu Chen, Walid Darwish, Bo Wu, Han Hu, and Min Chen. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprs-annals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combining RGB image-based modeling and depth-based modeling. The scale ambiguity problem during the pose estimation with RGB image sequences can be resolved by integrating the information from the depth and visual information provided by the proposed system. A robust rigid-transformation recovery method is developed to register the RGB image-based and depth-based 3D models together. The proposed method is examined with two datasets collected in indoor environments for which the experimental results demonstrate the feasibility and robustness of the proposed method
APA, Harvard, Vancouver, ISO, and other styles
9

Lee, Ki-Seung. "Improving the Performance of Automatic Lip-Reading Using Image Conversion Techniques." Electronics 13, no. 6 (March 9, 2024): 1032. http://dx.doi.org/10.3390/electronics13061032.

Full text
Abstract:
Variation in lighting conditions is a major cause of performance degradation in pattern recognition when using optical imaging. In this study, infrared (IR) and depth images were considered as possible robust alternatives against variations in illumination, particularly for improving the performance of automatic lip-reading. The variations due to lighting conditions were quantitatively analyzed for optical, IR, and depth images. Then, deep neural network (DNN)-based lip-reading rules were built for each image modality. Speech recognition techniques based on IR or depth imaging required an additional light source that emitted light in the IR range, along with a special camera. To mitigate this problem, we propose a method that does not use an IR/depth image directly, but instead estimates images based on the optical RGB image. To this end, a modified U-net was adopted to estimate the IR/depth image from an optical RGB image. The results show that the IR and depth images were rarely affected by the lighting conditions. The recognition rates for the optical, IR, and depth images were 48.29%, 95.76%, and 92.34%, respectively, under various lighting conditions. Using the estimated IR and depth images, the recognition rates were 89.35% and 80.42%, respectively. This was significantly higher than for the optical RGB images.
APA, Harvard, Vancouver, ISO, and other styles
10

Kao, Yueying, Weiming Li, Qiang Wang, Zhouchen Lin, Wooshik Kim, and Sunghoon Hong. "Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11221–28. http://dx.doi.org/10.1609/aaai.v34i07.6781.

Full text
Abstract:
Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "RGB-Depth Image"

1

Deng, Zhuo. "RGB-DEPTH IMAGE SEGMENTATION AND OBJECT RECOGNITION FOR INDOOR SCENES." Diss., Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/427631.

Full text
Abstract:
Computer and Information Science
Ph.D.
With the advent of Microsoft Kinect, the landscape of various vision-related tasks has been changed. Firstly, using an active infrared structured light sensor, the Kinect can provide directly the depth information that is hard to infer from traditional RGB images. Secondly, RGB and depth information are generated synchronously and can be easily aligned, which makes their direct integration possible. In this thesis, I propose several algorithms or systems that focus on how to integrate depth information with traditional visual appearances for addressing different computer vision applications. Those applications cover both low level (image segmentation, class agnostic object proposals) and high level (object detection, semantic segmentation) computer vision tasks. To firstly understand whether and how depth information is helpful for improving computer vision performances, I start research on the image segmentation field, which is a fundamental problem and has been studied extensively in natural color images. We propose an unsupervised segmentation algorithm that is carefully crafted to balance the contribution of color and depth features in RGB-D images. The segmentation problem is then formulated as solving the Maximum Weight Independence Set (MWIS) problem. Given superpixels obtained from different layers of a hierarchical segmentation, the saliency of each superpixel is estimated based on balanced combination of features originating from depth, gray level intensity, and texture information. We evaluate the segmentation quality based on five standard measures on the commonly used NYU-v2 RGB-Depth dataset. A surprising message indicated from experiments is that unsupervised image segmentation of RGB-D images yields comparable results to supervised segmentation. In image segmentation, an image is partitioned into several groups of pixels (or super-pixels). We take one step further to investigate on the problem of assigning class labels to every pixel, i.e., semantic scene segmentation. We propose a novel image region labeling method which augments CRF formulation with hard mutual exclusion (mutex) constraints. This way our approach can make use of rich and accurate 3D geometric structure coming from Kinect in a principled manner. The final labeling result must satisfy all mutex constraints, which allows us to eliminate configurations that violate common sense physics laws like placing a floor above a night stand. Three classes of mutex constraints are proposed: global object co-occurrence constraint, relative height relationship constraint, and local support relationship constraint. Segments obtained from image segmentation can be either too fine or too coarse. A full object region not only conveys global features but also arguably enriches contextual features as confusing background is separated. We propose a novel unsupervised framework for automatically generating bottom up class independent object candidates for detection and recognition in cluttered indoor environments. Utilizing raw depth map, we propose a novel plane segmentation algorithm for dividing an indoor scene into predominant planar regions and non-planar regions. Based on this partition, we are able to effectively predict object locations and their spatial extensions. Our approach automatically generates object proposals considering five different aspects: Non-planar Regions (NPR), Planar Regions (PR), Detected Planes (DP), Merged Detected Planes (MDP) and Hierarchical Clustering (HC) of 3D point clouds. Object region proposals include both bounding boxes and instance segments. Although 2D computer vision tasks can roughly identify where objects are placed on image planes, their true locations and poses in the physical 3D world are difficult to determine due to multiple factors such as occlusions and the uncertainty arising from perspective projections. However, it is very natural for human beings to understand how far objects are from viewers, object poses and their full extents from still images. These kind of features are extremely desirable for many applications such as robotics navigation, grasp estimation, and Augmented Reality (AR) etc. In order to fill the gap, we addresses the problem of amodal perception of 3D object detection. The task is to not only find object localizations in the 3D world, but also estimate their physical sizes and poses, even if only parts of them are visible in the RGB-D image. Recent approaches have attempted to harness point cloud from depth channel to exploit 3D features directly in the 3D space and demonstrated the superiority over traditional 2D representation approaches. We revisit the amodal 3D detection problem by sticking to the 2D representation framework, and directly relate 2D visual appearance to 3D objects. We propose a novel 3D object detection system that simultaneously predicts objects' 3D locations, physical sizes, and orientations in indoor scenes.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
2

Hasnat, Md Abul. "Unsupervised 3D image clustering and extension to joint color and depth segmentation." Thesis, Saint-Etienne, 2014. http://www.theses.fr/2014STET4013/document.

Full text
Abstract:
L'accès aux séquences d'images 3D s'est aujourd'hui démocratisé, grâce aux récentes avancées dans le développement des capteurs de profondeur ainsi que des méthodes permettant de manipuler des informations 3D à partir d'images 2D. De ce fait, il y a une attente importante de la part de la communauté scientifique de la vision par ordinateur dans l'intégration de l'information 3D. En effet, des travaux de recherche ont montré que les performances de certaines applications pouvaient être améliorées en intégrant l'information 3D. Cependant, il reste des problèmes à résoudre pour l'analyse et la segmentation de scènes intérieures comme (a) comment l'information 3D peut-elle être exploitée au mieux ? et (b) quelle est la meilleure manière de prendre en compte de manière conjointe les informations couleur et 3D ? Nous abordons ces deux questions dans cette thèse et nous proposons de nouvelles méthodes non supervisées pour la classification d'images 3D et la segmentation prenant en compte de manière conjointe les informations de couleur et de profondeur. A cet effet, nous formulons l'hypothèse que les normales aux surfaces dans les images 3D sont des éléments à prendre en compte pour leur analyse, et leurs distributions sont modélisables à l'aide de lois de mélange. Nous utilisons la méthode dite « Bregman Soft Clustering » afin d'être efficace d'un point de vue calculatoire. De plus, nous étudions plusieurs lois de probabilités permettant de modéliser les distributions de directions : la loi de von Mises-Fisher et la loi de Watson. Les méthodes de classification « basées modèles » proposées sont ensuite validées en utilisant des données de synthèse puis nous montrons leur intérêt pour l'analyse des images 3D (ou de profondeur). Une nouvelle méthode de segmentation d'images couleur et profondeur, appelées aussi images RGB-D, exploitant conjointement la couleur, la position 3D, et la normale locale est alors développée par extension des précédentes méthodes et en introduisant une méthode statistique de fusion de régions « planes » à l'aide d'un graphe. Les résultats montrent que la méthode proposée donne des résultats au moins comparables aux méthodes de l'état de l'art tout en demandant moins de temps de calcul. De plus, elle ouvre des perspectives nouvelles pour la fusion non supervisée des informations de couleur et de géométrie. Nous sommes convaincus que les méthodes proposées dans cette thèse pourront être utilisées pour la classification d'autres types de données comme la parole, les données d'expression en génétique, etc. Elles devraient aussi permettre la réalisation de tâches complexes comme l'analyse conjointe de données contenant des images et de la parole
Access to the 3D images at a reasonable frame rate is widespread now, thanks to the recent advances in low cost depth sensors as well as the efficient methods to compute 3D from 2D images. As a consequence, it is highly demanding to enhance the capability of existing computer vision applications by incorporating 3D information. Indeed, it has been demonstrated in numerous researches that the accuracy of different tasks increases by including 3D information as an additional feature. However, for the task of indoor scene analysis and segmentation, it remains several important issues, such as: (a) how the 3D information itself can be exploited? and (b) what is the best way to fuse color and 3D in an unsupervised manner? In this thesis, we address these issues and propose novel unsupervised methods for 3D image clustering and joint color and depth image segmentation. To this aim, we consider image normals as the prominent feature from 3D image and cluster them with methods based on finite statistical mixture models. We consider Bregman Soft Clustering method to ensure computationally efficient clustering. Moreover, we exploit several probability distributions from directional statistics, such as the von Mises-Fisher distribution and the Watson distribution. By combining these, we propose novel Model Based Clustering methods. We empirically validate these methods using synthetic data and then demonstrate their application for 3D/depth image analysis. Afterward, we extend these methods to segment synchronized 3D and color image, also called RGB-D image. To this aim, first we propose a statistical image generation model for RGB-D image. Then, we propose novel RGB-D segmentation method using a joint color-spatial-axial clustering and a statistical planar region merging method. Results show that, the proposed method is comparable with the state of the art methods and requires less computation time. Moreover, it opens interesting perspectives to fuse color and geometry in an unsupervised manner. We believe that the methods proposed in this thesis are equally applicable and extendable for clustering different types of data, such as speech, gene expressions, etc. Moreover, they can be used for complex tasks, such as joint image-speech data analysis
APA, Harvard, Vancouver, ISO, and other styles
3

Baban, a. erep Thierry Roland. "Contribution au développement d'un système intelligent de quantification des nutriments dans les repas d'Afrique subsaharienne." Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSEP100.

Full text
Abstract:
La malnutrition, qu'elle soit liée à un apport insuffisant ou excessif en nutriments, représente un défi mondial de santé publique touchant des milliards de personnes. Elle affecte tous les systèmes organiques en étant un facteur majeur de risque pour les maladies non transmissibles telles que les maladies cardiovasculaires, le diabète et certains cancers. Évaluer l'apport alimentaire est crucial pour prévenir la malnutrition, mais cela reste un défi. Les méthodes traditionnelles d'évaluation alimentaire sont laborieuses et sujettes aux biais. Les avancées en IA ont permis la conception de VBDA, solution prometteuse pour analyser automatiquement les images alimentaires afin d'estimer les portions et la composition nutritionnelle. Cependant, la segmentation des images alimentaires dans un VBDA rencontre des difficultés en raison de la structure non rigide des aliments, de la variation intra-classe élevée (où le même type d'aliment peut apparaître très différent), de la ressemblance inter-classe (où différents types d'aliments semblent visuellement très similaires) et de la rareté des ensembles de données disponibles publiquement.Presque toutes les recherches sur la segmentation alimentaire se sont concentrées sur les aliments asiatiques et occidentaux, en l'absence de bases de données pour les cuisines africaines. Cependant, les plats africains impliquent souvent des classes alimentaires mélangées, rendant la segmentation précise difficile. De plus, la recherche s'est largement concentrée sur les images RGB, qui fournissent des informations sur la couleur et la texture mais pourraient manquer de suffisamment de détails géométriques. Pour y remédier, la segmentation RGB-D combine des données de profondeur avec des images RGB. Les images de profondeur fournissent des détails géométriques cruciaux qui enrichissent les données RGB, améliorent la discrimination des objets et sont robustes face à des facteurs tels que l'illumination et le brouillard. Malgré son succès dans d'autres domaines, la segmentation RGB-D pour les aliments est peu explorée en raison des difficultés à collecter des images de profondeur des aliments.Cette thèse apporte des contributions clés en développant de nouveaux modèles d'apprentissage profond pour la segmentation d'images RGB (mid-DeepLabv3+) et RGB-D (ESeNet-D) et en introduisant les premiers ensembles de données axés sur les images alimentaires africaines. Mid-DeepLabv3+ est basé sur DeepLabv3+, avec un backbone ResNet simplifié et une couche de saut (middle layer) ajoutée dans le décodeur, ainsi que des couches mécanisme d'attention SimAM. Ce model offre un excellent compromis entre performance et efficacité computationnelle. ESeNet-D est composé de deux branches d'encodeurs utilisant EfficientNetV2 comme backbone, avec un bloc de fusion pour l'intégration multi-échelle et un décodeur employant des convolutions auto-calibrée et interpolations entrainées pour une segmentation précise. ESeNet-D surpasse de nombreux modèles de référence RGB et RGB-D tout en ayant une charge computationnelle plus faible. Nos expériences ont montré que, lorsqu'elles sont correctement intégrées, les informations relatives à la profondeur peuvent améliorer de manière significative la précision de la segmentation des images alimentaires.Nous présentons également deux nouvelles bases de données : AfricaFoodSeg pour la segmentation « aliment/non-aliment » avec 3067 images (2525 pour l'entraînement, 542 pour la validation), et CamerFood, axée sur la cuisine camerounaise. Les ensembles de données CamerFood comprennent CamerFood10 avec 1422 images et dix classes alimentaires, et CamerFood15, une version améliorée avec 15 classes alimentaires, 1684 images d'entraînement et 514 images de validation. Enfin, nous abordons le défi des données de profondeur rares dans la segmentation RGB-D des aliments en démontrant que les modèles MDE peuvent aider à générer des cartes de profondeur efficaces pour les ensembles de données RGB-D
Malnutrition, including under- and overnutrition, is a global health challenge affecting billions of people. It impacts all organ systems and is a significant risk factor for noncommunicable diseases such as cardiovascular diseases, diabetes, and some cancers. Assessing food intake is crucial for preventing malnutrition but remains challenging. Traditional methods for dietary assessment are labor-intensive and prone to bias. Advancements in AI have made Vision-Based Dietary Assessment (VBDA) a promising solution for automatically analyzing food images to estimate portions and nutrition. However, food image segmentation in VBDA faces challenges due to food's non-rigid structure, high intra-class variation (where the same dish can look very different), inter-class resemblance (where different foods appear similar) and scarcity of publicly available datasets.Almost all food segmentation research has focused on Asian and Western foods, with no datasets for African cuisines. However, African dishes often involve mixed food classes, making accurate segmentation challenging. Additionally, research has largely focus on RGB images, which provides color and texture but may lack geometric detail. To address this, RGB-D segmentation combines depth data with RGB images. Depth images provide crucial geometric details that enhance RGB data, improve object discrimination, and are robust to factors like illumination and fog. Despite its success in other fields, RGB-D segmentation for food is underexplored due to difficulties in collecting food depth images.This thesis makes key contributions by developing new deep learning models for RGB (mid-DeepLabv3+) and RGB-D (ESeNet-D) image segmentation and introducing the first food segmentation datasets focused on African food images. Mid-DeepLabv3+ is based on DeepLabv3+, featuring a simplified ResNet backbone with and added skip layer (middle layer) in the decoder and SimAM attention mechanism. This model offers an optimal balance between performance and efficiency, matching DeepLabv3+'s performance while cutting computational load by half. ESeNet-D consists on two encoder branches using EfficientNetV2 as backbone, with a fusion block for multi-scale integration and a decoder employing self-calibrated convolution and learned interpolation for precise segmentation. ESeNet-D outperforms many RGB and RGB-D benchmark models while having fewer parameters and FLOPs. Our experiments show that, when properly integrated, depth information can significantly improve food segmentation accuracy. We also present two new datasets: AfricaFoodSeg for “food/non-food” segmentation with 3,067 images (2,525 for training, 542 for validation), and CamerFood focusing on Cameroonian cuisine. CamerFood datasets include CamerFood10 with 1,422 images from ten food classes, and CamerFood15, an enhanced version with 15 food classes, 1,684 training images, and 514 validation images. Finally, we address the challenge of scarce depth data in RGB-D food segmentation by demonstrating that Monocular Depth Estimation (MDE) models can aid in generating effective depth maps for RGB-D datasets
APA, Harvard, Vancouver, ISO, and other styles
4

Řehánek, Martin. "Detekce objektů pomocí Kinectu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236602.

Full text
Abstract:
With the release of the Kinect device new possibilities appeared, allowing a simple use of image depth in image processing. The aim of this thesis is to propose a method for object detection and recognition in a depth map. Well known method Bag of Words and a descriptor based on Spin Image method are used for the object recognition. The Spin Image method is one of several existing approaches to depth map which are described in this thesis. Detection of object in picture is ensured by the sliding window technique. That is improved and speeded up by utilization of the depth information.
APA, Harvard, Vancouver, ISO, and other styles
5

SANTOS, LEANDRO TAVARES ARAGAO DOS. "GENERATING SUPERRESOLVED DEPTH MAPS USING LOW COST SENSORS AND RGB IMAGES." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2014. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=28673@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
As aplicações da reconstrução em três dimensões de uma cena real são as mais diversas. O surgimento de sensores de profundidade de baixo custo, tal qual o Kinect, sugere o desenvolvimento de sistemas de reconstrução mais baratos que aqueles já existentes. Contudo, os dados disponibilizados por este dispositivo ainda carecem em muito quando comparados àqueles providos por sistemas mais sofisticados. No mundo acadêmico e comercial, algumas iniciativas, como aquelas de Tong et al. [1] e de Cui et al. [2], se propõem a solucionar tal problema. A partir do estudo das mesmas, este trabalho propôs a modificação do algoritmo de super-resolução descrito por Mitzel et al. [3] no intuito de considerar em seus cálculos as imagens coloridas também fornecidas pelo dispositivo, conforme abordagem de Cui et al. [2]. Tal alteração melhorou os mapas de profundidade super-resolvidos fornecidos, mitigando interferências geradas por movimentações repentinas na cena captada. Os testes realizados comprovam a melhoria dos mapas gerados, bem como analisam o impacto da implementação em CPU e GPU dos algoritmos nesta etapa da super-resolução. O trabalho se restringe a esta etapa. As etapas seguintes da reconstrução 3D não foram implementadas.
There are a lot of three dimensions reconstruction applications of real scenes. The rise of low cost sensors, like the Kinect, suggests the development of systems cheaper than the existing ones. Nevertheless, data provided by this device are worse than that provided by more sophisticated sensors. In the academic and commercial world, some initiatives, described in Tong et al. [1] and in Cui et al. [2], try to solve that problem. Studying that attempts, this work suggests the modification of super-resolution algorithm described for Mitzel et al. [3] in order to consider in its calculations coloured images provided by Kinect, like the approach of Cui et al. [2]. This change improved the super resolved depth maps provided, mitigating interference caused by sudden changes of captured scenes. The tests proved the improvement of generated maps and analysed the impact of CPU and GPU algorithms implementation in the superresolution step. This work is restricted to this step. The next stages of 3D reconstruction have not been implemented.
APA, Harvard, Vancouver, ISO, and other styles
6

Thörnberg, Jesper. "Combining RGB and Depth Images for Robust Object Detection using Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-174137.

Full text
Abstract:
We investigated the advantage of combining RGB images with depth data to get more robust object classifications and detections using pre-trained deep convolutional neural networks. We relied upon the raw images from publicly available datasets captured using Microsoft Kinect cameras. The raw images varied in size, and therefore required resizing to fit our network. We designed a resizing method called "bleeding edge" to avoid distorting the objects in the images. We present a novel method of interpolating the missing depth pixel values by comparing to similar RGB values. This method proved superior to the other methods tested. We showed that a simple colormap transformation of the depth image can provide close to state-of-art performance. Using our methods, we can present state-of-art performance on the Washington Object dataset and we provide some results on the Washington Scenes (V1) dataset. Specifically, for the detection, we used contours at different thresholds to find the likely object locations in the images. For the classification task we can report state-of-art results using only RGB and RGB-D images, depth data alone gave close to state-of-art results. For the detection task we found the RGB only detector to be superior to the other detectors.
APA, Harvard, Vancouver, ISO, and other styles
7

Möckelind, Christoffer. "Improving deep monocular depth predictions using dense narrow field of view depth images." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235660.

Full text
Abstract:
In this work we study a depth prediction problem where we provide a narrow field of view depth image and a wide field of view RGB image to a deep network tasked with predicting the depth for the entire RGB image. We show that by providing a narrow field of view depth image, we improve results for the area outside the provided depth compared to an earlier approach only utilizing a single RGB image for depth prediction. We also show that larger depth maps provide a greater advantage than smaller ones and that the accuracy of the model decreases with the distance from the provided depth. Further, we investigate several architectures as well as study the effect of adding noise and lowering the resolution of the provided depth image. Our results show that models provided low resolution noisy data performs on par with the models provided unaltered depth.
I det här arbetet studerar vi ett djupapproximationsproblem där vi tillhandahåller en djupbild med smal synvinkel och en RGB-bild med bred synvinkel till ett djupt nätverk med uppgift att förutsäga djupet för hela RGB-bilden. Vi visar att genom att ge djupbilden till nätverket förbättras resultatet för området utanför det tillhandahållna djupet jämfört med en existerande metod som använder en RGB-bild för att förutsäga djupet. Vi undersöker flera arkitekturer och storlekar på djupbildssynfält och studerar effekten av att lägga till brus och sänka upplösningen på djupbilden. Vi visar att större synfält för djupbilden ger en större fördel och även att modellens noggrannhet minskar med avståndet från det angivna djupet. Våra resultat visar också att modellerna som använde sig av det brusiga lågupplösta djupet presterade på samma nivå som de modeller som använde sig av det omodifierade djupet.
APA, Harvard, Vancouver, ISO, and other styles
8

Hammond, Patrick Douglas. "Deep Synthetic Noise Generation for RGB-D Data Augmentation." BYU ScholarsArchive, 2019. https://scholarsarchive.byu.edu/etd/7516.

Full text
Abstract:
Considerable effort has been devoted to finding reliable methods of correcting noisy RGB-D images captured with unreliable depth-sensing technologies. Supervised neural networks have been shown to be capable of RGB-D image correction, but require copious amounts of carefully-corrected ground-truth data to train effectively. Data collection is laborious and time-intensive, especially for large datasets, and generation of ground-truth training data tends to be subject to human error. It might be possible to train an effective method on a relatively smaller dataset using synthetically damaged depth-data as input to the network, but this requires some understanding of the latent noise distribution of the respective camera. It is possible to augment datasets to a certain degree using naive noise generation, such as random dropout or Gaussian noise, but these tend to generalize poorly to real data. A superior method would imitate real camera noise to damage input depth images realistically so that the network is able to learn to correct the appropriate depth-noise distribution.We propose a novel noise-generating CNN capable of producing realistic noise customized to a variety of different depth-noise distributions. In order to demonstrate the effects of synthetic augmentation, we also contribute a large novel RGB-D dataset captured with the Intel RealSense D415 and D435 depth cameras. This dataset pairs many examples of noisy depth images with automatically completed RGB-D images, which we use as proxy for ground-truth data. We further provide an automated depth-denoising pipeline which may be used to produce proxy ground-truth data for novel datasets. We train a modified sparse-to-dense depth-completion network on splits of varying size from our dataset to determine reasonable baselines for improvement. We determine through these tests that adding more noisy depth frames to each RGB-D image in the training set has a nearly identical impact on depth-completion training as gathering more ground-truth data. We leverage these findings to produce additional synthetic noisy depth images for each RGB-D image in our baseline training sets using our noise-generating CNN. Through use of our augmentation method, it is possible to achieve greater than 50% error reduction on supervised depth-completion training, even for small datasets.
APA, Harvard, Vancouver, ISO, and other styles
9

Tu, Chieh-Min, and 杜介民. "Depth Image Inpainting with RGB-D Camera." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/k4m42a.

Full text
Abstract:
碩士
義守大學
資訊工程學系
103
Since Microsoft released the cheap Kinect sensors as a new natural user interface, stereo imaging is made from previous multi-view color image synthesis, to now synthesis of color image and depth image. But the captured depth images may lose some depth values so that stereoscopic effect is often poor in general. This thesis is based on Kinect RGB-D camera to develop an object-based depth inpainting method. Firstly, the background differencing, frame differencing and depth thresholding strategies are used as a basis for segmenting foreground objects from a dynamic background image. Then, the task of hole inpainting is divided into background area and foreground area, in which background area is inpainted by background depth image and foreground area is inpainted by a best-fit neighborhood depth value. Experimental results show that such an inpainting method is helpful to fill holes, and to improve the contour edges and image quality.
APA, Harvard, Vancouver, ISO, and other styles
10

Lin, Shih-Pi, and 林士筆. "In-air Handwriting Chinese Character Recognition Base on RGB Image without Depth Information." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/2mhfzk.

Full text
Abstract:
碩士
國立中央大學
資訊工程學系
107
As technology changes rapidly, Human-Computer Interaction(HCI) no longer being limited by keyboard. Existing handwriting products are provided sufficient feature to recognize handwriting trajectories on density and stability. For Chinese font, it is relatively difficult for machines to obtain stable trajectory comparing to English and numerals. In the past, in-air hand detection and tracking often used the devices with depth information. For example, Kinect uses two infrared cameras to obtain depth information, which cause higher price on devices. Therefore, the use of RGB information with one camera to achieve object detection and tracking is a trend in recent years. The use of RGB camera as HCI media for in-air handwriting need to deal with accurate hand detection and stability tracking, and the handwriting trajectory has one stroke-finished attribute, which means that it will have both real stroke and virtual stroke, it increases the difficulty of recognition. The hand database uses to build the model contains, self-recorded handwriting videos and the relevant hand data sets collected on the Internet. By adding the Multiple Receptive Field(MRF) in processing data, which scale the ground truth and regard the scaled as a new object, it increases the robustness of detection. This paper uses YOLO v3 as the core neural network model, and adds Convolutional Recurrent Neural Network(CRNN) to convert YOLO into a time-sequential neural network to stabilize tracking. The analysis of the experimental results shows that the hand detection can be more robust after the data processed by the MRF. The converted YOLO improves the stability of hand tracking. Overall, using several Chinese character recognition methods, the accuracy of recognize in-air handwriting trajectory in Chinese characters is about 96.33%.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "RGB-Depth Image"

1

Pan, Hong, Søren Ingvor Olsen, and Yaping Zhu. "Joint Spatial-Depth Feature Pooling for RGB-D Object Classification." In Image Analysis, 314–26. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-19665-7_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Liu, Shirui, Hamid A. Jalab, and Zhen Dai. "Intrinsic Face Image Decomposition from RGB Images with Depth Cues." In Advances in Visual Informatics, 149–56. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-34032-2_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Guo, Jinxin, Qingxiang Wang, and Xiaoqiang Ren. "Target Recognition Based on Kinect Combined RGB Image with Depth Image." In Advances in Intelligent Systems and Computing, 726–32. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-25128-4_89.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mechal, Chaymae El, Najiba El Amrani El Idrissi, and Mostefa Mesbah. "CNN-Based Obstacle Avoidance Using RGB-Depth Image Fusion." In Lecture Notes in Electrical Engineering, 867–76. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-33-6893-4_78.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Petrelli, Alioscia, and Luigi Di Stefano. "Learning to Weight Color and Depth for RGB-D Visual Search." In Image Analysis and Processing - ICIAP 2017, 648–59. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68560-1_58.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Anran, Yao Zhao, and Chunyu Lin. "RGB Image Guided Depth Hole-Filling Using Bidirectional Attention Mechanism." In Advances in Intelligent Information Hiding and Multimedia Signal Processing, 173–82. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-1053-1_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Farahnakian, Fahimeh, and Jukka Heikkonen. "RGB and Depth Image Fusion for Object Detection Using Deep Learning." In Advances in Intelligent Systems and Computing, 73–93. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-3357-7_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Khaire, Pushpajit, Javed Imran, and Praveen Kumar. "Human Activity Recognition by Fusion of RGB, Depth, and Skeletal Data." In Proceedings of 2nd International Conference on Computer Vision & Image Processing, 409–21. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-10-7895-8_32.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Li, Yijin, Xinyang Liu, Wenqi Dong, Han Zhou, Hujun Bao, Guofeng Zhang, Yinda Zhang, and Zhaopeng Cui. "DELTAR: Depth Estimation from a Light-Weight ToF Sensor and RGB Image." In Lecture Notes in Computer Science, 619–36. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-19769-7_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kam, Jaewon, Jungeon Kim, Soongjin Kim, Jaesik Park, and Seungyong Lee. "CostDCNet: Cost Volume Based Depth Completion for a Single RGB-D Image." In Lecture Notes in Computer Science, 257–74. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20086-1_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "RGB-Depth Image"

1

Qiu, Zhouyan, Shang Zeng, Joaquín Martínez Sánchez, and Pedro Arias. "Comparative analysis of image super-resolution: A concurrent study of RGB and depth images." In 2024 International Workshop on the Theory of Computational Sensing and its Applications to Radar, Multimodal Sensing and Imaging (CoSeRa), 36–41. IEEE, 2024. http://dx.doi.org/10.1109/cosera60846.2024.10720360.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Morisset, Maxime, Marc Donias, and Christian Germain. "Principal Curvatures as Pose-Invariant Features of Depth Maps for RGB-D Object Recognition." In 2024 IEEE Thirteenth International Conference on Image Processing Theory, Tools and Applications (IPTA), 1–6. IEEE, 2024. http://dx.doi.org/10.1109/ipta62886.2024.10755742.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Baban A Erep, Thierry Roland, Lotfi Chaari, Pierre Ele, and Eugene Sobngwi. "ESeNet-D : Efficient Semantic Segmentation for RGB-Depth Food Images." In 2024 IEEE 34th International Workshop on Machine Learning for Signal Processing (MLSP), 1–6. IEEE, 2024. http://dx.doi.org/10.1109/mlsp58920.2024.10734761.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Yu, Yeh-Wei, Tzu-Kai Wang, Chi-Chung Lau, Jia-Ching Wang, Tsung-Hsun Yang, Jann-Long Chern, and Ching-Cherng Sun. "Repairing IR depth image with 2D RGB image." In Current Developments in Lens Design and Optical Engineering XIX, edited by R. Barry Johnson, Virendra N. Mahajan, and Simon Thibault. SPIE, 2018. http://dx.doi.org/10.1117/12.2321205.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Issaranon, Theerasit, Chuhang Zou, and David Forsyth. "Counterfactual Depth from a Single RGB Image." In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 2019. http://dx.doi.org/10.1109/iccvw.2019.00268.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Wenju, Wenkang Hu, Tianzhen Dong, and Jiantao Qu. "Depth Image Enhancement Algorithm Based on RGB Image Fusion." In 2018 11th International Symposium on Computational Intelligence and Design (ISCID). IEEE, 2018. http://dx.doi.org/10.1109/iscid.2018.10126.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hui, Tak-Wai, and King Ngi Ngan. "Depth enhancement using RGB-D guided filtering." In 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014. http://dx.doi.org/10.1109/icip.2014.7025778.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bai, Jinghui, Jingyu Yang, Xinchen Ye, and Chunping Hou. "Depth refinement for binocular kinect RGB-D cameras." In 2016 Visual Communications and Image Processing (VCIP). IEEE, 2016. http://dx.doi.org/10.1109/vcip.2016.7805545.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Waskitho, Suryo Aji, Ardiansyah Alfarouq, Sritrusta Sukaridhoto, and Dadet Pramadihanto. "FloW vision: Depth image enhancement by combining stereo RGB-depth sensor." In 2016 International Conference on Knowledge Creation and Intelligent Computing (KCIC). IEEE, 2016. http://dx.doi.org/10.1109/kcic.2016.7883644.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yan, Zengqiang, Li Yu, and Zixiang Xiong. "Large-area depth recovery for RGB-D camera." In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015. http://dx.doi.org/10.1109/icip.2015.7351032.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography