To see the other types of publications on this topic, follow the link: RGB-Depth Image.

Journal articles on the topic 'RGB-Depth Image'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'RGB-Depth Image.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Hengyu, Hang Liu, Ning Cao, Yan Peng, Shaorong Xie, Jun Luo, and Yu Sun. "Real-time RGB-D image stitching using multiple Kinects for improved field of view." International Journal of Advanced Robotic Systems 14, no. 2 (March 1, 2017): 172988141769556. http://dx.doi.org/10.1177/1729881417695560.

Full text
Abstract:
This article concerns the problems of a defective depth map and limited field of view of Kinect-style RGB-D sensors. An anisotropic diffusion based hole-filling method is proposed to recover invalid depth data in the depth map. The field of view of the Kinect-style RGB-D sensor is extended by stitching depth and color images from several RGB-D sensors. By aligning the depth map with the color image, the registration data calculated by registering color images can be used to stitch depth and color images into a depth and color panoramic image concurrently in real time. Experiments show that the proposed stitching method can generate a RGB-D panorama with no invalid depth data and little distortion in real time and can be extended to incorporate more RGB-D sensors to construct even a 360° field of view panoramic RGB-D image.
APA, Harvard, Vancouver, ISO, and other styles
2

Wu, Yan, Jiqian Li, and Jing Bai. "Multiple Classifiers-Based Feature Fusion for RGB-D Object Recognition." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 05 (February 27, 2017): 1750014. http://dx.doi.org/10.1142/s0218001417500148.

Full text
Abstract:
RGB-D-based object recognition has been enthusiastically investigated in the past few years. RGB and depth images provide useful and complementary information. Fusing RGB and depth features can significantly increase the accuracy of object recognition. However, previous works just simply take the depth image as the fourth channel of the RGB image and concatenate the RGB and depth features, ignoring the different power of RGB and depth information for different objects. In this paper, a new method which contains three different classifiers is proposed to fuse features extracted from RGB image and depth image for RGB-D-based object recognition. Firstly, a RGB classifier and a depth classifier are trained by cross-validation to get the accuracy difference between RGB and depth features for each object. Then a variant RGB-D classifier is trained with different initialization parameters for each class according to the accuracy difference. The variant RGB-D-classifier can result in a more robust classification performance. The proposed method is evaluated on two benchmark RGB-D datasets. Compared with previous methods, ours achieves comparable performance with the state-of-the-art method.
APA, Harvard, Vancouver, ISO, and other styles
3

OYAMA, Tadahiro, and Daisuke MATSUZAKI. "Depth Image Generation from monocular RGB image." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2019 (2019): 2P2—H09. http://dx.doi.org/10.1299/jsmermd.2019.2p2-h09.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Cao, Hao, Xin Zhao, Ang Li, and Meng Yang. "Depth Image Rectification Based on an Effective RGB–Depth Boundary Inconsistency Model." Electronics 13, no. 16 (August 22, 2024): 3330. http://dx.doi.org/10.3390/electronics13163330.

Full text
Abstract:
Depth image has been widely involved in various tasks of 3D systems with the advancement of depth acquisition sensors in recent years. Depth images suffer from serious distortions near object boundaries due to the limitations of depth sensors or estimation methods. In this paper, a simple method is proposed to rectify the erroneous object boundaries of depth images with the guidance of reference RGB images. First, an RGB–Depth boundary inconsistency model is developed to measure whether collocated pixels in depth and RGB images belong to the same object. The model extracts the structures of RGB and depth images, respectively, by Gaussian functions. The inconsistency of two collocated pixels is then statistically determined inside large-sized local windows. In this way, pixels near object boundaries of depth images are identified to be erroneous when they are inconsistent with collocated ones in RGB images. Second, a depth image rectification method is proposed by embedding the model into a simple weighted mean filter (WMF). Experiment results on two datasets verify that the proposed method well improves the RMSE and SSIM of depth images by 2.556 and 0.028, respectively, compared with recent optimization-based and learning-based methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Longyu, Hao Xia, and Yanyou Qiao. "Texture Synthesis Repair of RealSense D435i Depth Images with Object-Oriented RGB Image Segmentation." Sensors 20, no. 23 (November 24, 2020): 6725. http://dx.doi.org/10.3390/s20236725.

Full text
Abstract:
A depth camera is a kind of sensor that can directly collect distance information between an object and the camera. The RealSense D435i is a low-cost depth camera that is currently in widespread use. When collecting data, an RGB image and a depth image are acquired simultaneously. The quality of the RGB image is good, whereas the depth image typically has many holes. In a lot of applications using depth images, these holes can lead to serious problems. In this study, a repair method of depth images was proposed. The depth image is repaired using the texture synthesis algorithm with the RGB image, which is segmented through a multi-scale object-oriented method. The object difference parameter is added to the process of selecting the best sample block. In contrast with previous methods, the experimental results show that the proposed method avoids the error filling of holes, the edge of the filled holes is consistent with the edge of RGB images, and the repair accuracy is better. The root mean square error, peak signal-to-noise ratio, and structural similarity index measure from the repaired depth images and ground-truth image were better than those obtained by two other methods. We believe that the repair of the depth image can improve the effects of depth image applications.
APA, Harvard, Vancouver, ISO, and other styles
6

Kwak, Jeonghoon, and Yunsick Sung. "Automatic 3D Landmark Extraction System Based on an Encoder–Decoder Using Fusion of Vision and LiDAR." Remote Sensing 12, no. 7 (April 3, 2020): 1142. http://dx.doi.org/10.3390/rs12071142.

Full text
Abstract:
To provide a realistic environment for remote sensing applications, point clouds are used to realize a three-dimensional (3D) digital world for the user. Motion recognition of objects, e.g., humans, is required to provide realistic experiences in the 3D digital world. To recognize a user’s motions, 3D landmarks are provided by analyzing a 3D point cloud collected through a light detection and ranging (LiDAR) system or a red green blue (RGB) image collected visually. However, manual supervision is required to extract 3D landmarks as to whether they originate from the RGB image or the 3D point cloud. Thus, there is a need for a method for extracting 3D landmarks without manual supervision. Herein, an RGB image and a 3D point cloud are used to extract 3D landmarks. The 3D point cloud is utilized as the relative distance between a LiDAR and a user. Because it cannot contain all information the user’s entire body due to disparities, it cannot generate a dense depth image that provides the boundary of user’s body. Therefore, up-sampling is performed to increase the density of the depth image generated based on the 3D point cloud; the density depends on the 3D point cloud. This paper proposes a system for extracting 3D landmarks using 3D point clouds and RGB images without manual supervision. A depth image provides the boundary of a user’s motion and is generated by using 3D point cloud and RGB image collected by a LiDAR and an RGB camera, respectively. To extract 3D landmarks automatically, an encoder–decoder model is trained with the generated depth images, and the RGB images and 3D landmarks are extracted from these images with the trained encoder model. The method of extracting 3D landmarks using RGB depth (RGBD) images was verified experimentally, and 3D landmarks were extracted to evaluate the user’s motions with RGBD images. In this manner, landmarks could be extracted according to the user’s motions, rather than by extracting them using the RGB images. The depth images generated by the proposed method were 1.832 times denser than the up-sampling-based depth images generated with bilateral filtering.
APA, Harvard, Vancouver, ISO, and other styles
7

Tang, Shengjun, Qing Zhu, Wu Chen, Walid Darwish, Bo Wu, Han Hu, and Min Chen. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprsannals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combining RGB image-based modeling and depth-based modeling. The scale ambiguity problem during the pose estimation with RGB image sequences can be resolved by integrating the information from the depth and visual information provided by the proposed system. A robust rigid-transformation recovery method is developed to register the RGB image-based and depth-based 3D models together. The proposed method is examined with two datasets collected in indoor environments for which the experimental results demonstrate the feasibility and robustness of the proposed method
APA, Harvard, Vancouver, ISO, and other styles
8

Tang, Shengjun, Qing Zhu, Wu Chen, Walid Darwish, Bo Wu, Han Hu, and Min Chen. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprs-annals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combining RGB image-based modeling and depth-based modeling. The scale ambiguity problem during the pose estimation with RGB image sequences can be resolved by integrating the information from the depth and visual information provided by the proposed system. A robust rigid-transformation recovery method is developed to register the RGB image-based and depth-based 3D models together. The proposed method is examined with two datasets collected in indoor environments for which the experimental results demonstrate the feasibility and robustness of the proposed method
APA, Harvard, Vancouver, ISO, and other styles
9

Lee, Ki-Seung. "Improving the Performance of Automatic Lip-Reading Using Image Conversion Techniques." Electronics 13, no. 6 (March 9, 2024): 1032. http://dx.doi.org/10.3390/electronics13061032.

Full text
Abstract:
Variation in lighting conditions is a major cause of performance degradation in pattern recognition when using optical imaging. In this study, infrared (IR) and depth images were considered as possible robust alternatives against variations in illumination, particularly for improving the performance of automatic lip-reading. The variations due to lighting conditions were quantitatively analyzed for optical, IR, and depth images. Then, deep neural network (DNN)-based lip-reading rules were built for each image modality. Speech recognition techniques based on IR or depth imaging required an additional light source that emitted light in the IR range, along with a special camera. To mitigate this problem, we propose a method that does not use an IR/depth image directly, but instead estimates images based on the optical RGB image. To this end, a modified U-net was adopted to estimate the IR/depth image from an optical RGB image. The results show that the IR and depth images were rarely affected by the lighting conditions. The recognition rates for the optical, IR, and depth images were 48.29%, 95.76%, and 92.34%, respectively, under various lighting conditions. Using the estimated IR and depth images, the recognition rates were 89.35% and 80.42%, respectively. This was significantly higher than for the optical RGB images.
APA, Harvard, Vancouver, ISO, and other styles
10

Kao, Yueying, Weiming Li, Qiang Wang, Zhouchen Lin, Wooshik Kim, and Sunghoon Hong. "Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 07 (April 3, 2020): 11221–28. http://dx.doi.org/10.1609/aaai.v34i07.6781.

Full text
Abstract:
Monocular object pose estimation is an important yet challenging computer vision problem. Depth features can provide useful information for pose estimation. However, existing methods rely on real depth images to extract depth features, leading to its difficulty on various applications. In this paper, we aim at extracting RGB and depth features from a single RGB image with the help of synthetic RGB-depth image pairs for object pose estimation. Specifically, a deep convolutional neural network is proposed with an RGB-to-Depth Embedding module and a Synthetic-Real Adaptation module. The embedding module is trained with synthetic pair data to learn a depth-oriented embedding space between RGB and depth images optimized for object pose estimation. The adaptation module is to further align distributions from synthetic to real data. Compared to existing methods, our method does not need any real depth images and can be trained easily with large-scale synthetic data. Extensive experiments and comparisons show that our method achieves best performance on a challenging public PASCAL 3D+ dataset in all the metrics, which substantiates the superiority of our method and the above modules.
APA, Harvard, Vancouver, ISO, and other styles
11

Ding, Ing-Jr, and Nai-Wei Zheng. "CNN Deep Learning with Wavelet Image Fusion of CCD RGB-IR and Depth-Grayscale Sensor Data for Hand Gesture Intention Recognition." Sensors 22, no. 3 (January 21, 2022): 803. http://dx.doi.org/10.3390/s22030803.

Full text
Abstract:
Pixel-based images captured by a charge-coupled device (CCD) with infrared (IR) LEDs around the image sensor are the well-known CCD Red–Green–Blue IR (the so-called CCD RGB-IR) data. The CCD RGB-IR data are generally acquired for video surveillance applications. Currently, CCD RGB-IR information has been further used to perform human gesture recognition on surveillance. Gesture recognition, including hand gesture intention recognition, is attracting great attention in the field of deep neural network (DNN) calculations. For further enhancing conventional CCD RGB-IR gesture recognition by DNN, this work proposes a deep learning framework for gesture recognition where a convolution neural network (CNN) incorporated with wavelet image fusion of CCD RGB-IR and additional depth-based depth-grayscale images (captured from depth sensors of the famous Microsoft Kinect device) is constructed for gesture intention recognition. In the proposed CNN with wavelet image fusion, a five-level discrete wavelet transformation (DWT) with three different wavelet decomposition merge strategies, namely, max-min, min-max and mean-mean, is employed; the visual geometry group (VGG)-16 CNN is used for deep learning and recognition of the wavelet fused gesture images. Experiments on the classifications of ten hand gesture intention actions (specified in a scenario of laboratory interactions) show that by additionally incorporating depth-grayscale data into CCD RGB-IR gesture recognition one will be able to further increase the averaged recognition accuracy to 83.88% for the VGG-16 CNN with min-max wavelet image fusion of the CCD RGB-IR and depth-grayscale data, which is obviously superior to the 75.33% of VGG-16 CNN with only CCD RGB-IR.
APA, Harvard, Vancouver, ISO, and other styles
12

Kostusiak, Aleksander, and Piotr Skrzypczyński. "Enhancing Visual Odometry with Estimated Scene Depth: Leveraging RGB-D Data with Deep Learning." Electronics 13, no. 14 (July 13, 2024): 2755. http://dx.doi.org/10.3390/electronics13142755.

Full text
Abstract:
Advances in visual odometry (VO) systems have benefited from the widespread use of affordable RGB-D cameras, improving indoor localization and mapping accuracy. However, older sensors like the Kinect v1 face challenges due to depth inaccuracies and incomplete data. This study compares indoor VO systems that use RGB-D images, exploring methods to enhance depth information. We examine conventional image inpainting techniques and a deep learning approach, utilizing newer depth data from devices like the Kinect v2. Our research highlights the importance of refining data from lower-quality sensors, which is crucial for cost-effective VO applications. By integrating deep learning models with richer context from RGB images and more comprehensive depth references, we demonstrate improved trajectory estimation compared to standard methods. This work advances budget-friendly RGB-D VO systems for indoor mobile robots, emphasizing deep learning’s role in leveraging connections between image appearance and depth data.
APA, Harvard, Vancouver, ISO, and other styles
13

Zhao, Bohu, Lebao Li, and Haipeng Pan. "Non-Local Means Hole Repair Algorithm Based on Adaptive Block." Applied Sciences 14, no. 1 (December 24, 2023): 159. http://dx.doi.org/10.3390/app14010159.

Full text
Abstract:
RGB-D cameras provide depth and color information and are widely used in 3D reconstruction and computer vision. In the majority of existing RGB-D cameras, a considerable portion of depth values is often lost due to severe occlusion or limited camera coverage, thereby adversely impacting the precise localization and three-dimensional reconstruction of objects. In this paper, to address the issue of poor-quality in-depth images captured by RGB-D cameras, a depth image hole repair algorithm based on non-local means is proposed first, leveraging the structural similarities between grayscale and depth images. Second, while considering the cumbersome parameter tuning associated with the non-local means hole repair method for determining the size of structural blocks for depth image hole repair, an intelligent block factor is introduced, which automatically determines the optimal search and repair block sizes for various hole sizes, resulting in the development of an adaptive block-based non-local means algorithm for repairing depth image holes. Furthermore, the proposed algorithm’s performance are evaluated using both the Middlebury stereo matching dataset and a self-constructed RGB-D dataset, with performance assessment being carried out by comparing the algorithm against other methods using five metrics: RMSE, SSIM, PSNR, DE, and ALME. Finally, experimental results unequivocally demonstrate the innovative resolution of the parameter tuning complexity inherent in-depth image hole repair, effectively filling the holes, suppressing noise within depth images, enhancing image quality, and achieving elevated precision and accuracy, as affirmed by the attained results.
APA, Harvard, Vancouver, ISO, and other styles
14

Peng, M., W. Wan, Y. Xing, Y. Wang, Z. Liu, K. Di, Q. Zhao, B. Teng, and X. Mao. "INTEGRATING DEPTH AND IMAGE SEQUENCES FOR PLANETARY ROVER MAPPING USING RGB-D SENSOR." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-3 (April 30, 2018): 1369–74. http://dx.doi.org/10.5194/isprs-archives-xlii-3-1369-2018.

Full text
Abstract:
RGB-D camera allows the capture of depth and color information at high data rates, and this makes it possible and beneficial integrate depth and image sequences for planetary rover mapping. The proposed mapping method consists of three steps. First, the strict projection relationship among 3D space, depth data and visual texture data is established based on the imaging principle of RGB-D camera, then, an extended bundle adjustment (BA) based SLAM method with integrated 2D and 3D measurements is applied to the image network for high-precision pose estimation. Next, as the interior and exterior elements of RGB images sequence are available, dense matching is completed with the CMPMVS tool. Finally, according to the registration parameters after ICP, the 3D scene from RGB images can be registered to the 3D scene from depth images well, and the fused point cloud can be obtained. Experiment was performed in an outdoor field to simulate the lunar surface. The experimental results demonstrated the feasibility of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
15

Zhang, Xiaomin, Yanning Zhang, Jinfeng Geng, Jinming Pan, Xinyao Huang, and Xiuqin Rao. "Feather Damage Monitoring System Using RGB-Depth-Thermal Model for Chickens." Animals 13, no. 1 (December 28, 2022): 126. http://dx.doi.org/10.3390/ani13010126.

Full text
Abstract:
Feather damage is a continuous health and welfare challenge among laying hens. Infrared thermography is a tool that can evaluate the changes in the surface temperature, derived from an inflammatory process that would make it possible to objectively determine the depth of the damage to the dermis. Therefore, the objective of this article was to develop an approach to feather damage assessment based on visible light and infrared thermography. Fusing information obtained from these two bands can highlight their strengths, which is more evident in the assessment of feather damage. A novel pipeline was proposed to reconstruct the RGB-Depth-Thermal maps of the chicken using binocular color cameras and a thermal infrared camera. The process of stereo matching based on binocular color images allowed for a depth image to be obtained. Then, a heterogeneous image registration method was presented to achieve image alignment between thermal infrared and color images so that the thermal infrared image was also aligned with the depth image. The chicken image was segmented from the background using a deep learning-based network based on the color and depth images. Four kinds of images, namely, color, depth, thermal and mask, were utilized as inputs to reconstruct the 3D model of a chicken with RGB-Depth-Thermal maps. The depth of feather damage can be better assessed with the proposed model compared to the 2D thermal infrared image or color image during both day and night, which provided a reference for further research in poultry farming.
APA, Harvard, Vancouver, ISO, and other styles
16

Chen, Songnan, Mengxia Tang, Ruifang Dong, and Jiangming Kan. "Encoder–Decoder Structure Fusing Depth Information for Outdoor Semantic Segmentation." Applied Sciences 13, no. 17 (September 1, 2023): 9924. http://dx.doi.org/10.3390/app13179924.

Full text
Abstract:
The semantic segmentation of outdoor images is the cornerstone of scene understanding and plays a crucial role in the autonomous navigation of robots. Although RGB–D images can provide additional depth information for improving the performance of semantic segmentation tasks, current state–of–the–art methods directly use ground truth depth maps for depth information fusion, which relies on highly developed and expensive depth sensors. Aiming to solve such a problem, we proposed a self–calibrated RGB-D image semantic segmentation neural network model based on an improved residual network without relying on depth sensors, which utilizes multi-modal information from depth maps predicted with depth estimation models and RGB image fusion for image semantic segmentation to enhance the understanding of a scene. First, we designed a novel convolution neural network (CNN) with an encoding and decoding structure as our semantic segmentation model. The encoder was constructed using IResNet to extract the semantic features of the RGB image and the predicted depth map and then effectively fuse them with the self–calibration fusion structure. The decoder restored the resolution of the output features with a series of successive upsampling structures. Second, we presented a feature pyramid attention mechanism to extract the fused information at multiple scales and obtain features with rich semantic information. The experimental results using the publicly available Cityscapes dataset and collected forest scene images show that our model trained with the estimated depth information can achieve comparable performance to the ground truth depth map in improving the accuracy of the semantic segmentation task and even outperforming some competitive methods.
APA, Harvard, Vancouver, ISO, and other styles
17

Büker, Linda Christin, Finnja Zuber, Andreas Hein, and Sebastian Fudickar. "HRDepthNet: Depth Image-Based Marker-Less Tracking of Body Joints." Sensors 21, no. 4 (February 14, 2021): 1356. http://dx.doi.org/10.3390/s21041356.

Full text
Abstract:
With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).
APA, Harvard, Vancouver, ISO, and other styles
18

Yan, Zhiqiang, Hongyuan Wang, Qianhao Ning, and Yinxi Lu. "Robust Image Matching Based on Image Feature and Depth Information Fusion." Machines 10, no. 6 (June 8, 2022): 456. http://dx.doi.org/10.3390/machines10060456.

Full text
Abstract:
In this paper, we propose a robust image feature extraction and fusion method to effectively fuse image feature and depth information and improve the registration accuracy of RGB-D images. The proposed method directly splices the image feature point descriptors with the corresponding point cloud feature descriptors to obtain the fusion descriptor of the feature points. The fusion feature descriptor is constructed based on the SIFT, SURF, and ORB feature descriptors and the PFH and FPFH point cloud feature descriptors. Furthermore, the registration performance based on fusion features is tested through the RGB-D datasets of YCB and KITTI. ORBPFH reduces the false-matching rate by 4.66~16.66%, and ORBFPFH reduces the false-matching rate by 9~20%. The experimental results show that the RGB-D robust feature extraction and fusion method proposed in this paper is suitable for the fusion of ORB with PFH and FPFH, which can improve feature representation and registration, representing a novel approach for RGB-D image matching.
APA, Harvard, Vancouver, ISO, and other styles
19

Uddin, Md Kamal, Amran Bhuiyan, and Mahmudul Hasan. "Fusion in Dissimilarity Space Between RGB D and Skeleton for Person Re Identification." International Journal of Innovative Technology and Exploring Engineering 10, no. 12 (October 30, 2021): 69–75. http://dx.doi.org/10.35940/ijitee.l9566.10101221.

Full text
Abstract:
Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to tackle these constraints. This work takes depth images and skeleton joint points as additional information along with RGB appearance cues and proposes a person re-identification method. We combine 4-channel RGB-D image features with skeleton information using score-level fusion strategy in dissimilarity space to increase re-identification accuracy. Moreover, our propose method overcomes the illumination problem because we use illumination invariant depth image and skeleton information. We carried out rigorous experiments on two publicly available RGBD-ID re-identification datasets and proved the use of combined features of 4-channel RGB-D images and skeleton information boost up the rank 1 recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
20

Jiao, Yuzhong, Kayton Wai Keung Cheung, Mark Ping Chan Mok, and Yiu Kei Li. "Spatial Distance-based Interpolation Algorithm for Computer Generated 2D+Z Images." Electronic Imaging 2020, no. 2 (January 26, 2020): 140–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.2.sda-140.

Full text
Abstract:
Computer generated 2D plus Depth (2D+Z) images are common input data for 3D display with depth image-based rendering (DIBR) technique. Due to their simplicity, linear interpolation methods are usually used to convert low-resolution images into high-resolution images for not only depth maps but also 2D RGB images. However linear methods suffer from zigzag artifacts in both depth map and RGB images, which severely affects the 3D visual experience. In this paper, spatial distance-based interpolation algorithm for computer generated 2D+Z images is proposed. The method interpolates RGB images with the help of depth and edge information from depth maps. Spatial distance from interpolated pixel to surrounding available pixels is utilized to obtain the weight factors of surrounding pixels. Experiment results show that such spatial distance-based interpolation can achieve sharp edges and less artifacts for 2D RGB images. Naturally, it can improve the performance of 3D display. Since bilinear interpolation is used in homogenous areas, the proposed algorithm keeps low computational complexity.
APA, Harvard, Vancouver, ISO, and other styles
21

Wang, Z., T. Li, L. Pan, and Z. Kang. "SCENE SEMANTIC SEGMENTATION FROM INDOOR RGB-D IMAGES USING ENCODE-DECODER FULLY CONVOLUTIONAL NETWORKS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W7 (September 12, 2017): 397–404. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w7-397-2017.

Full text
Abstract:
With increasing attention for the indoor environment and the development of low-cost RGB-D sensors, indoor RGB-D images are easily acquired. However, scene semantic segmentation is still an open area, which restricts indoor applications. The depth information can help to distinguish the regions which are difficult to be segmented out from the RGB images with similar color or texture in the indoor scenes. How to utilize the depth information is the key problem of semantic segmentation for RGB-D images. In this paper, we propose an Encode-Decoder Fully Convolutional Networks for RGB-D image classification. We use Multiple Kernel Maximum Mean Discrepancy (MK-MMD) as a distance measure to find common and special features of RGB and D images in the network to enhance performance of classification automatically. To explore better methods of applying MMD, we designed two strategies; the first calculates MMD for each feature map, and the other calculates MMD for whole batch features. Based on the result of classification, we use the full connect CRFs for the semantic segmentation. The experimental results show that our method can achieve a good performance on indoor RGB-D image semantic segmentation.
APA, Harvard, Vancouver, ISO, and other styles
22

Zheng, Huiming, and Wei Gao. "End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 7562–70. http://dx.doi.org/10.1609/aaai.v38i7.28588.

Full text
Abstract:
As a kind of 3D data, RGB-D images have been extensively used in object tracking, 3D reconstruction, remote sensing mapping, and other tasks. In the realm of computer vision, the significance of RGB-D images is progressively growing. However, the existing learning-based image compression methods usually process RGB images and depth images separately, which cannot entirely exploit the redundant information between the modalities, limiting the further improvement of the Rate-Distortion performance. With the goal of overcoming the defect, in this paper, we propose a learning-based dual-branch RGB-D image compression framework. Compared with traditional RGB domain compression scheme, a YUV domain compression scheme is presented for spatial redundancy removal. In addition, Intra-Modality Attention (IMA) and Cross-Modality Attention (CMA) are introduced for modal redundancy removal. For the sake of benefiting from cross-modal prior information, Context Prediction Module (CPM) and Context Fusion Module (CFM) are raised in the conditional entropy model which makes the context probability prediction more accurate. The experimental results demonstrate our method outperforms existing image compression methods in two RGB-D image datasets. Compared with BPG, our proposed framework can achieve up to 15% bit rate saving for RGB images.
APA, Harvard, Vancouver, ISO, and other styles
23

Jiang, Ming-xin, Chao Deng, Ming-min Zhang, Jing-song Shan, and Haiyan Zhang. "Multimodal Deep Feature Fusion (MMDFF) for RGB-D Tracking." Complexity 2018 (November 28, 2018): 1–8. http://dx.doi.org/10.1155/2018/5676095.

Full text
Abstract:
Visual tracking is still a challenging task due to occlusion, appearance changes, complex motion, etc. We propose a novel RGB-D tracker based on multimodal deep feature fusion (MMDFF) in this paper. MMDFF model consists of four deep Convolutional Neural Networks (CNNs): Motion-specific CNN, RGB- specific CNN, Depth-specific CNN, and RGB-Depth correlated CNN. The depth image is encoded into three channels which are sent into depth-specific CNN to extract deep depth features. The optical flow image is calculated for every frame and then is fed to motion-specific CNN to learn deep motion features. Deep RGB, depth, and motion information can be effectively fused at multiple layers via MMDFF model. Finally, multimodal fusion deep features are sent into the C-COT tracker to obtain the tracking result. For evaluation, experiments are conducted on two recent large-scale RGB-D datasets and results demonstrate that our proposed RGB-D tracking method achieves better performance than other state-of-art RGB-D trackers.
APA, Harvard, Vancouver, ISO, and other styles
24

Kozlova, Y. K., and V. V. Myasnikov. "Head model reconstruction and animation method using color image with depth information." Computer Optics 48, no. 1 (February 2024): 118–22. http://dx.doi.org/10.18287/2412-6179-co-1334.

Full text
Abstract:
The article presents a method for reconstructing and animating a digital model of a human head from a single RGBD image, a color RGB image with depth information. An approach is proposed for optimizing the parametric FLAME model using a point cloud of a face corresponding to a single RGBD image. The results of experimental studies have shown that the proposed optimization approach makes it possible to obtain a head model with more prominent features of the original face compared to optimization approaches using RGB images or the same approaches generalized to RGBD images.
APA, Harvard, Vancouver, ISO, and other styles
25

Lv, Ying, and Wujie Zhou. "Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency." Computational Intelligence and Neuroscience 2020 (November 20, 2020): 1–9. http://dx.doi.org/10.1155/2020/8841681.

Full text
Abstract:
Visual saliency prediction for RGB-D images is more challenging than that for their RGB counterparts. Additionally, very few investigations have been undertaken concerning RGB-D-saliency prediction. The proposed study presents a method based on a hierarchical multimodal adaptive fusion (HMAF) network to facilitate end-to-end prediction of RGB-D saliency. In the proposed method, hierarchical (multilevel) multimodal features are first extracted from an RGB image and depth map using a VGG-16-based two-stream network. Subsequently, the most significant hierarchical features of the said RGB image and depth map are predicted using three two-input attention modules. Furthermore, adaptive fusion of saliencies concerning the above-mentioned fused saliency features of different levels (hierarchical fusion saliency features) can be accomplished using a three-input attention module to facilitate high-accuracy RGB-D visual saliency prediction. Comparisons based on the application of the proposed HMAF-based approach against those of other state-of-the-art techniques on two challenging RGB-D datasets demonstrate that the proposed method outperforms other competing approaches consistently by a considerable margin.
APA, Harvard, Vancouver, ISO, and other styles
26

Sun, Wenbo, Zhi Gao, Jinqiang Cui, Bharath Ramesh, Bin Zhang, and Ziyao Li. "Semantic Segmentation Leveraging Simultaneous Depth Estimation." Sensors 21, no. 3 (January 20, 2021): 690. http://dx.doi.org/10.3390/s21030690.

Full text
Abstract:
Semantic segmentation is one of the most widely studied problems in computer vision communities, which makes a great contribution to a variety of applications. A lot of learning-based approaches, such as Convolutional Neural Network (CNN), have made a vast contribution to this problem. While rich context information of the input images can be learned from multi-scale receptive fields by convolutions with deep layers, traditional CNNs have great difficulty in learning the geometrical relationship and distribution of objects in the RGB image due to the lack of depth information, which may lead to an inferior segmentation quality. To solve this problem, we propose a method that improves segmentation quality with depth estimation on RGB images. Specifically, we estimate depth information on RGB images via a depth estimation network, and then feed the depth map into the CNN which is able to guide the semantic segmentation. Furthermore, in order to parse the depth map and RGB images simultaneously, we construct a multi-branch encoder–decoder network and fuse the RGB and depth features step by step. Extensive experimental evaluation on four baseline networks demonstrates that our proposed method can enhance the segmentation quality considerably and obtain better performance compared to other segmentation networks.
APA, Harvard, Vancouver, ISO, and other styles
27

Cai, Ziyun, Yang Long, and Ling Shao. "Adaptive RGB Image Recognition by Visual-Depth Embedding." IEEE Transactions on Image Processing 27, no. 5 (May 2018): 2471–83. http://dx.doi.org/10.1109/tip.2018.2806839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Kanda, Takuya, Kazuya Miyakawa, Jeonghwang Hayashi, Jun Ohya, Hiroyuki Ogata, Kenji Hashimoto, Xiao Sun, Takashi Matsuzawa, Hiroshi Naito, and Atsuo Takanishi. "Locating Mechanical Switches Using RGB-D Sensor Mounted on a Disaster Response Robot." Electronic Imaging 2020, no. 6 (January 26, 2020): 16–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.6.iriacv-016.

Full text
Abstract:
To achieve one of the tasks required for disaster response robots, this paper proposes a method for locating 3D structured switches’ points to be pressed by the robot in disaster sites using RGBD images acquired by Kinect sensor attached to our disaster response robot. Our method consists of the following five steps: 1)Obtain RGB and depth images using an RGB-D sensor. 2) Detect the bounding box of switch area from the RGB image using YOLOv3. 3)Generate 3D point cloud data of the target switch by combining the bounding box and the depth image.4)Detect the center position of the switch button from the RGB image in the bounding box using Convolutional Neural Network (CNN). 5)Estimate the center of the button’s face in real space from the detection result in step 4) and the 3D point cloud data generated in step3) In the experiment, the proposed method is applied to two types of 3D structured switch boxes to evaluate the effectiveness. The results show that our proposed method can locate the switch button accurately enough for the robot operation.
APA, Harvard, Vancouver, ISO, and other styles
29

Salazar, Isail, Said Pertuz, and Fabio Martínez. "Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps." TecnoLógicas 23, no. 48 (May 15, 2020): 143–61. http://dx.doi.org/10.22430/22565337.1538.

Full text
Abstract:
Classical image segmentation algorithms exploit the detection of similarities and discontinuities of different visual cues to define and differentiate multiple regions of interest in images. However, due to the high variability and uncertainty of image data, producing accurate results is difficult. In other words, segmentation based just on color is often insufficient for a large percentage of real-life scenes. This work presents a novel multi-modal segmentation strategy that integrates depth and appearance cues from RGB-D images by building a hierarchical region-based representation, i.e., a multi-modal segmentation tree (MM-tree). For this purpose, RGB-D image pairs are represented in a complementary fashion by different segmentation maps. Based on color images, a color segmentation tree (C-tree) is created to obtain segmented and over-segmented maps. From depth images, two independent segmentation maps are derived by computing planar and 3D edge primitives. Then, an iterative region merging process can be used to locally group the previously obtained maps into the MM-tree. Finally, the top emerging MM-tree level coherently integrates the available information from depth and appearance maps. The experiments were conducted using the NYU-Depth V2 RGB-D dataset, which demonstrated the competitive results of our strategy compared to state-of-the-art segmentation methods. Specifically, using test images, our method reached average scores of 0.56 in Segmentation Covering and 2.13 in Variation of Information.
APA, Harvard, Vancouver, ISO, and other styles
30

Feng, Guanyuan, Lin Ma, and Xuezhi Tan. "Visual Map Construction Using RGB-D Sensors for Image-Based Localization in Indoor Environments." Journal of Sensors 2017 (2017): 1–18. http://dx.doi.org/10.1155/2017/8037607.

Full text
Abstract:
RGB-D sensors capture RGB images and depth images simultaneously, which makes it possible to acquire the depth information at pixel level. This paper focuses on the use of RGB-D sensors to construct a visual map which is an extended dense 3D map containing essential elements for image-based localization, such as poses of the database camera, visual features, and 3D structures of the building. Taking advantage of matched visual features and corresponding depth values, a novel local optimization algorithm is proposed to achieve point cloud registration and database camera pose estimation. Next, graph-based optimization is used to obtain the global consistency of the map. On the basis of the visual map, the image-based localization method is investigated, making use of the epipolar constraint. The performance of the visual map construction and the image-based localization are evaluated on typical indoor scenes. The simulation results show that the average position errors of the database camera and the query camera can be limited to within 0.2 meters and 0.9 meters, respectively.
APA, Harvard, Vancouver, ISO, and other styles
31

Li, Shipeng, Di Li, Chunhua Zhang, Jiafu Wan, and Mingyou Xie. "RGB-D Image Processing Algorithm for Target Recognition and Pose Estimation of Visual Servo System." Sensors 20, no. 2 (January 12, 2020): 430. http://dx.doi.org/10.3390/s20020430.

Full text
Abstract:
This paper studies the control performance of visual servoing system under the planar camera and RGB-D cameras, the contribution of this paper is through rapid identification of target RGB-D images and precise measurement of depth direction to strengthen the performance indicators of visual servoing system such as real time and accuracy, etc. Firstly, color images acquired by the RGB-D camera are segmented based on optimized normalized cuts. Next, the gray scale is restored according to the histogram feature of the target image. Then, the obtained 2D graphics depth information and the enhanced gray image information are distort merged to complete the target pose estimation based on the Hausdorff distance, and the current image pose is matched with the target image pose. The end angle and the speed of the robot are calculated to complete a control cycle and the process is iterated until the servo task is completed. Finally, the performance index of this control system based on proposed algorithm is tested about accuracy, real-time under position-based visual servoing system. The results demonstrate and validate that the RGB-D image processing algorithm proposed in this paper has the performance in the above aspects of the visual servoing system.
APA, Harvard, Vancouver, ISO, and other styles
32

Hristova, H., M. Abegg, C. Fischer, and N. Rehush. "MONOCULAR DEPTH ESTIMATION IN FOREST ENVIRONMENTS." International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2022 (May 30, 2022): 1017–23. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2022-1017-2022.

Full text
Abstract:
Abstract. Depth estimation from a single image is a challenging task, especially inside the highly structured forest environment. In this paper, we propose a supervised deep learning model for monocular depth estimation based on forest imagery. We train our model on a new data set of forest RGB-D images that we collected using a terrestrial laser scanner. Alongside the input RGB image, our model uses a sparse depth channel as input to recover the dense depth information. The prediction accuracy of our model is significantly higher than that of state-of-the-art methods when applied in the context of forest depth estimation. Our model brings the RMSE down to 2.1 m, compared to 4 m and above for reference methods.
APA, Harvard, Vancouver, ISO, and other styles
33

Kong, Yuqiu, He Wang, Lingwei Kong, Yang Liu, Cuili Yao, and Baocai Yin. "Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection." Sensors 23, no. 7 (March 30, 2023): 3611. http://dx.doi.org/10.3390/s23073611.

Full text
Abstract:
Detecting salient objects in complicated scenarios is a challenging problem. Except for semantic features from the RGB image, spatial information from the depth image also provides sufficient cues about the object. Therefore, it is crucial to rationally integrate RGB and depth features for the RGB-D salient object detection task. Most existing RGB-D saliency detectors modulate RGB semantic features with absolution depth values. However, they ignore the appearance contrast and structure knowledge indicated by relative depth values between pixels. In this work, we propose a depth-induced network (DIN) for RGB-D salient object detection, to take full advantage of both absolute and relative depth information, and further, enforce the in-depth fusion of the RGB-D cross-modalities. Specifically, an absolute depth-induced module (ADIM) is proposed, to hierarchically integrate absolute depth values and RGB features, to allow the interaction between the appearance and structural information in the encoding stage. A relative depth-induced module (RDIM) is designed, to capture detailed saliency cues, by exploring contrastive and structural information from relative depth values in the decoding stage. By combining the ADIM and RDIM, we can accurately locate salient objects with clear boundaries, even from complex scenes. The proposed DIN is a lightweight network, and the model size is much smaller than that of state-of-the-art algorithms. Extensive experiments on six challenging benchmarks, show that our method outperforms most existing RGB-D salient object detection models.
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Botao, Kai Chen, Sheng-Lung Peng, and Ming Zhao. "Depth Map Super-Resolution Based on Semi-Couple Deformable Convolution Networks." Mathematics 11, no. 21 (November 5, 2023): 4556. http://dx.doi.org/10.3390/math11214556.

Full text
Abstract:
Depth images obtained from lightweight, real-time depth estimation models and consumer-oriented sensors typically have low-resolution issues. Traditional interpolation methods for depth image up-sampling result in a significant information loss, especially in edges with discontinuous depth variations (depth discontinuities). To address this issue, this paper proposes a semi-coupled deformable convolution network (SCD-Net) based on the idea of guided depth map super-resolution (GDSR). The method employs a semi-coupled feature extraction scheme to learn unique and similar features between RGB images and depth images. We utilize a Coordinate Attention (CA) to suppress redundant information in RGB features. Finally, a deformable convolutional module is employed to restore the original resolution of the depth image. The model is tested on NYUv2, Middlebury, Lu, and a Real-Sense real-world dataset created using an Intel Real-sense D455 structured-light camera. The super-resolution accuracy of SCD-Net at multiple scales is much higher than that of traditional methods and superior to recent state-of-the-art (SOTA) models, which demonstrates the effectiveness and flexibility of our model on GDSR tasks. In particular, our method further solves the problem of an RGB texture being over-transferred in GDSR tasks.
APA, Harvard, Vancouver, ISO, and other styles
35

Du, Qinsheng, Yingxu Bian, Jianyu Wu, Shiyan Zhang, and Jian Zhao. "Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection." Applied Sciences 14, no. 17 (August 23, 2024): 7440. http://dx.doi.org/10.3390/app14177440.

Full text
Abstract:
The salient object detection (SOD) task aims to automatically detect the most prominent areas observed by the human eye in an image. Since RGB images and depth images contain different information, how to effectively integrate cross-modal features in the RGB-D SOD task remains a major challenge. Therefore, this paper proposes a cross-modal adaptive interaction network (CMANet) for the RGB-D salient object detection task, which consists of a cross-modal feature integration module (CMF) and an adaptive feature fusion module (AFFM). These modules are designed to integrate and enhance multi-scale features from both modalities, improve the effect of integrating cross-modal complementary information of RGB and depth images, enhance feature information, and generate richer and more representative feature maps. Extensive experiments were conducted on four RGB-D datasets to verify the effectiveness of CMANet. Compared with 17 RGB-D SOD methods, our model accurately detects salient regions in images and achieves state-of-the-art performance across four evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
36

Chinnala Balakrishna and Shepuri Srinivasulu. "Astronomical bodies detection with stacking of CoAtNets by fusion of RGB and depth Images." International Journal of Science and Research Archive 12, no. 2 (July 30, 2024): 423–27. http://dx.doi.org/10.30574/ijsra.2024.12.2.1234.

Full text
Abstract:
Space situational awareness (SSA) system requires detection of space objects that are varied in sizes, shapes, and types. The space images are difficult because of various factors such as illumination and noise and as a result make the recognition task complex. Image fusion is an important area in image processing for a variety of applications including RGB-D sensor fusion, remote sensing, medical diagnostics, and infrared and visible image fusion. In recent times, various image fusion algorithms have been developed and they showed a superior performance to explore more information that is not available in single images. In this paper I compared various methods of RGB and Depth image fusion for space object classification task. The experiments were carried out, and the performance was evaluated using fusion performance metrics. It was found that the guided filter context enhancement (GFCE) outperformed other image fusion methods in terms of average gradient, spatial frequency, and entropy. Additionally, due to its ability to balance between good performance and inference speed, GFCE was selected for RGB and Depth image fusion stage before feature extraction and classification stage. The outcome of fusion method is merged images that were used to train a deep assembly of CoAtNets to classify space objects into ten categories. The deep ensemble learning methods including bagging, boosting, and stacking were trained and evaluated for classification purposes. It was found that combination of fusion and stacking was able to improve classification accuracy.
APA, Harvard, Vancouver, ISO, and other styles
37

Zhou, Yang, Danqing Chen, Jun Wu, Mingyi Huang, and Yubin Weng. "Calibration of RGB-D Camera Using Depth Correction Model." Journal of Physics: Conference Series 2203, no. 1 (February 1, 2022): 012032. http://dx.doi.org/10.1088/1742-6596/2203/1/012032.

Full text
Abstract:
Abstract This paper proposes a calibration method of RGB-D camera, especially its depth camera. First, use a checkerboard calibration board under auxiliary Infrared light source to collect calibration images. Then, the internal and external parameters of the depth camera are calculated by Zhang’s calibration method, which improves the accuracy of the internal parameter. Next, the depth correction model is proposed to directly calibrate the distortion of the depth image, which is more intuitive and faster than the disparity distortion correction model. This method is simple, high-precision, and suitable for most depth cameras.
APA, Harvard, Vancouver, ISO, and other styles
38

Vashpanov, Yuriy, Jung-Young Son, Gwanghee Heo, Tatyana Podousova, and Yong Suk Kim. "Determination of Geometric Parameters of Cracks in Concrete by Image Processing." Advances in Civil Engineering 2019 (October 30, 2019): 1–14. http://dx.doi.org/10.1155/2019/2398124.

Full text
Abstract:
The 8-bit RGB image of a cracked concrete surface, obtained with a high-resolution camera based on a close-distance photographing and using an optical microscope, is used to estimate the geometrical parameters of the crack. The parameters such as the crack’s width, depth, and morphology can be determined by the pixel intensity distribution of the image. For the estimation, the image is transformed into 16-bit gray scale to enhance the geometrical parameters of the crack and then a mathematical relationship relating the intensity distribution with the depth and width is derived based on the enhanced image. This relationship enables to estimate the width and depth with ±10% and ±15% accuracy, respectively, for the crack samples used for the experiments. It is expected that the accuracy can be further improved if the 8-bit RGB image is synthesized by the images of the cracks obtained with different illumination directions.
APA, Harvard, Vancouver, ISO, and other styles
39

Chi, Chen Tung, Shih Chien Yang, and Yin Tien Wang. "Calibration of RGB-D Sensors for Robot SLAM." Applied Mechanics and Materials 479-480 (December 2013): 677–81. http://dx.doi.org/10.4028/www.scientific.net/amm.479-480.677.

Full text
Abstract:
This paper presents a calibration procedure for a Kinect RGB-D sensor and its application to robot simultaneous localization and mapping(SLAM). The calibration procedure consists of two stages: in the first stage, the RGB image is aligned with the depth image by using the bilinear interpolation. The distorted RGB image is further corrected in the second stage. The calibrated RGB-D sensor is used as the sensing device for robot navigation in unknown environment. In SLAM tasks, the speeded-up robust features (SURF) are detected from the RGB image and used as landmarks in the environment map. The depth image could provide the stereo information of each landmark. Meanwhile, the robot estimates its own state and landmark locations by mean of the Extended Kalman filter (EKF). The EKF SLAM has been carried out in the paper and the experimental results showed that the Kinect sensors could provide the mobile robot reliable measurement information when navigating in unknown environment.
APA, Harvard, Vancouver, ISO, and other styles
40

Zeng, Hui, Bin Yang, Xiuqing Wang, Jiwei Liu, and Dongmei Fu. "RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory." Sensors 19, no. 3 (January 27, 2019): 529. http://dx.doi.org/10.3390/s19030529.

Full text
Abstract:
With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-D object recognition has attracted more and more researchers’ attention in recent years. The deep learning technique has become popular in the field of image analysis and has achieved competitive results. To make full use of the effective identification information in the RGB and depth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer) evidence theory based RGB-D object recognition method. First, the RGB and depth images are preprocessed and two convolutional neural networks are trained, respectively. Next, we perform multi-modal feature learning using the proposed quadruplet samples based objective function to fine-tune the network parameters. Then, two probability classification results are obtained using two sigmoid SVMs (Support Vector Machines) with the learned RGB and depth features. Finally, the DS evidence theory based decision fusion method is used for integrating the two classification results. Compared with other RGB-D object recognition methods, our proposed method adopts two fusion strategies: Multi-modal feature learning and DS decision fusion. Both the discriminative information of each modality and the correlation information between the two modalities are exploited. Extensive experimental results have validated the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
41

Zhang, Zhijie, Yan Liu, Junjie Chen, Li Niu, and Liqing Zhang. "Depth Privileged Object Detection in Indoor Scenes via Deformation Hallucination." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3456–64. http://dx.doi.org/10.1609/aaai.v35i4.16459.

Full text
Abstract:
RGB-D object detection has achieved significant advance, because depth provides complementary geometric information to RGB images. Considering depth images are unavailable in some scenarios, we focus on depth privileged object detection in indoor scenes, where the depth images are only available in the training phase. Under this setting, one prevalent research line is modality hallucination, in which depth image and depth feature are the common choices for hallucinating. In contrast, we choose to hallucinate depth deformation, which is explicit geometric information and efficient to hallucinate. Specifically, we employ the deformable convolution layer with augmented offsets as our deformation module and regard the offsets as geometric deformation, because the offsets enable flexibly sampling over the object and transforming to a canonical shape for ease of detection. In addition, we design a quality-based mechanism to avoid negative transfer of depth deformation. Experimental results and analyses on NYUDv2 and SUN RGB-D demonstrate the effectiveness of our method against the state-of-the-art methods for depth privileged object detection.
APA, Harvard, Vancouver, ISO, and other styles
42

Gopalapillai, Radhakrishnan, Deepa Gupta, Mohammed Zakariah, and Yousef Ajami Alotaibi. "Convolution-Based Encoding of Depth Images for Transfer Learning in RGB-D Scene Classification." Sensors 21, no. 23 (November 28, 2021): 7950. http://dx.doi.org/10.3390/s21237950.

Full text
Abstract:
Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel’s local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.
APA, Harvard, Vancouver, ISO, and other styles
43

Roman-Rivera, Luis-Rogelio, Israel Sotelo-Rodríguez, Jesus Carlos Pedraza-Ortega, Marco Antonio Aceves-Fernandez, Juan Manuel Ramos-Arreguín, and Efrén Gorrostieta-Hurtado. "Reduced Calibration Strategy Using a Basketball for RGB-D Cameras." Mathematics 10, no. 12 (June 16, 2022): 2085. http://dx.doi.org/10.3390/math10122085.

Full text
Abstract:
RGB-D cameras produce depth and color information commonly used in the 3D reconstruction and vision computer areas. Different cameras with the same model usually produce images with different calibration errors. The color and depth layer usually requires calibration to minimize alignment errors, adjust precision, and improve data quality in general. Standard calibration protocols for RGB-D cameras require a controlled environment to allow operators to take many RGB and depth pair images as an input for calibration frameworks making the calibration protocol challenging to implement without ideal conditions and the operator experience. In this work, we proposed a novel strategy that simplifies the calibration protocol by requiring fewer images than other methods. Our strategy uses an ordinary object, a know-size basketball, as a ground truth sphere geometry during the calibration. Our experiments show comparable results requiring fewer images and non-ideal scene conditions than a reference method to align color and depth image layers.
APA, Harvard, Vancouver, ISO, and other styles
44

Chen, Songnan, Mengxia Tang, and Jiangming Kan. "Predicting Depth from Single RGB Images with Pyramidal Three-Streamed Networks." Sensors 19, no. 3 (February 6, 2019): 667. http://dx.doi.org/10.3390/s19030667.

Full text
Abstract:
Predicting depth from a monocular image is an ill-posed and inherently ambiguous issue in computer vision. In this paper, we propose a pyramidal third-streamed network (PTSN) that recovers the depth information using a single given RGB image. PTSN uses pyramidal structure images, which can extract multiresolution features to improve the robustness of the network as the network input. The full connection layer is changed into fully convolutional layers with a new upconvolution structure, which reduces the network parameters and computational complexity. We propose a new loss function including scale-invariant, horizontal and vertical gradient loss that not only helps predict the depth values, but also clearly obtains local contours. We evaluate PTSN on the NYU Depth v2 dataset and the experimental results show that our depth predictions have better accuracy than competing methods.
APA, Harvard, Vancouver, ISO, and other styles
45

Wang, Dongyu. "Automatic Depth Image Generation for Unseen Object Amodal Instance Segmentation." Journal of Physics: Conference Series 2405, no. 1 (December 1, 2022): 012023. http://dx.doi.org/10.1088/1742-6596/2405/1/012023.

Full text
Abstract:
Abstract This research proposes an automatic instance-wise depth image estimation method for Unseen Object Amodal Instance Segmentation (UOAIS), which predicts depth images automatically by sharing features of other prediction branches. The depth images can provide valuable information for amodal mask generation. To achieve good performance, the model estimates the depth images in two steps by adding two extra depth branches to the original UOAIS model. A new feature fusion scheme incorporating these two branches is also designed to facilitate information sharing between tasks. The new model is evaluated on OCID and UOAIS-Sim evaluation datasets. Compared with the performance of the original model with pure RGB input, the AP50 of the new model increases by 0.685% on the UOAIS-Sim evaluation dataset, and the overlap and boundary F measures increase by 2.3% and 1.6% respectively on the OCID dataset. Consequently, the model achieves similar performance to those which need RGB-D images as input. It has the potential to serve as a compromise plan for UOAIS if the collection of real scene-depth images is difficult or unavailable.
APA, Harvard, Vancouver, ISO, and other styles
46

Han, Daechan, and Yukyung Choi. "Monocular Depth Estimation from a Single Infrared Image." Electronics 11, no. 11 (May 30, 2022): 1729. http://dx.doi.org/10.3390/electronics11111729.

Full text
Abstract:
Thermal infrared imaging is attracting much attention due to its strength against illuminance variation. However, because of the spectral difference between thermal infrared images and RGB images, the existing research on self-supervised monocular depth estimation has performance limitations. Therefore, in this study, we propose a novel Self-Guided Framework using a Pseudolabel predicted from RGB images. Our proposed framework, which solves the problem of appearance matching loss in the existing framework, transfers the high accuracy of Pseudolabel to the thermal depth estimation network by comparing low- and high-level pixels. Furthermore, we propose Patch-NetVLAD Loss, which strengthens local detail and global context information in the depth map from thermal infrared imaging by comparing locally global patch-level descriptors. Finally, we introduce an Image Matching Loss to estimate a more accurate depth map in a thermal depth network by enhancing the performance of the Pseudolabel. We demonstrate that the proposed framework shows significant performance improvement even when applied to various depth networks in the KAIST Multispectral Dataset.
APA, Harvard, Vancouver, ISO, and other styles
47

Heravi, Hamed, Roghaieh Aghaeifard, Ali Rahimpour Jounghani, Afshin Ebrahimi, and Masumeh Delgarmi. "EXTRACTING FEATURES OF THE HUMAN FACE FROM RGB-D IMAGES TO PLAN FACIAL SURGERIES." Biomedical Engineering: Applications, Basis and Communications 32, no. 06 (December 2020): 2050042. http://dx.doi.org/10.4015/s1016237220500428.

Full text
Abstract:
Biometric identification of the human face is a pervasive subject which deals with a wide range of disciplines such as image processing, computer vision, pattern recognition, artificial intelligence, and cognitive psychology. Extracting key face points for developing software and commercial devices of face surgery analysis is one of the most challenging fields in computer image and vision processing. Many studies have developed a variety of techniques to extract facial features from color and gray images. In recent years, using depth information has opened up new approaches to researchers in the field of image processing. Hence, in this study, a statistical method is proposed to extract key nose points from color-depth images (RGB-D) of the face front view. In this study, the Microsoft Kinect sensor is used to produce the face RGB-D images. To assess the capability of the proposed method, this algorithm is applied to 20 RGB-D face images from the database collected in the ICT lab of Sahand University of Technology and promising results are achieved for extracting key points of the face. The results of this study indicated that using the available information in two different color-depth bands could make key points of the face more easily accessible and bring better results and we can conclude that the proposed algorithm provided a promising outcome for extracting the positions of key points.
APA, Harvard, Vancouver, ISO, and other styles
48

Wang, Xianghan, Jie Jiang, Yanming Guo, Lai Kang, Yingmei Wei, and Dan Li. "CFAM: Estimating 3D Hand Poses from a Single RGB Image with Attention." Applied Sciences 10, no. 2 (January 15, 2020): 618. http://dx.doi.org/10.3390/app10020618.

Full text
Abstract:
Precise 3D hand pose estimation can be used to improve the performance of human–computer interaction (HCI). Specifically, computer-vision-based hand pose estimation can make this process more natural. Most traditional computer-vision-based hand pose estimation methods use depth images as the input, which requires complicated and expensive acquisition equipment. Estimation through a single RGB image is more convenient and less expensive. Previous methods based on RGB images utilize only 2D keypoint score maps to recover 3D hand poses but ignore the hand texture features and the underlying spatial information in the RGB image, which leads to a relatively low accuracy. To address this issue, we propose a channel fusion attention mechanism that combines 2D keypoint features and RGB image features at the channel level. In particular, the proposed method replans weights by using cascading RGB images and 2D keypoint features, which enables rational planning and the utilization of various features. Moreover, our method improves the fusion performance of different types of feature maps. Multiple contrast experiments on public datasets demonstrate that the accuracy of our proposed method is comparable to the state-of-the-art accuracy.
APA, Harvard, Vancouver, ISO, and other styles
49

Li, Zun, and Jin Wu. "Learning Deep CNN Denoiser Priors for Depth Image Inpainting." Applied Sciences 9, no. 6 (March 15, 2019): 1103. http://dx.doi.org/10.3390/app9061103.

Full text
Abstract:
Due to the rapid development of RGB-D sensors, increasing attention is being paid to depth image applications. Depth images play an important role in computer vision research. In this paper, we address the problem of inpainting for single depth images without corresponding color images as a guide. Within the framework of model-based optimization methods for depth image inpainting, the split Bregman iteration algorithm was used to transform depth image inpainting into the corresponding denoising subproblem. Then, we trained a set of efficient convolutional neural network (CNN) denoisers to solve this subproblem. Experimental results demonstrate the effectiveness of the proposed algorithm in comparison with three traditional methods in terms of visual quality and objective metrics.
APA, Harvard, Vancouver, ISO, and other styles
50

Wang, Liang, and Zhiqiu Wu. "RGB-D SLAM with Manhattan Frame Estimation Using Orientation Relevance." Sensors 19, no. 5 (March 1, 2019): 1050. http://dx.doi.org/10.3390/s19051050.

Full text
Abstract:
Due to image noise, image blur, and inconsistency between depth data and color image, the accuracy and robustness of the pairwise spatial transformation computed by matching extracted features of detected key points in existing sparse Red Green Blue-Depth (RGB-D) Simultaneously Localization And Mapping (SLAM) algorithms are poor. Considering that most indoor environments follow the Manhattan World assumption and the Manhattan Frame can be used as a reference to compute the pairwise spatial transformation, a new RGB-D SLAM algorithm is proposed. It first performs the Manhattan Frame Estimation using the introduced concept of orientation relevance. Then the pairwise spatial transformation between two RGB-D frames is computed with the Manhattan Frame Estimation. Finally, the Manhattan Frame Estimation using orientation relevance is incorporated into the RGB-D SLAM to improve its performance. Experimental results show that the proposed RGB-D SLAM algorithm has definite improvements in accuracy, robustness, and runtime.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography