To see the other types of publications on this topic, follow the link: RGB-D Image.

Journal articles on the topic 'RGB-D Image'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'RGB-D Image.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Uddin, Md Kamal, Amran Bhuiyan, and Mahmudul Hasan. "Fusion in Dissimilarity Space Between RGB D and Skeleton for Person Re Identification." International Journal of Innovative Technology and Exploring Engineering 10, no. 12 (October 30, 2021): 69–75. http://dx.doi.org/10.35940/ijitee.l9566.10101221.

Full text
Abstract:
Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to tackle these constraints. This work takes depth images and skeleton joint points as additional information along with RGB appearance cues and proposes a person re-identification method. We combine 4-channel RGB-D image features with skeleton information using score-level fusion strategy in dissimilarity space to increase re-identification accuracy. Moreover, our propose method overcomes the illumination problem because we use illumination invariant depth image and skeleton information. We carried out rigorous experiments on two publicly available RGBD-ID re-identification datasets and proved the use of combined features of 4-channel RGB-D images and skeleton information boost up the rank 1 recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Hengyu, Hang Liu, Ning Cao, Yan Peng, Shaorong Xie, Jun Luo, and Yu Sun. "Real-time RGB-D image stitching using multiple Kinects for improved field of view." International Journal of Advanced Robotic Systems 14, no. 2 (March 1, 2017): 172988141769556. http://dx.doi.org/10.1177/1729881417695560.

Full text
Abstract:
This article concerns the problems of a defective depth map and limited field of view of Kinect-style RGB-D sensors. An anisotropic diffusion based hole-filling method is proposed to recover invalid depth data in the depth map. The field of view of the Kinect-style RGB-D sensor is extended by stitching depth and color images from several RGB-D sensors. By aligning the depth map with the color image, the registration data calculated by registering color images can be used to stitch depth and color images into a depth and color panoramic image concurrently in real time. Experiments show that the proposed stitching method can generate a RGB-D panorama with no invalid depth data and little distortion in real time and can be extended to incorporate more RGB-D sensors to construct even a 360° field of view panoramic RGB-D image.
APA, Harvard, Vancouver, ISO, and other styles
3

Wu, Yan, Jiqian Li, and Jing Bai. "Multiple Classifiers-Based Feature Fusion for RGB-D Object Recognition." International Journal of Pattern Recognition and Artificial Intelligence 31, no. 05 (February 27, 2017): 1750014. http://dx.doi.org/10.1142/s0218001417500148.

Full text
Abstract:
RGB-D-based object recognition has been enthusiastically investigated in the past few years. RGB and depth images provide useful and complementary information. Fusing RGB and depth features can significantly increase the accuracy of object recognition. However, previous works just simply take the depth image as the fourth channel of the RGB image and concatenate the RGB and depth features, ignoring the different power of RGB and depth information for different objects. In this paper, a new method which contains three different classifiers is proposed to fuse features extracted from RGB image and depth image for RGB-D-based object recognition. Firstly, a RGB classifier and a depth classifier are trained by cross-validation to get the accuracy difference between RGB and depth features for each object. Then a variant RGB-D classifier is trained with different initialization parameters for each class according to the accuracy difference. The variant RGB-D-classifier can result in a more robust classification performance. The proposed method is evaluated on two benchmark RGB-D datasets. Compared with previous methods, ours achieves comparable performance with the state-of-the-art method.
APA, Harvard, Vancouver, ISO, and other styles
4

Kitzler, Florian, Norbert Barta, Reinhard W. Neugschwandtner, Andreas Gronauer, and Viktoria Motsch. "WE3DS: An RGB-D Image Dataset for Semantic Segmentation in Agriculture." Sensors 23, no. 5 (March 1, 2023): 2713. http://dx.doi.org/10.3390/s23052713.

Full text
Abstract:
Smart farming (SF) applications rely on robust and accurate computer vision systems. An important computer vision task in agriculture is semantic segmentation, which aims to classify each pixel of an image and can be used for selective weed removal. State-of-the-art implementations use convolutional neural networks (CNN) that are trained on large image datasets. In agriculture, publicly available RGB image datasets are scarce and often lack detailed ground-truth information. In contrast to agriculture, other research areas feature RGB-D datasets that combine color (RGB) with additional distance (D) information. Such results show that including distance as an additional modality can improve model performance further. Therefore, we introduce WE3DS as the first RGB-D image dataset for multi-class plant species semantic segmentation in crop farming. It contains 2568 RGB-D images (color image and distance map) and corresponding hand-annotated ground-truth masks. Images were taken under natural light conditions using an RGB-D sensor consisting of two RGB cameras in a stereo setup. Further, we provide a benchmark for RGB-D semantic segmentation on the WE3DS dataset and compare it with a solely RGB-based model. Our trained models achieve up to 70.7% mean Intersection over Union (mIoU) for discriminating between soil, seven crop species, and ten weed species. Finally, our work confirms the finding that additional distance information improves segmentation quality.
APA, Harvard, Vancouver, ISO, and other styles
5

Zheng, Huiming, and Wei Gao. "End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 7 (March 24, 2024): 7562–70. http://dx.doi.org/10.1609/aaai.v38i7.28588.

Full text
Abstract:
As a kind of 3D data, RGB-D images have been extensively used in object tracking, 3D reconstruction, remote sensing mapping, and other tasks. In the realm of computer vision, the significance of RGB-D images is progressively growing. However, the existing learning-based image compression methods usually process RGB images and depth images separately, which cannot entirely exploit the redundant information between the modalities, limiting the further improvement of the Rate-Distortion performance. With the goal of overcoming the defect, in this paper, we propose a learning-based dual-branch RGB-D image compression framework. Compared with traditional RGB domain compression scheme, a YUV domain compression scheme is presented for spatial redundancy removal. In addition, Intra-Modality Attention (IMA) and Cross-Modality Attention (CMA) are introduced for modal redundancy removal. For the sake of benefiting from cross-modal prior information, Context Prediction Module (CPM) and Context Fusion Module (CFM) are raised in the conditional entropy model which makes the context probability prediction more accurate. The experimental results demonstrate our method outperforms existing image compression methods in two RGB-D image datasets. Compared with BPG, our proposed framework can achieve up to 15% bit rate saving for RGB images.
APA, Harvard, Vancouver, ISO, and other styles
6

Peroš, Josip, Rinaldo Paar, Vladimir Divić, and Boštjan Kovačić. "Fusion of Laser Scans and Image Data—RGB+D for Structural Health Monitoring of Engineering Structures." Applied Sciences 12, no. 22 (November 19, 2022): 11763. http://dx.doi.org/10.3390/app122211763.

Full text
Abstract:
A novel method for structural health monitoring (SHM) by using RGB+D data has been recently proposed. RGB+D data are created by fusing image and laser scan data, where the D channel represents the distance, interpolated from laser scanner data. RGB channel represents image data obtained by an image sensor integrated in robotic total station (RTS) telescope, or on top of the telescope i.e., image assisted total station (IATS). Images can also be obtained by conventional cameras, or cameras integrated with RTS (different kind of prototypes). RGB+D image combines the advantages of the two measuring methods. Laser scans are used for distance changes in the line of sight and image data are used for displacements determination in two axes perpendicular to the viewing direction of the camera. Image feature detection and matching algorithms detect and match discrete points within RGB+D images obtained from different epochs. These way 3D coordinates of the points can be easily calculated from RGB+D images. In this study, the implementation of this method was proposed for measuring displacements and monitoring the behavior of structural elements under constant load in field conditions. For the precision analysis of the proposed method, displacements obtained from a numerical model in combination with measurements from a high precision linear variable differential transformer (LVDT) sensor was used as a reference for the analysis of determined displacements from RGB+D images. Based on the achieved results, we calculated that in this study, the precision of the image matching and fusion part of the RGB+D is ±1 mm while using the ORB algorithm. The ORB algorithm was determined as the optimal algorithm for this study, with good computing performance, lowest processing times and the highest number of usable features detected. The calculated achievable precision for determining height displacement while monitoring the behavior of structural element wooden beam under different loads is ±2.7 mm.
APA, Harvard, Vancouver, ISO, and other styles
7

Yan, Zhiqiang, Hongyuan Wang, Qianhao Ning, and Yinxi Lu. "Robust Image Matching Based on Image Feature and Depth Information Fusion." Machines 10, no. 6 (June 8, 2022): 456. http://dx.doi.org/10.3390/machines10060456.

Full text
Abstract:
In this paper, we propose a robust image feature extraction and fusion method to effectively fuse image feature and depth information and improve the registration accuracy of RGB-D images. The proposed method directly splices the image feature point descriptors with the corresponding point cloud feature descriptors to obtain the fusion descriptor of the feature points. The fusion feature descriptor is constructed based on the SIFT, SURF, and ORB feature descriptors and the PFH and FPFH point cloud feature descriptors. Furthermore, the registration performance based on fusion features is tested through the RGB-D datasets of YCB and KITTI. ORBPFH reduces the false-matching rate by 4.66~16.66%, and ORBFPFH reduces the false-matching rate by 9~20%. The experimental results show that the RGB-D robust feature extraction and fusion method proposed in this paper is suitable for the fusion of ORB with PFH and FPFH, which can improve feature representation and registration, representing a novel approach for RGB-D image matching.
APA, Harvard, Vancouver, ISO, and other styles
8

Yuan, Yuan, Zhitong Xiong, and Qi Wang. "ACM: Adaptive Cross-Modal Graph Convolutional Neural Networks for RGB-D Scene Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9176–84. http://dx.doi.org/10.1609/aaai.v33i01.33019176.

Full text
Abstract:
RGB image classification has achieved significant performance improvement with the resurge of deep convolutional neural networks. However, mono-modal deep models for RGB image still have several limitations when applied to RGB-D scene recognition. 1) Images for scene classification usually contain more than one typical object with flexible spatial distribution, so the object-level local features should also be considered in addition to global scene representation. 2) Multi-modal features in RGB-D scene classification are still under-utilized. Simply combining these modal-specific features suffers from the semantic gaps between different modalities. 3) Most existing methods neglect the complex relationships among multiple modality features. Considering these limitations, this paper proposes an adaptive crossmodal (ACM) feature learning framework based on graph convolutional neural networks for RGB-D scene recognition. In order to make better use of the modal-specific cues, this approach mines the intra-modality relationships among the selected local features from one modality. To leverage the multi-modal knowledge more effectively, the proposed approach models the inter-modality relationships between two modalities through the cross-modal graph (CMG). We evaluate the proposed method on two public RGB-D scene classification datasets: SUN-RGBD and NYUD V2, and the proposed method achieves state-of-the-art performance.
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Z., T. Li, L. Pan, and Z. Kang. "SCENE SEMANTIC SEGMENTATION FROM INDOOR RGB-D IMAGES USING ENCODE-DECODER FULLY CONVOLUTIONAL NETWORKS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W7 (September 12, 2017): 397–404. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w7-397-2017.

Full text
Abstract:
With increasing attention for the indoor environment and the development of low-cost RGB-D sensors, indoor RGB-D images are easily acquired. However, scene semantic segmentation is still an open area, which restricts indoor applications. The depth information can help to distinguish the regions which are difficult to be segmented out from the RGB images with similar color or texture in the indoor scenes. How to utilize the depth information is the key problem of semantic segmentation for RGB-D images. In this paper, we propose an Encode-Decoder Fully Convolutional Networks for RGB-D image classification. We use Multiple Kernel Maximum Mean Discrepancy (MK-MMD) as a distance measure to find common and special features of RGB and D images in the network to enhance performance of classification automatically. To explore better methods of applying MMD, we designed two strategies; the first calculates MMD for each feature map, and the other calculates MMD for whole batch features. Based on the result of classification, we use the full connect CRFs for the semantic segmentation. The experimental results show that our method can achieve a good performance on indoor RGB-D image semantic segmentation.
APA, Harvard, Vancouver, ISO, and other styles
10

Kanda, Takuya, Kazuya Miyakawa, Jeonghwang Hayashi, Jun Ohya, Hiroyuki Ogata, Kenji Hashimoto, Xiao Sun, Takashi Matsuzawa, Hiroshi Naito, and Atsuo Takanishi. "Locating Mechanical Switches Using RGB-D Sensor Mounted on a Disaster Response Robot." Electronic Imaging 2020, no. 6 (January 26, 2020): 16–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.6.iriacv-016.

Full text
Abstract:
To achieve one of the tasks required for disaster response robots, this paper proposes a method for locating 3D structured switches’ points to be pressed by the robot in disaster sites using RGBD images acquired by Kinect sensor attached to our disaster response robot. Our method consists of the following five steps: 1)Obtain RGB and depth images using an RGB-D sensor. 2) Detect the bounding box of switch area from the RGB image using YOLOv3. 3)Generate 3D point cloud data of the target switch by combining the bounding box and the depth image.4)Detect the center position of the switch button from the RGB image in the bounding box using Convolutional Neural Network (CNN). 5)Estimate the center of the button’s face in real space from the detection result in step 4) and the 3D point cloud data generated in step3) In the experiment, the proposed method is applied to two types of 3D structured switch boxes to evaluate the effectiveness. The results show that our proposed method can locate the switch button accurately enough for the robot operation.
APA, Harvard, Vancouver, ISO, and other styles
11

Dai, Xinxin, Ran Zhao, Pengpeng Hu, and Adrian Munteanu. "KD-Net: Continuous-Keystroke-Dynamics-Based Human Identification from RGB-D Image Sequences." Sensors 23, no. 20 (October 10, 2023): 8370. http://dx.doi.org/10.3390/s23208370.

Full text
Abstract:
Keystroke dynamics is a soft biometric based on the assumption that humans always type in uniquely characteristic manners. Previous works mainly focused on analyzing the key press or release events. Unlike these methods, we explored a novel visual modality of keystroke dynamics for human identification using a single RGB-D sensor. In order to verify this idea, we created a dataset dubbed KD-MultiModal, which contains 243.2 K frames of RGB images and depth images, obtained by recording a video of hand typing with a single RGB-D sensor. The dataset comprises RGB-D image sequences of 20 subjects (10 males and 10 females) typing sentences, and each subject typed around 20 sentences. In the task, only the hand and keyboard region contributed to the person identification, so we also propose methods of extracting Regions of Interest (RoIs) for each type of data. Unlike the data of the key press or release, our dataset not only captures the velocity of pressing and releasing different keys and the typing style of specific keys or combinations of keys, but also contains rich information on the hand shape and posture. To verify the validity of our proposed data, we adopted deep neural networks to learn distinguishing features from different data representations, including RGB-KD-Net, D-KD-Net, and RGBD-KD-Net. Simultaneously, the sequence of point clouds also can be obtained from depth images given the intrinsic parameters of the RGB-D sensor, so we also studied the performance of human identification based on the point clouds. Extensive experimental results showed that our idea works and the performance of the proposed method based on RGB-D images is the best, which achieved 99.44% accuracy based on the unseen real-world data. To inspire more researchers and facilitate relevant studies, the proposed dataset will be publicly accessible together with the publication of this paper.
APA, Harvard, Vancouver, ISO, and other styles
12

Lv, Ying, and Wujie Zhou. "Hierarchical Multimodal Adaptive Fusion (HMAF) Network for Prediction of RGB-D Saliency." Computational Intelligence and Neuroscience 2020 (November 20, 2020): 1–9. http://dx.doi.org/10.1155/2020/8841681.

Full text
Abstract:
Visual saliency prediction for RGB-D images is more challenging than that for their RGB counterparts. Additionally, very few investigations have been undertaken concerning RGB-D-saliency prediction. The proposed study presents a method based on a hierarchical multimodal adaptive fusion (HMAF) network to facilitate end-to-end prediction of RGB-D saliency. In the proposed method, hierarchical (multilevel) multimodal features are first extracted from an RGB image and depth map using a VGG-16-based two-stream network. Subsequently, the most significant hierarchical features of the said RGB image and depth map are predicted using three two-input attention modules. Furthermore, adaptive fusion of saliencies concerning the above-mentioned fused saliency features of different levels (hierarchical fusion saliency features) can be accomplished using a three-input attention module to facilitate high-accuracy RGB-D visual saliency prediction. Comparisons based on the application of the proposed HMAF-based approach against those of other state-of-the-art techniques on two challenging RGB-D datasets demonstrate that the proposed method outperforms other competing approaches consistently by a considerable margin.
APA, Harvard, Vancouver, ISO, and other styles
13

Tang, Shengjun, Qing Zhu, Wu Chen, Walid Darwish, Bo Wu, Han Hu, and Min Chen. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprsannals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combining RGB image-based modeling and depth-based modeling. The scale ambiguity problem during the pose estimation with RGB image sequences can be resolved by integrating the information from the depth and visual information provided by the proposed system. A robust rigid-transformation recovery method is developed to register the RGB image-based and depth-based 3D models together. The proposed method is examined with two datasets collected in indoor environments for which the experimental results demonstrate the feasibility and robustness of the proposed method
APA, Harvard, Vancouver, ISO, and other styles
14

Tang, Shengjun, Qing Zhu, Wu Chen, Walid Darwish, Bo Wu, Han Hu, and Min Chen. "ENHANCED RGB-D MAPPING METHOD FOR DETAILED 3D MODELING OF LARGE INDOOR ENVIRONMENTS." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences III-1 (June 2, 2016): 151–58. http://dx.doi.org/10.5194/isprs-annals-iii-1-151-2016.

Full text
Abstract:
RGB-D sensors are novel sensing systems that capture RGB images along with pixel-wise depth information. Although they are widely used in various applications, RGB-D sensors have significant drawbacks with respect to 3D dense mapping of indoor environments. First, they only allow a measurement range with a limited distance (e.g., within 3 m) and a limited field of view. Second, the error of the depth measurement increases with increasing distance to the sensor. In this paper, we propose an enhanced RGB-D mapping method for detailed 3D modeling of large indoor environments by combining RGB image-based modeling and depth-based modeling. The scale ambiguity problem during the pose estimation with RGB image sequences can be resolved by integrating the information from the depth and visual information provided by the proposed system. A robust rigid-transformation recovery method is developed to register the RGB image-based and depth-based 3D models together. The proposed method is examined with two datasets collected in indoor environments for which the experimental results demonstrate the feasibility and robustness of the proposed method
APA, Harvard, Vancouver, ISO, and other styles
15

Tu Shuqin, 涂淑琴, 薛月菊 Xue Yueju, 梁云 Liang Yun, 黄宁 Huang Ning, and 张晓 Zhang Xiao. "Review on RGB-D Image Classification." Laser & Optoelectronics Progress 53, no. 6 (2016): 060003. http://dx.doi.org/10.3788/lop53.060003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Lin, Wei-Yang, Chih-Fong Tsai, Pei-Chen Wu, and Bo-Rong Chen. "Image retargeting using RGB-D camera." Multimedia Tools and Applications 74, no. 9 (January 23, 2014): 3155–70. http://dx.doi.org/10.1007/s11042-013-1776-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Li, Shipeng, Di Li, Chunhua Zhang, Jiafu Wan, and Mingyou Xie. "RGB-D Image Processing Algorithm for Target Recognition and Pose Estimation of Visual Servo System." Sensors 20, no. 2 (January 12, 2020): 430. http://dx.doi.org/10.3390/s20020430.

Full text
Abstract:
This paper studies the control performance of visual servoing system under the planar camera and RGB-D cameras, the contribution of this paper is through rapid identification of target RGB-D images and precise measurement of depth direction to strengthen the performance indicators of visual servoing system such as real time and accuracy, etc. Firstly, color images acquired by the RGB-D camera are segmented based on optimized normalized cuts. Next, the gray scale is restored according to the histogram feature of the target image. Then, the obtained 2D graphics depth information and the enhanced gray image information are distort merged to complete the target pose estimation based on the Hausdorff distance, and the current image pose is matched with the target image pose. The end angle and the speed of the robot are calculated to complete a control cycle and the process is iterated until the servo task is completed. Finally, the performance index of this control system based on proposed algorithm is tested about accuracy, real-time under position-based visual servoing system. The results demonstrate and validate that the RGB-D image processing algorithm proposed in this paper has the performance in the above aspects of the visual servoing system.
APA, Harvard, Vancouver, ISO, and other styles
18

B L, Sunil Kumar, and Sharmila Kumari M. "RGB-D FACE RECOGNITION USING LBP-DCT ALGORITHM." Applied Computer Science 17, no. 3 (September 30, 2021): 73–81. http://dx.doi.org/10.35784/acs-2021-22.

Full text
Abstract:
Face recognition is one of the applications in image processing that recognizes or checks an individual's identity. 2D images are used to identify the face, but the problem is that this kind of image is very sensitive to changes in lighting and various angles of view. The images captured by 3D camera and stereo camera can also be used for recognition, but fairly long processing times is needed. RGB-D images that Kinect produces are used as a new alternative approach to 3D images. Such cameras cost less and can be used in any situation and any environment. This paper shows the face recognition algorithms’ performance using RGB-D images. These algorithms calculate the descriptor which uses RGB and Depth map faces based on local binary pattern. Those images are also tested for the fusion of LBP and DCT methods. The fusion of LBP and DCT approach produces a recognition rate of 97.5% during the experiment.
APA, Harvard, Vancouver, ISO, and other styles
19

Du, Qinsheng, Yingxu Bian, Jianyu Wu, Shiyan Zhang, and Jian Zhao. "Cross-Modal Adaptive Interaction Network for RGB-D Saliency Detection." Applied Sciences 14, no. 17 (August 23, 2024): 7440. http://dx.doi.org/10.3390/app14177440.

Full text
Abstract:
The salient object detection (SOD) task aims to automatically detect the most prominent areas observed by the human eye in an image. Since RGB images and depth images contain different information, how to effectively integrate cross-modal features in the RGB-D SOD task remains a major challenge. Therefore, this paper proposes a cross-modal adaptive interaction network (CMANet) for the RGB-D salient object detection task, which consists of a cross-modal feature integration module (CMF) and an adaptive feature fusion module (AFFM). These modules are designed to integrate and enhance multi-scale features from both modalities, improve the effect of integrating cross-modal complementary information of RGB and depth images, enhance feature information, and generate richer and more representative feature maps. Extensive experiments were conducted on four RGB-D datasets to verify the effectiveness of CMANet. Compared with 17 RGB-D SOD methods, our model accurately detects salient regions in images and achieves state-of-the-art performance across four evaluation metrics.
APA, Harvard, Vancouver, ISO, and other styles
20

Xu, Chi, Jun Zhou, Wendi Cai, Yunkai Jiang, Yongbo Li, and Yi Liu. "Robust 3D Hand Detection from a Single RGB-D Image in Unconstrained Environments." Sensors 20, no. 21 (November 7, 2020): 6360. http://dx.doi.org/10.3390/s20216360.

Full text
Abstract:
Three-dimensional hand detection from a single RGB-D image is an important technology which supports many useful applications. Practically, it is challenging to robustly detect human hands in unconstrained environments because the RGB-D channels can be affected by many uncontrollable factors, such as light changes. To tackle this problem, we propose a 3D hand detection approach which improves the robustness and accuracy by adaptively fusing the complementary features extracted from the RGB-D channels. Using the fused RGB-D feature, the 2D bounding boxes of hands are detected first, and then the 3D locations along the z-axis are estimated through a cascaded network. Furthermore, we represent a challenging RGB-D hand detection dataset collected in unconstrained environments. Different from previous works which primarily rely on either the RGB or D channel, we adaptively fuse the RGB-D channels for hand detection. Specifically, evaluation results show that the D-channel is crucial for hand detection in unconstrained environments. Our RGB-D fusion-based approach significantly improves the hand detection accuracy from 69.1 to 74.1 comparing to one of the most state-of-the-art RGB-based hand detectors. The existing RGB- or D-based methods are unstable in unseen lighting conditions: in dark conditions, the accuracy of the RGB-based method significantly drops to 48.9, and in back-light conditions, the accuracy of the D-based method dramatically drops to 28.3. Compared with these methods, our RGB-D fusion based approach is much more robust without accuracy degrading, and our detection results are 62.5 and 65.9, respectively, in these two extreme lighting conditions for accuracy.
APA, Harvard, Vancouver, ISO, and other styles
21

Aubry, Sophie, Sohaib Laraba, Joëlle Tilmanne, and Thierry Dutoit. "Action recognition based on 2D skeletons extracted from RGB videos." MATEC Web of Conferences 277 (2019): 02034. http://dx.doi.org/10.1051/matecconf/201927702034.

Full text
Abstract:
In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.
APA, Harvard, Vancouver, ISO, and other styles
22

Kostusiak, Aleksander, and Piotr Skrzypczyński. "Enhancing Visual Odometry with Estimated Scene Depth: Leveraging RGB-D Data with Deep Learning." Electronics 13, no. 14 (July 13, 2024): 2755. http://dx.doi.org/10.3390/electronics13142755.

Full text
Abstract:
Advances in visual odometry (VO) systems have benefited from the widespread use of affordable RGB-D cameras, improving indoor localization and mapping accuracy. However, older sensors like the Kinect v1 face challenges due to depth inaccuracies and incomplete data. This study compares indoor VO systems that use RGB-D images, exploring methods to enhance depth information. We examine conventional image inpainting techniques and a deep learning approach, utilizing newer depth data from devices like the Kinect v2. Our research highlights the importance of refining data from lower-quality sensors, which is crucial for cost-effective VO applications. By integrating deep learning models with richer context from RGB images and more comprehensive depth references, we demonstrate improved trajectory estimation compared to standard methods. This work advances budget-friendly RGB-D VO systems for indoor mobile robots, emphasizing deep learning’s role in leveraging connections between image appearance and depth data.
APA, Harvard, Vancouver, ISO, and other styles
23

Sun, Qingbo. "Research on RGB-D image recognition technology based on feature fusion and machine learning." Journal of Physics: Conference Series 2031, no. 1 (September 1, 2021): 012022. http://dx.doi.org/10.1088/1742-6596/2031/1/012022.

Full text
Abstract:
Abstract The three-dimensional RGB-D image contains not only the color and texture information of the two-dimensional image, but also contains the surface geometry information of the target. This article analyzes the RGB-D image recognition methods, including stereo vision technology, structured light technology, etc. By studying the application points of RGB-D image recognition technology under the background of feature fusion and machine learning, the purpose is to improve the richness of image recognition content and provide reference for the smooth development of the follow-up work.
APA, Harvard, Vancouver, ISO, and other styles
24

Chen, Liang Chia, and Nguyen Van Thai. "Real-Time 3-D Mapping for Indoor Environments Using RGB-D Cameras." Advanced Materials Research 579 (October 2012): 435–44. http://dx.doi.org/10.4028/www.scientific.net/amr.579.435.

Full text
Abstract:
For three-dimensional (3-D) mapping, so far, 3-D laser scanners and stereo camera systems are used widely due to their high measurement range and accuracy. For stereo camera systems, establishing corresponding point pairs between two images is one crucial step for reconstructing depth information. However, mapping approaches using laser scanners are still restricted by a serious constraint by accurate image registration and mapping. In recent years, time-of-flight (ToF) cameras have been used for mapping tasks in providing high frame rates while preserving a compact size, but lack in measurement precision and robustness. To address the current technological bottleneck, this article presents a 3-D mapping method which employs an RGB-D camera for 3-D data acquisition and then applies the RGB-D features alignment (RGBD-FA) for data registration. Experimental results show the feasibility and robustness of applying the proposed approach for real-time 3-D mapping for large-scale indoor environments.
APA, Harvard, Vancouver, ISO, and other styles
25

Peng, M., W. Wan, Y. Xing, Y. Wang, Z. Liu, K. Di, Q. Zhao, B. Teng, and X. Mao. "INTEGRATING DEPTH AND IMAGE SEQUENCES FOR PLANETARY ROVER MAPPING USING RGB-D SENSOR." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-3 (April 30, 2018): 1369–74. http://dx.doi.org/10.5194/isprs-archives-xlii-3-1369-2018.

Full text
Abstract:
RGB-D camera allows the capture of depth and color information at high data rates, and this makes it possible and beneficial integrate depth and image sequences for planetary rover mapping. The proposed mapping method consists of three steps. First, the strict projection relationship among 3D space, depth data and visual texture data is established based on the imaging principle of RGB-D camera, then, an extended bundle adjustment (BA) based SLAM method with integrated 2D and 3D measurements is applied to the image network for high-precision pose estimation. Next, as the interior and exterior elements of RGB images sequence are available, dense matching is completed with the CMPMVS tool. Finally, according to the registration parameters after ICP, the 3D scene from RGB images can be registered to the 3D scene from depth images well, and the fused point cloud can be obtained. Experiment was performed in an outdoor field to simulate the lunar surface. The experimental results demonstrated the feasibility of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
26

Zhao, Bohu, Lebao Li, and Haipeng Pan. "Non-Local Means Hole Repair Algorithm Based on Adaptive Block." Applied Sciences 14, no. 1 (December 24, 2023): 159. http://dx.doi.org/10.3390/app14010159.

Full text
Abstract:
RGB-D cameras provide depth and color information and are widely used in 3D reconstruction and computer vision. In the majority of existing RGB-D cameras, a considerable portion of depth values is often lost due to severe occlusion or limited camera coverage, thereby adversely impacting the precise localization and three-dimensional reconstruction of objects. In this paper, to address the issue of poor-quality in-depth images captured by RGB-D cameras, a depth image hole repair algorithm based on non-local means is proposed first, leveraging the structural similarities between grayscale and depth images. Second, while considering the cumbersome parameter tuning associated with the non-local means hole repair method for determining the size of structural blocks for depth image hole repair, an intelligent block factor is introduced, which automatically determines the optimal search and repair block sizes for various hole sizes, resulting in the development of an adaptive block-based non-local means algorithm for repairing depth image holes. Furthermore, the proposed algorithm’s performance are evaluated using both the Middlebury stereo matching dataset and a self-constructed RGB-D dataset, with performance assessment being carried out by comparing the algorithm against other methods using five metrics: RMSE, SSIM, PSNR, DE, and ALME. Finally, experimental results unequivocally demonstrate the innovative resolution of the parameter tuning complexity inherent in-depth image hole repair, effectively filling the holes, suppressing noise within depth images, enhancing image quality, and achieving elevated precision and accuracy, as affirmed by the attained results.
APA, Harvard, Vancouver, ISO, and other styles
27

Wang, Liang, and Zhiqiu Wu. "RGB-D SLAM with Manhattan Frame Estimation Using Orientation Relevance." Sensors 19, no. 5 (March 1, 2019): 1050. http://dx.doi.org/10.3390/s19051050.

Full text
Abstract:
Due to image noise, image blur, and inconsistency between depth data and color image, the accuracy and robustness of the pairwise spatial transformation computed by matching extracted features of detected key points in existing sparse Red Green Blue-Depth (RGB-D) Simultaneously Localization And Mapping (SLAM) algorithms are poor. Considering that most indoor environments follow the Manhattan World assumption and the Manhattan Frame can be used as a reference to compute the pairwise spatial transformation, a new RGB-D SLAM algorithm is proposed. It first performs the Manhattan Frame Estimation using the introduced concept of orientation relevance. Then the pairwise spatial transformation between two RGB-D frames is computed with the Manhattan Frame Estimation. Finally, the Manhattan Frame Estimation using orientation relevance is incorporated into the RGB-D SLAM to improve its performance. Experimental results show that the proposed RGB-D SLAM algorithm has definite improvements in accuracy, robustness, and runtime.
APA, Harvard, Vancouver, ISO, and other styles
28

Chi, Chen Tung, Shih Chien Yang, and Yin Tien Wang. "Calibration of RGB-D Sensors for Robot SLAM." Applied Mechanics and Materials 479-480 (December 2013): 677–81. http://dx.doi.org/10.4028/www.scientific.net/amm.479-480.677.

Full text
Abstract:
This paper presents a calibration procedure for a Kinect RGB-D sensor and its application to robot simultaneous localization and mapping(SLAM). The calibration procedure consists of two stages: in the first stage, the RGB image is aligned with the depth image by using the bilinear interpolation. The distorted RGB image is further corrected in the second stage. The calibrated RGB-D sensor is used as the sensing device for robot navigation in unknown environment. In SLAM tasks, the speeded-up robust features (SURF) are detected from the RGB image and used as landmarks in the environment map. The depth image could provide the stereo information of each landmark. Meanwhile, the robot estimates its own state and landmark locations by mean of the Extended Kalman filter (EKF). The EKF SLAM has been carried out in the paper and the experimental results showed that the Kinect sensors could provide the mobile robot reliable measurement information when navigating in unknown environment.
APA, Harvard, Vancouver, ISO, and other styles
29

Jiang, Ming-xin, Chao Deng, Ming-min Zhang, Jing-song Shan, and Haiyan Zhang. "Multimodal Deep Feature Fusion (MMDFF) for RGB-D Tracking." Complexity 2018 (November 28, 2018): 1–8. http://dx.doi.org/10.1155/2018/5676095.

Full text
Abstract:
Visual tracking is still a challenging task due to occlusion, appearance changes, complex motion, etc. We propose a novel RGB-D tracker based on multimodal deep feature fusion (MMDFF) in this paper. MMDFF model consists of four deep Convolutional Neural Networks (CNNs): Motion-specific CNN, RGB- specific CNN, Depth-specific CNN, and RGB-Depth correlated CNN. The depth image is encoded into three channels which are sent into depth-specific CNN to extract deep depth features. The optical flow image is calculated for every frame and then is fed to motion-specific CNN to learn deep motion features. Deep RGB, depth, and motion information can be effectively fused at multiple layers via MMDFF model. Finally, multimodal fusion deep features are sent into the C-COT tracker to obtain the tracking result. For evaluation, experiments are conducted on two recent large-scale RGB-D datasets and results demonstrate that our proposed RGB-D tracking method achieves better performance than other state-of-art RGB-D trackers.
APA, Harvard, Vancouver, ISO, and other styles
30

Wang, Huiqun, Di Huang, Kui Jia, and Yunhong Wang. "Hierarchical Image Segmentation Ensemble for Objectness in RGB-D Images." IEEE Transactions on Circuits and Systems for Video Technology 29, no. 1 (January 2019): 93–103. http://dx.doi.org/10.1109/tcsvt.2017.2776220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Jung, Geunho, Yong-Yuk Won, and Sang Min Yoon. "Computational Large Field-of-View RGB-D Integral Imaging System." Sensors 21, no. 21 (November 8, 2021): 7407. http://dx.doi.org/10.3390/s21217407.

Full text
Abstract:
The integral imaging system has received considerable research attention because it can be applied to real-time three-dimensional image displays with a continuous view angle without supplementary devices. Most previous approaches place a physical micro-lens array in front of the image, where each lens looks different depending on the viewing angle. A computational integral imaging system with a virtual micro-lens arrays has been proposed in order to provide flexibility for users to change micro-lens arrays and focal length while reducing distortions due to physical mismatches with the lens arrays. However, computational integral imaging methods only represent part of the whole image because the size of virtual lens arrays is much smaller than the given large-scale images when dealing with large-scale images. As a result, the previous approaches produce sub-aperture images with a small field of view and need additional devices for depth information to apply to integral imaging pickup systems. In this paper, we present a single image-based computational RGB-D integral imaging pickup system for a large field of view in real time. The proposed system comprises three steps: deep learning-based automatic depth map estimation from an RGB input image without the help of an additional device, a hierarchical integral imaging system for a large field of view in real time, and post-processing for optimized visualization of the failed pickup area using an inpainting method. Quantitative and qualitative experimental results verify the proposed approach’s robustness.
APA, Harvard, Vancouver, ISO, and other styles
32

Salazar, Isail, Said Pertuz, and Fabio Martínez. "Multi-modal RGB-D Image Segmentation from Appearance and Geometric Depth Maps." TecnoLógicas 23, no. 48 (May 15, 2020): 143–61. http://dx.doi.org/10.22430/22565337.1538.

Full text
Abstract:
Classical image segmentation algorithms exploit the detection of similarities and discontinuities of different visual cues to define and differentiate multiple regions of interest in images. However, due to the high variability and uncertainty of image data, producing accurate results is difficult. In other words, segmentation based just on color is often insufficient for a large percentage of real-life scenes. This work presents a novel multi-modal segmentation strategy that integrates depth and appearance cues from RGB-D images by building a hierarchical region-based representation, i.e., a multi-modal segmentation tree (MM-tree). For this purpose, RGB-D image pairs are represented in a complementary fashion by different segmentation maps. Based on color images, a color segmentation tree (C-tree) is created to obtain segmented and over-segmented maps. From depth images, two independent segmentation maps are derived by computing planar and 3D edge primitives. Then, an iterative region merging process can be used to locally group the previously obtained maps into the MM-tree. Finally, the top emerging MM-tree level coherently integrates the available information from depth and appearance maps. The experiments were conducted using the NYU-Depth V2 RGB-D dataset, which demonstrated the competitive results of our strategy compared to state-of-the-art segmentation methods. Specifically, using test images, our method reached average scores of 0.56 in Segmentation Covering and 2.13 in Variation of Information.
APA, Harvard, Vancouver, ISO, and other styles
33

Feng, Guanyuan, Lin Ma, and Xuezhi Tan. "Visual Map Construction Using RGB-D Sensors for Image-Based Localization in Indoor Environments." Journal of Sensors 2017 (2017): 1–18. http://dx.doi.org/10.1155/2017/8037607.

Full text
Abstract:
RGB-D sensors capture RGB images and depth images simultaneously, which makes it possible to acquire the depth information at pixel level. This paper focuses on the use of RGB-D sensors to construct a visual map which is an extended dense 3D map containing essential elements for image-based localization, such as poses of the database camera, visual features, and 3D structures of the building. Taking advantage of matched visual features and corresponding depth values, a novel local optimization algorithm is proposed to achieve point cloud registration and database camera pose estimation. Next, graph-based optimization is used to obtain the global consistency of the map. On the basis of the visual map, the image-based localization method is investigated, making use of the epipolar constraint. The performance of the visual map construction and the image-based localization are evaluated on typical indoor scenes. The simulation results show that the average position errors of the database camera and the query camera can be limited to within 0.2 meters and 0.9 meters, respectively.
APA, Harvard, Vancouver, ISO, and other styles
34

Zeng, Hui, Bin Yang, Xiuqing Wang, Jiwei Liu, and Dongmei Fu. "RGB-D Object Recognition Using Multi-Modal Deep Neural Network and DS Evidence Theory." Sensors 19, no. 3 (January 27, 2019): 529. http://dx.doi.org/10.3390/s19030529.

Full text
Abstract:
With the development of low-cost RGB-D (Red Green Blue-Depth) sensors, RGB-D object recognition has attracted more and more researchers’ attention in recent years. The deep learning technique has become popular in the field of image analysis and has achieved competitive results. To make full use of the effective identification information in the RGB and depth images, we propose a multi-modal deep neural network and a DS (Dempster Shafer) evidence theory based RGB-D object recognition method. First, the RGB and depth images are preprocessed and two convolutional neural networks are trained, respectively. Next, we perform multi-modal feature learning using the proposed quadruplet samples based objective function to fine-tune the network parameters. Then, two probability classification results are obtained using two sigmoid SVMs (Support Vector Machines) with the learned RGB and depth features. Finally, the DS evidence theory based decision fusion method is used for integrating the two classification results. Compared with other RGB-D object recognition methods, our proposed method adopts two fusion strategies: Multi-modal feature learning and DS decision fusion. Both the discriminative information of each modality and the correlation information between the two modalities are exploited. Extensive experimental results have validated the effectiveness of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
35

Hacking, Chris, Nitesh Poona, Nicola Manzan, and Carlos Poblete-Echeverría. "Investigating 2-D and 3-D Proximal Remote Sensing Techniques for Vineyard Yield Estimation." Sensors 19, no. 17 (August 22, 2019): 3652. http://dx.doi.org/10.3390/s19173652.

Full text
Abstract:
Vineyard yield estimation provides the winegrower with insightful information regarding the expected yield, facilitating managerial decisions to achieve maximum quantity and quality and assisting the winery with logistics. The use of proximal remote sensing technology and techniques for yield estimation has produced limited success within viticulture. In this study, 2-D RGB and 3-D RGB-D (Kinect sensor) imagery were investigated for yield estimation in a vertical shoot positioned (VSP) vineyard. Three experiments were implemented, including two measurement levels and two canopy treatments. The RGB imagery (bunch- and plant-level) underwent image segmentation before the fruit area was estimated using a calibrated pixel area. RGB-D imagery captured at bunch-level (mesh) and plant-level (point cloud) was reconstructed for fruit volume estimation. The RGB and RGB-D measurements utilised cross-validation to determine fruit mass, which was subsequently used for yield estimation. Experiment one’s (laboratory conditions) bunch-level results achieved a high yield estimation agreement with RGB-D imagery (r2 = 0.950), which outperformed RGB imagery (r2 = 0.889). Both RGB and RGB-D performed similarly in experiment two (bunch-level), while RGB outperformed RGB-D in experiment three (plant-level). The RGB-D sensor (Kinect) is suited to ideal laboratory conditions, while the robust RGB methodology is suitable for both laboratory and in-situ yield estimation.
APA, Harvard, Vancouver, ISO, and other styles
36

Sudharshan Duth, P., and M. Mary Deepa. "Color detection in RGB-modeled images using MAT LAB." International Journal of Engineering & Technology 7, no. 2.31 (May 29, 2018): 29. http://dx.doi.org/10.14419/ijet.v7i2.31.13391.

Full text
Abstract:
This research work introduces a method of using color thresholds to identify two-dimensional images in MATLAB using the RGB Color model to recognize the Color preferred by the user in the picture. Methodologies including image color detection convert a 3-D RGB Image into a Gray-scale Image, at that point subtract the two pictures to obtain a 2-D black-and-white picture, filtering the noise picture elements using a median filter, detecting with a connected component mark digital pictures in the connected area and utilize the bounding box and its properties to calculate the metric for every marking area. In addition, the shade of the picture element is identified by examining the RGB value of every picture element present in the picture. Color Detection algorithm is executed utilizing the MATLAB Picture handling Toolkit. The result of this implementation can be used in as a bit of security applications such as spy robots, object tracking, Color-based object isolation, and intrusion detection.
APA, Harvard, Vancouver, ISO, and other styles
37

Heravi, Hamed, Roghaieh Aghaeifard, Ali Rahimpour Jounghani, Afshin Ebrahimi, and Masumeh Delgarmi. "EXTRACTING FEATURES OF THE HUMAN FACE FROM RGB-D IMAGES TO PLAN FACIAL SURGERIES." Biomedical Engineering: Applications, Basis and Communications 32, no. 06 (December 2020): 2050042. http://dx.doi.org/10.4015/s1016237220500428.

Full text
Abstract:
Biometric identification of the human face is a pervasive subject which deals with a wide range of disciplines such as image processing, computer vision, pattern recognition, artificial intelligence, and cognitive psychology. Extracting key face points for developing software and commercial devices of face surgery analysis is one of the most challenging fields in computer image and vision processing. Many studies have developed a variety of techniques to extract facial features from color and gray images. In recent years, using depth information has opened up new approaches to researchers in the field of image processing. Hence, in this study, a statistical method is proposed to extract key nose points from color-depth images (RGB-D) of the face front view. In this study, the Microsoft Kinect sensor is used to produce the face RGB-D images. To assess the capability of the proposed method, this algorithm is applied to 20 RGB-D face images from the database collected in the ICT lab of Sahand University of Technology and promising results are achieved for extracting key points of the face. The results of this study indicated that using the available information in two different color-depth bands could make key points of the face more easily accessible and bring better results and we can conclude that the proposed algorithm provided a promising outcome for extracting the positions of key points.
APA, Harvard, Vancouver, ISO, and other styles
38

Liu, Zhengyi, Tengfei Song, and Feng Xie. "RGB-D image saliency detection from 3D perspective." Multimedia Tools and Applications 78, no. 6 (July 31, 2018): 6787–804. http://dx.doi.org/10.1007/s11042-018-6319-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Na, Myung Hwan, Wan Hyun Cho, Sang Kyoon Kim, and In Seop Na. "Automatic Weight Prediction System for Korean Cattle Using Bayesian Ridge Algorithm on RGB-D Image." Electronics 11, no. 10 (May 23, 2022): 1663. http://dx.doi.org/10.3390/electronics11101663.

Full text
Abstract:
Weighting the Hanwoo (Korean cattle) is very important for Korean beef producers when selling the Hanwoo at the right time. Recently, research is being conducted on the automatic prediction of the weight of Hanwoo only through images with the achievement of research using deep learning and image recognition. In this paper, we propose a method for the automatic weight prediction of Hanwoo using the Bayesian ridge algorithm on RGB-D images. The proposed system consists of three parts: segmentation, extraction of features, and estimation of the weight of Korean cattle from a given RGB-D image. The first step is to segment the Hanwoo area from a given RGB-D image using depth information and color information, respectively, and then combine them to perform optimal segmentation. Additionally, we correct the posture using ellipse fitting on segmented body image. The second step is to extract features for weight prediction from the segmented Hanwoo image. We extracted three features: size, shape, and gradients. The third step is to find the optimal machine learning model by comparing eight types of well-known machine learning models. In this step, we compared each model with the aim of finding an efficient model that is lightweight and can be used in an embedded system in the real field. To evaluate the performance of the proposed weight prediction system, we collected 353 RGB-D images from livestock farms in Wonju, Gangwon-do in Korea. In the experimental results, random forest showed the best performance, and the Bayesian ridge model is the second best in MSE or the coefficient of determination. However, we suggest that the Bayesian ridge model is the most optimal model in the aspect of time complexity and space complexity. Finally, it is expected that the proposed system will be casually used to determine the shipping time of Hanwoo in wild farms for a portable commercial device.
APA, Harvard, Vancouver, ISO, and other styles
40

Hong, Sungjin, and Myounggyu Kim. "A Framework for Human Body Parts Detection in RGB-D Image." Journal of Korea Multimedia Society 19, no. 12 (December 31, 2016): 1927–35. http://dx.doi.org/10.9717/kmms.2016.19.12.1927.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Kong, Yuqiu, He Wang, Lingwei Kong, Yang Liu, Cuili Yao, and Baocai Yin. "Absolute and Relative Depth-Induced Network for RGB-D Salient Object Detection." Sensors 23, no. 7 (March 30, 2023): 3611. http://dx.doi.org/10.3390/s23073611.

Full text
Abstract:
Detecting salient objects in complicated scenarios is a challenging problem. Except for semantic features from the RGB image, spatial information from the depth image also provides sufficient cues about the object. Therefore, it is crucial to rationally integrate RGB and depth features for the RGB-D salient object detection task. Most existing RGB-D saliency detectors modulate RGB semantic features with absolution depth values. However, they ignore the appearance contrast and structure knowledge indicated by relative depth values between pixels. In this work, we propose a depth-induced network (DIN) for RGB-D salient object detection, to take full advantage of both absolute and relative depth information, and further, enforce the in-depth fusion of the RGB-D cross-modalities. Specifically, an absolute depth-induced module (ADIM) is proposed, to hierarchically integrate absolute depth values and RGB features, to allow the interaction between the appearance and structural information in the encoding stage. A relative depth-induced module (RDIM) is designed, to capture detailed saliency cues, by exploring contrastive and structural information from relative depth values in the decoding stage. By combining the ADIM and RDIM, we can accurately locate salient objects with clear boundaries, even from complex scenes. The proposed DIN is a lightweight network, and the model size is much smaller than that of state-of-the-art algorithms. Extensive experiments on six challenging benchmarks, show that our method outperforms most existing RGB-D salient object detection models.
APA, Harvard, Vancouver, ISO, and other styles
42

Chen, Songnan, Mengxia Tang, Ruifang Dong, and Jiangming Kan. "Encoder–Decoder Structure Fusing Depth Information for Outdoor Semantic Segmentation." Applied Sciences 13, no. 17 (September 1, 2023): 9924. http://dx.doi.org/10.3390/app13179924.

Full text
Abstract:
The semantic segmentation of outdoor images is the cornerstone of scene understanding and plays a crucial role in the autonomous navigation of robots. Although RGB–D images can provide additional depth information for improving the performance of semantic segmentation tasks, current state–of–the–art methods directly use ground truth depth maps for depth information fusion, which relies on highly developed and expensive depth sensors. Aiming to solve such a problem, we proposed a self–calibrated RGB-D image semantic segmentation neural network model based on an improved residual network without relying on depth sensors, which utilizes multi-modal information from depth maps predicted with depth estimation models and RGB image fusion for image semantic segmentation to enhance the understanding of a scene. First, we designed a novel convolution neural network (CNN) with an encoding and decoding structure as our semantic segmentation model. The encoder was constructed using IResNet to extract the semantic features of the RGB image and the predicted depth map and then effectively fuse them with the self–calibration fusion structure. The decoder restored the resolution of the output features with a series of successive upsampling structures. Second, we presented a feature pyramid attention mechanism to extract the fused information at multiple scales and obtain features with rich semantic information. The experimental results using the publicly available Cityscapes dataset and collected forest scene images show that our model trained with the estimated depth information can achieve comparable performance to the ground truth depth map in improving the accuracy of the semantic segmentation task and even outperforming some competitive methods.
APA, Harvard, Vancouver, ISO, and other styles
43

Ouloul, M. I., Z. Moutakki, K. Afdel, and A. Amghar. "An Efficient Face Recognition Using SIFT Descriptor in RGB-D Images." International Journal of Electrical and Computer Engineering (IJECE) 5, no. 6 (December 1, 2015): 1227. http://dx.doi.org/10.11591/ijece.v5i6.pp1227-1233.

Full text
Abstract:
Automatic face recognition has known a very important evolution in the last decade, due to its huge usage in the security systems. The most of facial recognition approaches use 2D image, but the problem is that this type of image is very sensible to the illumination and lighting changes. Another approach uses the 3D camera and stereo cameras as well, but it’s rarely used because it requires a relatively long processing duration. A new approach rise in this field, which is based on RGB-D images produced by Kinect, this type of cameras cost less and it can be used in any environment and under any circumstances. In this work we propose a new algorithm that combines the RGB image with Depth map which is less sensible to illumination changes. We got a recognition rate of 96, 63% in rank 2.
APA, Harvard, Vancouver, ISO, and other styles
44

Yan, Hailong, Wenqi Wu, Zhenghua Deng, Junjian Huang, Zhizhang Li, and Luting Zhang. "Image Inpainting for 3D Reconstruction Based on the Known Region Boundaries." Mathematics 10, no. 15 (August 3, 2022): 2761. http://dx.doi.org/10.3390/math10152761.

Full text
Abstract:
Pointcloud is a collection of 3D object coordinate systems in 3D scene. Generally, point data in pointclouds represent the outer surface of an object. It is widely used in 3D reconstruction applications in various fields. When obtaining pointcloud data from RGB-D images, if part of the information in the RGB-D images is lost or damaged, the pointcloud data will be hollow or too sparse. Moreover, it is not conducive to the subsequent application of pointcloud data. Based on the boundary of the region to be repaired, we proposes to repair the damaged image and synthesize the complete pointcloud data after a series of preprocessing steps related to the image. Experiments show that the our method can effectively improve the restoration of the lost details of the pixel in the target area and that it will have the fuller pointcloud data after synthesizing the restored image.
APA, Harvard, Vancouver, ISO, and other styles
45

Banchajarurat, Chanikan, Khwantri Saengprachatanarug, Nattpol Damrongplasit, and Chanat Ratanasumawong. "Volume estimation of cassava using consumer-grade RGB-D camera." E3S Web of Conferences 187 (2020): 02002. http://dx.doi.org/10.1051/e3sconf/202018702002.

Full text
Abstract:
Mismanagement during postharvest handling of cassava can degrade the quality of the product and depreciate its selling price considerably. This study proposed the feasibility of using RGB-depth camera to measure the quality of cassava roots in a non-destructive, fast and cost-effective manner. Methodology to estimate the volume of cassavas Kasetsart 50 variety was the focus of this study. Using RGB-D images collected from 60 cassava samples with each one being photographed from 6 different orientations. Image Processing model and Point Cloud image model were used to find the volume from depth images, and then disk method and box method were used to estimate the volume of cassava under ellipsoidal shape. Both estimation methods provided usable values for the volumes in the range of 100 - 500 ml with RMSE values of 5.91% and 4.02%, respectively. The estimated volume can be applied to find density to predict the rotten cassava for improving quality and efficiency of cassava industry.
APA, Harvard, Vancouver, ISO, and other styles
46

Zhang, Heng, Zhenqiang Wen, Yanli Liu, and Gang Xu. "Edge Detection from RGB-D Image Based on Structured Forests." Journal of Sensors 2016 (2016): 1–10. http://dx.doi.org/10.1155/2016/5328130.

Full text
Abstract:
This paper looks into the fundamental problem in computer vision: edge detection. We propose a new edge detector using structured random forests as the classifier, which can make full use of RGB-D image information from Kinect. Before classification, the adaptive bilateral filter is used for the denoising processing of the depth image. As data sources, information of 13 channels from RGB-D image is computed. In order to train the random forest classifier, the approximation measurement of the information gain is used. All the structured labels at a given node are mapped to a discrete set of labels using the Principal Component Analysis (PCA) method. NYUD2 dataset is used to train our structured random forests. The random forest algorithm is used to classify the RGB-D image information for extracting the edge of the image. In addition to the proposed methodology, the quantitative comparisons of different algorithms are presented. The results of the experiments demonstrate the significant improvements of our algorithm over the state of the art.
APA, Harvard, Vancouver, ISO, and other styles
47

Roman-Rivera, Luis-Rogelio, Israel Sotelo-Rodríguez, Jesus Carlos Pedraza-Ortega, Marco Antonio Aceves-Fernandez, Juan Manuel Ramos-Arreguín, and Efrén Gorrostieta-Hurtado. "Reduced Calibration Strategy Using a Basketball for RGB-D Cameras." Mathematics 10, no. 12 (June 16, 2022): 2085. http://dx.doi.org/10.3390/math10122085.

Full text
Abstract:
RGB-D cameras produce depth and color information commonly used in the 3D reconstruction and vision computer areas. Different cameras with the same model usually produce images with different calibration errors. The color and depth layer usually requires calibration to minimize alignment errors, adjust precision, and improve data quality in general. Standard calibration protocols for RGB-D cameras require a controlled environment to allow operators to take many RGB and depth pair images as an input for calibration frameworks making the calibration protocol challenging to implement without ideal conditions and the operator experience. In this work, we proposed a novel strategy that simplifies the calibration protocol by requiring fewer images than other methods. Our strategy uses an ordinary object, a know-size basketball, as a ground truth sphere geometry during the calibration. Our experiments show comparable results requiring fewer images and non-ideal scene conditions than a reference method to align color and depth image layers.
APA, Harvard, Vancouver, ISO, and other styles
48

Cho, Junsu, Seungwon Kim, Chi-Min Oh, and Jeong-Min Park. "Auxiliary Task Graph Convolution Network: A Skeleton-Based Action Recognition for Practical Use." Applied Sciences 15, no. 1 (December 29, 2024): 198. https://doi.org/10.3390/app15010198.

Full text
Abstract:
Graph convolution networks (GCNs) have been extensively researched for action recognition by estimating human skeletons from video clips. However, their image sampling methods are not practical because they require video-length information for sampling images. In this study, we propose an Auxiliary Task Graph Convolution Network (AT-GCN) with low and high-frame pathways while supporting a new sampling method. AT-GCN learns actions at a defined frame rate in the defined range with three losses: fuse, slow, and fast losses. AT-GCN handles the slow and fast losses in two auxiliary tasks, while the mainstream handles the fuse loss. AT-GCN outperforms the original State-of-the-Art model on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets while maintaining the same inference time. AT-GCN shows the best performance on the NTU RGB+D dataset at 90.3% from subjects, 95.2 from view benchmarks, on the NTU RGB+D 120 dataset at 86.5% from subjects, 87.6% from set benchmarks, and at 93.5% on the NW-UCLA dataset as top-1 accuracy.
APA, Harvard, Vancouver, ISO, and other styles
49

Zhou, Yang, Danqing Chen, Jun Wu, Mingyi Huang, and Yubin Weng. "Calibration of RGB-D Camera Using Depth Correction Model." Journal of Physics: Conference Series 2203, no. 1 (February 1, 2022): 012032. http://dx.doi.org/10.1088/1742-6596/2203/1/012032.

Full text
Abstract:
Abstract This paper proposes a calibration method of RGB-D camera, especially its depth camera. First, use a checkerboard calibration board under auxiliary Infrared light source to collect calibration images. Then, the internal and external parameters of the depth camera are calculated by Zhang’s calibration method, which improves the accuracy of the internal parameter. Next, the depth correction model is proposed to directly calibrate the distortion of the depth image, which is more intuitive and faster than the disparity distortion correction model. This method is simple, high-precision, and suitable for most depth cameras.
APA, Harvard, Vancouver, ISO, and other styles
50

Rossi, L., C. I. De Gaetani, D. Pagliari, E. Realini, M. Reguzzoni, and L. Pinto. "COMPARISON BETWEEN RGB AND RGB-D CAMERAS FOR SUPPORTING LOW-COST GNSS URBAN NAVIGATION." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2 (May 30, 2018): 991–98. http://dx.doi.org/10.5194/isprs-archives-xlii-2-991-2018.

Full text
Abstract:
A pure GNSS navigation is often unreliable in urban areas because of the presence of obstructions, thus preventing a correct reception of the satellite signal. The bridging between GNSS outages, as well as the vehicle attitude reconstruction, can be recovered by using complementary information, such as visual data acquired by RGB-D or RGB cameras. In this work, the possibility of integrating low-cost GNSS and visual data by means of an extended Kalman filter has been investigated. The focus is on the comparison between the use of RGB-D or RGB cameras. In particular, a Microsoft Kinect device (second generation) and a mirrorless Canon EOS M RGB camera have been compared. The former is an interesting RGB-D camera because of its low-cost, easiness of use and raw data accessibility. The latter has been selected for the high-quality of the acquired images and for the possibility of mounting fixed focal length lenses with a lower weight and cost with respect to a reflex camera. The designed extended Kalman filter takes as input the GNSS-only trajectory and the relative orientation between subsequent pairs of images. Depending on the visual data acquisition system, the filter is different because RGB-D cameras acquire both RGB and depth data, allowing to solve the scale problem, which is instead typical of image-only solutions. The two systems and filtering approaches were assessed by ad-hoc experimental tests, showing that the use of a Kinect device for supporting a u-blox low-cost receiver led to a trajectory with a decimeter accuracy, that is 15 % better than the one obtained when using the Canon EOS M camera.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography