Journal articles on the topic 'Dense Vision Tasks'

To see the other types of publications on this topic, follow the link: Dense Vision Tasks.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Dense Vision Tasks.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yao, Chao, Shuo Jin, Meiqin Liu, and Xiaojuan Ban. "Dense Residual Transformer for Image Denoising." Electronics 11, no. 3 (January 29, 2022): 418. http://dx.doi.org/10.3390/electronics11030418.

Full text
Abstract:
Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually applied and achieved great success in image denoising, image compression, image enhancement, etc. Recently, Transformer has been a hot technique, which is widely used to tackle computer vision tasks. However, few Transformer-based methods have been proposed for low-level vision tasks. In this paper, we proposed an image denoising network structure based on Transformer, which is named DenSformer. DenSformer consists of three modules, including a preprocessing module, a local-global feature extraction module, and a reconstruction module. Specifically, the local-global feature extraction module consists of several Sformer groups, each of which has several ETransformer layers and a convolution layer, together with a residual connection. These Sformer groups are densely skip-connected to fuse the feature of different layers, and they jointly capture the local and global information from the given noisy images. We conduct our model on comprehensive experiments. In synthetic noise removal, DenSformer outperforms other state-of-the-art methods by up to 0.06–0.28 dB in gray-scale images and 0.57–1.19 dB in color images. In real noise removal, DenSformer can achieve comparable performance, while the number of parameters can be reduced by up to 40%. Experimental results prove that our DenSformer achieves improvement compared to some state-of-the-art methods, both for the synthetic noise data and real noise data, in the objective and subjective evaluations.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Qian, Yeqi Liu, Chuanyang Gong, Yingyi Chen, and Huihui Yu. "Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review." Sensors 20, no. 5 (March 10, 2020): 1520. http://dx.doi.org/10.3390/s20051520.

Full text
Abstract:
Deep Learning (DL) is the state-of-the-art machine learning technology, which shows superior performance in computer vision, bioinformatics, natural language processing, and other areas. Especially as a modern image processing technology, DL has been successfully applied in various tasks, such as object detection, semantic segmentation, and scene analysis. However, with the increase of dense scenes in reality, due to severe occlusions, and small size of objects, the analysis of dense scenes becomes particularly challenging. To overcome these problems, DL recently has been increasingly applied to dense scenes and has begun to be used in dense agricultural scenes. The purpose of this review is to explore the applications of DL for dense scenes analysis in agriculture. In order to better elaborate the topic, we first describe the types of dense scenes in agriculture, as well as the challenges. Next, we introduce various popular deep neural networks used in these dense scenes. Then, the applications of these structures in various agricultural tasks are comprehensively introduced in this review, including recognition and classification, detection, counting and yield estimation. Finally, the surveyed DL applications, limitations and the future work for analysis of dense images in agriculture are summarized.
APA, Harvard, Vancouver, ISO, and other styles
3

Gan, Zhe, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, and Zicheng Liu. "Playing Lottery Tickets with Vision and Language." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 652–60. http://dx.doi.org/10.1609/aaai.v36i1.19945.

Full text
Abstract:
Large-scale pre-training has recently revolutionized vision-and-language (VL) research. Models such as LXMERT and UNITER have significantly lifted the state of the art over a wide range of VL tasks. However, the large number of parameters in such models hinders their application in practice. In parallel, work on the lottery ticket hypothesis (LTH) has shown that deep neural networks contain small matching subnetworks that can achieve on par or even better performance than the dense networks when trained in isolation. In this work, we perform the first empirical study to assess whether such trainable subnetworks also exist in pre-trained VL models. We use UNITER as the main testbed (also test on LXMERT and ViLT), and consolidate 7 representative VL tasks for experiments, including visual question answering, visual commonsense reasoning, visual entailment, referring expression comprehension, image-text retrieval, GQA, and NLVR2. Through comprehensive analysis, we summarize our main findings as follows. (i) It is difficult to find subnetworks that strictly match the performance of the full model. However, we can find relaxed winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy. (ii) Subnetworks found by task-specific pruning transfer reasonably well to the other tasks, while those found on the pre-training tasks at 60%/70% sparsity transfer universally, matching 98%/96% of the full accuracy on average over all the tasks. (iii) Besides UNITER, other models such as LXMERT and ViLT can also play lottery tickets. However, the highest sparsity we can achieve for ViLT is far lower than LXMERT and UNITER (30% vs. 70%). (iv) LTH also remains relevant when using other training methods (e.g., adversarial training).
APA, Harvard, Vancouver, ISO, and other styles
4

Dinh, My-Tham, Deok-Jai Choi, and Guee-Sang Lee. "DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection." Sensors 23, no. 13 (June 25, 2023): 5889. http://dx.doi.org/10.3390/s23135889.

Full text
Abstract:
Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Pan, Zizheng, Bohan Zhuang, Haoyu He, Jing Liu, and Jianfei Cai. "Less Is More: Pay Less Attention in Vision Transformers." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2035–43. http://dx.doi.org/10.1609/aaai.v36i2.20099.

Full text
Abstract:
Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks. Code is available at https://github.com/zip-group/LIT.
APA, Harvard, Vancouver, ISO, and other styles
6

CASCO, CLARA, GIANLUCA CAMPANA, ALBA GRIECO, SILVANA MUSETTI, and SALVATORE PERRONE. "Hyper-vision in a patient with central and paracentral vision loss reflects cortical reorganization." Visual Neuroscience 20, no. 5 (September 2003): 501–10. http://dx.doi.org/10.1017/s0952523803205046.

Full text
Abstract:
SM, a 21-year-old female, presents an extensive central scotoma (30 deg) with dense absolute scotoma (visual acuity = 10/100) in the macular area (10 deg) due to Stargardt's disease. We provide behavioral evidence of cortical plastic reorganization since the patient could perform several visual tasks with her poor-vision eyes better than controls, although high spatial frequency sensitivity and visual acuity are severely impaired. Between 2.5-deg and 12-deg eccentricities, SM presented (1) normal acuity for crowded letters, provided stimulus size is above acuity thresholds for single letters; (2) a two-fold sensitivity increase (d-prime) with respect to controls in a simple search task; and (3) largely above-threshold performance in a lexical decision task carried out randomly by controls. SM's hyper-vision may reflect a long-term sensory gain specific for unimpaired low spatial-frequency mechanisms, which may result from modifications in response properties due to practice-dependent changes in excitatory/inhibitory intracortical connections.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Xu, DeZhi Han, and Chin-Chen Chang. "RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer." Mobile Information Systems 2021 (October 18, 2021): 1–9. http://dx.doi.org/10.1155/2021/2662064.

Full text
Abstract:
Visual question answering (VQA) is the natural language question-answering of visual images. The model of VQA needs to make corresponding answers according to specific questions based on understanding images, the most important of which is to understand the relationship between images and language. Therefore, this paper proposes a new model, Representation of Dense Multimodality Fusion Encoder Based on Transformer, for short, RDMMFET, which can learn the related knowledge between vision and language. The RDMMFET model consists of three parts: dense language encoder, image encoder, and multimodality fusion encoder. In addition, we designed three types of pretraining tasks: masked language model, masked image model, and multimodality fusion task. These pretraining tasks can help to understand the fine-grained alignment between text and image regions. Simulation results on the VQA v2.0 data set show that the RDMMFET model can work better than the previous model. Finally, we conducted detailed ablation studies on the RDMMFET model and provided the results of attention visualization, which proves that the RDMMFET model can significantly improve the effect of VQA.
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Bin, Haifeng Ye, Sihan Fu, Xiaojin Gong, and Zhiyu Xiang. "UnVELO: Unsupervised Vision-Enhanced LiDAR Odometry with Online Correction." Sensors 23, no. 8 (April 13, 2023): 3967. http://dx.doi.org/10.3390/s23083967.

Full text
Abstract:
Due to the complementary characteristics of visual and LiDAR information, these two modalities have been fused to facilitate many vision tasks. However, current studies of learning-based odometries mainly focus on either the visual or LiDAR modality, leaving visual–LiDAR odometries (VLOs) under-explored. This work proposes a new method to implement an unsupervised VLO, which adopts a LiDAR-dominant scheme to fuse the two modalities. We, therefore, refer to it as unsupervised vision-enhanced LiDAR odometry (UnVELO). It converts 3D LiDAR points into a dense vertex map via spherical projection and generates a vertex color map by colorizing each vertex with visual information. Further, a point-to-plane distance-based geometric loss and a photometric-error-based visual loss are, respectively, placed on locally planar regions and cluttered regions. Last, but not least, we designed an online pose-correction module to refine the pose predicted by the trained UnVELO during test time. In contrast to the vision-dominant fusion scheme adopted in most previous VLOs, our LiDAR-dominant method adopts the dense representations for both modalities, which facilitates the visual–LiDAR fusion. Besides, our method uses the accurate LiDAR measurements instead of the predicted noisy dense depth maps, which significantly improves the robustness to illumination variations, as well as the efficiency of the online pose correction. The experiments on the KITTI and DSEC datasets showed that our method outperformed previous two-frame-based learning methods. It was also competitive with hybrid methods that integrate a global optimization on multiple or all frames.
APA, Harvard, Vancouver, ISO, and other styles
9

Liang, Junling, Heng Li, Fei Xu, Jianpin Chen, Meixuan Zhou, Liping Yin, Zhenzhen Zhai, and Xinyu Chai. "A Fast Deployable Instance Elimination Segmentation Algorithm Based on Watershed Transform for Dense Cereal Grain Images." Agriculture 12, no. 9 (September 16, 2022): 1486. http://dx.doi.org/10.3390/agriculture12091486.

Full text
Abstract:
Cereal grains are a vital part of the human diet. The appearance quality and size distribution of cereal grains play major roles as deciders or indicators of market acceptability, storage stability, and breeding. Computer vision is popular in completing quality assessment and size analysis tasks, in which an accurate instance segmentation is a key step to guaranteeing the smooth completion of tasks. This study proposes a fast deployable instance segmentation method based on a generative marker-based watershed segmentation algorithm, which combines two strategies (one strategy for optimizing kernel areas and another for comprehensive segmentation) to overcome the problems of over-segmentation and under-segmentation for images with dense and small targets. Results show that the average segmentation accuracy of our method reaches 98.73%, which is significantly higher than the marker-based watershed segmentation algorithm (82.98%). To further verify the engineering practicality of our method, we count the size distribution of segmented cereal grains. The results keep a high degree of consistency with the manually sketched ground truth. Moreover, our proposed algorithm framework can be used as a great reference in other segmentation tasks of dense targets.
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Yaming, Minjie Wang, Wenqing Huang, Xiaoping Ye, and Mingfeng Jiang. "Deep Spatial-Temporal Neural Network for Dense Non-Rigid Structure from Motion." Mathematics 10, no. 20 (October 14, 2022): 3794. http://dx.doi.org/10.3390/math10203794.

Full text
Abstract:
Dense non-rigid structure from motion (NRSfM) has long been a challenge in computer vision because of the vast number of feature points. As neural networks develop rapidly, a novel solution is emerging. However, existing methods ignore the significance of spatial–temporal data and the strong capacity of neural networks for learning. This study proposes a deep spatial–temporal NRSfM framework (DST-NRSfM) and introduces a weighted spatial constraint to further optimize the 3D reconstruction results. Layer normalization layers are applied in dense NRSfM tasks to stop gradient disappearance and hasten neural network convergence. Our DST-NRSfM framework outperforms both classical approaches and recent advancements. It achieves state-of-the-art performance across commonly used synthetic and real benchmark datasets.
APA, Harvard, Vancouver, ISO, and other styles
11

Wei, Guoqiang, Zhizheng Zhang, Cuiling Lan, Yan Lu, and Zhibo Chen. "Active Token Mixer." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 3 (June 26, 2023): 2759–67. http://dx.doi.org/10.1609/aaai.v37i3.25376.

Full text
Abstract:
The three existing dominant network families, i.e., CNNs, Transformers and MLPs, differ from each other mainly in the ways of fusing spatial contextual information, leaving designing more effective token-mixing mechanisms at the core of backbone architecture development. In this work, we propose an innovative token-mixer, dubbed Active Token Mixer (ATM), to actively incorporate contextual information from other tokens in the global scope into the given query token. This fundamental operator actively predicts where to capture useful contexts and learns how to fuse the captured contexts with the query token at channel level. In this way, the spatial range of token-mixing can be expanded to a global scope with limited computational complexity, where the way of token-mixing is reformed. We take ATMs as the primary operators and assemble them into a cascade architecture, dubbed ATMNet. Extensive experiments demonstrate that ATMNet is generally applicable and comprehensively surpasses different families of SOTA vision backbones by a clear margin on a broad range of vision tasks, including visual recognition and dense prediction tasks. Code is available at https://github.com/microsoft/ActiveMLP.
APA, Harvard, Vancouver, ISO, and other styles
12

Tippetts, Beau, Dah Jye Lee, Kirt Lillywhite, and James K. Archibald. "Hardware-Efficient Design of Real-Time Profile Shape Matching Stereo Vision Algorithm on FPGA." International Journal of Reconfigurable Computing 2014 (2014): 1–12. http://dx.doi.org/10.1155/2014/945926.

Full text
Abstract:
A variety of platforms, such as micro-unmanned vehicles, are limited in the amount of computational hardware they can support due to weight and power constraints. An efficient stereo vision algorithm implemented on an FPGA would be able to minimize payload and power consumption in microunmanned vehicles, while providing 3D information and still leaving computational resources available for other processing tasks. This work presents a hardware design of the efficient profile shape matching stereo vision algorithm. Hardware resource usage is presented for the targeted micro-UV platform, Helio-copter, that uses the Xilinx Virtex 4 FX60 FPGA. Less than a fifth of the resources on this FGPA were used to produce dense disparity maps for image sizes up to 450 × 375, with the ability to scale up easily by increasing BRAM usage. A comparison is given of accuracy, speed performance, and resource usage of a census transform-based stereo vision FPGA implementation by Jin et al. Results show that the profile shape matching algorithm is an efficient real-time stereo vision algorithm for hardware implementation for resource limited systems such as microunmanned vehicles.
APA, Harvard, Vancouver, ISO, and other styles
13

Xing, Shuli, Marely Lee, and Keun-kwang Lee. "Citrus Pests and Diseases Recognition Model Using Weakly Dense Connected Convolution Network." Sensors 19, no. 14 (July 19, 2019): 3195. http://dx.doi.org/10.3390/s19143195.

Full text
Abstract:
Pests and diseases can cause severe damage to citrus fruits. Farmers used to rely on experienced experts to recognize them, which is a time consuming and costly process. With the popularity of image sensors and the development of computer vision technology, using convolutional neural network (CNN) models to identify pests and diseases has become a recent trend in the field of agriculture. However, many researchers refer to pre-trained models of ImageNet to execute different recognition tasks without considering their own dataset scale, resulting in a waste of computational resources. In this paper, a simple but effective CNN model was developed based on our image dataset. The proposed network was designed from the aspect of parameter efficiency. To achieve this goal, the complexity of cross-channel operation was increased and the frequency of feature reuse was adapted to network depth. Experiment results showed that Weakly DenseNet-16 got the highest classification accuracy with fewer parameters. Because this network is lightweight, it can be used in mobile devices.
APA, Harvard, Vancouver, ISO, and other styles
14

Hackel, T., N. Savinov, L. Ladicky, J. D. Wegner, K. Schindler, and M. Pollefeys. "SEMANTIC3D.NET: A NEW LARGE-SCALE POINT CLOUD CLASSIFICATION BENCHMARK." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-1/W1 (May 30, 2017): 91–98. http://dx.doi.org/10.5194/isprs-annals-iv-1-w1-91-2017.

Full text
Abstract:
This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de-facto standard for many tasks in computer vision and machine learning like semantic segmentation or object detection in images, but have no yet led to a true breakthrough for 3D point cloud labelling tasks due to lack of training data. With the massive data set presented in this paper, we aim at closing this data gap to help unleash the full potential of deep learning methods for 3D labelling tasks. Our semantic3D.net data set consists of dense point clouds acquired with static terrestrial laser scanners. It contains 8 semantic classes and covers a wide range of urban outdoor scenes: churches, streets, railroad tracks, squares, villages, soccer fields and castles. We describe our labelling interface and show that our data set provides more dense and complete point clouds with much higher overall number of labelled points compared to those already available to the research community. We further provide baseline method descriptions and comparison between methods submitted to our online system. We hope semantic3D.net will pave the way for deep learning methods in 3D point cloud labelling to learn richer, more general 3D representations, and first submissions after only a few months indicate that this might indeed be the case.
APA, Harvard, Vancouver, ISO, and other styles
15

Cai, Pingping, Zhenyao Wu, Xinyi Wu, and Song Wang. "Parametric Surface Constrained Upsampler Network for Point Cloud." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 1 (June 26, 2023): 250–58. http://dx.doi.org/10.1609/aaai.v37i1.25097.

Full text
Abstract:
Designing a point cloud upsampler, which aims to generate a clean and dense point cloud given a sparse point representation, is a fundamental and challenging problem in computer vision. A line of attempts achieves this goal by establishing a point-to-point mapping function via deep neural networks. However, these approaches are prone to produce outlier points due to the lack of explicit surface-level constraints. To solve this problem, we introduce a novel surface regularizer into the upsampler network by forcing the neural network to learn the underlying parametric surface represented by bicubic functions and rotation functions, where the new generated points are then constrained on the underlying surface. These designs are integrated into two different networks for two tasks that take advantages of upsampling layers -- point cloud upsampling and point cloud completion for evaluation. The state-of-the-art experimental results on both tasks demonstrate the effectiveness of the proposed method. The implementation code will be available at https://github.com/corecai163/PSCU.
APA, Harvard, Vancouver, ISO, and other styles
16

S R, Sreela, and Sumam Mary Idicula. "Dense Model for Automatic Image Description Generation with Game Theoretic Optimization." Information 10, no. 11 (November 15, 2019): 354. http://dx.doi.org/10.3390/info10110354.

Full text
Abstract:
Due to the rapid growth of deep learning technologies, automatic image description generation is an interesting problem in computer vision and natural language generation. It helps to improve access to photo collections on social media and gives guidance for visually impaired people. Currently, deep neural networks play a vital role in computer vision and natural language processing tasks. The main objective of the work is to generate the grammatically correct description of the image using the semantics of the trained captions. An encoder-decoder framework using the deep neural system is used to implement an image description generation task. The encoder is an image parsing module, and the decoder is a surface realization module. The framework uses Densely connected convolutional neural networks (Densenet) for image encoding and Bidirectional Long Short Term Memory (BLSTM) for language modeling, and the outputs are given to bidirectional LSTM in the caption generator, which is trained to optimize the log-likelihood of the target description of the image. Most of the existing image captioning works use RNN and LSTM for language modeling. RNNs are computationally expensive with limited memory. LSTM checks the inputs in one direction. BLSTM is used in practice, which avoids the problem of RNN and LSTM. In this work, the selection of the best combination of words in caption generation is made using beam search and game theoretic search. The results show the game theoretic search outperforms beam search. The model was evaluated with the standard benchmark dataset Flickr8k. The Bilingual Evaluation Understudy (BLEU) score is taken as the evaluation measure of the system. A new evaluation measure called GCorrectwas used to check the grammatical correctness of the description. The performance of the proposed model achieves greater improvements over previous methods on the Flickr8k dataset. The proposed model produces grammatically correct sentences for images with a GCorrect of 0.040625 and a BLEU score of 69.96%
APA, Harvard, Vancouver, ISO, and other styles
17

Cho, Hoonhee, and Kuk-Jin Yoon. "Event-Image Fusion Stereo Using Cross-Modality Feature Propagation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 454–62. http://dx.doi.org/10.1609/aaai.v36i1.19923.

Full text
Abstract:
Event cameras asynchronously output the polarity values of pixel-level log intensity alterations. They are robust against motion blur and can be adopted in challenging light conditions. Owing to these advantages, event cameras have been employed in various vision tasks such as depth estimation, visual odometry, and object detection. In particular, event cameras are effective in stereo depth estimation to find correspondence points between two cameras under challenging illumination conditions and/or fast motion. However, because event cameras provide spatially sparse event stream data, it is difficult to obtain a dense disparity map. Although it is possible to estimate disparity from event data at the edge of a structure where intensity changes are likely to occur, estimating the disparity in a region where event occurs rarely is challenging. In this study, we propose a deep network that combines the features of an image with the features of an event to generate a dense disparity map. The proposed network uses images to obtain spatially dense features that are lacking in events. In addition, we propose a spatial multi-scale correlation between two fused feature maps for an accurate disparity map. To validate our method, we conducted experiments using synthetic and real-world datasets.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhou, Wei, Ziheng Qian, Xinyuan Ni, Yujun Tang, Hanming Guo, and Songlin Zhuang. "Dense Convolutional Neural Network for Identification of Raman Spectra." Sensors 23, no. 17 (August 25, 2023): 7433. http://dx.doi.org/10.3390/s23177433.

Full text
Abstract:
The rapid development of cloud computing and deep learning makes the intelligent modes of applications widespread in various fields. The identification of Raman spectra can be realized in the cloud, due to its powerful computing, abundant spectral databases and advanced algorithms. Thus, it can reduce the dependence on the performance of the terminal instruments. However, the complexity of the detection environment can cause great interferences, which might significantly decrease the identification accuracies of algorithms. In this paper, a deep learning algorithm based on the Dense network has been proposed to satisfy the realization of this vision. The proposed Dense convolutional neural network has a very deep structure of over 40 layers and plenty of parameters to adjust the weight of different wavebands. In the kernel Dense blocks part of the network, it has a feed-forward fashion of connection for each layer to every other layer. It can alleviate the gradient vanishing or explosion problems, strengthen feature propagations, encourage feature reuses and enhance training efficiency. The network’s special architecture mitigates noise interferences and ensures precise identification. The Dense network shows more accuracy and robustness compared to other CNN-based algorithms. We set up a database of 1600 Raman spectra consisting of 32 different types of liquid chemicals. They are detected using different postures as examples of interfered Raman spectra. In the 50 repeated training and testing sets, the Dense network can achieve a weighted accuracy of 99.99%. We have also tested the RRUFF database and the Dense network has a good performance. The proposed approach advances cloud-enabled Raman spectra identification, offering improved accuracy and adaptability for diverse identification tasks.
APA, Harvard, Vancouver, ISO, and other styles
19

Karadeniz, Ahmet Serdar, Mehmet Fatih Karadeniz, Gerhard Wilhelm Weber, and Ismail Husein. "IMPROVING CNN FEATURES FOR FACIAL EXPRESSION RECOGNITION." ZERO: Jurnal Sains, Matematika dan Terapan 3, no. 1 (November 11, 2019): 1. http://dx.doi.org/10.30829/zero.v3i1.5881.

Full text
Abstract:
<span class="fontstyle0">Abstract </span><span class="fontstyle2">Facial expression recognition is one of the challenging tasks in computer<br />vision. In this paper, we analyzed and improved the performances both<br />handcrafted features and deep features extracted by Convolutional Neural<br />Network (CNN). Eigenfaces, HOG, Dense-SIFT were used as handcrafted features.<br />Additionally, we developed features based on the distances between facial<br />landmarks and SIFT descriptors around the centroids of the facial landmarks,<br />leading to a better performance than Dense-SIFT. We achieved 68.34 % accuracy<br />with a CNN model trained from scratch. By combining CNN features with<br />handcrafted features, we achieved 69.54 % test accuracy.<br /></span><span class="fontstyle0">Key Word</span><span class="fontstyle3">: </span><span class="fontstyle2">Neural network, facial expression recognition, handcrafted features</span> <br /><br />
APA, Harvard, Vancouver, ISO, and other styles
20

Ley, Pia, Davide Bottari, Bhamy Hariprasad Shenoy, Ramesh Kekunnaya, and Brigitte Roeder. "Restricted recovery of external remapping of tactile stimuli after restoring vision in a congenitally blind man." Seeing and Perceiving 25 (2012): 190. http://dx.doi.org/10.1163/187847612x648198.

Full text
Abstract:
People with surgically removed congenital dense bilateral cataracts offer a natural model of visual deprivation and reafferentation in humans to investigate sensitive periods of multisensory development, for example regarding the recruitment of external or anatomical frames of reference for spatial representation. Here we present a single case (HS; male; 33 years; right-handed), born with congenital dense bilateral cataracts. His lenses were removed at the age of two years, but he received optical aids only at age six. At time of testing, his visual acuity was 30% in the best eye. We performed two tasks, a tactile temporal order judgment task (TOJ) in which two tactile stimuli were presented successively to the index fingers located in the two hemifields, adopting a crossed and uncrossed hand posture. The participant judged as precisely as possible which side was stimulated first. Moreover, we used a crossmodal-congruency task in which a tactile stimulus and an irrelevant visual distracter were presented simultaneously but independently to one of four positions. The participant judged the location (index or thumb) of the tactile stimulus with hands crossed or uncrossed. Speed was emphasized. In contrast to sighted controls, HS did not show a decrement of TOJ performance with hands crossed. Moreover, while the congruency gain was equivalent to sighted controls with uncrossed hands, this effect was significantly reduced with hands crossed. Thus, an external remapping of tactile stimuli still develops after a long phase of visual deprivation. However, remapping seems to be less efficient and to only take place in the context of visual stimuli.
APA, Harvard, Vancouver, ISO, and other styles
21

Park, Soya, Jonathan Bragg, Michael Chang, Kevin Larson, and Danielle Bragg. "Exploring Team-Sourced Hyperlinks to Address Navigation Challenges for Low-Vision Readers of Scientific Papers." Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (November 7, 2022): 1–23. http://dx.doi.org/10.1145/3555629.

Full text
Abstract:
Reading academic papers is a fundamental part of higher education and research, but navigating these information-dense texts can be challenging. In particular, low-vision readers using magnification encounter additional barriers to quickly skimming and visually locating information. In this work, we explored the design of interfaces to enable readers to: 1) navigate papers more easily, and 2) input the required navigation hooks that AI cannot currently automate. To explore this design space, we ran two exploratory studies. The first focused on current practices of low-vision paper readers, the challenges they encounter, and the interfaces they desire. During this study, low-vision participants were interviewed, and tried out four new paper navigation prototypes. Results from this study grounded the design of our end-to-end system prototype Ocean, which provides an accessible front-end for low-vision readers, and enables all readers to contribute to the backend by leaving traces of their reading paths for others to leverage. Our second study used this exploratory interface in a field study with groups of low-vision and sighted readers to probe the user experience of reading and creating traces. Our findings suggest that it may be possible for readers of all abilities to organically leave traces in papers, and that these traces can be used to facilitate navigation tasks, in particular for low-vision readers. Based on our findings, we present design considerations for creating future paper-reading tools that improve access, and organically source the required data from readers.
APA, Harvard, Vancouver, ISO, and other styles
22

Wei, Shuangfeng, Shangxing Wang, Hao Li, Guangzu Liu, Tong Yang, and Changchang Liu. "A Semantic Information-Based Optimized vSLAM in Indoor Dynamic Environments." Applied Sciences 13, no. 15 (July 29, 2023): 8790. http://dx.doi.org/10.3390/app13158790.

Full text
Abstract:
In unknown environments, mobile robots can use visual-based Simultaneous Localization and Mapping (vSLAM) to complete positioning tasks while building sparse feature maps and dense maps. However, the traditional vSLAM works in the hypothetical environment of static scenes and rarely considers the dynamic objects existing in the actual scenes. In addition, it is difficult for the robot to perform high-level semantic tasks due to its inability to obtain semantic information from sparse feature maps and dense maps. In order to improve the ability of environment perception and accuracy of mapping for mobile robots in dynamic indoor environments, we propose a semantic information-based optimized vSLAM algorithm. The optimized vSLAM algorithm adds the modules of dynamic region detection and semantic segmentation to ORB-SLAM2. First, a dynamic region detection module is added to the vision odometry. The dynamic region of the image is detected by combining single response matrix and dense optical flow method to improve the accuracy of pose estimation in dynamic environment. Secondly, the semantic segmentation of images is implemented based on BiSeNet V2 network. For the over-segmentation problem in semantic segmentation, a region growth algorithm combining depth information is proposed to optimize the 3D segmentation. In the process of map building, semantic information and dynamic regions are used to remove dynamic objects and build an indoor map containing semantic information. The system not only can effectively remove the effect of dynamic objects on the pose estimation, but also use the semantic information of images to build indoor maps containing semantic information. The proposed algorithm is evaluated and analyzed in TUM RGB-D dataset and real dynamic scenes. The results show that the accuracy of our algorithm outperforms that of ORB-SLAM2 and DS-SLAM in dynamic scenarios.
APA, Harvard, Vancouver, ISO, and other styles
23

Zhao, Yinuo, Kun Wu, Zhiyuan Xu, Zhengping Che, Qi Lu, Jian Tang, and Chi Harold Liu. "CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-Based Autonomous Urban Driving." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 3 (June 28, 2022): 3481–89. http://dx.doi.org/10.1609/aaai.v36i3.20259.

Full text
Abstract:
Vision-based autonomous urban driving in dense traffic is quite challenging due to the complicated urban environment and the dynamics of the driving behaviors. Widely-applied methods either heavily rely on hand-crafted rules or learn from limited human experience, which makes them hard to generalize to rare but critical scenarios. In this paper, we present a novel CAscade Deep REinforcement learning framework, CADRE, to achieve model-free vision-based autonomous urban driving. In CADRE, to derive representative latent features from raw observations, we first offline train a Co-attention Perception Module (CoPM) that leverages the co-attention mechanism to learn the inter-relationships between the visual and control information from a pre-collected driving dataset. Cascaded by the frozen CoPM, we then present an efficient distributed proximal policy optimization framework to online learn the driving policy under the guidance of particularly designed reward functions. We perform a comprehensive empirical study with the CARLA NoCrash benchmark as well as specific obstacle avoidance scenarios in autonomous urban driving tasks. The experimental results well justify the effectiveness of CADRE and its superiority over the state-of-the-art by a wide margin.
APA, Harvard, Vancouver, ISO, and other styles
24

Huang, X., R. Qin, and M. Chen. "DISPARITY REFINEMENT OF BUILDING EDGES USING ROBUSTLY MATCHED STRAIGHT LINES FOR STEREO MATCHING." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-1 (September 26, 2018): 77–84. http://dx.doi.org/10.5194/isprs-annals-iv-1-77-2018.

Full text
Abstract:
<p><strong>Abstract.</strong> Stereo dense matching has already been one of the dominant tools in 3D reconstruction of urban regions, due to its low cost and high flexibility in generating 3D points. However, the image-derived 3D points are often inaccurate around building edges, which limit its use in several vision tasks (e.g. building modelling). To generate 3D point clouds or digital surface models (DSM) with sharp boundaries, this paper integrates robustly matched lines for improving dense matching, and proposes a non-local disparity refinement of building edges through an iterative least squares plane adjustment approach. In our method, we first extract and match straight lines in images using epipolar constraints, then detect building edges from these straight lines by comparing matching results on both sides of straight lines, and finally we develop a non-local disparity refinement method through an iterative least squares plane adjustment constrained by matched straight lines to yield sharper and more accurate edges. Experiments conducted on both satellite and aerial data demonstrate that our proposed method is able to generate more accurate DSM with sharper object boundaries.</p>
APA, Harvard, Vancouver, ISO, and other styles
25

Deschaud, Jean-Emmanuel, David Duque, Jean Pierre Richa, Santiago Velasco-Forero, Beatriz Marcotegui, and François Goulette. "Paris-CARLA-3D: A Real and Synthetic Outdoor Point Cloud Dataset for Challenging Tasks in 3D Mapping." Remote Sensing 13, no. 22 (November 21, 2021): 4713. http://dx.doi.org/10.3390/rs13224713.

Full text
Abstract:
Paris-CARLA-3D is a dataset of several dense colored point clouds of outdoor environments built by a mobile LiDAR and camera system. The data are composed of two sets with synthetic data from the open source CARLA simulator (700 million points) and real data acquired in the city of Paris (60 million points), hence the name Paris-CARLA-3D. One of the advantages of this dataset is to have simulated the same LiDAR and camera platform in the open source CARLA simulator as the one used to produce the real data. In addition, manual annotation of the classes using the semantic tags of CARLA was performed on the real data, allowing the testing of transfer methods from the synthetic to the real data. The objective of this dataset is to provide a challenging dataset to evaluate and improve methods on difficult vision tasks for the 3D mapping of outdoor environments: semantic segmentation, instance segmentation, and scene completion. For each task, we describe the evaluation protocol as well as the experiments carried out to establish a baseline.
APA, Harvard, Vancouver, ISO, and other styles
26

Lu, Nai Guang, Ming Li Dong, P. Sun, and J. W. Guo. "A Point Matching Method for Stereovision Measurement." Key Engineering Materials 381-382 (June 2008): 305–8. http://dx.doi.org/10.4028/www.scientific.net/kem.381-382.305.

Full text
Abstract:
Many vision tasks such as 3D measurement, scene reconstruction, object recognition, etc., rely on feature correspondence among images. This paper presents a point matching method for 3D surface measurement. The procedure of the method is as follows: (1) rectification for stereo image pairs; (2) computation of epipolar lines; (3) sequential matching in vertical direction; (4) sequential matching in horizontal direction. The fourth step is performed to deal with the ambiguity in dense areas where points have closer vertical coordinates. In the fourth step a threshold limit of vertical coordinate difference is designed to determine those points potential to cause ambiguity. This method was applied to the 3D surface measurement for an inflatable parabolic reflector with validity of point matching up to 100%. Experiment results show that this method is feasible in application of sparse point matching for continuous surface measurements.
APA, Harvard, Vancouver, ISO, and other styles
27

Li, Yongbo, Yuanyuan Ma, Wendi Cai, Zhongzhao Xie, and Tao Zhao. "Complementary Convolution Residual Networks for Semantic Segmentation in Street Scenes with Deep Gaussian CRF." Journal of Advanced Computational Intelligence and Intelligent Informatics 25, no. 1 (January 20, 2021): 3–12. http://dx.doi.org/10.20965/jaciii.2021.p0003.

Full text
Abstract:
To understand surrounding scenes accurately, the semantic segmentation of images is vital in autonomous driving tasks, such as navigation, and route planning. Currently, convolutional neural networks (CNN) are widely employed in semantic segmentation to perform precise prediction in the dense pixel level. A recent trend in network design is the stacking of small convolution kernels. In this work, small convolution kernels (3 × 3) are decomposed into complementary convolution kernels (1 × 3 + 3 × 1, 3 × 1 + 1 × 3), the complementary small convolution kernels perform better in the classification and location tasks of semantic segmentation. Subsequently, a complementary convolution residual network (CCRN) is proposed to improve the speed and accuracy of semantic segmentation. To further locate the edge of objects precisely, A coupled Gaussian conditional random field (G-CRF) is utilized for CCRN post-processing. Proposal approach achieved 81.8% and 73.1% mean Intersection-over-Union (mIoU) on PASCAL VOC-2012 test set and Cityscapes test set, respectively.
APA, Harvard, Vancouver, ISO, and other styles
28

Xia, Y., P. d’Angelo, J. Tian, and P. Reinartz. "DENSE MATCHING COMPARISON BETWEEN CLASSICAL AND DEEP LEARNING BASED ALGORITHMS FOR REMOTE SENSING DATA." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B2-2020 (August 12, 2020): 521–25. http://dx.doi.org/10.5194/isprs-archives-xliii-b2-2020-521-2020.

Full text
Abstract:
Abstract. Deep learning and convolutional neural networks (CNN) have obtained a great success in image processing, by means of its powerful feature extraction ability to learn specific tasks. Many deep learning based algorithms have been developed for dense image matching, which is a hot topic in the community of computer vision. These methods are tested for close-range or street-view stereo data, however, not well studied with remote sensing datasets, including aerial and satellite data. As more high-quality datasets are collected by recent airborne and spaceborne sensors, it is necessary to compare the performance of these algorithms to classical dense matching algorithms on remote sensing data. In this paper, Guided Aggregation Net (GA-Net), which belongs to the most competitive algorithms on KITTI 2015 benchmark (street-view dataset), is tested and compared with Semi-Global Matching (SGM) on satellite and airborne data. GA-Net is an end-to-end neural network, which starts from an stereo pair and directly outputs a disparity map indicating the scene’s depth information. It is based on a differentiable approximation of SGM embedded into a neural network, performing well for ill-posed regions, such as textureless areas, slanted surfaces, etc. The results demonstrate that GA-Net is capable of producing a smoother disparity map with less errors, particularly for across track data acquired at different dates.
APA, Harvard, Vancouver, ISO, and other styles
29

Liu, Qi, Shibiao Xu, Jun Xiao, and Ying Wang. "Sharp Feature-Preserving 3D Mesh Reconstruction from Point Clouds Based on Primitive Detection." Remote Sensing 15, no. 12 (June 16, 2023): 3155. http://dx.doi.org/10.3390/rs15123155.

Full text
Abstract:
High-fidelity mesh reconstruction from point clouds has long been a fundamental research topic in computer vision and computer graphics. Traditional methods require dense triangle meshes to achieve high fidelity, but excessively dense triangles may lead to unnecessary storage and computational burdens, while also struggling to capture clear, sharp, and continuous edges. This paper argues that the key to high-fidelity reconstruction lies in preserving sharp features. Therefore, we introduce a novel sharp-feature-preserving reconstruction framework based on primitive detection. It includes an improved deep-learning-based primitive detection module and two novel mesh splitting and selection modules that we propose. Our framework can accurately and reasonably segment primitive patches, fit meshes in each patch, and split overlapping meshes at the triangle level to ensure true sharpness while obtaining lightweight mesh models. Quantitative and visual experimental results demonstrate that our framework outperforms both the state-of-the-art learning-based primitive detection methods and traditional reconstruction methods. Moreover, our designed modules are plug-and-play, which not only apply to learning-based primitive detectors but also can be combined with other point cloud processing tasks such as edge extraction or random sample consensus (RANSAC) to achieve high-fidelity results.
APA, Harvard, Vancouver, ISO, and other styles
30

Bello, R. W., E. S. Ikeremo, F. N. Otobo, D. A. Olubummo, and O. C. Enuma. "Cattle Segmentation and Contour Detection Based on Solo for Precision Livestock Husbandry." Journal of Applied Sciences and Environmental Management 26, no. 10 (October 31, 2022): 1713–20. http://dx.doi.org/10.4314/jasem.v26i10.15.

Full text
Abstract:
Segmenting objects such as herd of cattle in natural and cluttered images is among the herculean dense prediction tasks of computer vision application to agriculture. To achieve the segmentation goal, we based the segmentation on the model of single objects by locations (SOLO) which is capable of exploiting the contextual cues and segmenting individual cattle by their locations and sizes. For its simple approach to instance segmentation with the use of instance categories, SOLO outperforms Mask R-CNN which uses detect-then-segment approach to predict a mask for each instance of cattle. The model is trained using synchronized stochastic gradient descent (SGD) over GPU to achieve a mAP of 0.94 making it 0.02 higher than the result recorded by the Mask R-CNN model. By using the focal loss, the proposed approach achieved 32.23 ADE on cattle contour detection making its performance better than the Mask R-CNN’s performance.
APA, Harvard, Vancouver, ISO, and other styles
31

Hu, Shiyong, Jia Yan, and Dexiang Deng. "Contextual Information Aided Generative Adversarial Network for Low-Light Image Enhancement." Electronics 11, no. 1 (December 23, 2021): 32. http://dx.doi.org/10.3390/electronics11010032.

Full text
Abstract:
Low-light image enhancement has been gradually becoming a hot research topic in recent years due to its wide usage as an important pre-processing step in computer vision tasks. Although numerous methods have achieved promising results, some of them still generate results with detail loss and local distortion. In this paper, we propose an improved generative adversarial network based on contextual information. Specifically, residual dense blocks are adopted in the generator to promote hierarchical feature interaction across multiple layers and enhance features at multiple depths in the network. Then, an attention module integrating multi-scale contextual information is introduced to refine and highlight discriminative features. A hybrid loss function containing perceptual and color component is utilized in the training phase to ensure the overall visual quality. Qualitative and quantitative experimental results on several benchmark datasets demonstrate that our model achieves relatively good results and has good generalization capacity compared to other state-of-the-art low-light enhancement algorithms.
APA, Harvard, Vancouver, ISO, and other styles
32

Wieczorek, Grzegorz, Sheikh Badar ud din Tahir, Israr Akhter, and Jaroslaw Kurek. "Vehicle Detection and Recognition Approach in Multi-Scale Traffic Monitoring System via Graph-Based Data Optimization." Sensors 23, no. 3 (February 3, 2023): 1731. http://dx.doi.org/10.3390/s23031731.

Full text
Abstract:
Over the past few years, significant investments in smart traffic monitoring systems have been made. The most important step in machine learning is detecting and recognizing objects relative to vehicles. Due to variations in vision and different lighting conditions, the recognition and tracking of vehicles under varying extreme conditions has become one of the most challenging tasks. To deal with this, our proposed system presents an adaptive method for robustly recognizing several existing automobiles in dense traffic settings. Additionally, this research presents a broad framework for effective on-road vehicle recognition and detection. Furthermore, the proposed system focuses on challenges typically noticed in analyzing traffic scenes captured by in-vehicle cameras, such as consistent extraction of features. First, we performed frame conversion, background subtraction, and object shape optimization as preprocessing steps. Next, two important features (energy and deep optical flow) were extracted. The incorporation of energy and dense optical flow features in distance-adaptive window areas and subsequent processing over the fused features resulted in a greater capacity for discrimination. Next, a graph-mining-based approach was applied to select optimal features. Finally, the artificial neural network was adopted for detection and classification. The experimental results show significant performance in two benchmark datasets, including the LISA and KITTI 7 databases. The LISA dataset achieved a mean recognition rate of 93.75% on the LDB1 and LDB2 databases, whereas KITTI attained 82.85% accuracy on separate training of ANN.
APA, Harvard, Vancouver, ISO, and other styles
33

Pérez, Javier, Mitch Bryson, Stefan B. Williams, and Pedro J. Sanz. "Recovering Depth from Still Images for Underwater Dehazing Using Deep Learning." Sensors 20, no. 16 (August 15, 2020): 4580. http://dx.doi.org/10.3390/s20164580.

Full text
Abstract:
Estimating depth from a single image is a challenging problem, but it is also interesting due to the large amount of applications, such as underwater image dehazing. In this paper, a new perspective is provided; by taking advantage of the underwater haze that may provide a strong cue to the depth of the scene, a neural network can be used to estimate it. Using this approach the depthmap can be used in a dehazing method to enhance the image and recover original colors, offering a better input to image recognition algorithms and, thus, improving the robot performance during vision-based tasks such as object detection and characterization of the seafloor. Experiments are conducted on different datasets that cover a wide variety of textures and conditions, while using a dense stereo depthmap as ground truth for training, validation and testing. The results show that the neural network outperforms other alternatives, such as the dark channel prior methods and it is able to accurately estimate depth from a single image after a training stage with depth information.
APA, Harvard, Vancouver, ISO, and other styles
34

Zhang, Dongdong, Chunping Wang, and Qiang Fu. "CAFC-Net: A Critical and Align Feature Constructing Network for Oriented Ship Detection in Aerial Images." Computational Intelligence and Neuroscience 2022 (February 24, 2022): 1–11. http://dx.doi.org/10.1155/2022/3391391.

Full text
Abstract:
Ship detection is one of the fundamental tasks in computer vision. In recent years, the methods based on convolutional neural networks have made great progress. However, improvement of ship detection in aerial images is limited by large-scale variation, aspect ratio, and dense distribution. In this paper, a Critical and Align Feature Constructing Network (CAFC-Net) which is an end-to-end single-stage rotation detector is proposed to improve ship detection accuracy. The framework is formed by three modules: a Biased Attention Module (BAM), a Feature Alignment Module (FAM), and a Distinctive Detection Module (DDM). Specifically, the BAM extracts biased critical features for classification and regression. With the extracted biased regression features, the FAM generates high-quality anchor boxes. Through a novel Alignment Convolution, convolutional features can be aligned according to anchor boxes. The DDM produces orientation-sensitive feature and reconstructs orientation-invariant features to alleviate inconsistency between classification and localization accuracy. Extensive experiments on two remote sensing datasets HRS2016 and self-built ship datasets show the state-of-the-art performance of our detector.
APA, Harvard, Vancouver, ISO, and other styles
35

Bhanushali, Darshan, Robert Relyea, Karan Manghi, Abhishek Vashist, Clark Hochgraf, Amlan Ganguly, Andres Kwasinski, Michael E. Kuhl, and Raymond Ptucha. "LiDAR-Camera Fusion for 3D Object Detection." Electronic Imaging 2020, no. 16 (January 26, 2020): 257–1. http://dx.doi.org/10.2352/issn.2470-1173.2020.16.avm-255.

Full text
Abstract:
The performance of autonomous agents in both commercial and consumer applications increases along with their situational awareness. Tasks such as obstacle avoidance, agent to agent interaction, and path planning are directly dependent upon their ability to convert sensor readings into scene understanding. Central to this is the ability to detect and recognize objects. Many object detection methodologies operate on a single modality such as vision or LiDAR. Camera-based object detection models benefit from an abundance of feature-rich information for classifying different types of objects. LiDAR-based object detection models use sparse point clouds, where each point contains accurate 3D position of object surfaces. Camera-based methods lack accurate object to lens distance measurements, while LiDAR-based methods lack dense feature-rich details. By utilizing information from both camera and LiDAR sensors, advanced object detection and identification is possible. In this work, we introduce a deep learning framework for fusing these modalities and produce a robust real-time 3D bounding box object detection network. We demonstrate qualitative and quantitative analysis of the proposed fusion model on the popular KITTI dataset.
APA, Harvard, Vancouver, ISO, and other styles
36

Naik, Prof Shruti P., Vishal Lohbande, Shreyas Hambir, Rohit Korade, and Rahul Hatkar. "Fire Detection with Image Processing." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (May 31, 2023): 2123–28. http://dx.doi.org/10.22214/ijraset.2023.52073.

Full text
Abstract:
Abstract: Convolutional neural networks (CNNs) have yielded state-of-theart performance in image classification and other computer vision tasks. Their application in fire detection systems will substantially improve detection accuracy, which will eventually minimize fire disasters and reduce the ecological and social ramifications. However, the major concern with CNNbased fire detection systems is their implementation in realworld surveillance networks, due to their high memory and computational requirements for inference. In this paper, we propose an original, energy-friendly, and computationally efficient CNN architecture, inspired by the SqueezeNet architecture for fire detection, localization, and semantic understanding of the scene of the fire. It uses smaller convolutional kernels and contains no dense, fully connected layers, which helps keep the computational requirements to a minimum. Despite its low computational needs, the experimental results demonstrate that our proposed solution achieves accuracies that are comparable to other, more complex models, mainly due to its increased depth. Moreover, this paper shows how a tradeoff can be reached between fire detection accuracy and efficiency, by considering the specific characteristics of the problem of interest and the variety of fire data
APA, Harvard, Vancouver, ISO, and other styles
37

Fennimore, Steven A., David C. Slaughter, Mark C. Siemens, Ramon G. Leon, and Mazin N. Saber. "Technology for Automation of Weed Control in Specialty Crops." Weed Technology 30, no. 4 (December 2016): 823–37. http://dx.doi.org/10.1614/wt-d-16-00070.1.

Full text
Abstract:
Specialty crops, like flowers, herbs, and vegetables, generally do not have an adequate spectrum of herbicide chemistries to control weeds and have been dependent on hand weeding to achieve commercially acceptable weed control. However, labor shortages have led to higher costs for hand weeding. There is a need to develop labor-saving technologies for weed control in specialty crops if production costs are to be contained. Machine vision technology, together with data processors, have been developed to enable commercial machines to recognize crop row patterns and control automated devices that perform tasks such as removal of intrarow weeds, as well as to thin crops to desired stands. The commercial machine vision systems depend upon a size difference between the crops and weeds and/or the regular crop row pattern to enable the system to recognize crop plants and control surrounding weeds. However, where weeds are large or the weed population is very dense, then current machine vision systems cannot effectively differentiate weeds from crops. Commercially available automated weeders and thinners today depend upon cultivators or directed sprayers to control weeds. Weed control actuators on future models may use abrasion with sand blown in an air stream or heating with flaming devices to kill weeds. Future weed control strategies will likely require adaptation of the crops to automated weed removal equipment. One example would be changes in crop row patterns and spacing to facilitate cultivation in two directions. Chemical company consolidation continues to reduce the number of companies searching for new herbicides; increasing costs to develop new herbicides and price competition from existing products suggest that the downward trend in new herbicide development will continue. In contrast, automated weed removal equipment continues to improve and become more effective.
APA, Harvard, Vancouver, ISO, and other styles
38

Naik, Prof Shruti P., Shreyas Hambir, Vishal Lohbande, Rohit Korade, and Rahul Hatkar. "Fire Detection with Image Processing." International Journal for Research in Applied Science and Engineering Technology 11, no. 2 (February 28, 2023): 321–24. http://dx.doi.org/10.22214/ijraset.2023.49014.

Full text
Abstract:
Abstract: Convolutional neural networks (CNNs) have yielded state-of-theart performance in image classification and other computer vision tasks. Their application in fire detection systems will substantially improve detection accuracy, which will eventually minimize fire disasters and reduce the ecological and social ramifications. However, the major concern with CNNbased fire detection systems is their implementation in real-world surveillance networks, due to their high memory and computational requirements for inference. In this paper, we propose an original, energy-friendly, and computationally efficient CNN architecture, inspired by the SqueezeNet architecture for fire detection, localization, and semantic understanding of the scene of the fire. It uses smaller convolutional kernels and contains no dense, fully connected layers, which helps keep the computational requirements to a minimum. Despite its low computational needs, the experimental results demonstrate that our proposed solution achieves accuracies that are comparable to other, more complex models, mainly due to its increased depth. Moreover, this paper shows how a tradeoff can be reached between fire detection accuracy and efficiency, by considering the specific characteristics of the problem of interest and the variety of fire data
APA, Harvard, Vancouver, ISO, and other styles
39

Song, Kechen, Yiming Zhang, Yanqi Bao, Ying Zhao, and Yunhui Yan. "Self-Enhanced Mixed Attention Network for Three-Modal Images Few-Shot Semantic Segmentation." Sensors 23, no. 14 (July 22, 2023): 6612. http://dx.doi.org/10.3390/s23146612.

Full text
Abstract:
As an important computer vision technique, image segmentation has been widely used in various tasks. However, in some extreme cases, the insufficient illumination would result in a great impact on the performance of the model. So more and more fully supervised methods use multi-modal images as their input. The dense annotated large datasets are difficult to obtain, but the few-shot methods still can have satisfactory results with few pixel-annotated samples. Therefore, we propose the Visible-Depth-Thermal (three-modal) images few-shot semantic segmentation method. It utilizes the homogeneous information of three-modal images and the complementary information of different modal images, which can improve the performance of few-shot segmentation tasks. We constructed a novel indoor dataset VDT-2048-5i for the three-modal images few-shot semantic segmentation task. We also proposed a Self-Enhanced Mixed Attention Network (SEMANet), which consists of a Self-Enhanced module (SE) and a Mixed Attention module (MA). The SE module amplifies the difference between the different kinds of features and strengthens the weak connection for the foreground features. The MA module fuses the three-modal feature to obtain a better feature. Compared with the most advanced methods before, our model improves mIoU by 3.8% and 3.3% in 1-shot and 5-shot settings, respectively, which achieves state-of-the-art performance. In the future, we will solve failure cases by obtaining more discriminative and robust feature representations, and explore achieving high performance with fewer parameters and computational costs.
APA, Harvard, Vancouver, ISO, and other styles
40

Ververas, Evangelos, and Stefanos Zafeiriou. "SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters." International Journal of Computer Vision 128, no. 10-11 (June 11, 2020): 2629–50. http://dx.doi.org/10.1007/s11263-020-01338-7.

Full text
Abstract:
Abstract Image-to-image (i2i) translation is the dense regression problem of learning how to transform an input image into an output using aligned image pairs. Remarkable progress has been made in i2i translation with the advent of deep convolutional neural networks and particular using the learning paradigm of generative adversarial networks (GANs). In the absence of paired images, i2i translation is tackled with one or multiple domain transformations (i.e., CycleGAN, StarGAN etc.). In this paper, we study the problem of image-to-image translation, under a set of continuous parameters that correspond to a model describing a physical process. In particular, we propose the SliderGAN which transforms an input face image into a new one according to the continuous values of a statistical blendshape model of facial motion. We show that it is possible to edit a facial image according to expression and speech blendshapes, using sliders that control the continuous values of the blendshape model. This provides much more flexibility in various tasks, including but not limited to face editing, expression transfer and face neutralisation, comparing to models based on discrete expressions or action units.
APA, Harvard, Vancouver, ISO, and other styles
41

Huang, S., F. Nex, Y. Lin, and M. Y. Yang. "SEMANTIC SEGMENTATION OF BUILDING IN AIRBORNE IMAGES." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W13 (June 4, 2019): 35–42. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w13-35-2019.

Full text
Abstract:
<p><strong>Abstract.</strong> Building is a key component to the reconstructing of LoD3 city modelling. Compared to terrestrial view, airborne datasets have more occlusions at street level but can cover larger area in the urban areas. With the popularity of the Deep Learning, many tasks in the field of computer vision can be solved in easier and efficiency way. In this paper, we propose a method to apply deep neural networks to building façade segmentation. In particular, the FC-DenseNet and the DeepLabV3+ algorithms are used to segment the building from airborne images and get semantic information such as, wall, roof, balcony and opening area. The patch-wise segmentation is used in the training and testing process in order to get information at pixel level. Different typologies of input have been considered: beside the conventional 2D information (i.e. RGB image), we combined 2D information with 3D features extracted from dense image matching point clouds to improve the performance of the segmentation. Results show that FC-DenseNet trained with 2D and 3D features achieves the best result, IoU up to 64.41%, it increases 5.13% compared to the result of the same model trained without 3D features.</p>
APA, Harvard, Vancouver, ISO, and other styles
42

Xiao, Feng, Haibin Wang, Yueqin Xu, and Ruiqing Zhang. "Fruit Detection and Recognition Based on Deep Learning for Automatic Harvesting: An Overview and Review." Agronomy 13, no. 6 (June 16, 2023): 1625. http://dx.doi.org/10.3390/agronomy13061625.

Full text
Abstract:
Continuing progress in machine learning (ML) has led to significant advancements in agricultural tasks. Due to its strong ability to extract high-dimensional features from fruit images, deep learning (DL) is widely used in fruit detection and automatic harvesting. Convolutional neural networks (CNN) in particular have demonstrated the ability to attain accuracy and speed levels comparable to those of humans in some fruit detection and automatic harvesting fields. This paper presents a comprehensive overview and review of fruit detection and recognition based on DL for automatic harvesting from 2018 up to now. We focus on the current challenges affecting fruit detection performance for automatic harvesting: the scarcity of high-quality fruit datasets, fruit detection of small targets, fruit detection in occluded and dense scenarios, fruit detection of multiple scales and multiple species, and lightweight fruit detection models. In response to these challenges, we propose feasible solutions and prospective future development trends. Future research should prioritize addressing these current challenges and improving the accuracy, speed, robustness, and generalization of fruit vision detection systems, while reducing the overall complexity and cost. This paper hopes to provide a reference for follow-up research in the field of fruit detection and recognition based on DL for automatic harvesting.
APA, Harvard, Vancouver, ISO, and other styles
43

Shchetinin, E. Yu. "ON AUTOMATIC DETECTION OF ANOMALIES IN ELECTROCARDIOGRAMMS WITH GENERATIVE MACHINE LEARNING." Vestnik komp'iuternykh i informatsionnykh tekhnologii, no. 216 (June 2022): 51–59. http://dx.doi.org/10.14489/vkit.2022.06.pp.051-059.

Full text
Abstract:
Anomaly detection is an important area of application of artificial intelligence in various areas of large data analysis, such as computer system security, fraud detection in bank transfers, reliability of computer vision systems and others. The detection of anomalies is also a key task of the analysis of biomedical information, since the violation of the stability of the recognition systems of dangerous diseases based on the analysis of biomedical signals and MRI, CT images, for example, can lead to erroneous diagnosis of patients. One of the main problems in machine learning and data analysis tasks is their correct labeling. In the task of detecting anomalies, its implementation is almost impossible due to both the unpredictability and the variety of their occurrence. Therefore, one of the actual approaches to solving the problem is the use of unsupervised machine learning methods, since in this case preliminary labeling of the data into abnormal and normal data is not required. There are popular methods for solving the problem of anomaly detection, which include the isolated forest algorithm, methods of nonparametric statistics, cluster analysis, and others. However, at the present stage of development of data analysis methods, machine learning and deep learning methods are becoming more and more effective. In this paper, a generative machine learning approach is proposed for anomalies detection. For this purpose, models of autoencoders have been developed, which are representatives of unsupervised deep learning methods. The autoencoder model consists of an encoder, a hidden layer of input data representation (latent representation), and a decoder. High-dimensional input data are transformed by the encoder into hidden representations of low-dimensional source data. The dimension of the hidden representations is smaller than the incoming source data. The task of the decoder is to recover the input data. The autoencoder accepts high-dimensional input data, compresses it to a representation in the space of a hidden layer. The decoder then takes the hidden representation of the data as input to restore the original input data. At the output, the autoencoder represents the recovered image or signal. Computational experiments were carried out to test the proposed method for detecting anomalies on a set of electrocardiograms of patients with various heart diseases. The data set under study was created and balanced in such a way that it represents 5000 electrocardiogram records, of which the proportion of normal signals is 58 %, the proportion of abnormal signals is 42 %. Each line corresponds to one complete ECG record of the patient. To detect abnormal ECG signals an autoencoder model based on deep neural networks is proposed. The autoencoder model is implemented in the Python programming language using the Keras framework [10]. The encoder consists of 5 fully connected layers Dense(128), Dense(64), Dense(32), Dense(16), Dense(8) with the activation function ReLU each. The decoder consists of five fully connected layers of Dense(8), Dense(16), Dense(32), Dense(64), Dense(128) with ReLU activation function and one fully connected layer of Dense(140) with sigmoid activation function ‘sigmoid'. The loss function during signal reconstruction is given by the RMS error between the original image and the image processed by the neural network. The Adam optimization method, the MAE loss function were used during training, the learning rate was 1E-04. A total of 500 epochs of model training were conducted, the parameter batch_size=32. To compare the results obtained in the work with other methods, such popular machine learning methods as SVM, logistic regression and LGBM were used. For the LGBM method, the anomaly detection accuracy was 81.4 %, for SVM – 78.47 %, which allowed us to assert the advantages of the proposed autoencoder model.
APA, Harvard, Vancouver, ISO, and other styles
44

Lemenkova, Polina, and Olivier Debeir. "Recognizing the Wadi Fluvial Structure and Stream Network in the Qena Bend of the Nile River, Egypt, on Landsat 8-9 OLI Images." Information 14, no. 4 (April 20, 2023): 249. http://dx.doi.org/10.3390/info14040249.

Full text
Abstract:
With methods for processing remote sensing data becoming widely available, the ability to quantify changes in spatial data and to evaluate the distribution of diverse landforms across target areas in datasets becomes increasingly important. One way to approach this problem is through satellite image processing. In this paper, we primarily focus on the methods of the unsupervised classification of the Landsat OLI/TIRS images covering the region of the Qena governorate in Upper Egypt. The Qena Bend of the Nile River presents a remarkable morphological feature in Upper Egypt, including a dense drainage network of wadi aquifer systems and plateaus largely dissected by numerous valleys of dry rivers. To identify the fluvial structure and stream network of the Wadi Qena region, this study addresses the problem of interpreting the relevant space-borne data using R, with an aim to visualize the land surface structures corresponding to various land cover types. To this effect, high-resolution 2D and 3D topographic and geologic maps were used for the analysis of the geomorphological setting of the Qena region. The information was extracted from the space-borne data for the comparative analysis of the distribution of wadi streams in the Qena Bend area over several years: 2013, 2015, 2016, 2019, 2022, and 2023. Six images were processed using computer vision methods made available by R libraries. The results of the k-means clustering of each scene retrieved from the multi-temporal images covering the Qena Bend of the Nile River were thus compared to visualize changes in landforms caused by the cumulative effects of geomorphological disasters and climate–environmental processes. The proposed method, tied together through the use of R scripts, runs effectively and performs favorably in computer vision tasks aimed at geospatial image processing and the analysis of remote sensing data.
APA, Harvard, Vancouver, ISO, and other styles
45

Dua, Sakshi, Sethuraman Sambath Kumar, Yasser Albagory, Rajakumar Ramalingam, Ankur Dumka, Rajesh Singh, Mamoon Rashid, Anita Gehlot, Sultan S. Alshamrani, and Ahmed Saeed AlGhamdi. "Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network." Applied Sciences 12, no. 12 (June 19, 2022): 6223. http://dx.doi.org/10.3390/app12126223.

Full text
Abstract:
Deep learning-based machine learning models have shown significant results in speech recognition and numerous vision-related tasks. The performance of the present speech-to-text model relies upon the hyperparameters used in this research work. In this research work, it is shown that convolutional neural networks (CNNs) can model raw and tonal speech signals. Their performance is on par with existing recognition systems. This study extends the role of the CNN-based approach to robust and uncommon speech signals (tonal) using its own designed database for target research. The main objective of this research work was to develop a speech-to-text recognition system to recognize the tonal speech signals of Gurbani hymns using a CNN. Further, the CNN model, with six layers of 2DConv, 2DMax Pooling, and 256 dense layer units (Google’s TensorFlow service) was also used in this work, as well as Praat for speech segmentation. Feature extraction was enforced using the MFCC feature extraction technique, which extracts standard speech features and features of background music as well. Our study reveals that the CNN-based method for identifying tonal speech sentences and adding instrumental knowledge performs better than the existing and conventional approaches. The experimental results demonstrate the significant performance of the present CNN architecture by providing an 89.15% accuracy rate and a 10.56% WER for continuous and extensive vocabulary sentences of speech signals with different tones.
APA, Harvard, Vancouver, ISO, and other styles
46

Yan, Xu, Jiantao Gao, Jie Li, Ruimao Zhang, Zhen Li, Rui Huang, and Shuguang Cui. "Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 4 (May 18, 2021): 3101–9. http://dx.doi.org/10.1609/aaai.v35i4.16419.

Full text
Abstract:
LiDAR point cloud analysis is a core task for 3D computer vision, especially for autonomous driving. However, due to the severe sparsity and noise interference in the single sweep LiDAR point cloud, the accurate semantic segmentation is non-trivial to achieve. In this paper, we propose a novel sparse LiDAR point cloud semantic segmentation framework assisted by learned contextual shape priors. In practice, an initial semantic segmentation (SS) of a single sweep point cloud can be achieved by any appealing network and then flows into the semantic scene completion (SSC) module as the input. By merging multiple frames in the LiDAR sequence as supervision, the optimized SSC module has learned the contextual shape priors from sequential LiDAR data, completing the sparse single sweep point cloud to the dense one. Thus, it inherently improves SS optimization through fully end-to-end training. Besides, a Point-Voxel Interaction (PVI) module is proposed to further enhance the knowledge fusion between SS and SSC tasks, i.e., promoting the interaction of incomplete local geometry of point cloud and complete voxel-wise global structure. Furthermore, the auxiliary SSC and PVI modules can be discarded during inference without extra burden for SS. Extensive experiments confirm that our JS3C-Net achieves superior performance on both SemanticKITTI and SemanticPOSS benchmarks, i.e., 4% and 3% improvement correspondingly.
APA, Harvard, Vancouver, ISO, and other styles
47

Maurer, M., M. Hofer, F. Fraundorfer, and H. Bischof. "AUTOMATED INSPECTION OF POWER LINE CORRIDORS TO MEASURE VEGETATION UNDERCUT USING UAV-BASED IMAGES." ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences IV-2/W3 (August 18, 2017): 33–40. http://dx.doi.org/10.5194/isprs-annals-iv-2-w3-33-2017.

Full text
Abstract:
Power line corridor inspection is a time consuming task that is performed mostly manually. As the development of UAVs made huge progress in recent years, and photogrammetric computer vision systems became well established, it is time to further automate inspection tasks. In this paper we present an automated processing pipeline to inspect vegetation undercuts of power line corridors. For this, the area of inspection is reconstructed, geo-referenced, semantically segmented and inter class distance measurements are calculated. The presented pipeline performs an automated selection of the proper 3D reconstruction method for on the one hand wiry (power line), and on the other hand solid objects (surrounding). The automated selection is realized by performing pixel-wise semantic segmentation of the input images using a Fully Convolutional Neural Network. Due to the geo-referenced semantic 3D reconstructions a documentation of areas where maintenance work has to be performed is inherently included in the distance measurements and can be extracted easily. We evaluate the influence of the semantic segmentation according to the 3D reconstruction and show that the automated semantic separation in wiry and dense objects of the 3D reconstruction routine improves the quality of the vegetation undercut inspection. We show the generalization of the semantic segmentation to datasets acquired using different acquisition routines and to varied seasons in time.
APA, Harvard, Vancouver, ISO, and other styles
48

Chen, Xiao, Mujiahui Yuan, Chenye Fan, Xingwu Chen, Yaan Li, and Haiyan Wang. "Research on an Underwater Object Detection Network Based on Dual-Branch Feature Extraction." Electronics 12, no. 16 (August 11, 2023): 3413. http://dx.doi.org/10.3390/electronics12163413.

Full text
Abstract:
Underwater object detection is challenging in computer vision research due to the complex underwater environment, poor image quality, and varying target scales, making it difficult for existing object detection networks to achieve high accuracy in underwater tasks. To address the issues of limited data and multi-scale targets in underwater detection, we propose a Dual-Branch Underwater Object Detection Network (DB-UODN) based on dual-branch feature extraction. In the feature extraction stage, we design a dual-branch structure by combining the You Only Look Once (YOLO) v7 backbone with the Enhanced Channel and Dilated Block (ECDB). It allows for the extraction and complementation of multi-scale features, which enable the model to learn both global and local information and enhance its perception of multi-scale features in underwater targets. Furthermore, we employ the DSPACSPC structure to replace the SPPCSPC structure in YOLOv7. The DSPACSPC structure utilizes atrous convolutions with different dilation rates to capture contextual information at various scales, compensating for potential information loss caused by pooling operations. Additionally, we utilize a dense connection structure to facilitate feature reuse and enhance the network’s representation and generalization capabilities. Experimental results demonstrate that the proposed DB-UODN outperforms the most commonly used object detection networks in underwater scenarios. On the URPC2020 dataset, the network achieves an average detection accuracy of 87.36%.
APA, Harvard, Vancouver, ISO, and other styles
49

Tsourounis, Dimitrios, Dimitris Kastaniotis, Christos Theoharatos, Andreas Kazantzidis, and George Economou. "SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification." Journal of Imaging 8, no. 10 (September 21, 2022): 256. http://dx.doi.org/10.3390/jimaging8100256.

Full text
Abstract:
Despite the success of hand-crafted features in computer visioning for many years, nowadays, this has been replaced by end-to-end learnable features that are extracted from deep convolutional neural networks (CNNs). Whilst CNNs can learn robust features directly from image pixels, they require large amounts of samples and extreme augmentations. On the contrary, hand-crafted features, like SIFT, exhibit several interesting properties as they can provide local rotation invariance. In this work, a novel scheme combining the strengths of SIFT descriptors with CNNs, namely SIFT-CNN, is presented. Given a single-channel image, one SIFT descriptor is computed for every pixel, and thus, every pixel is represented as an M-dimensional histogram, which ultimately results in an M-channel image. Thus, the SIFT image is generated from the SIFT descriptors for all the pixels in a single-channel image, while at the same time, the original spatial size is preserved. Next, a CNN is trained to utilize these M-channel images as inputs by operating directly on the multiscale SIFT images with the regular convolution processes. Since these images incorporate spatial relations between the histograms of the SIFT descriptors, the CNN is guided to learn features from local gradient information of images that otherwise can be neglected. In this manner, the SIFT-CNN implicitly acquires a local rotation invariance property, which is desired for problems where local areas within the image can be rotated without affecting the overall classification result of the respective image. Some of these problems refer to indirect immunofluorescence (IIF) cell image classification, ground-based all-sky image-cloud classification and human lip-reading classification. The results for the popular datasets related to the three different aforementioned problems indicate that the proposed SIFT-CNN can improve the performance and surpasses the corresponding CNNs trained directly on pixel values in various challenging tasks due to its robustness in local rotations. Our findings highlight the importance of the input image representation in the overall efficiency of a data-driven system.
APA, Harvard, Vancouver, ISO, and other styles
50

Jiang, S., W. Yao, and M. Heurich. "DEAD WOOD DETECTION BASED ON SEMANTIC SEGMENTATION OF VHR AERIAL CIR IMAGERY USING OPTIMIZED FCN-DENSENET." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLII-2/W16 (September 17, 2019): 127–33. http://dx.doi.org/10.5194/isprs-archives-xlii-2-w16-127-2019.

Full text
Abstract:
<p><strong>Abstract.</strong> The assessment of the forests’ health conditions is an important task for biodiversity, forest management, global environment monitoring, and carbon dynamics. Several research works were proposed to evaluate the state condition of a forest based on remote sensing technology. Concerning existing technologies, employing traditional machine learning approaches to detect the dead wood in aerial colour-infrared (CIR) imagery is one of the major trends due to its spectral capability to explicitly capturing vegetation health conditions. However, the complicated scene with background noise restricted the accuracy of existing approaches as those detectors normally utilized hand-crafted features. Currently, deep neural networks are widely used in computer vision tasks and prove that features learnt by the model itself perform much better than the hand-crafted features. The semantic image segmentation is a pixel-level classification task, which is best suitable to dead wood detection in very high resolution (VHR) mode because it enables the model to identify and classify very dense and detailed components on the tree objects. In this paper, an optimized FCN-DenseNet is proposed to detect dead wood (i.e. standing dead tree and fallen tree) in a complicated temperate forest environment. Since the appearance of dead trees generally occupies greatly different scales and sizes; several pooling procedures are employed to extract multi-scale features and dense connection is employed to enhance the inline connection among the scales. Our proposed deep neural network is evaluated over VHR CIR imagery (GSD-10cm) captured in a natural temperate forest in Bavarian national forest park, Germany, which has undergone on-site bark beetle attack. The results show that the boundary of dead trees can be accurately segmented, and the classification are performed with a high accuracy, even though only one labelled image with moderate size is used for training the deep neural network.</p>
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography