Journal articles on the topic 'Dilated convolution'

To see the other types of publications on this topic, follow the link: Dilated convolution.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Dilated convolution.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Wang, Wei, Yiyang Hu, Ting Zou, Hongmei Liu, Jin Wang, and Xin Wang. "A New Image Classification Approach via Improved MobileNet Models with Local Receptive Field Expansion in Shallow Layers." Computational Intelligence and Neuroscience 2020 (August 1, 2020): 1–10. http://dx.doi.org/10.1155/2020/8817849.

Full text
Abstract:
Because deep neural networks (DNNs) are both memory-intensive and computation-intensive, they are difficult to apply to embedded systems with limited hardware resources. Therefore, DNN models need to be compressed and accelerated. By applying depthwise separable convolutions, MobileNet can decrease the number of parameters and computational complexity with less loss of classification precision. Based on MobileNet, 3 improved MobileNet models with local receptive field expansion in shallow layers, also called Dilated-MobileNet (Dilated Convolution MobileNet) models, are proposed, in which dilated convolutions are introduced into a specific convolutional layer of the MobileNet model. Without increasing the number of parameters, dilated convolutions are used to increase the receptive field of the convolution filters to obtain better classification accuracy. The experiments were performed on the Caltech-101, Caltech-256, and Tubingen animals with attribute datasets, respectively. The results show that Dilated-MobileNets can obtain up to 2% higher classification accuracy than MobileNet.
APA, Harvard, Vancouver, ISO, and other styles
2

Peng, Wenli, Shenglai Zhen, Xin Chen, Qianjing Xiong, and Benli Yu. "Study on convolutional recurrent neural networks for speech enhancement in fiber-optic microphones." Journal of Physics: Conference Series 2246, no. 1 (April 1, 2022): 012084. http://dx.doi.org/10.1088/1742-6596/2246/1/012084.

Full text
Abstract:
Abstract In this paper, several improved convolutional recurrent networks (CRN) are proposed, which can enhance the speech with non-additive distortion captured by fiber-optic microphones. Our preliminary study shows that the original CRN structure based on amplitude spectrum estimation is seriously distorted due to the loss of phase information. Therefore, we transform the network to run in time domain and gain 0.42 improvement on PESQ and 0.03 improvement on STOI. In addition, we integrate dilated convolution into CRN architecture, and adopt three different types of bottleneck modules, namely long short-term memory (LSTM), gated recurrent units (GRU) and dilated convolutions. The experimental results show that the model with dilated convolution in the encoder-decoder and the model with dilated convolution at bottleneck layer have the highest PESQ and STOI scores, respectively.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhao, Feng, Junjie Zhang, Zhe Meng, and Hanqiang Liu. "Densely Connected Pyramidal Dilated Convolutional Network for Hyperspectral Image Classification." Remote Sensing 13, no. 17 (August 26, 2021): 3396. http://dx.doi.org/10.3390/rs13173396.

Full text
Abstract:
Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is small, the dilated convolution is introduced into the field of HSI classification. However, the dilated convolution usually generates blind spots in the receptive field, resulting in discontinuous spatial information obtained. In order to solve the above problem, a densely connected pyramidal dilated convolutional network (PDCNet) is proposed in this paper. Firstly, a pyramidal dilated convolutional (PDC) layer integrates different numbers of sub-dilated convolutional layers is proposed, where the dilated factor of the sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields. Secondly, the number of sub-dilated convolutional layers increases in a pyramidal pattern with the depth of the network, thereby capturing more comprehensive hyperspectral information in the receptive field. Furthermore, a feature fusion mechanism combining pixel-by-pixel addition and channel stacking is adopted to extract more abstract spectral–spatial features. Finally, in order to reuse the features of the previous layers more effectively, dense connections are applied in densely pyramidal dilated convolutional (DPDC) blocks. Experiments on three well-known HSI datasets indicate that PDCNet proposed in this paper has good classification performance compared with other popular models.
APA, Harvard, Vancouver, ISO, and other styles
4

Chim, Seyha, Jin-Gu Lee, and Ho-Hyun Park. "Dilated Skip Convolution for Facial Landmark Detection." Sensors 19, no. 24 (December 4, 2019): 5350. http://dx.doi.org/10.3390/s19245350.

Full text
Abstract:
Facial landmark detection has gained enormous interest for face-related applications due to its success in facial analysis tasks such as facial recognition, cartoon generation, face tracking and facial expression analysis. Many studies have been proposed and implemented to deal with the challenging problems of localizing facial landmarks from given images, including large appearance variations and partial occlusion. Studies have differed in the way they use the facial appearances and shape information of input images. In our work, we consider facial information within both global and local contexts. We aim to obtain local pixel-level accuracy for local-context information in the first stage and integrate this with knowledge of spatial relationships between each key point in a whole image for global-context information in the second stage. Thus, the pipeline of our architecture consists of two main components: (1) a deep network for local-context subnet that generates detection heatmaps via fully convolutional DenseNets with additional kernel convolution filters and (2) a dilated skip convolution subnet—a combination of dilated convolutions and skip-connections networks—that are in charge of robustly refining the local appearance heatmaps. Through this proposed architecture, we demonstrate that our approach achieves state-of-the-art performance on challenging datasets—including LFPW, HELEN, 300W and AFLW2000-3D—by leveraging fully convolutional DenseNets, skip-connections and dilated convolution architecture without further post-processing.
APA, Harvard, Vancouver, ISO, and other styles
5

Song, Zhendong, Yupeng Ma, Fang Tan, and Xiaoyi Feng. "Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement." Applied Sciences 12, no. 7 (March 29, 2022): 3461. http://dx.doi.org/10.3390/app12073461.

Full text
Abstract:
In this paper, we propose a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. The proposed network is an encoder-decoder structure using a series of hybrid dilated modules (HDM). The encoder creates low-dimensional features of a noisy input frame. In the HDM, the dilated convolution is used to expand the receptive field of the network model. In contrast, the standard convolution is used to make up for the under-utilized local information of the dilated convolution. The decoder is used to reconstruct enhanced frames. The recursive recurrent convolutional network uses GRU to solve the problem of multiple training parameters and complex structures. State-of-the-art results are achieved on two commonly used speech datasets.
APA, Harvard, Vancouver, ISO, and other styles
6

Tang, Jingfan, Meijia Zhou, Pengfei Li, Min Zhang, and Ming Jiang. "Crowd Counting Based on Multiresolution Density Map and Parallel Dilated Convolution." Scientific Programming 2021 (January 20, 2021): 1–10. http://dx.doi.org/10.1155/2021/8831458.

Full text
Abstract:
The current crowd counting tasks rely on a fully convolutional network to generate a density map that can achieve good performance. However, due to the crowd occlusion and perspective distortion in the image, the directly generated density map usually neglects the scale information and spatial contact information. To solve it, we proposed MDPDNet (Multiresolution Density maps and Parallel Dilated convolutions’ Network) to reduce the influence of occlusion and distortion on crowd estimation. This network is composed of two modules: (1) the parallel dilated convolution module (PDM) that combines three dilated convolutions in parallel to obtain the deep features on the larger receptive field with fewer parameters while reducing the loss of multiscale information; (2) the multiresolution density map module (MDM) that contains three-branch networks for extracting spatial contact information on three different low-resolution density maps as the feature input of the final crowd density map. Experiments show that MDPDNet achieved excellent results on three mainstream datasets (ShanghaiTech, UCF_CC_50, and UCF-QNRF).
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Jianming, Chaoquan Lu, Jin Wang, Lei Wang, and Xiao-Guang Yue. "Concrete Cracks Detection Based on FCN with Dilated Convolution." Applied Sciences 9, no. 13 (July 1, 2019): 2686. http://dx.doi.org/10.3390/app9132686.

Full text
Abstract:
In civil engineering, the stability of concrete is of great significance to safety of people’s life and property, so it is necessary to detect concrete damage effectively. In this paper, we treat crack detection on concrete surface as a semantic segmentation task that distinguishes background from crack at the pixel level. Inspired by Fully Convolutional Networks (FCN), we propose a full convolution network based on dilated convolution for concrete crack detection, which consists of an encoder and a decoder. Specifically, we first used the residual network to extract the feature maps of the input image, designed the dilated convolutions with different dilation rates to extract the feature maps of different receptive fields, and fused the extracted features from multiple branches. Then, we exploited the stacked deconvolution to do up-sampling operator in the fused feature maps. Finally, we used the SoftMax function to classify the feature maps at the pixel level. In order to verify the validity of the model, we introduced the commonly used evaluation indicators of semantic segmentation: Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), Mean Intersection over Union (MIoU), and Frequency Weighted Intersection over Union (FWIoU). The experimental results show that the proposed model converges faster and has better generalization performance on the test set by introducing dilated convolutions with different dilation rates and a multi-branch fusion strategy. Our model has a PA of 96.84%, MPA of 92.55%, MIoU of 86.05% and FWIoU of 94.22% on the test set, which is superior to other models.
APA, Harvard, Vancouver, ISO, and other styles
8

Cao, Ruifen, Xi Pei, Ning Ge, and Chunhou Zheng. "Clinical Target Volume Auto-Segmentation of Esophageal Cancer for Radiotherapy After Radical Surgery Based on Deep Learning." Technology in Cancer Research & Treatment 20 (January 1, 2021): 153303382110342. http://dx.doi.org/10.1177/15330338211034284.

Full text
Abstract:
Radiotherapy plays an important role in controlling the local recurrence of esophageal cancer after radical surgery. Segmentation of the clinical target volume is a key step in radiotherapy treatment planning, but it is time-consuming and operator-dependent. This paper introduces a deep dilated convolutional U-network to achieve fast and accurate clinical target volume auto-segmentation of esophageal cancer after radical surgery. The deep dilated convolutional U-network, which integrates the advantages of dilated convolution and the U-network, is an end-to-end architecture that enables rapid training and testing. A dilated convolution module for extracting multiscale context features containing the original information on fine texture and boundaries is integrated into the U-network architecture to avoid information loss due to down-sampling and improve the segmentation accuracy. In addition, batch normalization is added to the deep dilated convolutional U-network for fast and stable convergence. In the present study, the training and validation loss tended to be stable after 40 training epochs. This deep dilated convolutional U-network model was able to segment the clinical target volume with an overall mean Dice similarity coefficient of 86.7% and a respective 95% Hausdorff distance of 37.4 mm, indicating reasonable volume overlap of the auto-segmented and manual contours. The mean Cohen kappa coefficient was 0.863, indicating that the deep dilated convolutional U-network was robust. Comparisons with the U-network and attention U-network showed that the overall performance of the deep dilated convolutional U-network was best for the Dice similarity coefficient, 95% Hausdorff distance, and Cohen kappa coefficient. The test time for segmentation of the clinical target volume was approximately 25 seconds per patient. This deep dilated convolutional U-network could be applied in the clinical setting to save time in delineation and improve the consistency of contouring.
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Ran, Ruyu Shi, Xiong Hu, and Changqing Shen. "Remaining Useful Life Prediction of Rolling Bearings Based on Multiscale Convolutional Neural Network with Integrated Dilated Convolution Blocks." Shock and Vibration 2021 (January 25, 2021): 1–11. http://dx.doi.org/10.1155/2021/6616861.

Full text
Abstract:
Remaining useful life (RUL) prediction is necessary for guaranteeing machinery’s safe operation. Among deep learning architectures, convolutional neural network (CNN) has shown achievements in RUL prediction because of its strong ability in representation learning. Features from different receptive fields extracted by different sizes of convolution kernels can provide complete information for prognosis. The single size convolution kernel in traditional CNN is difficult to learn comprehensive information from complex signals. Besides, the ability to learn local and global features synchronously is limited to conventional CNN. Thus, a multiscale convolutional neural network (MS-CNN) is introduced to overcome these aforementioned problems. Convolution filters with different dilation rates are integrated to form a dilated convolution block, which can learn features in different receptive fields. Then, several stacked integrated dilated convolution blocks in different depths are concatenated to extract local and global features. The effectiveness of the proposed method is verified by a bearing dataset prepared from the PRONOSTIA platform. The results turn out that the proposed MS-CNN has higher prediction accuracy than many other deep learning-based RUL methods.
APA, Harvard, Vancouver, ISO, and other styles
10

Madych, W. R. "Limits of Dilated Convolution Transforms." SIAM Journal on Mathematical Analysis 16, no. 3 (May 1985): 551–58. http://dx.doi.org/10.1137/0516041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Zhang, Guokai, Xiao Liu, Dandan Zhu, Pengcheng He, Lipeng Liang, Ye Luo, and Jianwei Lu. "3D Spatial Pyramid Dilated Network for Pulmonary Nodule Classification." Symmetry 10, no. 9 (September 1, 2018): 376. http://dx.doi.org/10.3390/sym10090376.

Full text
Abstract:
Lung cancer mortality is currently the highest among all kinds of fatal cancers. With the help of computer-aided detection systems, a timely detection of malignant pulmonary nodule at early stage could improve the patient survival rate efficiently. However, the sizes of the pulmonary nodules are usually various, and it is more difficult to detect small diameter nodules. The traditional convolution neural network uses pooling layers to reduce the resolution progressively, but it hampers the network’s ability to capture the tiny but vital features of the pulmonary nodules. To tackle this problem, we propose a novel 3D spatial pyramid dilated convolution network to classify the malignancy of the pulmonary nodules. Instead of using the pooling layers, we use 3D dilated convolution to learn the detailed characteristic information of the pulmonary nodules. Furthermore, we show that the fusion of multiple receptive fields from different dilated convolutions could further improve the classification performance of the model. Extensive experimental results demonstrate that our model achieves a better result with an accuracy of 88 . 6 % , which outperforms other state-of-the- art methods.
APA, Harvard, Vancouver, ISO, and other styles
12

Heo, Woon-Haeng, Hyemi Kim, and Oh-Wook Kwon. "Source Separation Using Dilated Time-Frequency DenseNet for Music Identification in Broadcast Contents." Applied Sciences 10, no. 5 (March 3, 2020): 1727. http://dx.doi.org/10.3390/app10051727.

Full text
Abstract:
We propose a source separation architecture using dilated time-frequency DenseNet for background music identification of broadcast content. We apply source separation techniques to the mixed signals of music and speech. For the source separation purpose, we propose a new architecture to add a time-frequency dilated convolution to the conventional DenseNet in order to effectively increase the receptive field in the source separation scheme. In addition, we apply different convolutions to each frequency band of the spectrogram in order to reflect the different frequency characteristics of the low- and high-frequency bands. To verify the performance of the proposed architecture, we perform singing-voice separation and music-identification experiments. As a result, we confirm that the proposed architecture produces the best performance in both experiments because it uses the dilated convolution to reflect wide contextual information.
APA, Harvard, Vancouver, ISO, and other styles
13

Contreras, Jonatan, Martine Ceberio, and Vladik Kreinovich. "Why Dilated Convolutional Neural Networks: A Proof of Their Optimality." Entropy 23, no. 6 (June 18, 2021): 767. http://dx.doi.org/10.3390/e23060767.

Full text
Abstract:
One of the most effective image processing techniques is the use of convolutional neural networks that use convolutional layers. In each such layer, the value of the layer’s output signal at each point is a combination of the layer’s input signals corresponding to several neighboring points. To improve the accuracy, researchers have developed a version of this technique, in which only data from some of the neighboring points is processed. It turns out that the most efficient case—called dilated convolution—is when we select the neighboring points whose differences in both coordinates are divisible by some constant ℓ. In this paper, we explain this empirical efficiency by proving that for all reasonable optimality criteria, dilated convolution is indeed better than possible alternatives.
APA, Harvard, Vancouver, ISO, and other styles
14

Viriyasaranon, Thanaporn, Seung-Hoon Chae, and Jang-Hwan Choi. "MFA-net: Object detection for complex X-ray cargo and baggage security imagery." PLOS ONE 17, no. 9 (September 1, 2022): e0272961. http://dx.doi.org/10.1371/journal.pone.0272961.

Full text
Abstract:
Deep convolutional networks have been developed to detect prohibited items for automated inspection of X-ray screening systems in the transport security system. To our knowledge, the existing frameworks were developed to recognize threats using only baggage security X-ray scans. Therefore, the detection accuracy in other domains of security X-ray scans, such as cargo X-ray scans, cannot be ensured. We propose an object detection method for efficiently detecting contraband items in both cargo and baggage for X-ray security scans. The proposed network, MFA-net, consists of three plug-and-play modules, including the multiscale dilated convolutional module, fusion feature pyramid network, and auxiliary point detection head. First, the multiscale dilated convolutional module converts the standard convolution of the detector backbone to a conditional convolution by aggregating the features from multiple dilated convolutions using dynamic feature selection to overcome the object-scale variant issue. Second, the fusion feature pyramid network combines the proposed attention and fusion modules to enhance multiscale object recognition and alleviate the object and occlusion problem. Third, the auxiliary point detection head adopts an auxiliary head to predict the new keypoints of the bounding box to emphasize the localizability without requiring further ground-truth information. We tested the performance of the MFA-net on two large-scale X-ray security image datasets from different domains: a Security Inspection X-ray (SIXray) dataset in the baggage domain and our dataset, named CargoX, in the cargo domain. Moreover, MFA-net outperformed state-of-the-art object detectors in both domains. Thus, adopting the proposed modules can further increase the detection capability of the current object detectors on X-ray security images.
APA, Harvard, Vancouver, ISO, and other styles
15

Hu, Yicheng, Shufang Tian, and Jia Ge. "Hybrid Convolutional Network Combining Multiscale 3D Depthwise Separable Convolution and CBAM Residual Dilated Convolution for Hyperspectral Image Classification." Remote Sensing 15, no. 19 (October 1, 2023): 4796. http://dx.doi.org/10.3390/rs15194796.

Full text
Abstract:
In recent years, convolutional neural networks (CNNs) have been increasingly leveraged for the classification of hyperspectral imagery, displaying notable advancements. To address the issues of insufficient spectral and spatial information extraction and high computational complexity in hyperspectral image classification, we introduce the MDRDNet, an integrated neural network model. This novel architecture is comprised of two main components: a Multiscale 3D Depthwise Separable Convolutional Network and a CBAM-augmented Residual Dilated Convolutional Network. The first component employs depthwise separable convolutions in a 3D setting to efficiently capture spatial–spectral characteristics, thus substantially reducing the computational burden associated with 3D convolutions. Meanwhile, the second component enhances the network by integrating the Convolutional Block Attention Module (CBAM) with dilated convolutions via residual connections, effectively counteracting the issue of model degradation. We have empirically evaluated the MDRDNet’s performance by running comprehensive experiments on three publicly available datasets: Indian Pines, Pavia University, and Salinas. Our findings indicate that the overall accuracy of the MDRDNet on the three datasets reached 98.83%, 99.81%, and 99.99%, respectively, which is higher than the accuracy of existing models. Therefore, the MDRDNet proposed in this study can fully extract spatial–spectral joint information, providing a new idea for solving the problem of large model calculations in 3D convolutions.
APA, Harvard, Vancouver, ISO, and other styles
16

Wu, Junjie, Wen Liu, and Yoshihisa Maruyama. "Automated Road-Marking Segmentation via a Multiscale Attention-Based Dilated Convolutional Neural Network Using the Road Marking Dataset." Remote Sensing 14, no. 18 (September 9, 2022): 4508. http://dx.doi.org/10.3390/rs14184508.

Full text
Abstract:
Road markings, including road lanes and symbolic road markings, can convey abundant guidance information to autonomous driving cars. However, recent works have paid less attention to the recognition of symbolic road markings compared with road lanes. In this study, a road-marking-segmentation dataset named the RMD (Road Marking Dataset) is introduced to compensate for the lack of datasets and the limitations of the existing datasets. Furthermore, we propose a novel multiscale attention-based dilated convolutional neural network (MSA-DCNN) to tackle the proposed RMD. The proposed method employs multiscale attention to merge the weighting outputs of adjacent multiscale inputs, and dilated convolution to capture spatial-context information. The performance analysis shows that the proposed MSA-DCNN yields the best results by combining multiscale attention and dilated convolution. Additionally, the proposed method gains the mIoU of 74.88%, which is a significant improvement over the existing techniques.
APA, Harvard, Vancouver, ISO, and other styles
17

Ku, Tao, Qirui Yang, and Hao Zhang. "Multilevel feature fusion dilated convolutional network for semantic segmentation." International Journal of Advanced Robotic Systems 18, no. 2 (March 1, 2021): 172988142110076. http://dx.doi.org/10.1177/17298814211007665.

Full text
Abstract:
Recently, convolutional neural network (CNN) has led to significant improvement in the field of computer vision, especially the improvement of the accuracy and speed of semantic segmentation tasks, which greatly improved robot scene perception. In this article, we propose a multilevel feature fusion dilated convolution network (Refine-DeepLab). By improving the space pyramid pooling structure, we propose a multiscale hybrid dilated convolution module, which captures the rich context information and effectively alleviates the contradiction between the receptive field size and the dilated convolution operation. At the same time, the high-level semantic information and low-level semantic information obtained through multi-level and multi-scale feature extraction can effectively improve the capture of global information and improve the performance of large-scale target segmentation. The encoder–decoder gradually recovers spatial information while capturing high-level semantic information, resulting in sharper object boundaries. Extensive experiments verify the effectiveness of our proposed Refine-DeepLab model, evaluate our approaches thoroughly on the PASCAL VOC 2012 data set without MS COCO data set pretraining, and achieve a state-of-art result of 81.73% mean interaction-over-union in the validate set.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhuang, Zilong, Huichun Lv, Jie Xu, Zizhao Huang, and Wei Qin. "A Deep Learning Method for Bearing Fault Diagnosis through Stacked Residual Dilated Convolutions." Applied Sciences 9, no. 9 (May 1, 2019): 1823. http://dx.doi.org/10.3390/app9091823.

Full text
Abstract:
Real-time monitoring and fault diagnosis of bearings are of great significance to improve production safety, prevent major accidents, and reduce production costs. However, there are three primary concerns in the current research, namely real-time performance, effectiveness, and generalization performance. In this paper, a deep learning method based on stacked residual dilated convolutional neural network (SRDCNN) is proposed for real-time bearing fault diagnosis, which is subtly combined by the dilated convolution, the input gate structure of long short-term memory network (LSTM) and the residual network. In the SRDCNN model, the dilated convolution is used to exponentially increase the receptive field of convolution kernel and extract features from the sample with more points, alleviating the influence of randomness. The input gate structure of LSTM could effectively remove noise and control the entry of information contained in the input sample. Meanwhile, the residual network is introduced to overcome the problem of vanishing gradients caused by the deeper structure of the neural network, hence improving the overall classification accuracy. The experimental results indicate that compared with three excellent models, the proposed SRDCNN model has higher denoising ability and better workload adaptability.
APA, Harvard, Vancouver, ISO, and other styles
19

Jin, Dawei, Ruizhi Kang, Hongjun Zhang, Wening Hao, and Gang Chen. "Improving Abstractive Summarization via Dilated Convolution." Journal of Physics: Conference Series 1616 (August 2020): 012078. http://dx.doi.org/10.1088/1742-6596/1616/1/012078.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Fang, Yuchun, Yifan Li, Xiaokang Tu, Taifeng Tan, and Xin Wang. "Face completion with Hybrid Dilated Convolution." Signal Processing: Image Communication 80 (February 2020): 115664. http://dx.doi.org/10.1016/j.image.2019.115664.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zhao, Haixia, You Zhou, Tingting Bai, and Yuanzhong Chen. "A U-Net Based Multi-Scale Deformable Convolution Network for Seismic Random Noise Suppression." Remote Sensing 15, no. 18 (September 17, 2023): 4569. http://dx.doi.org/10.3390/rs15184569.

Full text
Abstract:
Seismic data processing plays a key role in the field of geophysics. The collected seismic data are inevitably contaminated by various types of noise, which makes the effective signals difficult to be accurately discriminated. A fundamental issue is how to improve the signal-to-noise ratio of seismic data. Due to the complex characteristics of noise and signals, it is a challenge for the denoising model to suppress noise and recover weak signals. To suppress random noise in seismic data, we propose a multi-scale deformable convolution neural network denoising model based on U-Net, named MSDC-Unet. The MSDC-Unet mainly contains modules of deformable convolution and dilated convolution. The deformable convolution can change the shape of the convolution kernel to adjust the shape of seismic signals to fit different features, while the dilated convolution with different dilation rates is used to extract feature information at different scales. Furthermore, we combine Charbonnier loss and structure similarity index measure (SSIM) to better characterize geological structures of seismic data. Several examples of synthetic and field seismic data demonstrate that the proposed method is effective in the comprehensive results in terms of quantitative metrics and visual effect of denoising, compared with two traditional denoising methods and two deep convolutional neural network denoising models.
APA, Harvard, Vancouver, ISO, and other styles
22

Hu, Guoping, Fangzheng Zhao, and Bingqi Liu. "Estimation of the Two-Dimensional Direction of Arrival for Low-Elevation and Non-Low-Elevation Targets Based on Dilated Convolutional Networks." Remote Sensing 15, no. 12 (June 14, 2023): 3117. http://dx.doi.org/10.3390/rs15123117.

Full text
Abstract:
This paper addresses the problem of the two-dimensional direction-of-arrival (2D DOA) estimation of low-elevation or non-low-elevation targets using L-shaped uniform and sparse arrays by analyzing the signal models’ features and their mapping to 2D DOA. This paper proposes a 2D DOA estimation algorithm based on the dilated convolutional network model, which consists of two components: a dilated convolutional autoencoder and a dilated convolutional neural network. If there are targets at low elevation, the dilated convolutional autoencoder suppresses the multipath signal and outputs a new signal covariance matrix as the input of the dilated convolutional neural network to directly perform 2D DOA estimation in the absence of a low-elevation target. The algorithm employs 3D convolution to fully retain and extract features. The simulation experiments and the analysis of their results revealed that for both L-shaped uniform and L-shaped sparse arrays, the dilated convolutional autoencoder could effectively suppress the multipath signals without affecting the direct wave and non-low-elevation targets, whereas the dilated convolutional neural network could effectively achieve 2D DOA estimation with a matching rate and an effective ratio of pitch and azimuth angles close to 100% without the need for additional parameter matching. Under the condition of a low signal-to-noise ratio, the estimation accuracy of the proposed algorithm was significantly higher than that of the traditional DOA estimation.
APA, Harvard, Vancouver, ISO, and other styles
23

Ma, Hao, Chao Chen, Qing Zhu, Haitao Yuan, Liming Chen, and Minglei Shu. "An ECG Signal Classification Method Based on Dilated Causal Convolution." Computational and Mathematical Methods in Medicine 2021 (February 2, 2021): 1–10. http://dx.doi.org/10.1155/2021/6627939.

Full text
Abstract:
The incidence of cardiovascular disease is increasing year by year and is showing a younger trend. At the same time, existing medical resources are tight. The automatic detection of ECG signals becomes increasingly necessary. This paper proposes an automatic classification of ECG signals based on a dilated causal convolutional neural network. To solve the problem that the recurrent neural network framework network cannot be accelerated by hardware equipment, the dilated causal convolutional neural network is adopted. Given the features of the same input and output time steps of the recurrent neural network and the nondisclosure of future information, the network is constructed with fully convolutional networks and causal convolution. To reduce the network depth and prevent gradient explosion or gradient disappearance, the dilated factor is introduced into the model, and the residual blocks are introduced into the model according to the shortcut connection idea. The effectiveness of the algorithm is verified in the MIT-BIH Atrial Fibrillation Database (MIT-BIH AFDB). In the experiment of the MIT-BIH AFDB database, the classification accuracy rate is 98.65%.
APA, Harvard, Vancouver, ISO, and other styles
24

Lin, Yingjie, and Jianning Wu. "A Novel Multichannel Dilated Convolution Neural Network for Human Activity Recognition." Mathematical Problems in Engineering 2020 (July 11, 2020): 1–10. http://dx.doi.org/10.1155/2020/5426532.

Full text
Abstract:
A novel multichannel dilated convolution neural network for improving the accuracy of human activity recognition is proposed. The proposed model utilizes the multichannel convolution structure with multiple kernels of various sizes to extract multiscale features of high-dimensional data of human activity during convolution operation and not to consider the use of the pooling layers that are used in the traditional convolution with dilated convolution. Its advantage is that the dilated convolution can first capture intrinsical sequence information by expanding the field of convolution kernel without increasing the parameter amount of the model. And then, the multichannel structure can be employed to extract multiscale gait features by forming multiple convolution paths. The open human activity recognition dataset is used to evaluate the effectiveness of our proposed model. The experimental results showed that our model achieves an accuracy of 95.49%, with the time to identify a single sample being approximately 0.34 ms on a low-end machine. These results demonstrate that our model is an efficient real-time HAR model, which can gain the representative features from sensor signals at low computation and is hopeful for the effective tool in practical applications.
APA, Harvard, Vancouver, ISO, and other styles
25

Rahman, Takowa, Md Saiful Islam, and Jia Uddin. "MRI-Based Brain Tumor Classification Using a Dilated Parallel Deep Convolutional Neural Network." Digital 4, no. 3 (June 28, 2024): 529–54. http://dx.doi.org/10.3390/digital4030027.

Full text
Abstract:
Brain tumors are frequently classified with high accuracy using convolutional neural networks (CNNs) to better comprehend the spatial connections among pixels in complex pictures. Due to their tiny receptive fields, the majority of deep convolutional neural network (DCNN)-based techniques overfit and are unable to extract global context information from more significant regions. While dilated convolution retains data resolution at the output layer and increases the receptive field without adding computation, stacking several dilated convolutions has the drawback of producing a grid effect. This research suggests a dilated parallel deep convolutional neural network (PDCNN) architecture that preserves a wide receptive field in order to handle gridding artifacts and extract both coarse and fine features from the images. This article applies multiple preprocessing strategies to the input MRI images used to train the model. By contrasting various dilation rates, the global path uses a low dilation rate (2,1,1), while the local path uses a high dilation rate (4,2,1) for decremental even numbers to tackle gridding artifacts and to extract both coarse and fine features from the two parallel paths. Using three different types of MRI datasets, the suggested dilated PDCNN with the average ensemble method performs best. The accuracy achieved for the multiclass Kaggle dataset-III, Figshare dataset-II, and binary tumor identification dataset-I is 98.35%, 98.13%, and 98.67%, respectively. In comparison to state-of-the-art techniques, the suggested structure improves results by extracting both fine and coarse features, making it efficient.
APA, Harvard, Vancouver, ISO, and other styles
26

Khotimah, Wijayanti Nurul, Farid Boussaid, Ferdous Sohel, Lian Xu, David Edwards, Xiu Jin, and Mohammed Bennamoun. "SC-CAN: Spectral Convolution and Channel Attention Network for Wheat Stress Classification." Remote Sensing 14, no. 17 (August 30, 2022): 4288. http://dx.doi.org/10.3390/rs14174288.

Full text
Abstract:
Biotic and abiotic plant stress (e.g., frost, fungi, diseases) can significantly impact crop production. It is thus essential to detect such stress at an early stage before visual symptoms and damage become apparent. To this end, this paper proposes a novel deep learning method, called Spectral Convolution and Channel Attention Network (SC-CAN), which exploits the difference in spectral responses of healthy and stressed crops. The proposed SC-CAN method comprises two main modules: (i) a spectral convolution module, which consists of dilated causal convolutional layers stacked in a residual manner to capture the spectral features; (ii) a channel attention module, which consists of a global pooling layer and fully connected layers that compute inter-relationship between feature map channels before scaling them based on their importance level (attention score). Unlike standard convolution, which focuses on learning local features, the dilated convolution layers can learn both local and global features. These layers also have long receptive fields, making them suitable for capturing long dependency patterns in hyperspectral data. However, because not all feature maps produced by the dilated convolutional layers are important, we propose a channel attention module that weights the feature maps according to their importance level. We used SC-CAN to classify salt stress (i.e., abiotic stress) on four datasets (Chinese Spring (CS), Aegilops columnaris (co(CS)), Ae. speltoides auchery (sp(CS)), and Kharchia datasets) and Fusarium head blight disease (i.e., biotic stress) on Fusarium dataset. Reported experimental results show that the proposed method outperforms existing state-of-the-art techniques with an overall accuracy of 83.08%, 88.90%, 82.44%, 82.10%, and 82.78% on CS, co(CS), sp(CS), Kharchia, and Fusarium datasets, respectively.
APA, Harvard, Vancouver, ISO, and other styles
27

Ni, Jian, Rui Wang, and Jing Tang. "ADSSD: Improved Single-Shot Detector with Attention Mechanism and Dilated Convolution." Applied Sciences 13, no. 6 (March 22, 2023): 4038. http://dx.doi.org/10.3390/app13064038.

Full text
Abstract:
The detection of small objects is easily affected by background information, and a lack of context information makes detection difficult. Therefore, small object detection has become an extremely challenging task. Based on the above problems, we proposed a Single-Shot MultiBox Detector with an attention mechanism and dilated convolution (ADSSD). In the attention module, we strengthened the connection between information in space and channels while using cross-layer connections to accelerate training. In the multi-branch dilated convolution module, we combined three expansion convolutions with different dilated ratios to obtain multi-scale context information and used hierarchical feature fusion to reduce the gridding effect. The results show that on PASCAL VOC2007 and VOC2012 datasets, our 300 × 300 input ADSSD model reaches 78.4% mAP and 76.1% mAP. The results outperform those of SSD and other advanced detectors; the effect of some small object detection is significantly improved. Moreover, the performance of the ADSSD in object detection affected by factors such as dense occlusion is better than that of the traditional SSD.
APA, Harvard, Vancouver, ISO, and other styles
28

Xu, Jiawei, Jie Wu, Yu Lei, and Yuxiang Gu. "Application of Pseudo-Three-Dimensional Residual Network to Classify the Stages of Moyamoya Disease." Brain Sciences 13, no. 5 (April 29, 2023): 742. http://dx.doi.org/10.3390/brainsci13050742.

Full text
Abstract:
It is essential to assess the condition of moyamoya disease (MMD) patients accurately and promptly to prevent MMD from endangering their lives. A Pseudo-Three-Dimensional Residual Network (P3D ResNet) was proposed to process spatial and temporal information, which was implemented in the identification of MMD stages. Digital Subtraction Angiography (DSA) sequences were split into mild, moderate and severe stages in accordance with the progression of MMD, and divided into a training set, a verification set, and a test set with a ratio of 6:2:2 after data enhancement. The features of the DSA images were processed using decoupled three-dimensional (3D) convolution. To increase the receptive field and preserve the features of the vessels, decoupled 3D dilated convolutions that are equivalent to two-dimensional dilated convolutions, plus one-dimensional dilated convolution, were utilized in the spatial and temporal domains, respectively. Then, they were coupled in serial, parallel, and serial–parallel modes to form P3D modules based on the structure of the residual unit. The three kinds of module were placed in a proper sequence to create the complete P3D ResNet. The experimental results demonstrate that the accuracy of P3D ResNet can reach 95.78% with appropriate parameter quantities, making it easy to implement in a clinical setting.
APA, Harvard, Vancouver, ISO, and other styles
29

Wang, Yanjie, Shiyu Hu, Guodong Wang, Chenglizhao Chen, and Zhenkuan Pan. "Multi-scale dilated convolution of convolutional neural network for crowd counting." Multimedia Tools and Applications 79, no. 1-2 (October 17, 2019): 1057–73. http://dx.doi.org/10.1007/s11042-019-08208-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Wang, Yanjie, Guodong Wang, Chenglizhao Chen, and Zhenkuan Pan. "Multi-scale dilated convolution of convolutional neural network for image denoising." Multimedia Tools and Applications 78, no. 14 (February 23, 2019): 19945–60. http://dx.doi.org/10.1007/s11042-019-7377-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Deng, Feiyue, Yan Bi, Yongqiang Liu, and Shaopu Yang. "Deep-Learning-Based Remaining Useful Life Prediction Based on a Multi-Scale Dilated Convolution Network." Mathematics 9, no. 23 (November 26, 2021): 3035. http://dx.doi.org/10.3390/math9233035.

Full text
Abstract:
Remaining useful life (RUL) prediction of key components is an important influencing factor in making accurate maintenance decisions for mechanical systems. With the rapid development of deep learning (DL) techniques, the research on RUL prediction based on the data-driven model is increasingly widespread. Compared with the conventional convolution neural networks (CNNs), the multi-scale CNNs can extract different-scale feature information, which exhibits a better performance in the RUL prediction. However, the existing multi-scale CNNs employ multiple convolution kernels with different sizes to construct the network framework. There are two main shortcomings of this approach: (1) the convolution operation based on multiple size convolution kernels requires enormous computation and has a low operational efficiency, which severely restricts its application in practical engineering. (2) The convolutional layer with a large size convolution kernel needs a mass of weight parameters, leading to a dramatic increase in the network training time and making it prone to overfitting in the case of small datasets. To address the above issues, a multi-scale dilated convolution network (MsDCN) is proposed for RUL prediction in this article. The MsDCN adopts a new multi-scale dilation convolution fusion unit (MsDCFU), in which the multi-scale network framework is composed of convolution operations with different dilated factors. This effectively expands the range of receptive field (RF) for the convolution kernel without an additional computational burden. Moreover, the MsDCFU employs the depthwise separable convolution (DSC) to further improve the operational efficiency of the prognostics model. Finally, the proposed method was validated with the accelerated degradation test data of rolling element bearings (REBs). The experimental results demonstrate that the proposed MSDCN has a higher RUL prediction accuracy compared to some typical CNNs and better operational efficiency than the existing multi-scale CNNs based on different convolution kernel sizes.
APA, Harvard, Vancouver, ISO, and other styles
32

Ma, Tian, Xinlei Zhou, Jiayi Yang, Boyang Meng, Jiali Qian, Jiehui Zhang, and Gang Ge. "Dental Lesion Segmentation Using an Improved ICNet Network with Attention." Micromachines 13, no. 11 (November 7, 2022): 1920. http://dx.doi.org/10.3390/mi13111920.

Full text
Abstract:
Precise segmentation of tooth lesions is critical to creation of an intelligent tooth lesion detection system. As a solution to the problem that tooth lesions are similar to normal tooth tissues and difficult to segment, an improved segmentation method of the image cascade network (ICNet) network is proposed to segment various lesion types, such as calculus, gingivitis, and tartar. First, the ICNet network model is used to achieve real-time segmentation of lesions. Second, the Convolutional Block Attention Module (CBAM) is integrated into the ICNet network structure, and large-size convolutions in the spatial attention module are replaced with layered dilated convolutions to enhance the relevant features while suppressing useless features and solve the problem of inaccurate lesion segmentations. Finally, part of the convolution in the network model is replaced with an asymmetric convolution to reduce the calculations added by the attention module. Experimental results show that compared with Fully Convolutional Networks (FCN), U-Net, SegNet, and other segmentation algorithms, our method has a significant improvement in the segmentation effect, and the image processing frequency is higher, which satisfies the real-time requirements of tooth lesion segmentation accuracy.
APA, Harvard, Vancouver, ISO, and other styles
33

Tran, Song-Toan, Thanh-Tuan Nguyen, Minh-Hai Le, Ching-Hwa Cheng, and Don-Gey Liu. "TDC-Unet: Triple Unet with Dilated Convolution for Medical Image Segmentation." International Journal of Pharma Medicine and Biological Sciences 11, no. 1 (January 2022): 1–7. http://dx.doi.org/10.18178/ijpmbs.11.1.1-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Park, Sangun, and Dong Eui Chang. "Multipath Lightweight Deep Network Using Randomly Selected Dilated Convolution." Sensors 21, no. 23 (November 26, 2021): 7862. http://dx.doi.org/10.3390/s21237862.

Full text
Abstract:
Robot vision is an essential research field that enables machines to perform various tasks by classifying/detecting/segmenting objects as humans do. The classification accuracy of machine learning algorithms already exceeds that of a well-trained human, and the results are rather saturated. Hence, in recent years, many studies have been conducted in the direction of reducing the weight of the model and applying it to mobile devices. For this purpose, we propose a multipath lightweight deep network using randomly selected dilated convolutions. The proposed network consists of two sets of multipath networks (minimum 2, maximum 8), where the output feature maps of one path are concatenated with the input feature maps of the other path so that the features are reusable and abundant. We also replace the 3×3 standard convolution of each path with a randomly selected dilated convolution, which has the effect of increasing the receptive field. The proposed network lowers the number of floating point operations (FLOPs) and parameters by more than 50% and the classification error by 0.8% as compared to the state-of-the-art. We show that the proposed network is efficient.
APA, Harvard, Vancouver, ISO, and other styles
35

Rhee, Jung-Soo. "CERTAIN RADIALLY DILATED CONVOLUTION AND ITS APPLICATION." Honam Mathematical Journal 32, no. 1 (March 25, 2010): 101–12. http://dx.doi.org/10.5831/hmj.2010.32.1.101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Orhei, Ciprian, Victor Bogdan, Cosmin Bonchis, and Radu Vasiu. "Dilated Filters for Edge-Detection Algorithms." Applied Sciences 11, no. 22 (November 13, 2021): 10716. http://dx.doi.org/10.3390/app112210716.

Full text
Abstract:
Edges are a basic and fundamental feature in image processing that is used directly or indirectly in huge number of applications. Inspired by the expansion of image resolution and processing power, dilated-convolution techniques appeared. Dilated convolutions have impressive results in machine learning, so naturally we discuss the idea of dilating the standard filters from several edge-detection algorithms. In this work, we investigated the research hypothesis that use dilated filters, rather than the extended or classical ones, and obtained better edge map results. To demonstrate this hypothesis, we compared the results of the edge-detection algorithms using the proposed dilation filters with original filters or custom variants. Experimental results confirm our statement that the dilation of filters have a positive impact for edge-detection algorithms from simple to rather complex algorithms.
APA, Harvard, Vancouver, ISO, and other styles
37

Heo, Woon-Haeng, Hyemi Kim, and Oh-Wook Kwon. "Integrating Dilated Convolution into DenseLSTM for Audio Source Separation." Applied Sciences 11, no. 2 (January 15, 2021): 789. http://dx.doi.org/10.3390/app11020789.

Full text
Abstract:
Herein, we proposed a multi-scale multi-band dilated time-frequency densely connected convolutional network (DenseNet) with long short-term memory (LSTM) for audio source separation. Because the spectrogram of the acoustic signal can be thought of as images as well as time series data, it is suitable for convolutional recurrent neural network (CRNN) architecture. We improved the audio source separation performance by applying the dilated block with a dilated convolution to CRNN architecture. The dilated block has the role of effectively increasing the receptive field in the spectrogram. In addition, it was designed in consideration of the acoustic characteristics that the frequency axis and the time axis in the spectrogram are changed by independent influences such as speech rate and pitch. In speech enhancement experiments, we estimated the speech signal using various deep learning architectures from a signal in which the music, noise, and speech were mixed. We conducted the subjective evaluation on the estimated speech signal. In addition, speech quality, intelligibility, separation, and speech recognition performance were also measured. In music signal separation, we estimated the music signal using several deep learning architectures from the mixture of the music and speech signal. After that, the separation performance and music identification accuracy were measured using the estimated music signal. Overall, the proposed architecture shows the best performance compared to other deep learning architectures not only in speech experiments but also in music experiments.
APA, Harvard, Vancouver, ISO, and other styles
38

Bian, Shengqin, Xinyu He, Zhengguang Xu, and Lixin Zhang. "Hybrid Dilated Convolution with Attention Mechanisms for Image Denoising." Electronics 12, no. 18 (September 6, 2023): 3770. http://dx.doi.org/10.3390/electronics12183770.

Full text
Abstract:
In the field of image denoising, convolutional neural networks (CNNs) have become increasingly popular due to their ability to learn effective feature representations from large amounts of data. In the field of image denoising, CNNs are widely used to improve performance. However, increasing network depth can weaken the influence of shallow layers on deep layers, especially for complex denoising tasks such as real denoising and blind denoising, where conventional networks fail to achieve high-quality results. To address this issue, this paper proposes a hybrid dilated convolution-based denoising network (AMDNet) that incorporates attention mechanisms. In specific, AMDNet consists of four modules: the sparse module (SM), the feature fusion module (FFM), the attention guidance module (AGM), and the image residual module (IRM). The SM employs hybrid dilated convolution to extract local features, while the FFM is used to integrate global and local features. The AGM accurately extracts noise information hidden in complex backgrounds. Finally, the IRM reconstructs images in a residual manner to obtain high-quality results after denoising. AMDNet has the following features: (1) The sparse mechanism in hybrid dilated convolution enables better extraction of local features, enhancing the network’s ability to capture noise information. (2) The feature fusion module, through long-range connections, fully integrates global and local features, improving the performance of the model; (3) the attention module is ingeniously designed to precisely extract features in complex backgrounds. The experimental results demonstrate that AMDNet achieves outstanding performance on three tasks (Gaussian noise, real noise, and blind denoising).
APA, Harvard, Vancouver, ISO, and other styles
39

Yang, Hongbo, and Shi Qiu. "A Novel Dynamic Contextual Feature Fusion Model for Small Object Detection in Satellite Remote-Sensing Images." Information 15, no. 4 (April 18, 2024): 230. http://dx.doi.org/10.3390/info15040230.

Full text
Abstract:
Ground objects in satellite images pose unique challenges due to their low resolution, small pixel size, lack of texture features, and dense distribution. Detecting small objects in satellite remote-sensing images is a difficult task. We propose a new detector focusing on contextual information and multi-scale feature fusion. Inspired by the notion that surrounding context information can aid in identifying small objects, we propose a lightweight context convolution block based on dilated convolutions and integrate it into the convolutional neural network (CNN). We integrate dynamic convolution blocks during the feature fusion step to enhance the high-level feature upsampling. An attention mechanism is employed to focus on the salient features of objects. We have conducted a series of experiments to validate the effectiveness of our proposed model. Notably, the proposed model achieved a 3.5% mean average precision (mAP) improvement on the satellite object detection dataset. Another feature of our approach is lightweight design. We employ group convolution to reduce the computational cost in the proposed contextual convolution module. Compared to the baseline model, our method reduces the number of parameters by 30%, computational cost by 34%, and an FPS rate close to the baseline model. We also validate the detection results through a series of visualizations.
APA, Harvard, Vancouver, ISO, and other styles
40

Zhou, Bo, and Omer Saeed. "Comparative Analysis of Volleyball Serve Action Based on Human Posture Estimation." Mobile Information Systems 2022 (September 30, 2022): 1–11. http://dx.doi.org/10.1155/2022/4817463.

Full text
Abstract:
Serving is one of the most crucial techniques in volleyball. Serving is a method that does not require team interaction and is difficult for the opponent to immediately interfere with. The feature migration module with a fixed offset is suggested in this work. This module can be thought of as a cross-channel dilated convolution approximation of dilated convolution. The reason cross-channel dilated convolution is not worse than standard dilated convolution with few parameters is discussed in this article. An improved random forest model is put forth to address the issue of the human pose estimation system’s high memory consumption when utilizing random forest as the classifier. This model presents the Poisson process and incorporates it with the depth data to create a filter before using Bootstrap sampling. In order to optimize and reconstruct the training dataset, a portion of the feature sample points that do not contribute positively to subsequent classification is removed from the original training dataset. This allows the training dataset to better account for the repeated sampling of the random forest during the sampling process. Resampling has some drawbacks, but they are not very representative. The effectiveness of the optimization model, which significantly lowers the system’s time and space complexity and increases the system’s applicability, is demonstrated by experiments.
APA, Harvard, Vancouver, ISO, and other styles
41

Pentapati, Hema Kumar, and Sridevi K. "A Systematic Approach of Advanced Dilated Convolution Network for Speaker Identification." International Journal of Electrical and Electronics Research 11, no. 1 (March 30, 2023): 25–30. http://dx.doi.org/10.37391/ijeer.110104.

Full text
Abstract:
Over the years, the Speaker recognition area is facing various challenges in identifying the speakers accurately. Remarkable changes came into existence with the advent of deep learning algorithms. Deep learning made a remarkable impact on the speaker recognition approaches. This paper introduces a simple novel architectural approach to an advanced Dilated Convolution network. The novel idea is to induce the well-structured log-Melspectrum to the proposed dilated convolution neural network and reduce the number of layers to 11. The network utilizes the Global average pooling to accumulate the outputs from all layers to get the feature vector representation for classification. Only 13 coefficients are extracted per frame of each speech sample. This novel dilated convolution neural network exhibits an accuracy of 90.97%, Equal Error Rate(EER) of 3.75% and 207 Seconds training time outperforms the existing systems on the LibriSpeech corpus.
APA, Harvard, Vancouver, ISO, and other styles
42

Ji, Changpeng, Haofeng Yu, and Wei Dai. "Network Traffic Anomaly Detection Based on Spatiotemporal Feature Extraction and Channel Attention." Processes 12, no. 7 (July 7, 2024): 1418. http://dx.doi.org/10.3390/pr12071418.

Full text
Abstract:
To overcome the challenges of feature selection in traditional machine learning and enhance the accuracy of deep learning methods for anomaly traffic detection, we propose a novel method called DCGCANet. This model integrates dilated convolution, a GRU, and a Channel Attention Network, effectively combining dilated convolutional structures with GRUs to extract both temporal and spatial features for identifying anomalous patterns in network traffic. The one-dimensional dilated convolution (DC-1D) structure is designed to expand the receptive field, allowing for comprehensive traffic feature extraction while minimizing information loss typically caused by pooling operations. The DC structure captures spatial dependencies in the data, while the GRU processes time series data to capture dynamic traffic changes. Furthermore, the channel attention (CA) module assigns importance-based weights to features in different channels, enhancing the model’s representational capacity and improving its ability to detect abnormal traffic. DCGCANet achieved an accuracy rate of 99.6% on the CIC-IDS-2017 dataset, outperforming other algorithms. Additionally, the model attained precision, recall, and F1 score rates of 99%. The generalization capability of DCGCANet was validated on a subset of CIC-IDS-2017, demonstrating superior detection performance and robust generalization potential.
APA, Harvard, Vancouver, ISO, and other styles
43

You, Jiangchuan, and Zhenhong Shang. "Solar Filament Detection Based on Improved DeepLab V3+." Publications of the Astronomical Society of the Pacific 134, no. 1036 (June 1, 2022): 064501. http://dx.doi.org/10.1088/1538-3873/ac6e07.

Full text
Abstract:
Abstract A novel solar filament detection method based on an improved DeepLab V3+ is proposed to address the low detection accuracy of small solar filaments in Hα full-disk solar images. First, the Xception structure of the backbone network is fine-tuned, and the low-level feature information of the filaments is added to the decoder module of the network to improve the utilization of the solar filament features. Second, the receptive field of dilated convolution is expanded, and the information utilization rate is increased via cascaded dilated convolution to improve the detection accuracy of the small solar filaments. In the decoder module, two depthwise separable convolutions are used instead of ordinary convolutions to reduce incomplete detections. Finally, a dense conditional random field is added to optimize the edge of the detection results. Experiments on a public data set comprising full-disk Hα images show that compared with the original Deeplab V3+ algorithm, the proposed method improves the mean pixel accuracy, mean intersection over union, and F1-score by 1.86%, 1.95%, and 2.18%, respectively, which also demonstrates its superiority over other existing solar filament detection algorithms.
APA, Harvard, Vancouver, ISO, and other styles
44

Guan, Xin, Yushan Zhao, Charles Okanda Nyatega, and Qiang Li. "Brain Tumor Segmentation Network with Multi-View Ensemble Discrimination and Kernel-Sharing Dilated Convolution." Brain Sciences 13, no. 4 (April 11, 2023): 650. http://dx.doi.org/10.3390/brainsci13040650.

Full text
Abstract:
Accurate segmentation of brain tumors from magnetic resonance 3D images (MRI) is critical for clinical decisions and surgical planning. Radiologists usually separate and analyze brain tumors by combining images of axial, coronal, and sagittal views. However, traditional convolutional neural network (CNN) models tend to use information from only a single view or one by one. Moreover, the existing models adopt a multi-branch structure with different-size convolution kernels in parallel to adapt to various tumor sizes. However, the difference in the convolution kernels’ parameters cannot precisely characterize the feature similarity of tumor lesion regions with various sizes, connectivity, and convexity. To address the above problems, we propose a hierarchical multi-view convolution method that decouples the standard 3D convolution into axial, coronal, and sagittal views to provide complementary-view features. Then, every pixel is classified by ensembling the discriminant results from the three views. Moreover, we propose a multi-branch kernel-sharing mechanism with a dilated rate to obtain parameter-consistent convolution kernels with different receptive fields. We use the BraTS2018 and BraTS2020 datasets for comparison experiments. The average Dice coefficients of the proposed network on the BraTS2020 dataset can reach 78.16%, 89.52%, and 83.05% for the enhancing tumor (ET), whole tumor (WT), and tumor core (TC), respectively, while the number of parameters is only 0.5 M. Compared with the baseline network for brain tumor segmentation, the accuracy was improved by 1.74%, 0.5%, and 2.19%, respectively.
APA, Harvard, Vancouver, ISO, and other styles
45

Ko, Tae-young, and Seung-ho Lee. "Novel Method of Semantic Segmentation Applicable to Augmented Reality." Sensors 20, no. 6 (March 20, 2020): 1737. http://dx.doi.org/10.3390/s20061737.

Full text
Abstract:
This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.
APA, Harvard, Vancouver, ISO, and other styles
46

Wang, Ji, Peiquan Xu, Leijun Li, and Feng Zhang. "DAssd-Net: A Lightweight Steel Surface Defect Detection Model Based on Multi-Branch Dilated Convolution Aggregation and Multi-Domain Perception Detection Head." Sensors 23, no. 12 (June 10, 2023): 5488. http://dx.doi.org/10.3390/s23125488.

Full text
Abstract:
During steel production, various defects often appear on the surface of the steel, such as cracks, pores, scars, and inclusions. These defects may seriously decrease steel quality or performance, so how to timely and accurately detect defects has great technical significance. This paper proposes a lightweight model based on multi-branch dilated convolution aggregation and multi-domain perception detection head, DAssd-Net, for steel surface defect detection. First, a multi-branch Dilated Convolution Aggregation Module (DCAM) is proposed as a feature learning structure for the feature augmentation networks. Second, to better capture spatial (location) information and to suppress channel redundancy, we propose a Dilated Convolution and Channel Attention Fusion Module (DCM) and Dilated Convolution and Spatial Attention Fusion Module (DSM) as feature enhancement modules for the regression and classification tasks in the detection head. Third, through experiments and heat map visualization analysis, we have used DAssd-Net to improve the receptive field of the model while paying attention to the target spatial location and redundant channel feature suppression. DAssd-Net is shown to achieve 81.97% mAP accuracy on the NEU-DET dataset, while the model size is only 18.7 MB. Compared with the latest YOLOv8 model, the mAP increased by 4.69%, and the model size was reduced by 23.9 MB, which has the advantage of being lightweight.
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Jie, Qingshan Xu, Yingchao Guo, and Runfeng Chen. "Aircraft Landing Gear Retraction/Extension System Fault Diagnosis with 1-D Dilated Convolutional Neural Network." Sensors 22, no. 4 (February 10, 2022): 1367. http://dx.doi.org/10.3390/s22041367.

Full text
Abstract:
The faults of the landing gear retraction/extension(R/E) system can result in the deterioration of an aircraft’s maneuvering conditions; how to identify the faults of the landing gear R/E system has become a key issue for ensuring aircraft take-off and landing safety. In this paper, we aim to solve this problem by proposing the 1-D dilated convolutional neural network (1-DDCNN). Aiming at developing the limited feature information extraction and inaccurate diagnosis of the traditional 1-DCNN with a single feature, the 1-DDCNN selects multiple feature parameters to realize feature integration. The performance of the 1-DDCNN in feature extraction is explored. Importantly, using padding dilated convolution to multiply the receptive field of the convolution kernel, the 1-DDCNN can completely retain the feature information in the original signal. Experimental results demonstrated that the proposed method has high accuracy and robustness, which provides a novel idea for feature extraction and fault diagnosis of the landing gear R/E system.
APA, Harvard, Vancouver, ISO, and other styles
48

Chen, Guangsheng, Chao Li, Wei Wei, Weipeng Jing, Marcin Woźniak, Tomas Blažauskas, and Robertas Damaševičius. "Fully Convolutional Neural Network with Augmented Atrous Spatial Pyramid Pool and Fully Connected Fusion Path for High Resolution Remote Sensing Image Segmentation." Applied Sciences 9, no. 9 (May 1, 2019): 1816. http://dx.doi.org/10.3390/app9091816.

Full text
Abstract:
Recent developments in Convolutional Neural Networks (CNNs) have allowed for the achievement of solid advances in semantic segmentation of high-resolution remote sensing (HRRS) images. Nevertheless, the problems of poor classification of small objects and unclear boundaries caused by the characteristics of the HRRS image data have not been fully considered by previous works. To tackle these challenging problems, we propose an improved semantic segmentation neural network, which adopts dilated convolution, a fully connected (FC) fusion path and pre-trained encoder for the semantic segmentation task of HRRS imagery. The network is built with the computationally-efficient DeepLabv3 architecture, with added Augmented Atrous Spatial Pyramid Pool and FC Fusion Path layers. Dilated convolution enlarges the receptive field of feature points without decreasing the feature map resolution. The improved neural network architecture enhances HRRS image segmentation, reaching the classification accuracy of 91%, and the precision of recognition of small objects is improved. The applicability of the improved model to the remote sensing image segmentation task is verified.
APA, Harvard, Vancouver, ISO, and other styles
49

Xu, Xuefeng, Tuo Chen, and Huanyi Chen. "Multi-scaled attentive style transfer based on dilated convolution." Applied and Computational Engineering 6, no. 1 (June 14, 2023): 998–1002. http://dx.doi.org/10.54254/2755-2721/6/20230693.

Full text
Abstract:
Image style transfer aims to apply artists painting styles to various images. Many approaches seek different purposes, but a general tendency is to increase efficiency and enable arbitrary style inputs. The state-of-art method is the adaptive convolutional network which expands the process of feature mixture into a layer-wise adjustment that superior previous work by presenting results that is more aware of detailed structures. However, the encoding process that guides the entire stylized revision is unaware of multi-scaled information. We designed an improved version of the style feature encoding procedure in our work. With the introduction of dilated convolution with a different rate, the output stylized image is better at spatial texture migration and color distribution determination. We also come up with a hybrid objective that better measures the spatial dissimilarity between content and style features.
APA, Harvard, Vancouver, ISO, and other styles
50

Zhou, Yuepeng, Huiyou Chang, Yonghe Lu, and Xili Lu. "CDTNet: Improved Image Classification Method Using Standard, Dilated and Transposed Convolutions." Applied Sciences 12, no. 12 (June 12, 2022): 5984. http://dx.doi.org/10.3390/app12125984.

Full text
Abstract:
Convolutional neural networks (CNNs) have achieved great success in image classification tasks. In the process of a convolutional operation, a larger input area can capture more context information. Stacking several convolutional layers can enlarge the receptive field, but this increases the parameters. Most CNN models use pooling layers to extract important features, but the pooling operations cause information loss. Transposed convolution can increase the spatial size of the feature maps to recover the lost low-resolution information. In this study, we used two branches with different dilated rates to obtain different size features. The dilated convolution can capture richer information, and the outputs from the two channels are concatenated together as input for the next block. The small size feature maps of the top blocks are transposed to increase the spatial size of the feature maps to recover low-resolution prediction maps. We evaluated the model on three image classification benchmark datasets (CIFAR-10, SVHN, and FMNIST) with four state-of-the-art models, namely, VGG16, VGG19, ResNeXt, and DenseNet. The experimental results show that CDTNet achieved lower loss, higher accuracy, and faster convergence speed in the training and test stages. The average test accuracy of CDTNet increased by 54.81% at most on SVHN with VGG19 and by 1.28% at least on FMNIST with VGG16, which proves that CDTNet has better performance and strong generalization abilities, as well as fewer parameters.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography