Log in

Relevant bibliographies by topics / Spectrogram / Journal articles

To see the other types of publications on this topic, follow the link: Spectrogram.

Journal articles on the topic 'Spectrogram'

Author: Grafiati

Published: 4 June 2021

Last updated: 15 June 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Spectrogram.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lee, Sang-Hoon, Hyun-Wook Yoon, Hyeong-Rae Noh, Ji-Hoon Kim, and Seong-Whan Lee. "Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (2021): 13198–206. http://dx.doi.org/10.1609/aaai.v35i14.17559.

Full text

Abstract:

While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems have shown significant improvement in neural speech synthesis, there is no TTS system to learn to synthesize speech from text sequences with only adversarial feedback. Because adversarial feedback alone is not sufficient to train the generator, current models still require the reconstruction loss compared with the ground-truth and the generated mel-spectrogram directly. In this paper, we present Multi-SpectroGAN (MSG), which can train the multi-speaker model with only the adversarial feedback by conditioning a self-supervised hidden representation of the generator to a conditional discriminator. This leads to better guidance for generator training. Moreover, we also propose adversarial style combination (ASC) for better generalization in the unseen speaking style and transcript, which can learn latent representations of the combined style embedding from multiple mel-spectrograms. Trained with ASC and feature matching, the MSG synthesizes a high-diversity mel-spectrogram by controlling and mixing the individual speaking styles (e.g., duration, pitch, and energy). The result shows that the MSG synthesizes a high-fidelity mel-spectrogram, which has almost the same naturalness MOS score as the ground-truth mel-spectrogram.

APA, Harvard, Vancouver, ISO, and other styles

2

Johnson, Alexander. "An integrated approach for teaching speech spectrogram analysis to engineering students." Journal of the Acoustical Society of America 152, no. 3 (2022): 1962–69. http://dx.doi.org/10.1121/10.0014172.

Full text

Abstract:

Spectrogram analysis is a vital skill for learning speech acoustics. Spectrograms are necessary for visualizing cause-effect relationships between speech articulator movements and the resulting sound produced. However, many interpretation techniques needed to read spectrograms are counterintuitive to engineering students who have been taught to use more rigid mathematical formulas. As a result, spectrogram reading is often challenging for these students who do not have prior background in acoustic phonetics. In this paper, a structured, inclusive framework for teaching spectrogram reading to students of engineering backgrounds is presented. Findings from the implementation of these teaching methods in undergraduate and graduate engineering courses at University of California, Los Angeles are also unveiled.

APA, Harvard, Vancouver, ISO, and other styles

3

Kim, Seong-Yoon, Hyun-Min Lee, Chae-Young Lim, and Hyun-Woo Kim. "Detection of Abnormal Symptoms Using Acoustic-Spectrogram-Based Deep Learning." Applied Sciences 15, no. 9 (2025): 4679. https://doi.org/10.3390/app15094679.

Full text

Abstract:

Acoustic data inherently contain a variety of information, including indicators of abnormal symptoms. In this study, we propose a method for detecting abnormal symptoms by converting acoustic data into spectrogram representations and applying a deep learning model. Spectrograms effectively capture the temporal and frequency characteristics of acoustic signals. In this work, we extract key features such as spectrograms, Mel-spectrograms, and MFCCs from raw acoustic data and use them as input for training a convolutional neural network. The proposed model is based on a custom ResNet architecture that incorporates Bottleneck Residual Blocks to improve training stability and computational efficiency. The experimental results show that the model trained with Mel-spectrogram data achieved the highest classification accuracy at 97.13%. The models trained with spectrogram and MFCC data achieved 95.22% and 93.78% accuracy, respectively. The superior performance of the Mel-spectrogram model is attributed to its ability to emphasize critical acoustic features through Mel-filter banks, which enhances learning performance. These findings demonstrate the effectiveness of spectrogram-based deep learning models in identifying latent patterns within acoustic data and detecting abnormal symptoms. Future research will focus on applying this approach to a wider range of acoustic domains and environments. The results of this study are expected to contribute to the development of disease surveillance systems by integrating acoustic data analysis with artificial intelligence techniques.

APA, Harvard, Vancouver, ISO, and other styles

4

Basak, Gopal K., and Tridibesh Dutta. "Statistical Speaker Identification Based on Spectrogram Imaging." Calcutta Statistical Association Bulletin 59, no. 3-4 (2007): 253–63. http://dx.doi.org/10.1177/0008068320070309.

Full text

Abstract:

Abstract: The paper addresses the problem of speaker identification based on spectrograms in the text dependent case. Using spectrogram segmentation, this paper, mainly, focusses on understanding the complex patterns in frequency and amplitude in an utterance of a given word by an individual. The features used for identifying a speaker based on an observed variable extracted from the spectrograms, rely on the distinct speaker effect, his/her interaction effect with the particular word and with the frequency bands of the spectrogram. Performance of this novel approach on spectrogram samples, collected from 40 speakers, show that this methodology can be effectively used to produce a very high success rate in a closed set of speakers for text-dependent speaker identification. AMS (2000) Subject Classification: 62P99.

APA, Harvard, Vancouver, ISO, and other styles

5

Han, Ying, Qiao Wang, Jianping Huang, et al. "Frequency Extraction of Global Constant Frequency Electromagnetic Disturbances from Electric Field VLF Data on CSES." Remote Sensing 15, no. 8 (2023): 2057. http://dx.doi.org/10.3390/rs15082057.

Full text

Abstract:

The electromagnetic data observed with the CSES (China Seismo-Electromagnetic Satellite, also known as Zhangheng-1 satellite) contain numerous spatial disturbances. These disturbances exhibit various shapes on the spectrogram, and constant frequency electromagnetic disturbances (CFEDs), such as artificially transmitted very-low-frequency (VLF) radio waves, power line harmonics, and interference from the satellite platform itself, appear as horizontal lines. To exploit this feature, we proposed an algorithm based on computer vision technology that automatically recognizes these lines on the spectrogram and extracts the frequencies from the CFEDs. First, the VLF waveform data collected with the CSES electric field detector (EFD) are converted into a time–frequency spectrogram using short-time Fourier Transform (STFT). Next, the CFED automatic recognition algorithm is used to identify horizontal lines on the spectrogram. The third step is to determine the line frequency range based on the proportional relationship between the frequency domain of the satellite’s VLF and the height of the time–frequency spectrogram. Finally, we used the CSES power spectrogram to confirm the presence of CFEDs in the line frequency range and extract their true frequencies. We statistically analyzed 1034 orbit time–frequency spectrograms and power spectrograms from 8 periods (5 days per period) and identified approximately 200 CFEDs. Among them, two CFEDs with strong signals persisted throughout an entire orbit. This study establishes a foundation for detecting anomalies due to artificial sources, particularly in the study of short-term strong earthquake prediction. Additionally, it contributes to research on other aspects of spatial electromagnetic interference and the suppression and cleaning of electromagnetic waves.

APA, Harvard, Vancouver, ISO, and other styles

6

Shingchern D. You, Kai-Rong Lin, and Chien-Hung Liu. "Estimating Classification Accuracy for Unlabeled Datasets Based on Block Scaling." International Journal of Engineering and Technology Innovation 13, no. 4 (2023): 313–27. http://dx.doi.org/10.46604/ijeti.2023.11975.

Full text

Abstract:

This paper proposes an approach called block scaling quality (BSQ) for estimating the prediction accuracy of a deep network model. The basic operation perturbs the input spectrogram by multiplying all values within a block by , where is equal to 0 in the experiments. The ratio of perturbed spectrograms that have different prediction labels than the original spectrogram to the total number of perturbed spectrograms indicates how much of the spectrogram is crucial for the prediction. Thus, this ratio is inversely correlated with the accuracy of the dataset. The BSQ approach demonstrates satisfactory estimation accuracy in experiments when compared with various other approaches. When using only the Jamendo and FMA datasets, the estimation accuracy experiences an average error of 4.9% and 1.8%, respectively. Moreover, the BSQ approach holds advantages over some of the comparison counterparts. Overall, it presents a promising approach for estimating the accuracy of a deep network model.

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Hong Ping, and Hong Li. "Establish an Artificial Neural Networks Model to Make Quantitative Analysis about the Capillary Electrophoresis Spectrum." Advanced Materials Research 452-453 (January 2012): 1116–20. http://dx.doi.org/10.4028/www.scientific.net/amr.452-453.1116.

Full text

Abstract:

Simulating the overlapping capillary electrophoresis spectrogram under the dissimilar conditions by the computer system , Choosing the overlapping capillary electrophoresis spectrogram simulated under the different conditions , processing the data to compose a neural network training regulations, Applying the artificial neural networks method to make a quantitative analysis about the multi-component in the overlapping capillary electrophoresis spectrogram,Using: Radial direction primary function neural network model and multi-layered perceptron neural network model. The findings indicated that, along with the increasing of the capillary electrophoresis spectrogram noise level, the related components’ ability of the two kinds of the overlapping capillary electrophoresis spectrogram by neural network model quantitative analysis drop down. Along with the increasing of the capillary electrophoresis spectrogram’s total dissociation degree, the multi-layered perceptron neural network model to the related components’ ability of the overlapping capillary electrophoresis spectum by quantitative analysis raise up.

APA, Harvard, Vancouver, ISO, and other styles

8

Li, Juan, Xueying Zhang, Lixia Huang, Fenglian Li, Shufei Duan, and Ying Sun. "Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network." Applied Sciences 12, no. 19 (2022): 9518. http://dx.doi.org/10.3390/app12199518.

Full text

Abstract:

In the background of artificial intelligence, the realization of smooth communication between people and machines has become the goal pursued by people. Mel spectrograms is a common method used in speech emotion recognition, focusing on the low-frequency part of speech. In contrast, the inverse Mel (IMel) spectrogram, which focuses on the high-frequency part, is proposed to comprehensively analyze emotions. Because the convolutional neural network-stacked sparse autoencoder (CNN-SSAE) can extract deep optimized features, the Mel-IMel dual-channel complementary structure is proposed. In the first channel, a CNN is used to extract the low-frequency information of the Mel spectrogram. The other channel extracts the high-frequency information of the IMel spectrogram. This information is transmitted into an SSAE to reduce the number of dimensions, and obtain the optimized information. Experimental results show that the highest recognition rates achieved on the EMO-DB, SAVEE, and RAVDESS datasets were 94.79%, 88.96%, and 83.18%, respectively. The conclusions are that the recognition rate of the two spectrograms was higher than that of each of the single spectrograms, which proves that the two spectrograms are complementary. The SSAE followed the CNN to get the optimized information, and the recognition rate was further improved, which proves the effectiveness of the CNN-SSAE network.

APA, Harvard, Vancouver, ISO, and other styles

9

Pethiyagoda, Ravindra, Scott W. McCue, and Timothy J. Moroney. "Spectrograms of ship wakes: identifying linear and nonlinear wave signals." Journal of Fluid Mechanics 811 (December 6, 2016): 189–209. http://dx.doi.org/10.1017/jfm.2016.753.

Full text

Abstract:

A spectrogram is a useful way of using short-time discrete Fourier transforms to visualise surface height measurements taken of ship wakes in real-world conditions. For a steadily moving ship that leaves behind small-amplitude waves, the spectrogram is known to have two clear linear components, a sliding-frequency mode caused by the divergent waves and a constant-frequency mode for the transverse waves. However, recent observations of high-speed ferry data have identified additional components of the spectrograms that are not yet explained. We use computer simulations of linear and nonlinear ship wave patterns and apply time–frequency analysis to generate spectrograms for an idealised ship. We clarify the role of the linear dispersion relation and ship speed on the two linear components. We use a simple weakly nonlinear theory to identify higher-order effects in a spectrogram and, while the high-speed ferry data are very noisy, we propose that certain additional features in the experimental data are caused by nonlinearity. Finally, we provide a possible explanation for a further discrepancy between the high-speed ferry spectrograms and linear theory by accounting for ship acceleration.

APA, Harvard, Vancouver, ISO, and other styles

10

Godbole, Shubham, Vaishnavi Jadhav, and Gajanan Birajdar. "Indian Language Identification using Deep Learning." ITM Web of Conferences 32 (2020): 01010. http://dx.doi.org/10.1051/itmconf/20203201010.

Full text

Abstract:

Spoken language is the most regular method of correspondence in this day and age. Endeavours to create language recognizable proof frameworks for Indian dialects have been very restricted because of the issue of speaker accessibility and language readability. However, the necessity of SLID is expanding for common and safeguard applications day by day. Feature extraction is a basic and important procedure performed in LID. A sound example is changed over into a spectrogram visual portrayal which describes a range of frequencies in regard with time. Three such spectrogram visuals were generated namely Log Spectrogram, Gammatonegram and IIR-CQT Spectrogram for audio samples from the standardized IIIT-H Indic Speech Database. These visual representations depict language specific details and the nature of each language. These spectrograms images were then used as an input to the CNN. Classification accuracy of 98.86% was obtained using the proposed methodology.

APA, Harvard, Vancouver, ISO, and other styles

11

Samad, Salina Abdul, and Aqilah Baseri Huddin. "Improving spectrogram correlation filters with time-frequency reassignment for bio-acoustic signal classification." Indonesian Journal of Electrical Engineering and Computer Science 14, no. 1 (2019): 59. http://dx.doi.org/10.11591/ijeecs.v14.i1.pp59-64.

Full text

Abstract:

<p>Spectrogram features have been used to automatically classify animals based on their vocalization. Usually, features are extracted and used as inputs to classifiers to distinguish between species. In this paper, a classifier based on Correlation Filters (CFs) is employed where the input features are the spectrogram image themselves. Spectrogram parameters are carefully selected based on the target dataset in order to obtain clear distinguishing images termed as call-prints. An even better representation of the call-prints is obtained using spectrogram Time-Frequency (TF) reassignment. To demonstrate the application of the proposed technique, two species of frogs are classified based on their vocalization spectrograms where for each species a correlation filter template is constructed from multiple call-prints using the Maximum Margin Correlation Filter (MMCF). The improved accuracy rate obtained with TF reassignment demonstrates that this is a viable method for bio-acoustic signal classification.</p>

APA, Harvard, Vancouver, ISO, and other styles

12

Samad, Salina Abdul, and Aqilah Baseri Huddin. "Improving spectrogram correlation filters with time-frequency reassignment for bio-acoustic signal classification." Indonesian Journal of Electrical Engineering and Computer Science 14, no. 1 (2019): 59–64. https://doi.org/10.11591/ijeecs.v14.i1.pp59-64.

Full text

Abstract:

Spectrogram features have been used to automatically classify animals based on their vocalization. Usually features are extracted and used as inputs to classifiers to distinguish between species. In this paper, a classifier based on Correlation Filters (CFs) is employed where the input features are the spectrogram image themselves. Spectrogram parameters are carefully selected based on the target dataset in order to obtain clear distinguishing images termed as call-prints. An even better representations of the call-prints are obtained using spectrogram Time-Frequency (TF) reassignment. To demonstrate the application of the proposed technique, two species of frogs are classified based on their vocalization spectrograms where for each species, a correlation filter template is constructed from multiple call-prints using the Maximum Margin Correlation Filter (MMCF). The improved accuracy rates obtained with TF reassignment demonstrates that this is a viable method for bio-acoustic signal classification.

APA, Harvard, Vancouver, ISO, and other styles

13

Tucker, Jeff, Kathleen E. Wage, John R. Buck, and Lora J. Van Uffelen. "Performance weighted blended spectrogram." Journal of the Acoustical Society of America 157, no. 3 (2025): 2106–16. https://doi.org/10.1121/10.0036216.

Full text

Abstract:

Spectrograms are used for time-frequency analysis and as preprocessing for signal classifiers and other algorithms. The conventional spectrogram is a tapered short-time Fourier transform, equivalent to a bank of bandpass filters. The taper defines filter-bank characteristics such as bandwidth and sidelobe levels. Although the conventional spectrogram uses minimal computational resources, its design requires a compromise between resolution and interference suppression. Adaptive spectrogram algorithms adjust the filter-bank based on incoming data, thereby allowing different bandwidth/sidelobe trade-offs at each frequency and time. Adaptation can simultaneously improve tonal resolution and reveal quiet sources but typically costs substantially more to implement. This paper presents an adaptive spectrogram designed for applications with limited computational resources, e.g., autonomous vehicles. The performance weighted blended (PWB) spectrogram combines the output of a set of conventional filter-banks designed with different tapers. By adapting its blend weights at each frequency and time, the new algorithm separates loud closely spaced tones and identifies quiet signals. Because it relies on conventional filter-banks, the PWB spectrogram requires significantly less computation than other adaptive algorithms that require expensive matrix computations. Analysis of underwater glider data demonstrates the algorithm's ability to reveal a quiet chirp signal in the presence of vehicle self-noise.

APA, Harvard, Vancouver, ISO, and other styles

14

Franzoni, Valentina. "Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs." Journal of Autonomous Intelligence 6, no. 3 (2023): 678. http://dx.doi.org/10.32629/jai.v6i3.678.

Full text

Abstract:

<p>In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.</p>

APA, Harvard, Vancouver, ISO, and other styles

15

Yu, Youxin, Wenbo Zhu, Xiaoli Ma, et al. "Recognition of Sheep Feeding Behavior in Sheepfolds Using Fusion Spectrogram Depth Features and Acoustic Features." Animals 14, no. 22 (2024): 3267. http://dx.doi.org/10.3390/ani14223267.

Full text

Abstract:

In precision feeding, non-contact and pressure-free monitoring of sheep feeding behavior is crucial for health monitoring and optimizing production management. The experimental conditions and real-world environments differ when using acoustic sensors to identify sheep feeding behaviors, leading to discrepancies and consequently posing challenges for achieving high-accuracy classification in complex production environments. This study enhances the classification performance by integrating the deep spectrogram features and acoustic characteristics associated with feeding behavior. We conducted the task of collecting sound data in actual production environments, considering noise and complex surroundings. The method included evaluating and filtering the optimal acoustic features, utilizing a customized convolutional neural network (SheepVGG-Lite) to extract Short-Time Fourier Transform (STFT) spectrograms and Constant Q Transform (CQT) spectrograms’ deep features, employing cross-spectrogram feature fusion and assessing classification performance through a support vector machine (SVM). Results indicate that the fusion of cross-spectral features significantly improved classification performance, achieving a classification accuracy of 96.47%. These findings highlight the value of integrating acoustic features with spectrogram deep features for accurately recognizing sheep feeding behavior.

APA, Harvard, Vancouver, ISO, and other styles

16

China Venkateswarlu, Guide: Dr S. "Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem48929.

Full text

Abstract:

Abstract -- Acoustic noise significantly degrades speech quality and intelligibility in almost all applications, ranging from telecommunications to voice assistants. In this paper, we address this problem by designing an efficient speech enhancement system based on deep learning. Our approach relies on spectrogram denoising, wherein audio signals are represented as 2D magnitude spectrograms that well maintain signal structure and enable direct application of Convolutional Neural Networks (CNNs). The backbone of our system is a U-Net model, which is a strong deep convolutional autoencoder capable of approximating the noise model of noisy voice spectrograms. We compiled a heterogeneous dataset carefully by mixing clean English speech from SiSec and LibriSpeech and 10 environmental noise classes from ESC-50 and others, using data augmentation and random noise levelization to encourage model generalization. We trained the U-Net with the Adam optimizer and Huber loss and attained strong performance with training loss 0.002129 and validation loss 0.002406. In prediction, the trained U-Net estimates the noise model accurately, which is then subtracted from the noisy spectrogram. The denoised magnitude spectrogram is then combined with the original phase, and the enhanced audio is reconstructed using an inverse Short Time Fourier Transform (ISTFT) process. Qualitative evaluations, including visual comparisons of time series and spectrograms, and audio demonstrations, confirm the efficacy of the system in suppressing various noises and preserving speech fidelity, even at high-noise levels. This project demonstrates a real-world and scalable deep learning solution to significant speech quality improvement in noisy environments. Key Words: speech enhancement, deep learning, spectrogram denoising, U-Net, convolutional neural networks, noise reduction, audio processing.

APA, Harvard, Vancouver, ISO, and other styles

17

Li, Chunhui, Xin Xiang, Hu Mao, Rui Wang, and Yonglei Qi. "Anchor-Free SNR-Aware Signal Detector for Wideband Signal Detection Framework." Electronics 14, no. 11 (2025): 2260. https://doi.org/10.3390/electronics14112260.

Full text

Abstract:

The spectrogram-based wideband signal detection framework has garnered increasing attention in various wireless communication applications. However, the front-end spectrograms in existing methods suffer from visual and informational deficiencies. This paper proposes a novel multichannel enhanced spectrogram (MCE spectrogram) to address these issues. The MCE spectrogram leverages additional channels for both visual and informational enhancement, highlighting signal regions and features while integrating richer recognition information across channels, thereby significantly improving feature extraction efficiency. Moreover, the back-end networks in existing methods are typically transferred from original object detection networks. Wideband signal detection, however, exhibits task-specific characteristics, such as the inherent signal-to-noise ratio (SNR) attribute of the spectrogram and the large variations in shapes of signal bounding boxes. These characteristics lead to issues like inefficient task adaptation and anchor mismatch, resulting in suboptimal performance. To tackle these challenges, we propose an SNR-aware detection network that employs an anchor-free paradigm instead of anchors for signal detection. Additionally, to address the impact of the SNR attribute, we design a trainable gating module for efficient feature fusion and introduce an auxiliary task branch to enable the network to capture more discriminative feature representations under varying SNRs. Experimental results demonstrate the superiority of the MCE spectrogram compared to those utilized in existing methods and the state-of-the-art performance of our SNR-aware Net among comparable detection networks.

APA, Harvard, Vancouver, ISO, and other styles

18

Alia Hussein, Ahmed Talib Abdulameer, Ali Abdulkarim, Husniza Husni, and Dalia Al-Ubaidi. "Classification of Dyslexia Among School Students Using Deep Learning." Journal of Techniques 6, no. 1 (2024): 85–92. http://dx.doi.org/10.51173/jt.v6i1.1893.

Full text

Abstract:

Dyslexia is a common learning disorder that affects children’s reading and writing skills. Early identification of Dyslexia is essential for providing appropriate interventions and support to affected children. Traditional methods of diagnosing Dyslexia often rely on subjective assessments and the expertise of specialists, leading to delays and potential inaccuracies in diagnosis. This study proposes a novel approach for diagnosing dyslexic children using spectrogram analysis and convolutional neural networks (CNNs). Spectrograms are visual representations of audio signals that provide detailed frequency and intensity information. CNNs are powerful deep-learning models capable of extracting complex patterns from data. In this research, raw audio signals from dyslexic and non-dyslexic children are transformed into spectrogram images. These images are then used as input for a CNN model trained on a large dataset of dyslexic and non-dyslexic samples. The CNN learns to automatically extract discriminative features from the spectrogram images and classify them into dyslexic and non-dyslexic categories. This study’s results demonstrate the proposed approach’s effectiveness in diagnosing dyslexic children. The CNN accurately identified dyslexic individuals based on the spectrogram features, outperforming traditional diagnostic methods. Spectrograms and CNNs provide a more objective and efficient approach to dyslexia diagnosis, enabling earlier intervention and support for affected children. This research contributes to the field of dyslexia diagnosis by harnessing the power of machine learning and audio analysis techniques. Facilitating faster and more accurate identification of Dyslexia in children, ultimately improving their educational outcomes and quality of life.

APA, Harvard, Vancouver, ISO, and other styles

19

Smietanka, Lukasz, and Tomasz Maka. "Enhancing Embedded Space with Low–Level Features for Speech Emotion Recognition." Applied Sciences 15, no. 5 (2025): 2598. https://doi.org/10.3390/app15052598.

Full text

Abstract:

This work proposes an approach that uses a feature space by combining the representation obtained in the unsupervised learning process and manually selected features defining the prosody of the utterances. In the experiments, we used two time-frequency representations (Mel and CQT spectrograms) and EmoDB and RAVDESS databases. As the results show, the proposed system improved the classification accuracy of both representations: 1.29% for CQT and 3.75% for Mel spectrogram compared to the typical CNN architecture for the EmoDB dataset and 3.02% for CQT and 0.63% for Mel spectrogram in the case of RAVDESS. Additionally, the results present a significant increase of around 14% in classification performance in the case of happiness and disgust emotions using Mel spectrograms and around 20% in happiness and disgust emotions for CQT in the case of best models trained on EmoDB. On the other hand, in the case of models that achieved the highest result for the RAVDESS database, the most significant improvement was observed in the classification of a neutral state, around 16%, using the Mel spectrogram. For CQT representation, the most significant improvement occurred for fear and surprise, around 9%. Additionally, the average results for all prepared models showed the positive impact of the method used on the quality of classification of most emotional states. For the EmoDB database, the highest average improvement was observed for happiness—14.6%. For other emotions, it ranged from 1.2% to 8.7%. The only exception was the emotion of sadness, for which the classification quality was average decreased by 1% when using the Mel spectrogram. In turn, for the RAVDESS database, the most significant improvement also occurred for happiness—7.5%, while for other emotions ranged from 0.2% to 7.1%, except disgust and calm, the classification of which deteriorated for the Mel spectrogram and the CQT representation, respectively.

APA, Harvard, Vancouver, ISO, and other styles

20

Jenkins, William F., Peter Gerstoft, Chih-Chieh Chien, and Emma Ozanich. "Reducing dimensionality of spectrograms using convolutional autoencoders." Journal of the Acoustical Society of America 153, no. 3_supplement (2023): A178. http://dx.doi.org/10.1121/10.0018582.

Full text

Abstract:

Under the “curse of dimensionality,” distance-based algorithms, such as k-means or Gaussian mixture model clustering, can lose meaning and interpretability in high-dimensional space. Acoustic data, specifically spectrograms, are subject to such limitations due to their high dimensionality: for example, a spectrogram with 100 time- and 100 frequency-bins contains 104 pixels, and its vectorized form constitutes a point in 104-dimensional space. In this talk, we look at four papers that used autoencoding convolutional neural networks to extract salient features of real data. The convolutional autoencoder consists of an encoder which compresses spectrograms into a low-dimensional latent feature space, and a decoder which seeks to reconstruct the original spectrogram from the latent feature space. The error between the original spectrogram and reconstruction is used to train the network. Once trained, the salient features of the data are embedded in the latent space and algorithms can be applied to the lower-dimensional latent space. We demonstrate how lower-dimensional representations result in interpretable clustering of complex physical data, which can contribute to reducing errors in classification and clustering tasks and enable exploratory analysis of large data sets.

APA, Harvard, Vancouver, ISO, and other styles

21

Hajihashemi, Vahid, Abdorreza Alavi Gharahbagh, Narges Hajaboutalebi, Mohsen Zahraei, José J. M. Machado, and João Manuel R. S. Tavares. "A Feature-Reduction Scheme Based on a Two-Sample t-Test to Eliminate Useless Spectrogram Frequency Bands in Acoustic Event Detection Systems." Electronics 13, no. 11 (2024): 2064. http://dx.doi.org/10.3390/electronics13112064.

Full text

Abstract:

Acoustic event detection (AED) systems, combined with video surveillance systems, can enhance urban security and safety by automatically detecting incidents, supporting the smart city concept. AED systems mostly use mel spectrograms as a well-known effective acoustic feature. The spectrogram is a combination of frequency bands. A big challenge is that some of the spectrogram bands may be similar in different events and be useless in AED. Removing useless bands reduces the input feature dimension and is highly desirable. This article proposes a mathematical feature analysis method to identify and eliminate ineffective spectrogram bands and improve AED systems’ efficiency. The proposed approach uses a Student’s t-test to compare frequency bands of the spectrogram from different acoustic events. The similarity between each frequency band among events is calculated using a two-sample t-test, allowing the identification of distinct and similar frequency bands. Removing these bands accelerates the training speed of the used classifier by reducing the number of features, and also enhances the system’s accuracy and efficiency. Based on the obtained results, the proposed method reduces the spectrogram bands by 26.3%. The results showed an average difference of 7.77% in the Jaccard, 4.07% in the Dice, and 5.7% in the Hamming distance between selected bands using train and test datasets. These small values underscore the validity of the obtained results for the test dataset.

APA, Harvard, Vancouver, ISO, and other styles

22

Mahmoudi, Omayma, Naoufal El Allali, and Mouncef Filali Bouami. "AMSVT: audio Mel-spectrogram vision transformer for spoken Arabic digit recognition." Indonesian Journal of Electrical Engineering and Computer Science 35, no. 2 (2024): 1013. http://dx.doi.org/10.11591/ijeecs.v35.i2.pp1013-1021.

Full text

Abstract:

This work presents a novel model to recognize spoken digits in the Arabic language. Due to the transformer-based models' tremendous success in natural language processing (NLP), several attempts have been made to extend transformer-based designs to other domains, such as vision and audio. However, our approach consists of extracting and inputting Mel-spectrogram features into our model of the proposed audio Mel-spectrogram vision transformer (AMSVT) for training. The signal processing community has been interested in these models due to the successful use of vision transformers (ViT) in several computer vision applications. This is because signals are frequently recorded as spectrograms (using the Mel-spectrogram, for example), which may be given directly as input to vision transformers. Our model outperformed a group of models in terms of accuracy and time, such as convolutional neural network (CNN)-based and recurrent neural network (RNN)-based.

APA, Harvard, Vancouver, ISO, and other styles

23

Omayma, Mahmoudi Naoufal El Allali Mouncef Filali Bouami. "AMSVT: audio Mel-spectrogram vision transformer for spoken Arabic digit recognition." Indonesian Journal of Electrical Engineering and Computer Science 35, no. 2 (2024): 1013–21. https://doi.org/10.11591/ijeecs.v35.i2.pp1013-1021.

Full text

Abstract:

This work presents a novel model to recognize spoken digits in the Arabic language. Due to the transformer-based models' tremendous success in natural language processing (NLP), several attempts have been made to extend transformer-based designs to other domains, such as vision and audio. However, our approach consists of extracting and inputting Mel-spectrogram features into our model of the proposed audio Mel-spectrogram vision transformer (AMSVT) for training. The signal processing community has been interested in these models due to the successful use of vision transformers (ViT) in several computer vision applications. This is because signals are frequently recorded as spectrograms (using the Mel-spectrogram, for example), which may be given directly as input to vision transformers. Our model outperformed a group of models in terms of accuracy and time, such as convolutional neural network (CNN)-based and recurrent neural network (RNN)-based.

APA, Harvard, Vancouver, ISO, and other styles

24

Zhang, Kuiyuan, Zhongyun Hua, Rushi Lan, Yifang Guo, Yushu Zhang, and Guoai Xu. "Multi-View Collaborative Learning Network for Speech Deepfake Detection." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 1 (2025): 1075–83. https://doi.org/10.1609/aaai.v39i1.32094.

Full text

Abstract:

As deep learning techniques advance rapidly, deepfake speech synthesized through text-to-speech or voice conversion networks is becoming increasingly realistic, posing significant challenges for detection and raising potential threats to social security. This growing realism has prompted extensive research in speech deepfake detection. However, current detection methods primarily focus on extracting features from either the raw waveform or the spectrogram, often overlooking the valuable correspondences between these two modalities that could enhance the detection of previously unseen types of deepfakes. In this work, we propose a multi-view collaborative learning network for speech deepfake detection, which jointly learns robust speech representations from both raw waveforms and spectrograms. Specifically, we first design a Dual-Branch Contrastive Learning (DBCL) framework for learning different view features. DBCL consists of two branches that learn representations from the raw waveform or the spectrogram and utilizes contrastive learning to enhance inter- and inner-view correlations. Additionally, we introduce a Waveform-Spectrogram Fusion Module (WSFM) to exchange multi-view information for collaborative learning. In the feature learning process, WSFM converts features between views and merges them adaptively using waveform-spectrogram cross-attention. The final detection is conducted based on the concatenation of the waveform and spectrogram features. We conduct extensive experiments on four benchmark deepfake speech detection datasets, and the experimental results demonstrate that our method can achieve better detection performance than current state-of-the-art detection methods.

APA, Harvard, Vancouver, ISO, and other styles

25

Choi, Byung-Moon, Ji Yeon Yim, Hangsik Shin, and Gyujeong Noh. "Novel Analgesic Index for Postoperative Pain Assessment Based on a Photoplethysmographic Spectrogram and Convolutional Neural Network: Observational Study." Journal of Medical Internet Research 23, no. 2 (2021): e23920. http://dx.doi.org/10.2196/23920.

Full text

Abstract:

Background Although commercially available analgesic indices based on biosignal processing have been used to quantify nociception during general anesthesia, their performance is low in conscious patients. Therefore, there is a need to develop a new analgesic index with improved performance to quantify postoperative pain in conscious patients. Objective This study aimed to develop a new analgesic index using photoplethysmogram (PPG) spectrograms and a convolutional neural network (CNN) to objectively assess pain in conscious patients. Methods PPGs were obtained from a group of surgical patients for 6 minutes both in the absence (preoperatively) and in the presence (postoperatively) of pain. Then, the PPG data of the latter 5 minutes were used for analysis. Based on the PPGs and a CNN, we developed a spectrogram–CNN index for pain assessment. The area under the curve (AUC) of the receiver-operating characteristic curve was measured to evaluate the performance of the 2 indices. Results PPGs from 100 patients were used to develop the spectrogram–CNN index. When there was pain, the mean (95% CI) spectrogram–CNN index value increased significantly—baseline: 28.5 (24.2-30.7) versus recovery area: 65.7 (60.5-68.3); P<.01. The AUC and balanced accuracy were 0.76 and 71.4%, respectively. The spectrogram–CNN index cutoff value for detecting pain was 48, with a sensitivity of 68.3% and specificity of 73.8%. Conclusions Although there were limitations to the study design, we confirmed that the spectrogram–CNN index can efficiently detect postoperative pain in conscious patients. Further studies are required to assess the spectrogram–CNN index’s feasibility and prevent overfitting to various populations, including patients under general anesthesia. Trial Registration Clinical Research Information Service KCT0002080; https://cris.nih.go.kr/cris/search/search_result_st01.jsp?seq=6638

APA, Harvard, Vancouver, ISO, and other styles

26

Ferreira, Diogo R., Tiago A. Martins, and Paulo Rodrigues. "Explainable deep learning for the analysis of MHD spectrograms in nuclear fusion." Machine Learning: Science and Technology 3, no. 1 (2021): 015015. http://dx.doi.org/10.1088/2632-2153/ac44aa.

Full text

Abstract:

Abstract In the nuclear fusion community, there are many specialized techniques to analyze the data coming from a variety of diagnostics. One of such techniques is the use of spectrograms to analyze the magnetohydrodynamic (MHD) behavior of fusion plasmas. Physicists look at the spectrogram to identify the oscillation modes of the plasma, and to study instabilities that may lead to plasma disruptions. One of the major causes of disruptions occurs when an oscillation mode interacts with the wall, stops rotating, and becomes a locked mode. In this work, we use deep learning to predict the occurrence of locked modes from MHD spectrograms. In particular, we use a convolutional neural network with class activation mapping to pinpoint the exact behavior that the model thinks is responsible for the locked mode. Surprisingly, we find that, in general, the model explanation agrees quite well with the physical interpretation of the behavior observed in the spectrogram.

APA, Harvard, Vancouver, ISO, and other styles

27

He, Yuan, Xinyu Li, Runlong Li, Jianping Wang, and Xiaojun Jing. "A Deep-Learning Method for Radar Micro-Doppler Spectrogram Restoration." Sensors 20, no. 17 (2020): 5007. http://dx.doi.org/10.3390/s20175007.

Full text

Abstract:

Radio frequency interference, which makes it difficult to produce high-quality radar spectrograms, is a major issue for micro-Doppler-based human activity recognition (HAR). In this paper, we propose a deep-learning-based method to detect and cut out the interference in spectrograms. Then, we restore the spectrograms in the cut-out region. First, a fully convolutional neural network (FCN) is employed to detect and remove the interference. Then, a coarse-to-fine generative adversarial network (GAN) is proposed to restore the part of the spectrogram that is affected by the interferences. The simulated motion capture (MOCAP) spectrograms and the measured radar spectrograms with interference are used to verify the proposed method. Experimental results from both qualitative and quantitative perspectives show that the proposed method can mitigate the interference and restore high-quality radar spectrograms. Furthermore, the comparison experiments also demonstrate the efficiency of the proposed approach.

APA, Harvard, Vancouver, ISO, and other styles

28

Kwon, Daehyun, Hanbit Kang, Dongwoo Lee, and Yoon-Chul Kim. "Deep learning-based prediction of atrial fibrillation from polar transformed time-frequency electrocardiogram." PLOS ONE 20, no. 3 (2025): e0317630. https://doi.org/10.1371/journal.pone.0317630.

Full text

Abstract:

Portable and wearable electrocardiogram (ECG) devices are increasingly utilized in healthcare for monitoring heart rhythms and detecting cardiac arrhythmias or other heart conditions. The integration of ECG signal visualization with AI-based abnormality detection empowers users to independently and confidently assess their physiological signals. In this study, we investigated a novel method for visualizing ECG signals using polar transformations of short-time Fourier transform (STFT) spectrograms and evaluated the performance of deep convolutional neural networks (CNNs) in predicting atrial fibrillation from these polar transformed spectrograms. The ECG data, which are available from the PhysioNet/CinC Challenge 2017, were categorized into four classes: normal sinus rhythm, atrial fibrillation, other rhythms, and noise. Preprocessing steps included ECG signal filtering, STFT-based spectrogram generation, and reverse polar transformation to generate final polar spectrogram images. These images were used as inputs for deep CNN models, where three pre-trained deep CNNs were used for comparisons. The results demonstrated that deep learning-based predictions using polar transformed spectrograms were comparable to existing methods. Furthermore, the polar transformed images offer a compact and intuitive representation of rhythm characteristics in ECG recordings, highlighting their potential for wearable applications.

APA, Harvard, Vancouver, ISO, and other styles

29

Fudholi, Dzikri Rahadian, Muhammad Auzan, and Novia Arum Sari. "Spectrogram Window Comparison: Cough Sound Recognition using Convolutional Neural Network." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 16, no. 3 (2022): 261. http://dx.doi.org/10.22146/ijccs.75697.

Full text

Abstract:

Cough is one of the most common symptoms of diseases, especially respiratory diseases. Quick cough detection can be the key to the current pandemic of COVID-19. Good cough recognition is the one that uses non-intrusive tools such as a mobile phone microphone that does not disable human activities like stick sensors. To do sound-only detection, Deep Learning current best method Convolutional Neural Network (CNN) is used. However, CNN needs image input while sound input differs (one dimension rather than two). An extra process is needed, converting sound data to image data using a spectrogram. When building a spectrogram, there is a question about the best size. This research will compare the spectrogram's size, called Spectrogram Window, by the performance. The result is that windows with 4 seconds have the highest F1-score performance at 92.9%. Therefore, a window of around 4 seconds will perform better for sound recognition problems.

APA, Harvard, Vancouver, ISO, and other styles

30

Liu, Haohe, Xubo Liu, Qiuqiang Kong, Wenwu Wang, and Mark D. Plumbley. "Learning Temporal Resolution in Spectrogram for Audio Classification." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 12 (2024): 13873–81. http://dx.doi.org/10.1609/aaai.v38i12.29294.

Full text

Abstract:

The audio spectrogram is a time-frequency representation that has been widely used for audio classification. One of the key attributes of the audio spectrogram is the temporal resolution, which depends on the hop size used in the Short-Time Fourier Transform (STFT). Previous works generally assume the hop size should be a constant value (e.g., 10 ms). However, a fixed temporal resolution is not always optimal for different types of sound. The temporal resolution affects not only classification accuracy but also computational cost. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution modeling for audio classification. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier and can be jointly optimized with the classification task. We evaluate DiffRes on five audio classification tasks, using mel-spectrograms as the acoustic features, followed by off-the-shelf classifier backbones. Compared with previous methods using the fixed temporal resolution, the DiffRes-based method can achieve the equivalent or better classification accuracy with at least 25% computational cost reduction. We further show that DiffRes can improve classification accuracy by increasing the temporal resolution of input acoustic features, without adding to the computational cost.

APA, Harvard, Vancouver, ISO, and other styles

31

Jiashen, Li, and Zhang Xianwu. "Extracting speech spectrogram of speech signal based on generalized S-transform." PLOS ONE 20, no. 1 (2025): e0317362. https://doi.org/10.1371/journal.pone.0317362.

Full text

Abstract:

In speech signal processing, time-frequency analysis is commonly employed to extract the spectrogram of speech signals. While many algorithms exist to achieve this with high-quality results, they often lack the flexibility to adjust the resolution of the extracted spectrograms. However, applications such as speech recognition and speech separation frequently require spectrograms of varying resolutions. The flexibility of an algorithm in providing different resolutions is crucial for these applications. This paper introduces the generalized S-transform, and explains its fundamental theory and algorithmic implementation. By adjusting parameters, the proposed method flexibly produces spectrograms with different resolutions, offering a novel and effective approach to obtain speech signal spectrograms. The algorithm enhances the traditional Stockwell transform (S-transform) by incorporating a low-pass filtering function and introducing two adjustable parameters. These parameters modify the Gaussian window function of the basic S-transform, resulting with the generalized S-transform with customizable time-frequency resolution. Finally, this paper presents simulation experiments using both synthesized signals and real speech datas, comparing with the generalized S-transform with several commonly used spectrogram extraction algorithms. The experiments demonstrate that the generalized S-transform is feasible and effective, particularly when it is combined with the generalized fundamental frequency profile. The results indicate that this method is a viable and effective in obtaining spectrograms of speech signals, and has potential application in speech feature extraction and speech recognition. The pure speech dataset used in the experiments is sourced from a downloadable database and partially from a recorded speech set.

APA, Harvard, Vancouver, ISO, and other styles

32

Lalla, Abderraouf, Andrea Albini, Paolo Di Barba, and Maria Evelina Mognaschi. "Spectrogram Inversion for Reconstruction of Electric Currents at Industrial Frequencies: A Deep Learning Approach." Sensors 24, no. 6 (2024): 1798. http://dx.doi.org/10.3390/s24061798.

Full text

Abstract:

In this paper, we present a deep learning approach for identifying current intensity and frequency. The reconstruction is based on measurements of the magnetic field generated by the current flowing in a conductor. Magnetic field data are collected using a magnetic probe capable of generating a spectrogram, representing the spectrum of frequencies of the magnetic field over time. These spectrograms are saved as images characterized by color density proportional to the induction field value at a given frequency. The proposed deep learning approach utilizes a convolutional neural network (CNN) with the spectrogram image as input and the current or frequency value as output. One advantage of this approach is that current estimation is achieved contactless, using a simple magnetic field probe positioned close to the conductor.

APA, Harvard, Vancouver, ISO, and other styles

33

Horn, Skyler, and Hynek Boril. "Gender classification from speech using convolutional networks augmented with synthetic spectrograms." Journal of the Acoustical Society of America 150, no. 4 (2021): A358. http://dx.doi.org/10.1121/10.0008585.

Full text

Abstract:

Automatic gender classification from speech is an integral component of human-computer interfaces. Gender information is utilized in user authentication, speech recognizers, or human-centered intelligent agents. This study focuses on gender classification from speech spectrograms using AlexNet-inspired 2D convolutional neural networks (CNN) trained on real samples augmented with synthetic spectrograms. A generative adversarial network (GAN) is trained to produce synthetic male/female-like speech spectrograms. In limited training data experiments on LibriSpeech, augmenting a training set of 200 real samples by 800 synthetic samples reduces equal error rate of the classifier from 23.7% to 1.0%. To further test the ‘quality’ of the generated samples, in a subsequent experiment, the real training samples are progressively replaced (rather than augmented) with synthetic samples at various ratios from 0 (all original samples preserved) to 1 (all original samples replaced by synthetic ones). Depending on the system setup, substituting between 50% to 90% of the original samples with the synthetic ones is found to have a minimal impact on the classifier performance. Finally, viewing the input CNN layers as filters that select salient spectrogram features, the learned convolutional kernels and filter outputs are studied to understand which spectrogram areas receive a prominent attention in the classifier.

APA, Harvard, Vancouver, ISO, and other styles

34

Oh, Myeonggeun, and Yong-Hoon Kim. "Statistical Approach to Spectrogram Analysis for Radio-Frequency Interference Detection and Mitigation in an L-Band Microwave Radiometer." Sensors 19, no. 2 (2019): 306. http://dx.doi.org/10.3390/s19020306.

Full text

Abstract:

For the elimination of radio-frequency interference (RFI) in a passive microwave radiometer, the threshold level is generally calculated from the mean value and standard deviation. However, a serious problem that can arise is an error in the retrieved brightness temperature from a higher threshold level owing to the presence of RFI. In this paper, we propose a method to detect and mitigate RFI contamination using the threshold level from statistical criteria based on a spectrogram technique. Mean and skewness spectrograms are created from a brightness temperature spectrogram by shifting the 2-D window to discriminate the form of the symmetric distribution as a natural thermal emission signal. From the remaining bins of the mean spectrogram eliminated by RFI-flagged bins in the skewness spectrogram for data captured at 0.1-s intervals, two distribution sides are identically created from the left side of the distribution by changing the standard position of the distribution. Simultaneously, kurtosis calculations from these bins for each symmetric distribution are repeatedly performed to determine the retrieved brightness temperature corresponding to the closest kurtosis value of three. The performance is evaluated using experimental data, and the maximum error and root-mean-square error (RMSE) in the retrieved brightness temperature are served to be less than approximately 3 K and 1.7 K, respectively, from a window with a size of 100 × 100 time–frequency bins according to the RFI levels and cases.

APA, Harvard, Vancouver, ISO, and other styles

35

Ender Ozturk, Fatih Erden, and Ismail Guvenc. "RF-based low-SNR classification of UAVs using convolutional neural networks." ITU Journal on Future and Evolving Technologies 2, no. 5 (2021): 39–52. http://dx.doi.org/10.52953/qjgh3217.

Full text

Abstract:

Unmanned Aerial Vehicles (UAVs), or drones, which can be considered as a coverage extender for Internet of Everything (IoE), have drawn high attention recently. The proliferation of drones will raise privacy and security concerns in public. This paper investigates the problem of classification of drones from Radio Frequency (RF) fingerprints at the low Signal-to-Noise Ratio (SNR) regime. We use Convolutional Neural Networks (CNNs) trained with both RF time-series images and the spectrograms of 15 different off-the-shelf drone controller RF signals. When using time-series signal images, the CNN extracts features from the signal transient and envelope. As the SNR decreases, this approach fails dramatically because the information in the transient is lost in the noise, and the envelope is distorted heavily. In contrast to time-series representation of the RF signals, with spectrograms, it is possible to focus only on the desired frequency interval, i.e., 2.4 GHz ISM band, and filter out any other signal component outside of this band. These advantages provide a notable performance improvement over the time-series signals-based methods. To further increase the classification accuracy of the spectrogram-based CNN, we denoise the spectrogram images by truncating them to a limited spectral density interval. Creating a single model using spectrogram images of noisy signals and tuning the CNN model parameters, we achieve a classification accuracy varying from 92% to 100% for an SNR range from -10 dB to 30 dB, which significantly outperforms the existing approaches to our best knowledge.

APA, Harvard, Vancouver, ISO, and other styles

36

Huh, Jiung, Huan Pham Van, Soonyoung Han, Hae-Jin Choi, and Seung-Kyum Choi. "A Data-Driven Approach for the Diagnosis of Mechanical Systems Using Trained Subtracted Signal Spectrograms." Sensors 19, no. 5 (2019): 1055. http://dx.doi.org/10.3390/s19051055.

Full text

Abstract:

Toward the prognostic and health management of mechanical systems, we propose and validate a novel effective, data-driven fault diagnosis method. In this method, we develop a trained subtracted spectrogram, the so called critical information map (CIM), identifying the difference between the signal spectrograms of normal and abnormal status. We believe this diagnosis process may be implemented in an autonomous manner so that an engineer employs it without expert knowledge in signal processing or mechanical analyses. Firstly, the CIM method applies sequential and autonomous procedures of time-synchronization, time frequency conversion, and spectral subtraction on raw signal. Secondly, the subtracted spectrogram is then trained to be a CIM for a specific mechanical system failure by finding out the optimal parameters and abstracted information of the spectrogram. Finally, the status of a system health can be monitored accurately by comparing the CIM with an acquired signal map in an automated and timely manner. The effectiveness of the proposed method is successfully validated by employing a diagnosis problem of six-degree-of-freedom industrial robot, which is the diagnosis of a non-stationary system with a small amount of training datasets.

APA, Harvard, Vancouver, ISO, and other styles

37

Yegnanarayana, B., and Vishala Pannala. "Processing group delay spectrograms for study of formant and harmonic contours in speech signals." Journal of the Acoustical Society of America 156, no. 4 (2024): 2422–33. http://dx.doi.org/10.1121/10.0032364.

Full text

Abstract:

This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the instantaneous narrowband equivalent GD spectrogram, derived using the modified SFF analysis of speech signals. For synthetic speech signals, the observed formant contours match the ground truth formant contours from which the signal is derived. For natural speech signals, the observed formant contours match approximately with the given ground truth formant contours mostly in the voiced regions. The results are illustrated for several randomly selected utterances from the TIMIT database. While this study helps to observe the contours of formants in the display, automatic extraction of the formant frequencies needs further processing, requiring logic for eliminating the spurious points, without forcing the number of formants.

APA, Harvard, Vancouver, ISO, and other styles

38

Rawat, Priyanshu, Madhvan Bajaj, Satvik Vats, and Vikrant Sharma. "A comprehensive study based on MFCC and spectrogram for audio classification." Journal of Information and Optimization Sciences 44, no. 6 (2023): 1057–74. http://dx.doi.org/10.47974/jios-1431.

Full text

Abstract:

Music Assortment is a music information retrieval (MIR) function to decide music connotation computationally. In recent years, deep neural networks have been proven to be effective in numerous classification tasks, including music genre categorisation. In this paper, we employ a comparative study between the two different music classification techniques. The first technique uses the audio’s spectrogram image and computes the music’s genre based on its spectrogram, using the CNN model trained on the spectrograms. The second approach computes the MFCC’s (Mel-Frequency Cepstral Coefficients) musical features and utilises them to classify the music using ANN. This paper aims to study the two algorithms closely against different audio signals and check the performance report of the above-mentioned techniques to see which of them is better for music genre classification.

APA, Harvard, Vancouver, ISO, and other styles

39

Lv, Dan, Yan Zhang, Danjv Lv, Jing Lu, Yixing Fu, and Zhun Li. "Combining CBAM and Iterative Shrinkage-Thresholding Algorithm for Compressive Sensing of Bird Images." Applied Sciences 14, no. 19 (2024): 8680. http://dx.doi.org/10.3390/app14198680.

Full text

Abstract:

Bird research contributes to understanding species diversity, ecosystem functions, and the maintenance of biodiversity. By analyzing bird images and the audio of birds, we can monitor bird distribution, abundance, and behavior to better understand the health of ecosystems. However, bird images and audio involve a vast amount of data. To improve the efficiency of data transmission and storage efficiency and save bandwidth, compressive sensing can overcome this challenge. Compressive sensing is a technique that uses the sparsity of signals to recover original data from a small number of linear measurements. This paper introduces a deep neural network based on the Iterative Shrinkage Thresholding Algorithm (ISTA) and a Convolutional Block Attention Module (CBAM), CBAM_ISTA-Net+, for the compressive reconstruction of bird images, audio Mel spectrograms and wavelet transform spectrograms. Using 45 bird species as research subjects, including 20 bird images, 15 audio-generated Mel spectrograms, and 10 audio wavelet transform (WT) spectrograms, the experimental results show that CBAM_ISTA-Net+ achieves a higher peak signal-to-noise ratio (PSNR) at different compression ratios. At a compression ratio of 50%, the average PSNR of the three datasets reaches 33.62 dB, 55.76 dB, and 38.59 dB, while both the Mel spectrogram and wavelet transform spectrogram achieve more than 30 dB at compression ratios of 25–50%. These results highlight the effectiveness of CBAM_ISTA-Net+ in maintaining high reconstruction quality even under significant compression, demonstrating its potential as a valuable tool for efficient data management in ecological research.

APA, Harvard, Vancouver, ISO, and other styles

40

Jiang, Hao, Jianqing Jiang, and Guoshao Su. "Rock Crack Types Identification by Machine Learning on the Sound Signal." Applied Sciences 13, no. 13 (2023): 7654. http://dx.doi.org/10.3390/app13137654.

Full text

Abstract:

Sound signals generated during rock failure contain useful information about crack development. A sound-signal-based identification method for crack types is proposed. In this method, the sound signals of tensile cracks, using the Brazilian splitting test, and those of shear cracks, using the direct shear test, are collected to establish the training samples. The spectrogram is used to characterize the sound signal and is taken as the input. To solve the small sample problem, since only a small amount of sound signal spectrogram can be obtained in our experimental test, pre-trained ResNet-18 is used as a feature extractor to acquire deep characteristics of sound signal spectrograms. Gaussian process classification (GPC) is employed to establish the recognizing model and to classify crack types using the extracted deep characteristics of spectrograms. To verify the proposed method, the tensile and shear crack development processes during the biaxial test are identified. The results show that the proposed method is feasible. Moreover, this method is used to investigate the tensile and shear crack development during the rockburst process. The obtained results are consistent with previous research results, further confirming the accuracy and rationality of this method.

APA, Harvard, Vancouver, ISO, and other styles

41

Trufanov, N. N., D. V. Churikov, and O. V. Kravchenko. "Selection of window functions for predicting the frequency pattern of vibrations of the technological process using an artificial neural network." Journal of Physics: Conference Series 2091, no. 1 (2021): 012074. http://dx.doi.org/10.1088/1742-6596/2091/1/012074.

Full text

Abstract:

Abstract The frequency pattern of the process is investigated by analyzing spectrograms constructed using the window Fourier transform. A set of window functions consists of a rectangular, membership, and windows based on atomic functions. The fulfillment of the condition for improving the time localization and energy concentration in the central part of the window allows one to select a window function. The resulting spectrograms are fed to the input of an artificial neural network to obtain a forecast. Varying the shape of the window functions allows us to analyze the proposed spectrogram prediction model.

APA, Harvard, Vancouver, ISO, and other styles

42

Dwijayanti, Suci, Alvio Yunita Putri, and Bhakti Yudho Suprapto. "Speaker Identification Using a Convolutional Neural Network." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, no. 1 (2022): 140–45. http://dx.doi.org/10.29207/resti.v6i1.3795.

Full text

Abstract:

Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.

APA, Harvard, Vancouver, ISO, and other styles

43

Cameron, J., A. Crosby, C. Paszkowski, and E. Bayne. "Visual spectrogram scanning paired with an observation–confirmation occupancy model improves the efficiency and accuracy of bioacoustic anuran data." Canadian Journal of Zoology 98, no. 11 (2020): 733–42. http://dx.doi.org/10.1139/cjz-2020-0103.

Full text

Abstract:

Passive acoustic monitoring using autonomous recording units has improved anuran amphibian call survey data collection. A challenge associated with this approach is the time required for audio data processing. Our objective was to develop a more efficient method of processing and analyzing acoustic data through visual spectrogram scanning and the application of an observation–confirmation occupancy model. We compared detection rates between methods of standard recording listening and visually scanning spectrogram images using different spectrogram parameters. Relative to listening, we found that 1 min spectrograms in two 30 s frames yield the best time efficiency–accuracy trade-off. A standard occupancy model applied to visual scanning data underestimated occupancy estimates relative to listening data for three species and overestimated occupancy for one species. The observation–confirmation model used a subset of listening data to improve the estimates of detection probability from visual scanning and therefore reduced bias in occupancy estimates when compared with using visual scanning data alone. Overall, the combination of the visual scanning method and the observation–confirmation model allowed us to maintain the accuracy of occupancy estimates while greatly increasing the efficiency of anuran data processing. These methods are widely applicable and can increase sample size and precision for acoustic monitoring programs using autonomous recording units.

APA, Harvard, Vancouver, ISO, and other styles

44

Zhu, Yuefan, and Xiaoying Liu. "A Lightweight CNN for Wind Turbine Blade Defect Detection Based on Spectrograms." Machines 11, no. 1 (2023): 99. http://dx.doi.org/10.3390/machines11010099.

Full text

Abstract:

Since wind turbines are exposed to harsh working environments and variable weather conditions, wind turbine blade condition monitoring is critical to prevent unscheduled downtime and loss. Realizing that common convolutional neural networks are difficult to use in embedded devices, a lightweight convolutional neural network for wind turbine blades (WTBMobileNet) based on spectrograms is proposed, reducing computation and size with a high accuracy. Compared to baseline models, WTBMobileNet without data augmentation has an accuracy of 97.05%, a parameter of 0.315 million, and a computation of 0.423 giga floating point operations (GFLOPs), which is 9.4 times smaller and 2.7 times less computation than the best-performing model with only a 1.68% decrease in accuracy. Then, the impact of difference data augmentation is analyzed. The WTBMobileNet with augmentation has an accuracy of 98.1%, and the accuracy of each category is above 95%. Furthermore, the interpretability and transparency of WTBMobileNet are demonstrated through class activation mapping for reliable deployment. Finally, WTBMobileNet is explored in drones image classification and spectrogram object detection, whose accuracy and mAP@[0.5, 0.95] are 89.55% and 70.7%, respectively. This proves that WTBMobileNet not only has a good performance in spectrogram classification, but also has good application potential in drone image classification and spectrogram object detection.

APA, Harvard, Vancouver, ISO, and other styles

45

Heim, Olga, Dennis M. Heim, Lara Marggraf, et al. "Variant maps for bat echolocation call identification algorithms." Bioacoustics 29, no. 5 (2020): 557–71. https://doi.org/10.5281/zenodo.13456968.

Full text

Abstract:

(Uploaded by Plazi for the Bat Literature Project) Automated ultrasonic recordings are widely used in basic and applied research to detect the presence of bats. Often, algorithms for the automated identification of species are based on a preprocessing of acoustic information that involves the generation of spectrograms. Even though this approach is technically advanced, recent surveys highlight substantially high failure rates to identify species correctly, which urges for improved processes. Here, we tested an entirely new method, in particular, the transformation of ultrasonic recordings into variant maps. To compare this method with a spectrogram-based method, we used a database consisting of 160 echolocation calls from eight European bat species, including species of the genus Myotis that are inherently difficult to separate based on echolocation calls. For non-Myotis species, both methods led to a 100% correct identification rate, while for Myotis species the use of variant maps led to a lower identification rate of 85.3% compared to 91.1% that was achieved with a spectrogram-based method. However, a combination of both methods could lead to an identification rate of 94.1% for Myotis species. This result suggests combining our approach with spectrogram-based techniques to improve the automated identification of species based on acoustic information.

APA, Harvard, Vancouver, ISO, and other styles

46

Heim, Olga, Dennis M. Heim, Lara Marggraf, et al. "Variant maps for bat echolocation call identification algorithms." Bioacoustics 29, no. 5 (2020): 557–71. https://doi.org/10.5281/zenodo.13456968.

Full text

Abstract:

(Uploaded by Plazi for the Bat Literature Project) Automated ultrasonic recordings are widely used in basic and applied research to detect the presence of bats. Often, algorithms for the automated identification of species are based on a preprocessing of acoustic information that involves the generation of spectrograms. Even though this approach is technically advanced, recent surveys highlight substantially high failure rates to identify species correctly, which urges for improved processes. Here, we tested an entirely new method, in particular, the transformation of ultrasonic recordings into variant maps. To compare this method with a spectrogram-based method, we used a database consisting of 160 echolocation calls from eight European bat species, including species of the genus Myotis that are inherently difficult to separate based on echolocation calls. For non-Myotis species, both methods led to a 100% correct identification rate, while for Myotis species the use of variant maps led to a lower identification rate of 85.3% compared to 91.1% that was achieved with a spectrogram-based method. However, a combination of both methods could lead to an identification rate of 94.1% for Myotis species. This result suggests combining our approach with spectrogram-based techniques to improve the automated identification of species based on acoustic information.

APA, Harvard, Vancouver, ISO, and other styles

47

Gu, Lianglian, Guangzhi Di, Danju Lv, et al. "A Multi-Scale Feature Fusion Hybrid Convolution Attention Model for Birdsong Recognition." Applied Sciences 15, no. 8 (2025): 4595. https://doi.org/10.3390/app15084595.

Full text

Abstract:

Birdsong is a valuable indicator of rich biodiversity and ecological significance. Although feature extraction has demonstrated satisfactory performance in classification, single-scale feature extraction methods may not fully capture the complexity of birdsong, potentially leading to suboptimal classification outcomes. The integration of multi-scale feature extraction and fusion enables the model to better handle scale variations, thereby enhancing its adaptability across different scales. To address this issue, we propose a multi-scale hybrid convolutional attention mechanism model (MUSCA). This method combines depthwise separable convolution and traditional convolution for feature extraction and incorporates self-attention and spatial attention mechanisms to refine spatial and channel features, thereby improving the effectiveness of multi-scale feature extraction. To further enhance multi-scale feature fusion, a layer-by-layer alignment feature fusion method is developed to establish a deeper correlation, thereby improving classification accuracy and robustness. Using the above method, we identified 20 bird species on three spectrograms, wavelet spectrogram, log-Mel spectrogram and log-spectrogram, with recognition rates of 93.79%, 96.97% and 95.44%, respectively. Compared with the resnet18 model, it increased by 3.26%, 1.88% and 3.09%, respectively. The results indicate that the MUSCA method proposed in this paper is competitive compared to recent and state-of-the-art methods.

APA, Harvard, Vancouver, ISO, and other styles

48

Léonard, François. "Phase spectrogram and frequency spectrogram as new diagnostic tools." Mechanical Systems and Signal Processing 21, no. 1 (2007): 125–37. http://dx.doi.org/10.1016/j.ymssp.2005.08.011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

Liao, Ying. "Analysis of Rehabilitation Occupational Therapy Techniques Based on Instrumental Music Chinese Tonal Language Spectrogram Analysis." Occupational Therapy International 2022 (October 3, 2022): 1–12. http://dx.doi.org/10.1155/2022/1064441.

Full text

Abstract:

This paper provides an in-depth analysis of timbre-speech spectrograms in instrumental music, designs a model analysis of rehabilitation occupational therapy techniques based on the analysis of timbre-speech spectrograms in instrumental music, and tests the models for comparison. Starting from the mechanism of human articulation, this paper models the process of human expression as a time-varying linear system consisting of excitation, vocal tract, and radiation models. The system’s overall architecture is designed according to the characteristics of Chinese speech and everyday speech rehabilitation theory (HSL theory). The dual judgment of temporal threshold and short-time average energy realized the phonetic length training. Tone and clear tone training were achieved by linear predictive coding technique (LPC) and autocorrelation function. Using the DTW technique, isolated word speech recognition was achieved by extracting Mel-scale Frequency Cepstral Coefficients (MFCC) parameters of speech signals. The system designs corresponding training scenes for each training module according to the extracted speech parameters, combines the multimedia speech spectrogram motion situation with the speech parameters, and finally presents the training content as a speech spectrogram, and evaluates the training results through human-machine interaction to stimulate the interest of rehabilitation therapy and realize the speech rehabilitation training of patients. After analyzing the pre- and post-test data, it was found that the p -values of all three groups were <0.05, which was judged to be significantly different. Also, all subjects changed their behavioral data during the treatment. Therefore, it was concluded that the music therapy technique could improve the patients’ active gaze communication ability, verbal command ability, and active question-answering ability after summarizing the data, i.e., the hypothesis of this experiment is valid. Therefore, it is believed that the technique of timbre-speech spectrogram analysis in instrumental music can achieve the effect of rehabilitation therapy to a certain extent.

APA, Harvard, Vancouver, ISO, and other styles

50

Lopes, Marilia, Raymundo Cassani, and Tiago H. Falk. "Using CNN Saliency Maps and EEG Modulation Spectra for Improved and More Interpretable Machine Learning-Based Alzheimer’s Disease Diagnosis." Computational Intelligence and Neuroscience 2023 (February 8, 2023): 1–17. http://dx.doi.org/10.1155/2023/3198066.

Full text

Abstract:

Biomarkers based on resting-state electroencephalography (EEG) signals have emerged as a promising tool in the study of Alzheimer’s disease (AD). Recently, a state-of-the-art biomarker was found based on visual inspection of power modulation spectrograms where three “patches” or regions from the modulation spectrogram were proposed and used for AD diagnostics. Here, we propose the use of deep neural networks, in particular convolutional neural networks (CNNs) combined with saliency maps, trained on power modulation spectrogram inputs to find optimal patches in a data-driven manner. Experiments are conducted on EEG data collected from fifty-four participants, including 20 healthy controls, 19 patients with mild AD, and 15 moderate-to-severe AD patients. Five classification tasks are explored, including the three-class problem, early-stage detection (control vs. mild-AD), and severity level detection (mild vs. moderate-to-severe). Experimental results show the proposed biomarkers outperform the state-of-the-art benchmark across all five tasks, as well as finding complementary modulation spectrogram regions not previously seen via visual inspection. Lastly, experiments are conducted on the proposed biomarkers to test their sensitivity to age, as this is a known confound in AD characterization. Across all five tasks, none of the proposed biomarkers showed a significant relationship with age, thus further highlighting their usefulness for automated AD diagnostics.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!