To see the other types of publications on this topic, follow the link: Spectrogram.

Journal articles on the topic 'Spectrogram'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Spectrogram.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lee, Sang-Hoon, Hyun-Wook Yoon, Hyeong-Rae Noh, Ji-Hoon Kim, and Seong-Whan Lee. "Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (2021): 13198–206. http://dx.doi.org/10.1609/aaai.v35i14.17559.

Full text
Abstract:
While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems have shown significant improvement in neural speech synthesis, there is no TTS system to learn to synthesize speech from text sequences with only adversarial feedback. Because adversarial feedback alone is not sufficient to train the generator, current models still require the reconstruction loss compared with the ground-truth and the generated mel-spectrogram directly. In this paper, we present Multi-SpectroGAN (MSG), which can train the multi-speaker model with only the adversarial feedback by conditioning
APA, Harvard, Vancouver, ISO, and other styles
2

Johnson, Alexander. "An integrated approach for teaching speech spectrogram analysis to engineering students." Journal of the Acoustical Society of America 152, no. 3 (2022): 1962–69. http://dx.doi.org/10.1121/10.0014172.

Full text
Abstract:
Spectrogram analysis is a vital skill for learning speech acoustics. Spectrograms are necessary for visualizing cause-effect relationships between speech articulator movements and the resulting sound produced. However, many interpretation techniques needed to read spectrograms are counterintuitive to engineering students who have been taught to use more rigid mathematical formulas. As a result, spectrogram reading is often challenging for these students who do not have prior background in acoustic phonetics. In this paper, a structured, inclusive framework for teaching spectrogram reading to s
APA, Harvard, Vancouver, ISO, and other styles
3

Kim, Seong-Yoon, Hyun-Min Lee, Chae-Young Lim, and Hyun-Woo Kim. "Detection of Abnormal Symptoms Using Acoustic-Spectrogram-Based Deep Learning." Applied Sciences 15, no. 9 (2025): 4679. https://doi.org/10.3390/app15094679.

Full text
Abstract:
Acoustic data inherently contain a variety of information, including indicators of abnormal symptoms. In this study, we propose a method for detecting abnormal symptoms by converting acoustic data into spectrogram representations and applying a deep learning model. Spectrograms effectively capture the temporal and frequency characteristics of acoustic signals. In this work, we extract key features such as spectrograms, Mel-spectrograms, and MFCCs from raw acoustic data and use them as input for training a convolutional neural network. The proposed model is based on a custom ResNet architecture
APA, Harvard, Vancouver, ISO, and other styles
4

Basak, Gopal K., and Tridibesh Dutta. "Statistical Speaker Identification Based on Spectrogram Imaging." Calcutta Statistical Association Bulletin 59, no. 3-4 (2007): 253–63. http://dx.doi.org/10.1177/0008068320070309.

Full text
Abstract:
Abstract: The paper addresses the problem of speaker identification based on spectrograms in the text dependent case. Using spectrogram segmentation, this paper, mainly, focusses on understanding the complex patterns in frequency and amplitude in an utterance of a given word by an individual. The features used for identifying a speaker based on an observed variable extracted from the spectrograms, rely on the distinct speaker effect, his/her interaction effect with the particular word and with the frequency bands of the spectrogram. Performance of this novel approach on spectrogram samples, co
APA, Harvard, Vancouver, ISO, and other styles
5

Han, Ying, Qiao Wang, Jianping Huang, et al. "Frequency Extraction of Global Constant Frequency Electromagnetic Disturbances from Electric Field VLF Data on CSES." Remote Sensing 15, no. 8 (2023): 2057. http://dx.doi.org/10.3390/rs15082057.

Full text
Abstract:
The electromagnetic data observed with the CSES (China Seismo-Electromagnetic Satellite, also known as Zhangheng-1 satellite) contain numerous spatial disturbances. These disturbances exhibit various shapes on the spectrogram, and constant frequency electromagnetic disturbances (CFEDs), such as artificially transmitted very-low-frequency (VLF) radio waves, power line harmonics, and interference from the satellite platform itself, appear as horizontal lines. To exploit this feature, we proposed an algorithm based on computer vision technology that automatically recognizes these lines on the spe
APA, Harvard, Vancouver, ISO, and other styles
6

Shingchern D. You, Kai-Rong Lin, and Chien-Hung Liu. "Estimating Classification Accuracy for Unlabeled Datasets Based on Block Scaling." International Journal of Engineering and Technology Innovation 13, no. 4 (2023): 313–27. http://dx.doi.org/10.46604/ijeti.2023.11975.

Full text
Abstract:
This paper proposes an approach called block scaling quality (BSQ) for estimating the prediction accuracy of a deep network model. The basic operation perturbs the input spectrogram by multiplying all values within a block by , where is equal to 0 in the experiments. The ratio of perturbed spectrograms that have different prediction labels than the original spectrogram to the total number of perturbed spectrograms indicates how much of the spectrogram is crucial for the prediction. Thus, this ratio is inversely correlated with the accuracy of the dataset. The BSQ approach demonstrates satisfac
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Hong Ping, and Hong Li. "Establish an Artificial Neural Networks Model to Make Quantitative Analysis about the Capillary Electrophoresis Spectrum." Advanced Materials Research 452-453 (January 2012): 1116–20. http://dx.doi.org/10.4028/www.scientific.net/amr.452-453.1116.

Full text
Abstract:
Simulating the overlapping capillary electrophoresis spectrogram under the dissimilar conditions by the computer system , Choosing the overlapping capillary electrophoresis spectrogram simulated under the different conditions , processing the data to compose a neural network training regulations, Applying the artificial neural networks method to make a quantitative analysis about the multi-component in the overlapping capillary electrophoresis spectrogram,Using: Radial direction primary function neural network model and multi-layered perceptron neural network model. The findings indicated that
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Juan, Xueying Zhang, Lixia Huang, Fenglian Li, Shufei Duan, and Ying Sun. "Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network." Applied Sciences 12, no. 19 (2022): 9518. http://dx.doi.org/10.3390/app12199518.

Full text
Abstract:
In the background of artificial intelligence, the realization of smooth communication between people and machines has become the goal pursued by people. Mel spectrograms is a common method used in speech emotion recognition, focusing on the low-frequency part of speech. In contrast, the inverse Mel (IMel) spectrogram, which focuses on the high-frequency part, is proposed to comprehensively analyze emotions. Because the convolutional neural network-stacked sparse autoencoder (CNN-SSAE) can extract deep optimized features, the Mel-IMel dual-channel complementary structure is proposed. In the fir
APA, Harvard, Vancouver, ISO, and other styles
9

Pethiyagoda, Ravindra, Scott W. McCue, and Timothy J. Moroney. "Spectrograms of ship wakes: identifying linear and nonlinear wave signals." Journal of Fluid Mechanics 811 (December 6, 2016): 189–209. http://dx.doi.org/10.1017/jfm.2016.753.

Full text
Abstract:
A spectrogram is a useful way of using short-time discrete Fourier transforms to visualise surface height measurements taken of ship wakes in real-world conditions. For a steadily moving ship that leaves behind small-amplitude waves, the spectrogram is known to have two clear linear components, a sliding-frequency mode caused by the divergent waves and a constant-frequency mode for the transverse waves. However, recent observations of high-speed ferry data have identified additional components of the spectrograms that are not yet explained. We use computer simulations of linear and nonlinear s
APA, Harvard, Vancouver, ISO, and other styles
10

Godbole, Shubham, Vaishnavi Jadhav, and Gajanan Birajdar. "Indian Language Identification using Deep Learning." ITM Web of Conferences 32 (2020): 01010. http://dx.doi.org/10.1051/itmconf/20203201010.

Full text
Abstract:
Spoken language is the most regular method of correspondence in this day and age. Endeavours to create language recognizable proof frameworks for Indian dialects have been very restricted because of the issue of speaker accessibility and language readability. However, the necessity of SLID is expanding for common and safeguard applications day by day. Feature extraction is a basic and important procedure performed in LID. A sound example is changed over into a spectrogram visual portrayal which describes a range of frequencies in regard with time. Three such spectrogram visuals were generated
APA, Harvard, Vancouver, ISO, and other styles
11

Samad, Salina Abdul, and Aqilah Baseri Huddin. "Improving spectrogram correlation filters with time-frequency reassignment for bio-acoustic signal classification." Indonesian Journal of Electrical Engineering and Computer Science 14, no. 1 (2019): 59. http://dx.doi.org/10.11591/ijeecs.v14.i1.pp59-64.

Full text
Abstract:
<p>Spectrogram features have been used to automatically classify animals based on their vocalization. Usually, features are extracted and used as inputs to classifiers to distinguish between species. In this paper, a classifier based on Correlation Filters (CFs) is employed where the input features are the spectrogram image themselves. Spectrogram parameters are carefully selected based on the target dataset in order to obtain clear distinguishing images termed as call-prints. An even better representation of the call-prints is obtained using spectrogram Time-Frequency (TF) reassignment.
APA, Harvard, Vancouver, ISO, and other styles
12

Samad, Salina Abdul, and Aqilah Baseri Huddin. "Improving spectrogram correlation filters with time-frequency reassignment for bio-acoustic signal classification." Indonesian Journal of Electrical Engineering and Computer Science 14, no. 1 (2019): 59–64. https://doi.org/10.11591/ijeecs.v14.i1.pp59-64.

Full text
Abstract:
Spectrogram features have been used to automatically classify animals based on their vocalization. Usually features are extracted and used as inputs to classifiers to distinguish between species. In this paper, a classifier based on Correlation Filters (CFs) is employed where the input features are the spectrogram image themselves. Spectrogram parameters are carefully selected based on the target dataset in order to obtain clear distinguishing images termed as call-prints. An even better representations of the call-prints are obtained using spectrogram Time-Frequency (TF) reassignment. To demo
APA, Harvard, Vancouver, ISO, and other styles
13

Tucker, Jeff, Kathleen E. Wage, John R. Buck, and Lora J. Van Uffelen. "Performance weighted blended spectrogram." Journal of the Acoustical Society of America 157, no. 3 (2025): 2106–16. https://doi.org/10.1121/10.0036216.

Full text
Abstract:
Spectrograms are used for time-frequency analysis and as preprocessing for signal classifiers and other algorithms. The conventional spectrogram is a tapered short-time Fourier transform, equivalent to a bank of bandpass filters. The taper defines filter-bank characteristics such as bandwidth and sidelobe levels. Although the conventional spectrogram uses minimal computational resources, its design requires a compromise between resolution and interference suppression. Adaptive spectrogram algorithms adjust the filter-bank based on incoming data, thereby allowing different bandwidth/sidelobe tr
APA, Harvard, Vancouver, ISO, and other styles
14

Franzoni, Valentina. "Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs." Journal of Autonomous Intelligence 6, no. 3 (2023): 678. http://dx.doi.org/10.32629/jai.v6i3.678.

Full text
Abstract:
<p>In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing o
APA, Harvard, Vancouver, ISO, and other styles
15

Yu, Youxin, Wenbo Zhu, Xiaoli Ma, et al. "Recognition of Sheep Feeding Behavior in Sheepfolds Using Fusion Spectrogram Depth Features and Acoustic Features." Animals 14, no. 22 (2024): 3267. http://dx.doi.org/10.3390/ani14223267.

Full text
Abstract:
In precision feeding, non-contact and pressure-free monitoring of sheep feeding behavior is crucial for health monitoring and optimizing production management. The experimental conditions and real-world environments differ when using acoustic sensors to identify sheep feeding behaviors, leading to discrepancies and consequently posing challenges for achieving high-accuracy classification in complex production environments. This study enhances the classification performance by integrating the deep spectrogram features and acoustic characteristics associated with feeding behavior. We conducted t
APA, Harvard, Vancouver, ISO, and other styles
16

China Venkateswarlu, Guide: Dr S. "Speech Enhancement Using Spectrogram Denoising with Deep U-Net Architectures." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem48929.

Full text
Abstract:
Abstract -- Acoustic noise significantly degrades speech quality and intelligibility in almost all applications, ranging from telecommunications to voice assistants. In this paper, we address this problem by designing an efficient speech enhancement system based on deep learning. Our approach relies on spectrogram denoising, wherein audio signals are represented as 2D magnitude spectrograms that well maintain signal structure and enable direct application of Convolutional Neural Networks (CNNs). The backbone of our system is a U-Net model, which is a strong deep convolutional autoencoder capab
APA, Harvard, Vancouver, ISO, and other styles
17

Li, Chunhui, Xin Xiang, Hu Mao, Rui Wang, and Yonglei Qi. "Anchor-Free SNR-Aware Signal Detector for Wideband Signal Detection Framework." Electronics 14, no. 11 (2025): 2260. https://doi.org/10.3390/electronics14112260.

Full text
Abstract:
The spectrogram-based wideband signal detection framework has garnered increasing attention in various wireless communication applications. However, the front-end spectrograms in existing methods suffer from visual and informational deficiencies. This paper proposes a novel multichannel enhanced spectrogram (MCE spectrogram) to address these issues. The MCE spectrogram leverages additional channels for both visual and informational enhancement, highlighting signal regions and features while integrating richer recognition information across channels, thereby significantly improving feature extr
APA, Harvard, Vancouver, ISO, and other styles
18

Alia Hussein, Ahmed Talib Abdulameer, Ali Abdulkarim, Husniza Husni, and Dalia Al-Ubaidi. "Classification of Dyslexia Among School Students Using Deep Learning." Journal of Techniques 6, no. 1 (2024): 85–92. http://dx.doi.org/10.51173/jt.v6i1.1893.

Full text
Abstract:
Dyslexia is a common learning disorder that affects children’s reading and writing skills. Early identification of Dyslexia is essential for providing appropriate interventions and support to affected children. Traditional methods of diagnosing Dyslexia often rely on subjective assessments and the expertise of specialists, leading to delays and potential inaccuracies in diagnosis. This study proposes a novel approach for diagnosing dyslexic children using spectrogram analysis and convolutional neural networks (CNNs). Spectrograms are visual representations of audio signals that provide detaile
APA, Harvard, Vancouver, ISO, and other styles
19

Smietanka, Lukasz, and Tomasz Maka. "Enhancing Embedded Space with Low–Level Features for Speech Emotion Recognition." Applied Sciences 15, no. 5 (2025): 2598. https://doi.org/10.3390/app15052598.

Full text
Abstract:
This work proposes an approach that uses a feature space by combining the representation obtained in the unsupervised learning process and manually selected features defining the prosody of the utterances. In the experiments, we used two time-frequency representations (Mel and CQT spectrograms) and EmoDB and RAVDESS databases. As the results show, the proposed system improved the classification accuracy of both representations: 1.29% for CQT and 3.75% for Mel spectrogram compared to the typical CNN architecture for the EmoDB dataset and 3.02% for CQT and 0.63% for Mel spectrogram in the case o
APA, Harvard, Vancouver, ISO, and other styles
20

Jenkins, William F., Peter Gerstoft, Chih-Chieh Chien, and Emma Ozanich. "Reducing dimensionality of spectrograms using convolutional autoencoders." Journal of the Acoustical Society of America 153, no. 3_supplement (2023): A178. http://dx.doi.org/10.1121/10.0018582.

Full text
Abstract:
Under the “curse of dimensionality,” distance-based algorithms, such as k-means or Gaussian mixture model clustering, can lose meaning and interpretability in high-dimensional space. Acoustic data, specifically spectrograms, are subject to such limitations due to their high dimensionality: for example, a spectrogram with 100 time- and 100 frequency-bins contains 104 pixels, and its vectorized form constitutes a point in 104-dimensional space. In this talk, we look at four papers that used autoencoding convolutional neural networks to extract salient features of real data. The convolutional aut
APA, Harvard, Vancouver, ISO, and other styles
21

Hajihashemi, Vahid, Abdorreza Alavi Gharahbagh, Narges Hajaboutalebi, Mohsen Zahraei, José J. M. Machado, and João Manuel R. S. Tavares. "A Feature-Reduction Scheme Based on a Two-Sample t-Test to Eliminate Useless Spectrogram Frequency Bands in Acoustic Event Detection Systems." Electronics 13, no. 11 (2024): 2064. http://dx.doi.org/10.3390/electronics13112064.

Full text
Abstract:
Acoustic event detection (AED) systems, combined with video surveillance systems, can enhance urban security and safety by automatically detecting incidents, supporting the smart city concept. AED systems mostly use mel spectrograms as a well-known effective acoustic feature. The spectrogram is a combination of frequency bands. A big challenge is that some of the spectrogram bands may be similar in different events and be useless in AED. Removing useless bands reduces the input feature dimension and is highly desirable. This article proposes a mathematical feature analysis method to identify a
APA, Harvard, Vancouver, ISO, and other styles
22

Mahmoudi, Omayma, Naoufal El Allali, and Mouncef Filali Bouami. "AMSVT: audio Mel-spectrogram vision transformer for spoken Arabic digit recognition." Indonesian Journal of Electrical Engineering and Computer Science 35, no. 2 (2024): 1013. http://dx.doi.org/10.11591/ijeecs.v35.i2.pp1013-1021.

Full text
Abstract:
This work presents a novel model to recognize spoken digits in the Arabic language. Due to the transformer-based models' tremendous success in natural language processing (NLP), several attempts have been made to extend transformer-based designs to other domains, such as vision and audio. However, our approach consists of extracting and inputting Mel-spectrogram features into our model of the proposed audio Mel-spectrogram vision transformer (AMSVT) for training. The signal processing community has been interested in these models due to the successful use of vision transformers (ViT) in severa
APA, Harvard, Vancouver, ISO, and other styles
23

Omayma, Mahmoudi Naoufal El Allali Mouncef Filali Bouami. "AMSVT: audio Mel-spectrogram vision transformer for spoken Arabic digit recognition." Indonesian Journal of Electrical Engineering and Computer Science 35, no. 2 (2024): 1013–21. https://doi.org/10.11591/ijeecs.v35.i2.pp1013-1021.

Full text
Abstract:
This work presents a novel model to recognize spoken digits in the Arabic language. Due to the transformer-based models' tremendous success in natural language processing (NLP), several attempts have been made to extend transformer-based designs to other domains, such as vision and audio. However, our approach consists of extracting and inputting Mel-spectrogram features into our model of the proposed audio Mel-spectrogram vision transformer (AMSVT) for training. The signal processing community has been interested in these models due to the successful use of vision transformers (ViT) in severa
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Kuiyuan, Zhongyun Hua, Rushi Lan, Yifang Guo, Yushu Zhang, and Guoai Xu. "Multi-View Collaborative Learning Network for Speech Deepfake Detection." Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 1 (2025): 1075–83. https://doi.org/10.1609/aaai.v39i1.32094.

Full text
Abstract:
As deep learning techniques advance rapidly, deepfake speech synthesized through text-to-speech or voice conversion networks is becoming increasingly realistic, posing significant challenges for detection and raising potential threats to social security. This growing realism has prompted extensive research in speech deepfake detection. However, current detection methods primarily focus on extracting features from either the raw waveform or the spectrogram, often overlooking the valuable correspondences between these two modalities that could enhance the detection of previously unseen types of
APA, Harvard, Vancouver, ISO, and other styles
25

Choi, Byung-Moon, Ji Yeon Yim, Hangsik Shin, and Gyujeong Noh. "Novel Analgesic Index for Postoperative Pain Assessment Based on a Photoplethysmographic Spectrogram and Convolutional Neural Network: Observational Study." Journal of Medical Internet Research 23, no. 2 (2021): e23920. http://dx.doi.org/10.2196/23920.

Full text
Abstract:
Background Although commercially available analgesic indices based on biosignal processing have been used to quantify nociception during general anesthesia, their performance is low in conscious patients. Therefore, there is a need to develop a new analgesic index with improved performance to quantify postoperative pain in conscious patients. Objective This study aimed to develop a new analgesic index using photoplethysmogram (PPG) spectrograms and a convolutional neural network (CNN) to objectively assess pain in conscious patients. Methods PPGs were obtained from a group of surgical patients
APA, Harvard, Vancouver, ISO, and other styles
26

Ferreira, Diogo R., Tiago A. Martins, and Paulo Rodrigues. "Explainable deep learning for the analysis of MHD spectrograms in nuclear fusion." Machine Learning: Science and Technology 3, no. 1 (2021): 015015. http://dx.doi.org/10.1088/2632-2153/ac44aa.

Full text
Abstract:
Abstract In the nuclear fusion community, there are many specialized techniques to analyze the data coming from a variety of diagnostics. One of such techniques is the use of spectrograms to analyze the magnetohydrodynamic (MHD) behavior of fusion plasmas. Physicists look at the spectrogram to identify the oscillation modes of the plasma, and to study instabilities that may lead to plasma disruptions. One of the major causes of disruptions occurs when an oscillation mode interacts with the wall, stops rotating, and becomes a locked mode. In this work, we use deep learning to predict the occurr
APA, Harvard, Vancouver, ISO, and other styles
27

He, Yuan, Xinyu Li, Runlong Li, Jianping Wang, and Xiaojun Jing. "A Deep-Learning Method for Radar Micro-Doppler Spectrogram Restoration." Sensors 20, no. 17 (2020): 5007. http://dx.doi.org/10.3390/s20175007.

Full text
Abstract:
Radio frequency interference, which makes it difficult to produce high-quality radar spectrograms, is a major issue for micro-Doppler-based human activity recognition (HAR). In this paper, we propose a deep-learning-based method to detect and cut out the interference in spectrograms. Then, we restore the spectrograms in the cut-out region. First, a fully convolutional neural network (FCN) is employed to detect and remove the interference. Then, a coarse-to-fine generative adversarial network (GAN) is proposed to restore the part of the spectrogram that is affected by the interferences. The sim
APA, Harvard, Vancouver, ISO, and other styles
28

Kwon, Daehyun, Hanbit Kang, Dongwoo Lee, and Yoon-Chul Kim. "Deep learning-based prediction of atrial fibrillation from polar transformed time-frequency electrocardiogram." PLOS ONE 20, no. 3 (2025): e0317630. https://doi.org/10.1371/journal.pone.0317630.

Full text
Abstract:
Portable and wearable electrocardiogram (ECG) devices are increasingly utilized in healthcare for monitoring heart rhythms and detecting cardiac arrhythmias or other heart conditions. The integration of ECG signal visualization with AI-based abnormality detection empowers users to independently and confidently assess their physiological signals. In this study, we investigated a novel method for visualizing ECG signals using polar transformations of short-time Fourier transform (STFT) spectrograms and evaluated the performance of deep convolutional neural networks (CNNs) in predicting atrial fi
APA, Harvard, Vancouver, ISO, and other styles
29

Fudholi, Dzikri Rahadian, Muhammad Auzan, and Novia Arum Sari. "Spectrogram Window Comparison: Cough Sound Recognition using Convolutional Neural Network." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 16, no. 3 (2022): 261. http://dx.doi.org/10.22146/ijccs.75697.

Full text
Abstract:
Cough is one of the most common symptoms of diseases, especially respiratory diseases. Quick cough detection can be the key to the current pandemic of COVID-19. Good cough recognition is the one that uses non-intrusive tools such as a mobile phone microphone that does not disable human activities like stick sensors. To do sound-only detection, Deep Learning current best method Convolutional Neural Network (CNN) is used. However, CNN needs image input while sound input differs (one dimension rather than two). An extra process is needed, converting sound data to image data using a spectrogram. W
APA, Harvard, Vancouver, ISO, and other styles
30

Liu, Haohe, Xubo Liu, Qiuqiang Kong, Wenwu Wang, and Mark D. Plumbley. "Learning Temporal Resolution in Spectrogram for Audio Classification." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 12 (2024): 13873–81. http://dx.doi.org/10.1609/aaai.v38i12.29294.

Full text
Abstract:
The audio spectrogram is a time-frequency representation that has been widely used for audio classification. One of the key attributes of the audio spectrogram is the temporal resolution, which depends on the hop size used in the Short-Time Fourier Transform (STFT). Previous works generally assume the hop size should be a constant value (e.g., 10 ms). However, a fixed temporal resolution is not always optimal for different types of sound. The temporal resolution affects not only classification accuracy but also computational cost. This paper proposes a novel method, DiffRes, that enables diffe
APA, Harvard, Vancouver, ISO, and other styles
31

Jiashen, Li, and Zhang Xianwu. "Extracting speech spectrogram of speech signal based on generalized S-transform." PLOS ONE 20, no. 1 (2025): e0317362. https://doi.org/10.1371/journal.pone.0317362.

Full text
Abstract:
In speech signal processing, time-frequency analysis is commonly employed to extract the spectrogram of speech signals. While many algorithms exist to achieve this with high-quality results, they often lack the flexibility to adjust the resolution of the extracted spectrograms. However, applications such as speech recognition and speech separation frequently require spectrograms of varying resolutions. The flexibility of an algorithm in providing different resolutions is crucial for these applications. This paper introduces the generalized S-transform, and explains its fundamental theory and a
APA, Harvard, Vancouver, ISO, and other styles
32

Lalla, Abderraouf, Andrea Albini, Paolo Di Barba, and Maria Evelina Mognaschi. "Spectrogram Inversion for Reconstruction of Electric Currents at Industrial Frequencies: A Deep Learning Approach." Sensors 24, no. 6 (2024): 1798. http://dx.doi.org/10.3390/s24061798.

Full text
Abstract:
In this paper, we present a deep learning approach for identifying current intensity and frequency. The reconstruction is based on measurements of the magnetic field generated by the current flowing in a conductor. Magnetic field data are collected using a magnetic probe capable of generating a spectrogram, representing the spectrum of frequencies of the magnetic field over time. These spectrograms are saved as images characterized by color density proportional to the induction field value at a given frequency. The proposed deep learning approach utilizes a convolutional neural network (CNN) w
APA, Harvard, Vancouver, ISO, and other styles
33

Horn, Skyler, and Hynek Boril. "Gender classification from speech using convolutional networks augmented with synthetic spectrograms." Journal of the Acoustical Society of America 150, no. 4 (2021): A358. http://dx.doi.org/10.1121/10.0008585.

Full text
Abstract:
Automatic gender classification from speech is an integral component of human-computer interfaces. Gender information is utilized in user authentication, speech recognizers, or human-centered intelligent agents. This study focuses on gender classification from speech spectrograms using AlexNet-inspired 2D convolutional neural networks (CNN) trained on real samples augmented with synthetic spectrograms. A generative adversarial network (GAN) is trained to produce synthetic male/female-like speech spectrograms. In limited training data experiments on LibriSpeech, augmenting a training set of 200
APA, Harvard, Vancouver, ISO, and other styles
34

Oh, Myeonggeun, and Yong-Hoon Kim. "Statistical Approach to Spectrogram Analysis for Radio-Frequency Interference Detection and Mitigation in an L-Band Microwave Radiometer." Sensors 19, no. 2 (2019): 306. http://dx.doi.org/10.3390/s19020306.

Full text
Abstract:
For the elimination of radio-frequency interference (RFI) in a passive microwave radiometer, the threshold level is generally calculated from the mean value and standard deviation. However, a serious problem that can arise is an error in the retrieved brightness temperature from a higher threshold level owing to the presence of RFI. In this paper, we propose a method to detect and mitigate RFI contamination using the threshold level from statistical criteria based on a spectrogram technique. Mean and skewness spectrograms are created from a brightness temperature spectrogram by shifting the 2-
APA, Harvard, Vancouver, ISO, and other styles
35

Ender Ozturk, Fatih Erden, and Ismail Guvenc. "RF-based low-SNR classification of UAVs using convolutional neural networks." ITU Journal on Future and Evolving Technologies 2, no. 5 (2021): 39–52. http://dx.doi.org/10.52953/qjgh3217.

Full text
Abstract:
Unmanned Aerial Vehicles (UAVs), or drones, which can be considered as a coverage extender for Internet of Everything (IoE), have drawn high attention recently. The proliferation of drones will raise privacy and security concerns in public. This paper investigates the problem of classification of drones from Radio Frequency (RF) fingerprints at the low Signal-to-Noise Ratio (SNR) regime. We use Convolutional Neural Networks (CNNs) trained with both RF time-series images and the spectrograms of 15 different off-the-shelf drone controller RF signals. When using time-series signal images, the CNN
APA, Harvard, Vancouver, ISO, and other styles
36

Huh, Jiung, Huan Pham Van, Soonyoung Han, Hae-Jin Choi, and Seung-Kyum Choi. "A Data-Driven Approach for the Diagnosis of Mechanical Systems Using Trained Subtracted Signal Spectrograms." Sensors 19, no. 5 (2019): 1055. http://dx.doi.org/10.3390/s19051055.

Full text
Abstract:
Toward the prognostic and health management of mechanical systems, we propose and validate a novel effective, data-driven fault diagnosis method. In this method, we develop a trained subtracted spectrogram, the so called critical information map (CIM), identifying the difference between the signal spectrograms of normal and abnormal status. We believe this diagnosis process may be implemented in an autonomous manner so that an engineer employs it without expert knowledge in signal processing or mechanical analyses. Firstly, the CIM method applies sequential and autonomous procedures of time-sy
APA, Harvard, Vancouver, ISO, and other styles
37

Yegnanarayana, B., and Vishala Pannala. "Processing group delay spectrograms for study of formant and harmonic contours in speech signals." Journal of the Acoustical Society of America 156, no. 4 (2024): 2422–33. http://dx.doi.org/10.1121/10.0032364.

Full text
Abstract:
This paper deals with study of formant and harmonic contours by processing the group delay (GD) spectrograms of speech signals. The GD spectrum is the negative derivative of the phase spectrum with respect to frequency. Recent study shows that the GD spectrogram can be obtained without phase wrapping. Formant frequency contours can be observed in the display of the peaks of the instantaneous wideband equivalent GD spectrogram, derived using the modified single frequency filtering (SFF) analysis of speech signals. Harmonic frequency contours can be observed in the display of the peaks of the in
APA, Harvard, Vancouver, ISO, and other styles
38

Rawat, Priyanshu, Madhvan Bajaj, Satvik Vats, and Vikrant Sharma. "A comprehensive study based on MFCC and spectrogram for audio classification." Journal of Information and Optimization Sciences 44, no. 6 (2023): 1057–74. http://dx.doi.org/10.47974/jios-1431.

Full text
Abstract:
Music Assortment is a music information retrieval (MIR) function to decide music connotation computationally. In recent years, deep neural networks have been proven to be effective in numerous classification tasks, including music genre categorisation. In this paper, we employ a comparative study between the two different music classification techniques. The first technique uses the audio’s spectrogram image and computes the music’s genre based on its spectrogram, using the CNN model trained on the spectrograms. The second approach computes the MFCC’s (Mel-Frequency Cepstral Coefficients) musi
APA, Harvard, Vancouver, ISO, and other styles
39

Lv, Dan, Yan Zhang, Danjv Lv, Jing Lu, Yixing Fu, and Zhun Li. "Combining CBAM and Iterative Shrinkage-Thresholding Algorithm for Compressive Sensing of Bird Images." Applied Sciences 14, no. 19 (2024): 8680. http://dx.doi.org/10.3390/app14198680.

Full text
Abstract:
Bird research contributes to understanding species diversity, ecosystem functions, and the maintenance of biodiversity. By analyzing bird images and the audio of birds, we can monitor bird distribution, abundance, and behavior to better understand the health of ecosystems. However, bird images and audio involve a vast amount of data. To improve the efficiency of data transmission and storage efficiency and save bandwidth, compressive sensing can overcome this challenge. Compressive sensing is a technique that uses the sparsity of signals to recover original data from a small number of linear m
APA, Harvard, Vancouver, ISO, and other styles
40

Jiang, Hao, Jianqing Jiang, and Guoshao Su. "Rock Crack Types Identification by Machine Learning on the Sound Signal." Applied Sciences 13, no. 13 (2023): 7654. http://dx.doi.org/10.3390/app13137654.

Full text
Abstract:
Sound signals generated during rock failure contain useful information about crack development. A sound-signal-based identification method for crack types is proposed. In this method, the sound signals of tensile cracks, using the Brazilian splitting test, and those of shear cracks, using the direct shear test, are collected to establish the training samples. The spectrogram is used to characterize the sound signal and is taken as the input. To solve the small sample problem, since only a small amount of sound signal spectrogram can be obtained in our experimental test, pre-trained ResNet-18 i
APA, Harvard, Vancouver, ISO, and other styles
41

Trufanov, N. N., D. V. Churikov, and O. V. Kravchenko. "Selection of window functions for predicting the frequency pattern of vibrations of the technological process using an artificial neural network." Journal of Physics: Conference Series 2091, no. 1 (2021): 012074. http://dx.doi.org/10.1088/1742-6596/2091/1/012074.

Full text
Abstract:
Abstract The frequency pattern of the process is investigated by analyzing spectrograms constructed using the window Fourier transform. A set of window functions consists of a rectangular, membership, and windows based on atomic functions. The fulfillment of the condition for improving the time localization and energy concentration in the central part of the window allows one to select a window function. The resulting spectrograms are fed to the input of an artificial neural network to obtain a forecast. Varying the shape of the window functions allows us to analyze the proposed spectrogram pr
APA, Harvard, Vancouver, ISO, and other styles
42

Dwijayanti, Suci, Alvio Yunita Putri, and Bhakti Yudho Suprapto. "Speaker Identification Using a Convolutional Neural Network." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, no. 1 (2022): 140–45. http://dx.doi.org/10.29207/resti.v6i1.3795.

Full text
Abstract:
Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and ea
APA, Harvard, Vancouver, ISO, and other styles
43

Cameron, J., A. Crosby, C. Paszkowski, and E. Bayne. "Visual spectrogram scanning paired with an observation–confirmation occupancy model improves the efficiency and accuracy of bioacoustic anuran data." Canadian Journal of Zoology 98, no. 11 (2020): 733–42. http://dx.doi.org/10.1139/cjz-2020-0103.

Full text
Abstract:
Passive acoustic monitoring using autonomous recording units has improved anuran amphibian call survey data collection. A challenge associated with this approach is the time required for audio data processing. Our objective was to develop a more efficient method of processing and analyzing acoustic data through visual spectrogram scanning and the application of an observation–confirmation occupancy model. We compared detection rates between methods of standard recording listening and visually scanning spectrogram images using different spectrogram parameters. Relative to listening, we found th
APA, Harvard, Vancouver, ISO, and other styles
44

Zhu, Yuefan, and Xiaoying Liu. "A Lightweight CNN for Wind Turbine Blade Defect Detection Based on Spectrograms." Machines 11, no. 1 (2023): 99. http://dx.doi.org/10.3390/machines11010099.

Full text
Abstract:
Since wind turbines are exposed to harsh working environments and variable weather conditions, wind turbine blade condition monitoring is critical to prevent unscheduled downtime and loss. Realizing that common convolutional neural networks are difficult to use in embedded devices, a lightweight convolutional neural network for wind turbine blades (WTBMobileNet) based on spectrograms is proposed, reducing computation and size with a high accuracy. Compared to baseline models, WTBMobileNet without data augmentation has an accuracy of 97.05%, a parameter of 0.315 million, and a computation of 0.
APA, Harvard, Vancouver, ISO, and other styles
45

Gu, Lianglian, Guangzhi Di, Danju Lv, et al. "A Multi-Scale Feature Fusion Hybrid Convolution Attention Model for Birdsong Recognition." Applied Sciences 15, no. 8 (2025): 4595. https://doi.org/10.3390/app15084595.

Full text
Abstract:
Birdsong is a valuable indicator of rich biodiversity and ecological significance. Although feature extraction has demonstrated satisfactory performance in classification, single-scale feature extraction methods may not fully capture the complexity of birdsong, potentially leading to suboptimal classification outcomes. The integration of multi-scale feature extraction and fusion enables the model to better handle scale variations, thereby enhancing its adaptability across different scales. To address this issue, we propose a multi-scale hybrid convolutional attention mechanism model (MUSCA). T
APA, Harvard, Vancouver, ISO, and other styles
46

Heim, Olga, Dennis M. Heim, Lara Marggraf, et al. "Variant maps for bat echolocation call identification algorithms." Bioacoustics 29, no. 5 (2020): 557–71. https://doi.org/10.5281/zenodo.13456968.

Full text
Abstract:
(Uploaded by Plazi for the Bat Literature Project) Automated ultrasonic recordings are widely used in basic and applied research to detect the presence of bats. Often, algorithms for the automated identification of species are based on a preprocessing of acoustic information that involves the generation of spectrograms. Even though this approach is technically advanced, recent surveys highlight substantially high failure rates to identify species correctly, which urges for improved processes. Here, we tested an entirely new method, in particular, the transformation of ultrasonic recordings int
APA, Harvard, Vancouver, ISO, and other styles
47

Heim, Olga, Dennis M. Heim, Lara Marggraf, et al. "Variant maps for bat echolocation call identification algorithms." Bioacoustics 29, no. 5 (2020): 557–71. https://doi.org/10.5281/zenodo.13456968.

Full text
Abstract:
(Uploaded by Plazi for the Bat Literature Project) Automated ultrasonic recordings are widely used in basic and applied research to detect the presence of bats. Often, algorithms for the automated identification of species are based on a preprocessing of acoustic information that involves the generation of spectrograms. Even though this approach is technically advanced, recent surveys highlight substantially high failure rates to identify species correctly, which urges for improved processes. Here, we tested an entirely new method, in particular, the transformation of ultrasonic recordings int
APA, Harvard, Vancouver, ISO, and other styles
48

Heim, Olga, Dennis M. Heim, Lara Marggraf, et al. "Variant maps for bat echolocation call identification algorithms." Bioacoustics 29, no. 5 (2020): 557–71. https://doi.org/10.5281/zenodo.13456968.

Full text
Abstract:
(Uploaded by Plazi for the Bat Literature Project) Automated ultrasonic recordings are widely used in basic and applied research to detect the presence of bats. Often, algorithms for the automated identification of species are based on a preprocessing of acoustic information that involves the generation of spectrograms. Even though this approach is technically advanced, recent surveys highlight substantially high failure rates to identify species correctly, which urges for improved processes. Here, we tested an entirely new method, in particular, the transformation of ultrasonic recordings int
APA, Harvard, Vancouver, ISO, and other styles
49

Heim, Olga, Dennis M. Heim, Lara Marggraf, et al. "Variant maps for bat echolocation call identification algorithms." Bioacoustics 29, no. 5 (2020): 557–71. https://doi.org/10.5281/zenodo.13456968.

Full text
Abstract:
(Uploaded by Plazi for the Bat Literature Project) Automated ultrasonic recordings are widely used in basic and applied research to detect the presence of bats. Often, algorithms for the automated identification of species are based on a preprocessing of acoustic information that involves the generation of spectrograms. Even though this approach is technically advanced, recent surveys highlight substantially high failure rates to identify species correctly, which urges for improved processes. Here, we tested an entirely new method, in particular, the transformation of ultrasonic recordings int
APA, Harvard, Vancouver, ISO, and other styles
50

Heim, Olga, Dennis M. Heim, Lara Marggraf, et al. "Variant maps for bat echolocation call identification algorithms." Bioacoustics 29, no. 5 (2020): 557–71. https://doi.org/10.5281/zenodo.13456968.

Full text
Abstract:
(Uploaded by Plazi for the Bat Literature Project) Automated ultrasonic recordings are widely used in basic and applied research to detect the presence of bats. Often, algorithms for the automated identification of species are based on a preprocessing of acoustic information that involves the generation of spectrograms. Even though this approach is technically advanced, recent surveys highlight substantially high failure rates to identify species correctly, which urges for improved processes. Here, we tested an entirely new method, in particular, the transformation of ultrasonic recordings int
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!