To see the other types of publications on this topic, follow the link: Spectrogram.

Journal articles on the topic 'Spectrogram'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Spectrogram.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lee, Sang-Hoon, Hyun-Wook Yoon, Hyeong-Rae Noh, Ji-Hoon Kim, and Seong-Whan Lee. "Multi-SpectroGAN: High-Diversity and High-Fidelity Spectrogram Generation with Adversarial Style Combination for Speech Synthesis." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (May 18, 2021): 13198–206. http://dx.doi.org/10.1609/aaai.v35i14.17559.

Full text
Abstract:
While generative adversarial networks (GANs) based neural text-to-speech (TTS) systems have shown significant improvement in neural speech synthesis, there is no TTS system to learn to synthesize speech from text sequences with only adversarial feedback. Because adversarial feedback alone is not sufficient to train the generator, current models still require the reconstruction loss compared with the ground-truth and the generated mel-spectrogram directly. In this paper, we present Multi-SpectroGAN (MSG), which can train the multi-speaker model with only the adversarial feedback by conditioning a self-supervised hidden representation of the generator to a conditional discriminator. This leads to better guidance for generator training. Moreover, we also propose adversarial style combination (ASC) for better generalization in the unseen speaking style and transcript, which can learn latent representations of the combined style embedding from multiple mel-spectrograms. Trained with ASC and feature matching, the MSG synthesizes a high-diversity mel-spectrogram by controlling and mixing the individual speaking styles (e.g., duration, pitch, and energy). The result shows that the MSG synthesizes a high-fidelity mel-spectrogram, which has almost the same naturalness MOS score as the ground-truth mel-spectrogram.
APA, Harvard, Vancouver, ISO, and other styles
2

Johnson, Alexander. "An integrated approach for teaching speech spectrogram analysis to engineering students." Journal of the Acoustical Society of America 152, no. 3 (September 2022): 1962–69. http://dx.doi.org/10.1121/10.0014172.

Full text
Abstract:
Spectrogram analysis is a vital skill for learning speech acoustics. Spectrograms are necessary for visualizing cause-effect relationships between speech articulator movements and the resulting sound produced. However, many interpretation techniques needed to read spectrograms are counterintuitive to engineering students who have been taught to use more rigid mathematical formulas. As a result, spectrogram reading is often challenging for these students who do not have prior background in acoustic phonetics. In this paper, a structured, inclusive framework for teaching spectrogram reading to students of engineering backgrounds is presented. Findings from the implementation of these teaching methods in undergraduate and graduate engineering courses at University of California, Los Angeles are also unveiled.
APA, Harvard, Vancouver, ISO, and other styles
3

Basak, Gopal K., and Tridibesh Dutta. "Statistical Speaker Identification Based on Spectrogram Imaging." Calcutta Statistical Association Bulletin 59, no. 3-4 (September 2007): 253–63. http://dx.doi.org/10.1177/0008068320070309.

Full text
Abstract:
Abstract: The paper addresses the problem of speaker identification based on spectrograms in the text dependent case. Using spectrogram segmentation, this paper, mainly, focusses on understanding the complex patterns in frequency and amplitude in an utterance of a given word by an individual. The features used for identifying a speaker based on an observed variable extracted from the spectrograms, rely on the distinct speaker effect, his/her interaction effect with the particular word and with the frequency bands of the spectrogram. Performance of this novel approach on spectrogram samples, collected from 40 speakers, show that this methodology can be effectively used to produce a very high success rate in a closed set of speakers for text-dependent speaker identification. AMS (2000) Subject Classification: 62P99.
APA, Harvard, Vancouver, ISO, and other styles
4

Han, Ying, Qiao Wang, Jianping Huang, Jing Yuan, Zhong Li, Yali Wang, Haijun Liu, and Xuhui Shen. "Frequency Extraction of Global Constant Frequency Electromagnetic Disturbances from Electric Field VLF Data on CSES." Remote Sensing 15, no. 8 (April 13, 2023): 2057. http://dx.doi.org/10.3390/rs15082057.

Full text
Abstract:
The electromagnetic data observed with the CSES (China Seismo-Electromagnetic Satellite, also known as Zhangheng-1 satellite) contain numerous spatial disturbances. These disturbances exhibit various shapes on the spectrogram, and constant frequency electromagnetic disturbances (CFEDs), such as artificially transmitted very-low-frequency (VLF) radio waves, power line harmonics, and interference from the satellite platform itself, appear as horizontal lines. To exploit this feature, we proposed an algorithm based on computer vision technology that automatically recognizes these lines on the spectrogram and extracts the frequencies from the CFEDs. First, the VLF waveform data collected with the CSES electric field detector (EFD) are converted into a time–frequency spectrogram using short-time Fourier Transform (STFT). Next, the CFED automatic recognition algorithm is used to identify horizontal lines on the spectrogram. The third step is to determine the line frequency range based on the proportional relationship between the frequency domain of the satellite’s VLF and the height of the time–frequency spectrogram. Finally, we used the CSES power spectrogram to confirm the presence of CFEDs in the line frequency range and extract their true frequencies. We statistically analyzed 1034 orbit time–frequency spectrograms and power spectrograms from 8 periods (5 days per period) and identified approximately 200 CFEDs. Among them, two CFEDs with strong signals persisted throughout an entire orbit. This study establishes a foundation for detecting anomalies due to artificial sources, particularly in the study of short-term strong earthquake prediction. Additionally, it contributes to research on other aspects of spatial electromagnetic interference and the suppression and cleaning of electromagnetic waves.
APA, Harvard, Vancouver, ISO, and other styles
5

Shingchern D. You, Kai-Rong Lin, and Chien-Hung Liu. "Estimating Classification Accuracy for Unlabeled Datasets Based on Block Scaling." International Journal of Engineering and Technology Innovation 13, no. 4 (September 28, 2023): 313–27. http://dx.doi.org/10.46604/ijeti.2023.11975.

Full text
Abstract:
This paper proposes an approach called block scaling quality (BSQ) for estimating the prediction accuracy of a deep network model. The basic operation perturbs the input spectrogram by multiplying all values within a block by , where is equal to 0 in the experiments. The ratio of perturbed spectrograms that have different prediction labels than the original spectrogram to the total number of perturbed spectrograms indicates how much of the spectrogram is crucial for the prediction. Thus, this ratio is inversely correlated with the accuracy of the dataset. The BSQ approach demonstrates satisfactory estimation accuracy in experiments when compared with various other approaches. When using only the Jamendo and FMA datasets, the estimation accuracy experiences an average error of 4.9% and 1.8%, respectively. Moreover, the BSQ approach holds advantages over some of the comparison counterparts. Overall, it presents a promising approach for estimating the accuracy of a deep network model.
APA, Harvard, Vancouver, ISO, and other styles
6

Li, Hong Ping, and Hong Li. "Establish an Artificial Neural Networks Model to Make Quantitative Analysis about the Capillary Electrophoresis Spectrum." Advanced Materials Research 452-453 (January 2012): 1116–20. http://dx.doi.org/10.4028/www.scientific.net/amr.452-453.1116.

Full text
Abstract:
Simulating the overlapping capillary electrophoresis spectrogram under the dissimilar conditions by the computer system , Choosing the overlapping capillary electrophoresis spectrogram simulated under the different conditions , processing the data to compose a neural network training regulations, Applying the artificial neural networks method to make a quantitative analysis about the multi-component in the overlapping capillary electrophoresis spectrogram,Using: Radial direction primary function neural network model and multi-layered perceptron neural network model. The findings indicated that, along with the increasing of the capillary electrophoresis spectrogram noise level, the related components’ ability of the two kinds of the overlapping capillary electrophoresis spectrogram by neural network model quantitative analysis drop down. Along with the increasing of the capillary electrophoresis spectrogram’s total dissociation degree, the multi-layered perceptron neural network model to the related components’ ability of the overlapping capillary electrophoresis spectum by quantitative analysis raise up.
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Juan, Xueying Zhang, Lixia Huang, Fenglian Li, Shufei Duan, and Ying Sun. "Speech Emotion Recognition Using a Dual-Channel Complementary Spectrogram and the CNN-SSAE Neutral Network." Applied Sciences 12, no. 19 (September 22, 2022): 9518. http://dx.doi.org/10.3390/app12199518.

Full text
Abstract:
In the background of artificial intelligence, the realization of smooth communication between people and machines has become the goal pursued by people. Mel spectrograms is a common method used in speech emotion recognition, focusing on the low-frequency part of speech. In contrast, the inverse Mel (IMel) spectrogram, which focuses on the high-frequency part, is proposed to comprehensively analyze emotions. Because the convolutional neural network-stacked sparse autoencoder (CNN-SSAE) can extract deep optimized features, the Mel-IMel dual-channel complementary structure is proposed. In the first channel, a CNN is used to extract the low-frequency information of the Mel spectrogram. The other channel extracts the high-frequency information of the IMel spectrogram. This information is transmitted into an SSAE to reduce the number of dimensions, and obtain the optimized information. Experimental results show that the highest recognition rates achieved on the EMO-DB, SAVEE, and RAVDESS datasets were 94.79%, 88.96%, and 83.18%, respectively. The conclusions are that the recognition rate of the two spectrograms was higher than that of each of the single spectrograms, which proves that the two spectrograms are complementary. The SSAE followed the CNN to get the optimized information, and the recognition rate was further improved, which proves the effectiveness of the CNN-SSAE network.
APA, Harvard, Vancouver, ISO, and other styles
8

Pethiyagoda, Ravindra, Scott W. McCue, and Timothy J. Moroney. "Spectrograms of ship wakes: identifying linear and nonlinear wave signals." Journal of Fluid Mechanics 811 (December 6, 2016): 189–209. http://dx.doi.org/10.1017/jfm.2016.753.

Full text
Abstract:
A spectrogram is a useful way of using short-time discrete Fourier transforms to visualise surface height measurements taken of ship wakes in real-world conditions. For a steadily moving ship that leaves behind small-amplitude waves, the spectrogram is known to have two clear linear components, a sliding-frequency mode caused by the divergent waves and a constant-frequency mode for the transverse waves. However, recent observations of high-speed ferry data have identified additional components of the spectrograms that are not yet explained. We use computer simulations of linear and nonlinear ship wave patterns and apply time–frequency analysis to generate spectrograms for an idealised ship. We clarify the role of the linear dispersion relation and ship speed on the two linear components. We use a simple weakly nonlinear theory to identify higher-order effects in a spectrogram and, while the high-speed ferry data are very noisy, we propose that certain additional features in the experimental data are caused by nonlinearity. Finally, we provide a possible explanation for a further discrepancy between the high-speed ferry spectrograms and linear theory by accounting for ship acceleration.
APA, Harvard, Vancouver, ISO, and other styles
9

Godbole, Shubham, Vaishnavi Jadhav, and Gajanan Birajdar. "Indian Language Identification using Deep Learning." ITM Web of Conferences 32 (2020): 01010. http://dx.doi.org/10.1051/itmconf/20203201010.

Full text
Abstract:
Spoken language is the most regular method of correspondence in this day and age. Endeavours to create language recognizable proof frameworks for Indian dialects have been very restricted because of the issue of speaker accessibility and language readability. However, the necessity of SLID is expanding for common and safeguard applications day by day. Feature extraction is a basic and important procedure performed in LID. A sound example is changed over into a spectrogram visual portrayal which describes a range of frequencies in regard with time. Three such spectrogram visuals were generated namely Log Spectrogram, Gammatonegram and IIR-CQT Spectrogram for audio samples from the standardized IIIT-H Indic Speech Database. These visual representations depict language specific details and the nature of each language. These spectrograms images were then used as an input to the CNN. Classification accuracy of 98.86% was obtained using the proposed methodology.
APA, Harvard, Vancouver, ISO, and other styles
10

Samad, Salina Abdul, and Aqilah Baseri Huddin. "Improving spectrogram correlation filters with time-frequency reassignment for bio-acoustic signal classification." Indonesian Journal of Electrical Engineering and Computer Science 14, no. 1 (April 1, 2019): 59. http://dx.doi.org/10.11591/ijeecs.v14.i1.pp59-64.

Full text
Abstract:
<p>Spectrogram features have been used to automatically classify animals based on their vocalization. Usually, features are extracted and used as inputs to classifiers to distinguish between species. In this paper, a classifier based on Correlation Filters (CFs) is employed where the input features are the spectrogram image themselves. Spectrogram parameters are carefully selected based on the target dataset in order to obtain clear distinguishing images termed as call-prints. An even better representation of the call-prints is obtained using spectrogram Time-Frequency (TF) reassignment. To demonstrate the application of the proposed technique, two species of frogs are classified based on their vocalization spectrograms where for each species a correlation filter template is constructed from multiple call-prints using the Maximum Margin Correlation Filter (MMCF). The improved accuracy rate obtained with TF reassignment demonstrates that this is a viable method for bio-acoustic signal classification.</p>
APA, Harvard, Vancouver, ISO, and other styles
11

Franzoni, Valentina. "Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs." Journal of Autonomous Intelligence 6, no. 3 (August 28, 2023): 678. http://dx.doi.org/10.32629/jai.v6i3.678.

Full text
Abstract:
<p>In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.</p>
APA, Harvard, Vancouver, ISO, and other styles
12

Alia Hussein, Ahmed Talib Abdulameer, Ali Abdulkarim, Husniza Husni, and Dalia Al-Ubaidi. "Classification of Dyslexia Among School Students Using Deep Learning." Journal of Techniques 6, no. 1 (March 31, 2024): 85–92. http://dx.doi.org/10.51173/jt.v6i1.1893.

Full text
Abstract:
Dyslexia is a common learning disorder that affects children’s reading and writing skills. Early identification of Dyslexia is essential for providing appropriate interventions and support to affected children. Traditional methods of diagnosing Dyslexia often rely on subjective assessments and the expertise of specialists, leading to delays and potential inaccuracies in diagnosis. This study proposes a novel approach for diagnosing dyslexic children using spectrogram analysis and convolutional neural networks (CNNs). Spectrograms are visual representations of audio signals that provide detailed frequency and intensity information. CNNs are powerful deep-learning models capable of extracting complex patterns from data. In this research, raw audio signals from dyslexic and non-dyslexic children are transformed into spectrogram images. These images are then used as input for a CNN model trained on a large dataset of dyslexic and non-dyslexic samples. The CNN learns to automatically extract discriminative features from the spectrogram images and classify them into dyslexic and non-dyslexic categories. This study’s results demonstrate the proposed approach’s effectiveness in diagnosing dyslexic children. The CNN accurately identified dyslexic individuals based on the spectrogram features, outperforming traditional diagnostic methods. Spectrograms and CNNs provide a more objective and efficient approach to dyslexia diagnosis, enabling earlier intervention and support for affected children. This research contributes to the field of dyslexia diagnosis by harnessing the power of machine learning and audio analysis techniques. Facilitating faster and more accurate identification of Dyslexia in children, ultimately improving their educational outcomes and quality of life.
APA, Harvard, Vancouver, ISO, and other styles
13

Jenkins, William F., Peter Gerstoft, Chih-Chieh Chien, and Emma Ozanich. "Reducing dimensionality of spectrograms using convolutional autoencoders." Journal of the Acoustical Society of America 153, no. 3_supplement (March 1, 2023): A178. http://dx.doi.org/10.1121/10.0018582.

Full text
Abstract:
Under the “curse of dimensionality,” distance-based algorithms, such as k-means or Gaussian mixture model clustering, can lose meaning and interpretability in high-dimensional space. Acoustic data, specifically spectrograms, are subject to such limitations due to their high dimensionality: for example, a spectrogram with 100 time- and 100 frequency-bins contains 104 pixels, and its vectorized form constitutes a point in 104-dimensional space. In this talk, we look at four papers that used autoencoding convolutional neural networks to extract salient features of real data. The convolutional autoencoder consists of an encoder which compresses spectrograms into a low-dimensional latent feature space, and a decoder which seeks to reconstruct the original spectrogram from the latent feature space. The error between the original spectrogram and reconstruction is used to train the network. Once trained, the salient features of the data are embedded in the latent space and algorithms can be applied to the lower-dimensional latent space. We demonstrate how lower-dimensional representations result in interpretable clustering of complex physical data, which can contribute to reducing errors in classification and clustering tasks and enable exploratory analysis of large data sets.
APA, Harvard, Vancouver, ISO, and other styles
14

Hajihashemi, Vahid, Abdorreza Alavi Gharahbagh, Narges Hajaboutalebi, Mohsen Zahraei, José J. M. Machado, and João Manuel R. S. Tavares. "A Feature-Reduction Scheme Based on a Two-Sample t-Test to Eliminate Useless Spectrogram Frequency Bands in Acoustic Event Detection Systems." Electronics 13, no. 11 (May 25, 2024): 2064. http://dx.doi.org/10.3390/electronics13112064.

Full text
Abstract:
Acoustic event detection (AED) systems, combined with video surveillance systems, can enhance urban security and safety by automatically detecting incidents, supporting the smart city concept. AED systems mostly use mel spectrograms as a well-known effective acoustic feature. The spectrogram is a combination of frequency bands. A big challenge is that some of the spectrogram bands may be similar in different events and be useless in AED. Removing useless bands reduces the input feature dimension and is highly desirable. This article proposes a mathematical feature analysis method to identify and eliminate ineffective spectrogram bands and improve AED systems’ efficiency. The proposed approach uses a Student’s t-test to compare frequency bands of the spectrogram from different acoustic events. The similarity between each frequency band among events is calculated using a two-sample t-test, allowing the identification of distinct and similar frequency bands. Removing these bands accelerates the training speed of the used classifier by reducing the number of features, and also enhances the system’s accuracy and efficiency. Based on the obtained results, the proposed method reduces the spectrogram bands by 26.3%. The results showed an average difference of 7.77% in the Jaccard, 4.07% in the Dice, and 5.7% in the Hamming distance between selected bands using train and test datasets. These small values underscore the validity of the obtained results for the test dataset.
APA, Harvard, Vancouver, ISO, and other styles
15

Choi, Byung-Moon, Ji Yeon Yim, Hangsik Shin, and Gyujeong Noh. "Novel Analgesic Index for Postoperative Pain Assessment Based on a Photoplethysmographic Spectrogram and Convolutional Neural Network: Observational Study." Journal of Medical Internet Research 23, no. 2 (February 3, 2021): e23920. http://dx.doi.org/10.2196/23920.

Full text
Abstract:
Background Although commercially available analgesic indices based on biosignal processing have been used to quantify nociception during general anesthesia, their performance is low in conscious patients. Therefore, there is a need to develop a new analgesic index with improved performance to quantify postoperative pain in conscious patients. Objective This study aimed to develop a new analgesic index using photoplethysmogram (PPG) spectrograms and a convolutional neural network (CNN) to objectively assess pain in conscious patients. Methods PPGs were obtained from a group of surgical patients for 6 minutes both in the absence (preoperatively) and in the presence (postoperatively) of pain. Then, the PPG data of the latter 5 minutes were used for analysis. Based on the PPGs and a CNN, we developed a spectrogram–CNN index for pain assessment. The area under the curve (AUC) of the receiver-operating characteristic curve was measured to evaluate the performance of the 2 indices. Results PPGs from 100 patients were used to develop the spectrogram–CNN index. When there was pain, the mean (95% CI) spectrogram–CNN index value increased significantly—baseline: 28.5 (24.2-30.7) versus recovery area: 65.7 (60.5-68.3); P<.01. The AUC and balanced accuracy were 0.76 and 71.4%, respectively. The spectrogram–CNN index cutoff value for detecting pain was 48, with a sensitivity of 68.3% and specificity of 73.8%. Conclusions Although there were limitations to the study design, we confirmed that the spectrogram–CNN index can efficiently detect postoperative pain in conscious patients. Further studies are required to assess the spectrogram–CNN index’s feasibility and prevent overfitting to various populations, including patients under general anesthesia. Trial Registration Clinical Research Information Service KCT0002080; https://cris.nih.go.kr/cris/search/search_result_st01.jsp?seq=6638
APA, Harvard, Vancouver, ISO, and other styles
16

Ferreira, Diogo R., Tiago A. Martins, and Paulo Rodrigues. "Explainable deep learning for the analysis of MHD spectrograms in nuclear fusion." Machine Learning: Science and Technology 3, no. 1 (December 30, 2021): 015015. http://dx.doi.org/10.1088/2632-2153/ac44aa.

Full text
Abstract:
Abstract In the nuclear fusion community, there are many specialized techniques to analyze the data coming from a variety of diagnostics. One of such techniques is the use of spectrograms to analyze the magnetohydrodynamic (MHD) behavior of fusion plasmas. Physicists look at the spectrogram to identify the oscillation modes of the plasma, and to study instabilities that may lead to plasma disruptions. One of the major causes of disruptions occurs when an oscillation mode interacts with the wall, stops rotating, and becomes a locked mode. In this work, we use deep learning to predict the occurrence of locked modes from MHD spectrograms. In particular, we use a convolutional neural network with class activation mapping to pinpoint the exact behavior that the model thinks is responsible for the locked mode. Surprisingly, we find that, in general, the model explanation agrees quite well with the physical interpretation of the behavior observed in the spectrogram.
APA, Harvard, Vancouver, ISO, and other styles
17

He, Yuan, Xinyu Li, Runlong Li, Jianping Wang, and Xiaojun Jing. "A Deep-Learning Method for Radar Micro-Doppler Spectrogram Restoration." Sensors 20, no. 17 (September 3, 2020): 5007. http://dx.doi.org/10.3390/s20175007.

Full text
Abstract:
Radio frequency interference, which makes it difficult to produce high-quality radar spectrograms, is a major issue for micro-Doppler-based human activity recognition (HAR). In this paper, we propose a deep-learning-based method to detect and cut out the interference in spectrograms. Then, we restore the spectrograms in the cut-out region. First, a fully convolutional neural network (FCN) is employed to detect and remove the interference. Then, a coarse-to-fine generative adversarial network (GAN) is proposed to restore the part of the spectrogram that is affected by the interferences. The simulated motion capture (MOCAP) spectrograms and the measured radar spectrograms with interference are used to verify the proposed method. Experimental results from both qualitative and quantitative perspectives show that the proposed method can mitigate the interference and restore high-quality radar spectrograms. Furthermore, the comparison experiments also demonstrate the efficiency of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
18

Fudholi, Dzikri Rahadian, Muhammad Auzan, and Novia Arum Sari. "Spectrogram Window Comparison: Cough Sound Recognition using Convolutional Neural Network." IJCCS (Indonesian Journal of Computing and Cybernetics Systems) 16, no. 3 (July 31, 2022): 261. http://dx.doi.org/10.22146/ijccs.75697.

Full text
Abstract:
Cough is one of the most common symptoms of diseases, especially respiratory diseases. Quick cough detection can be the key to the current pandemic of COVID-19. Good cough recognition is the one that uses non-intrusive tools such as a mobile phone microphone that does not disable human activities like stick sensors. To do sound-only detection, Deep Learning current best method Convolutional Neural Network (CNN) is used. However, CNN needs image input while sound input differs (one dimension rather than two). An extra process is needed, converting sound data to image data using a spectrogram. When building a spectrogram, there is a question about the best size. This research will compare the spectrogram's size, called Spectrogram Window, by the performance. The result is that windows with 4 seconds have the highest F1-score performance at 92.9%. Therefore, a window of around 4 seconds will perform better for sound recognition problems.
APA, Harvard, Vancouver, ISO, and other styles
19

Liu, Haohe, Xubo Liu, Qiuqiang Kong, Wenwu Wang, and Mark D. Plumbley. "Learning Temporal Resolution in Spectrogram for Audio Classification." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 12 (March 24, 2024): 13873–81. http://dx.doi.org/10.1609/aaai.v38i12.29294.

Full text
Abstract:
The audio spectrogram is a time-frequency representation that has been widely used for audio classification. One of the key attributes of the audio spectrogram is the temporal resolution, which depends on the hop size used in the Short-Time Fourier Transform (STFT). Previous works generally assume the hop size should be a constant value (e.g., 10 ms). However, a fixed temporal resolution is not always optimal for different types of sound. The temporal resolution affects not only classification accuracy but also computational cost. This paper proposes a novel method, DiffRes, that enables differentiable temporal resolution modeling for audio classification. Given a spectrogram calculated with a fixed hop size, DiffRes merges non-essential time frames while preserving important frames. DiffRes acts as a "drop-in" module between an audio spectrogram and a classifier and can be jointly optimized with the classification task. We evaluate DiffRes on five audio classification tasks, using mel-spectrograms as the acoustic features, followed by off-the-shelf classifier backbones. Compared with previous methods using the fixed temporal resolution, the DiffRes-based method can achieve the equivalent or better classification accuracy with at least 25% computational cost reduction. We further show that DiffRes can improve classification accuracy by increasing the temporal resolution of input acoustic features, without adding to the computational cost.
APA, Harvard, Vancouver, ISO, and other styles
20

Lalla, Abderraouf, Andrea Albini, Paolo Di Barba, and Maria Evelina Mognaschi. "Spectrogram Inversion for Reconstruction of Electric Currents at Industrial Frequencies: A Deep Learning Approach." Sensors 24, no. 6 (March 11, 2024): 1798. http://dx.doi.org/10.3390/s24061798.

Full text
Abstract:
In this paper, we present a deep learning approach for identifying current intensity and frequency. The reconstruction is based on measurements of the magnetic field generated by the current flowing in a conductor. Magnetic field data are collected using a magnetic probe capable of generating a spectrogram, representing the spectrum of frequencies of the magnetic field over time. These spectrograms are saved as images characterized by color density proportional to the induction field value at a given frequency. The proposed deep learning approach utilizes a convolutional neural network (CNN) with the spectrogram image as input and the current or frequency value as output. One advantage of this approach is that current estimation is achieved contactless, using a simple magnetic field probe positioned close to the conductor.
APA, Harvard, Vancouver, ISO, and other styles
21

Horn, Skyler, and Hynek Boril. "Gender classification from speech using convolutional networks augmented with synthetic spectrograms." Journal of the Acoustical Society of America 150, no. 4 (October 2021): A358. http://dx.doi.org/10.1121/10.0008585.

Full text
Abstract:
Automatic gender classification from speech is an integral component of human-computer interfaces. Gender information is utilized in user authentication, speech recognizers, or human-centered intelligent agents. This study focuses on gender classification from speech spectrograms using AlexNet-inspired 2D convolutional neural networks (CNN) trained on real samples augmented with synthetic spectrograms. A generative adversarial network (GAN) is trained to produce synthetic male/female-like speech spectrograms. In limited training data experiments on LibriSpeech, augmenting a training set of 200 real samples by 800 synthetic samples reduces equal error rate of the classifier from 23.7% to 1.0%. To further test the ‘quality’ of the generated samples, in a subsequent experiment, the real training samples are progressively replaced (rather than augmented) with synthetic samples at various ratios from 0 (all original samples preserved) to 1 (all original samples replaced by synthetic ones). Depending on the system setup, substituting between 50% to 90% of the original samples with the synthetic ones is found to have a minimal impact on the classifier performance. Finally, viewing the input CNN layers as filters that select salient spectrogram features, the learned convolutional kernels and filter outputs are studied to understand which spectrogram areas receive a prominent attention in the classifier.
APA, Harvard, Vancouver, ISO, and other styles
22

Oh, Myeonggeun, and Yong-Hoon Kim. "Statistical Approach to Spectrogram Analysis for Radio-Frequency Interference Detection and Mitigation in an L-Band Microwave Radiometer." Sensors 19, no. 2 (January 14, 2019): 306. http://dx.doi.org/10.3390/s19020306.

Full text
Abstract:
For the elimination of radio-frequency interference (RFI) in a passive microwave radiometer, the threshold level is generally calculated from the mean value and standard deviation. However, a serious problem that can arise is an error in the retrieved brightness temperature from a higher threshold level owing to the presence of RFI. In this paper, we propose a method to detect and mitigate RFI contamination using the threshold level from statistical criteria based on a spectrogram technique. Mean and skewness spectrograms are created from a brightness temperature spectrogram by shifting the 2-D window to discriminate the form of the symmetric distribution as a natural thermal emission signal. From the remaining bins of the mean spectrogram eliminated by RFI-flagged bins in the skewness spectrogram for data captured at 0.1-s intervals, two distribution sides are identically created from the left side of the distribution by changing the standard position of the distribution. Simultaneously, kurtosis calculations from these bins for each symmetric distribution are repeatedly performed to determine the retrieved brightness temperature corresponding to the closest kurtosis value of three. The performance is evaluated using experimental data, and the maximum error and root-mean-square error (RMSE) in the retrieved brightness temperature are served to be less than approximately 3 K and 1.7 K, respectively, from a window with a size of 100 × 100 time–frequency bins according to the RFI levels and cases.
APA, Harvard, Vancouver, ISO, and other styles
23

Ender Ozturk, Fatih Erden, and Ismail Guvenc. "RF-based low-SNR classification of UAVs using convolutional neural networks." ITU Journal on Future and Evolving Technologies 2, no. 5 (July 23, 2021): 39–52. http://dx.doi.org/10.52953/qjgh3217.

Full text
Abstract:
Unmanned Aerial Vehicles (UAVs), or drones, which can be considered as a coverage extender for Internet of Everything (IoE), have drawn high attention recently. The proliferation of drones will raise privacy and security concerns in public. This paper investigates the problem of classification of drones from Radio Frequency (RF) fingerprints at the low Signal-to-Noise Ratio (SNR) regime. We use Convolutional Neural Networks (CNNs) trained with both RF time-series images and the spectrograms of 15 different off-the-shelf drone controller RF signals. When using time-series signal images, the CNN extracts features from the signal transient and envelope. As the SNR decreases, this approach fails dramatically because the information in the transient is lost in the noise, and the envelope is distorted heavily. In contrast to time-series representation of the RF signals, with spectrograms, it is possible to focus only on the desired frequency interval, i.e., 2.4 GHz ISM band, and filter out any other signal component outside of this band. These advantages provide a notable performance improvement over the time-series signals-based methods. To further increase the classification accuracy of the spectrogram-based CNN, we denoise the spectrogram images by truncating them to a limited spectral density interval. Creating a single model using spectrogram images of noisy signals and tuning the CNN model parameters, we achieve a classification accuracy varying from 92% to 100% for an SNR range from -10 dB to 30 dB, which significantly outperforms the existing approaches to our best knowledge.
APA, Harvard, Vancouver, ISO, and other styles
24

Huh, Jiung, Huan Pham Van, Soonyoung Han, Hae-Jin Choi, and Seung-Kyum Choi. "A Data-Driven Approach for the Diagnosis of Mechanical Systems Using Trained Subtracted Signal Spectrograms." Sensors 19, no. 5 (March 1, 2019): 1055. http://dx.doi.org/10.3390/s19051055.

Full text
Abstract:
Toward the prognostic and health management of mechanical systems, we propose and validate a novel effective, data-driven fault diagnosis method. In this method, we develop a trained subtracted spectrogram, the so called critical information map (CIM), identifying the difference between the signal spectrograms of normal and abnormal status. We believe this diagnosis process may be implemented in an autonomous manner so that an engineer employs it without expert knowledge in signal processing or mechanical analyses. Firstly, the CIM method applies sequential and autonomous procedures of time-synchronization, time frequency conversion, and spectral subtraction on raw signal. Secondly, the subtracted spectrogram is then trained to be a CIM for a specific mechanical system failure by finding out the optimal parameters and abstracted information of the spectrogram. Finally, the status of a system health can be monitored accurately by comparing the CIM with an acquired signal map in an automated and timely manner. The effectiveness of the proposed method is successfully validated by employing a diagnosis problem of six-degree-of-freedom industrial robot, which is the diagnosis of a non-stationary system with a small amount of training datasets.
APA, Harvard, Vancouver, ISO, and other styles
25

Rawat, Priyanshu, Madhvan Bajaj, Satvik Vats, and Vikrant Sharma. "A comprehensive study based on MFCC and spectrogram for audio classification." Journal of Information and Optimization Sciences 44, no. 6 (2023): 1057–74. http://dx.doi.org/10.47974/jios-1431.

Full text
Abstract:
Music Assortment is a music information retrieval (MIR) function to decide music connotation computationally. In recent years, deep neural networks have been proven to be effective in numerous classification tasks, including music genre categorisation. In this paper, we employ a comparative study between the two different music classification techniques. The first technique uses the audio’s spectrogram image and computes the music’s genre based on its spectrogram, using the CNN model trained on the spectrograms. The second approach computes the MFCC’s (Mel-Frequency Cepstral Coefficients) musical features and utilises them to classify the music using ANN. This paper aims to study the two algorithms closely against different audio signals and check the performance report of the above-mentioned techniques to see which of them is better for music genre classification.
APA, Harvard, Vancouver, ISO, and other styles
26

Jiang, Hao, Jianqing Jiang, and Guoshao Su. "Rock Crack Types Identification by Machine Learning on the Sound Signal." Applied Sciences 13, no. 13 (June 28, 2023): 7654. http://dx.doi.org/10.3390/app13137654.

Full text
Abstract:
Sound signals generated during rock failure contain useful information about crack development. A sound-signal-based identification method for crack types is proposed. In this method, the sound signals of tensile cracks, using the Brazilian splitting test, and those of shear cracks, using the direct shear test, are collected to establish the training samples. The spectrogram is used to characterize the sound signal and is taken as the input. To solve the small sample problem, since only a small amount of sound signal spectrogram can be obtained in our experimental test, pre-trained ResNet-18 is used as a feature extractor to acquire deep characteristics of sound signal spectrograms. Gaussian process classification (GPC) is employed to establish the recognizing model and to classify crack types using the extracted deep characteristics of spectrograms. To verify the proposed method, the tensile and shear crack development processes during the biaxial test are identified. The results show that the proposed method is feasible. Moreover, this method is used to investigate the tensile and shear crack development during the rockburst process. The obtained results are consistent with previous research results, further confirming the accuracy and rationality of this method.
APA, Harvard, Vancouver, ISO, and other styles
27

Léonard, François. "Phase spectrogram and frequency spectrogram as new diagnostic tools." Mechanical Systems and Signal Processing 21, no. 1 (January 2007): 125–37. http://dx.doi.org/10.1016/j.ymssp.2005.08.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Trufanov, N. N., D. V. Churikov, and O. V. Kravchenko. "Selection of window functions for predicting the frequency pattern of vibrations of the technological process using an artificial neural network." Journal of Physics: Conference Series 2091, no. 1 (November 1, 2021): 012074. http://dx.doi.org/10.1088/1742-6596/2091/1/012074.

Full text
Abstract:
Abstract The frequency pattern of the process is investigated by analyzing spectrograms constructed using the window Fourier transform. A set of window functions consists of a rectangular, membership, and windows based on atomic functions. The fulfillment of the condition for improving the time localization and energy concentration in the central part of the window allows one to select a window function. The resulting spectrograms are fed to the input of an artificial neural network to obtain a forecast. Varying the shape of the window functions allows us to analyze the proposed spectrogram prediction model.
APA, Harvard, Vancouver, ISO, and other styles
29

Dwijayanti, Suci, Alvio Yunita Putri, and Bhakti Yudho Suprapto. "Speaker Identification Using a Convolutional Neural Network." Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, no. 1 (February 27, 2022): 140–45. http://dx.doi.org/10.29207/resti.v6i1.3795.

Full text
Abstract:
Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.
APA, Harvard, Vancouver, ISO, and other styles
30

Cameron, J., A. Crosby, C. Paszkowski, and E. Bayne. "Visual spectrogram scanning paired with an observation–confirmation occupancy model improves the efficiency and accuracy of bioacoustic anuran data." Canadian Journal of Zoology 98, no. 11 (November 2020): 733–42. http://dx.doi.org/10.1139/cjz-2020-0103.

Full text
Abstract:
Passive acoustic monitoring using autonomous recording units has improved anuran amphibian call survey data collection. A challenge associated with this approach is the time required for audio data processing. Our objective was to develop a more efficient method of processing and analyzing acoustic data through visual spectrogram scanning and the application of an observation–confirmation occupancy model. We compared detection rates between methods of standard recording listening and visually scanning spectrogram images using different spectrogram parameters. Relative to listening, we found that 1 min spectrograms in two 30 s frames yield the best time efficiency–accuracy trade-off. A standard occupancy model applied to visual scanning data underestimated occupancy estimates relative to listening data for three species and overestimated occupancy for one species. The observation–confirmation model used a subset of listening data to improve the estimates of detection probability from visual scanning and therefore reduced bias in occupancy estimates when compared with using visual scanning data alone. Overall, the combination of the visual scanning method and the observation–confirmation model allowed us to maintain the accuracy of occupancy estimates while greatly increasing the efficiency of anuran data processing. These methods are widely applicable and can increase sample size and precision for acoustic monitoring programs using autonomous recording units.
APA, Harvard, Vancouver, ISO, and other styles
31

Zhu, Yuefan, and Xiaoying Liu. "A Lightweight CNN for Wind Turbine Blade Defect Detection Based on Spectrograms." Machines 11, no. 1 (January 11, 2023): 99. http://dx.doi.org/10.3390/machines11010099.

Full text
Abstract:
Since wind turbines are exposed to harsh working environments and variable weather conditions, wind turbine blade condition monitoring is critical to prevent unscheduled downtime and loss. Realizing that common convolutional neural networks are difficult to use in embedded devices, a lightweight convolutional neural network for wind turbine blades (WTBMobileNet) based on spectrograms is proposed, reducing computation and size with a high accuracy. Compared to baseline models, WTBMobileNet without data augmentation has an accuracy of 97.05%, a parameter of 0.315 million, and a computation of 0.423 giga floating point operations (GFLOPs), which is 9.4 times smaller and 2.7 times less computation than the best-performing model with only a 1.68% decrease in accuracy. Then, the impact of difference data augmentation is analyzed. The WTBMobileNet with augmentation has an accuracy of 98.1%, and the accuracy of each category is above 95%. Furthermore, the interpretability and transparency of WTBMobileNet are demonstrated through class activation mapping for reliable deployment. Finally, WTBMobileNet is explored in drones image classification and spectrogram object detection, whose accuracy and mAP@[0.5, 0.95] are 89.55% and 70.7%, respectively. This proves that WTBMobileNet not only has a good performance in spectrogram classification, but also has good application potential in drone image classification and spectrogram object detection.
APA, Harvard, Vancouver, ISO, and other styles
32

Liao, Ying. "Analysis of Rehabilitation Occupational Therapy Techniques Based on Instrumental Music Chinese Tonal Language Spectrogram Analysis." Occupational Therapy International 2022 (October 3, 2022): 1–12. http://dx.doi.org/10.1155/2022/1064441.

Full text
Abstract:
This paper provides an in-depth analysis of timbre-speech spectrograms in instrumental music, designs a model analysis of rehabilitation occupational therapy techniques based on the analysis of timbre-speech spectrograms in instrumental music, and tests the models for comparison. Starting from the mechanism of human articulation, this paper models the process of human expression as a time-varying linear system consisting of excitation, vocal tract, and radiation models. The system’s overall architecture is designed according to the characteristics of Chinese speech and everyday speech rehabilitation theory (HSL theory). The dual judgment of temporal threshold and short-time average energy realized the phonetic length training. Tone and clear tone training were achieved by linear predictive coding technique (LPC) and autocorrelation function. Using the DTW technique, isolated word speech recognition was achieved by extracting Mel-scale Frequency Cepstral Coefficients (MFCC) parameters of speech signals. The system designs corresponding training scenes for each training module according to the extracted speech parameters, combines the multimedia speech spectrogram motion situation with the speech parameters, and finally presents the training content as a speech spectrogram, and evaluates the training results through human-machine interaction to stimulate the interest of rehabilitation therapy and realize the speech rehabilitation training of patients. After analyzing the pre- and post-test data, it was found that the p -values of all three groups were <0.05, which was judged to be significantly different. Also, all subjects changed their behavioral data during the treatment. Therefore, it was concluded that the music therapy technique could improve the patients’ active gaze communication ability, verbal command ability, and active question-answering ability after summarizing the data, i.e., the hypothesis of this experiment is valid. Therefore, it is believed that the technique of timbre-speech spectrogram analysis in instrumental music can achieve the effect of rehabilitation therapy to a certain extent.
APA, Harvard, Vancouver, ISO, and other styles
33

Lopes, Marilia, Raymundo Cassani, and Tiago H. Falk. "Using CNN Saliency Maps and EEG Modulation Spectra for Improved and More Interpretable Machine Learning-Based Alzheimer’s Disease Diagnosis." Computational Intelligence and Neuroscience 2023 (February 8, 2023): 1–17. http://dx.doi.org/10.1155/2023/3198066.

Full text
Abstract:
Biomarkers based on resting-state electroencephalography (EEG) signals have emerged as a promising tool in the study of Alzheimer’s disease (AD). Recently, a state-of-the-art biomarker was found based on visual inspection of power modulation spectrograms where three “patches” or regions from the modulation spectrogram were proposed and used for AD diagnostics. Here, we propose the use of deep neural networks, in particular convolutional neural networks (CNNs) combined with saliency maps, trained on power modulation spectrogram inputs to find optimal patches in a data-driven manner. Experiments are conducted on EEG data collected from fifty-four participants, including 20 healthy controls, 19 patients with mild AD, and 15 moderate-to-severe AD patients. Five classification tasks are explored, including the three-class problem, early-stage detection (control vs. mild-AD), and severity level detection (mild vs. moderate-to-severe). Experimental results show the proposed biomarkers outperform the state-of-the-art benchmark across all five tasks, as well as finding complementary modulation spectrogram regions not previously seen via visual inspection. Lastly, experiments are conducted on the proposed biomarkers to test their sensitivity to age, as this is a known confound in AD characterization. Across all five tasks, none of the proposed biomarkers showed a significant relationship with age, thus further highlighting their usefulness for automated AD diagnostics.
APA, Harvard, Vancouver, ISO, and other styles
34

Bruni, Vittoria, Michela Tartaglione, and Domenico Vitulano. "A Fast and Robust Spectrogram Reassignment Method." Mathematics 7, no. 4 (April 19, 2019): 358. http://dx.doi.org/10.3390/math7040358.

Full text
Abstract:
The improvement of the readability of time-frequency transforms is an important topic in the field of fast-oscillating signal processing. The reassignment method is often used due to its adaptivity to different transforms and nice formal properties. However, it strongly depends on the selection of the analysis window and it requires the computation of the same transform using three different but well-defined windows. The aim of this work is to provide a simple method for spectrogram reassignment, named FIRST (Fast Iterative and Robust Reassignment Thinning), with comparable or better precision than classical reassignment method, a reduced computational effort, and a near independence of the adopted analysis window. To this aim, the time-frequency evolution of a multicomponent signal is formally provided and, based on this law, only a subset of time-frequency points is used to improve spectrogram readability. Those points are the ones less influenced by interfering components. Preliminary results show that the proposed method can efficiently reassign spectrograms more accurately than the classical method in the case of interfering signal components, with a significant gain in terms of required computational effort.
APA, Harvard, Vancouver, ISO, and other styles
35

Dhakne, Dr Amol, Vaishnav M. Kuduvan, Aniket Palhade, Tarun Kanjwani, and Rushikesh Kshirsagar. "Bird Species Identification using Audio Signal Processing and Neural Networks." International Journal for Research in Applied Science and Engineering Technology 10, no. 5 (May 31, 2022): 4002–5. http://dx.doi.org/10.22214/ijraset.2022.43309.

Full text
Abstract:
Abstract: In this work, automatic bird species recognition systems were developed, and their identification methods were investigated. Automatically identifying bird calls without physical intervention has been a large and tedious endeavor for major studies in various subfields of taxonomy and other ornithology. This task uses a two-step identification process. In the first phase, an ideal dataset was created containing all recordings of different bird species. Next, the sound clip was subjected to various sound pre-processing techniques such as pre-emphasis, framing, silence removal, and reconstruction. A spectrogram was generated for each reconstructed sound clip. The second step used a neural network given with the spectrogram as input. A convolutional neural network (CNN) classifies sound clips and recognizes bird species based on input characteristics. In the above system, a real-time implementation model was also designed and executed. Keywords: Bird species identification, bird sound, sound pre-processing techniques, Convolutional Neural Network, Spectrograms
APA, Harvard, Vancouver, ISO, and other styles
36

Ding, Congzhang, Yong Jia, Guolong Cui, Chuan Chen, Xiaoling Zhong, and Yong Guo. "Continuous Human Activity Recognition through Parallelism LSTM with Multi-Frequency Spectrograms." Remote Sensing 13, no. 21 (October 23, 2021): 4264. http://dx.doi.org/10.3390/rs13214264.

Full text
Abstract:
According to the real-living environment, radar-based human activity recognition (HAR) is dedicated to recognizing and classifying a sequence of activities rather than individual activities, thereby drawing more attention in practical applications of security surveillance, health care and human–computer interactions. This paper proposes a parallelism long short-term memory (LSTM) framework with the input of multi-frequency spectrograms to implement continuous HAR. Specifically, frequency-division short-time Fourier transformation (STFT) is performed on the data stream of continuous activities collected by a stepped-frequency continuous-wave (SFCW) radar, generating spectrograms of multiple frequencies which introduce different scattering properties and frequency resolutions. In the designed parallelism LSTM framework, multiple parallel LSTM sub-networks are trained separately to extract different temporal features from the spectrogram of each frequency and produce corresponding classification probabilities. At the decision level, the probabilities of activity classification from these sub-networks are fused by addition as the recognition output. To validate the proposed method, an experimental data set is collected by using an SFCW radar to monitor 11 participants who continuously perform six activities in sequence with three different transitions and random durations. The validation results demonstrate that the average accuracies of the designed parallelism unidirectional LSTM (Uni-LSTM) and bidirectional LSTM (Bi-LSTM) based on five frequency spectrograms are 85.41% and 96.15%, respectively, outperforming traditional Uni-LSTM and Bi-LSTM networks with only a single-frequency spectrogram by 5.35% and 6.33% at least. Additionally, the recognition accuracy of the parallelism LSTM network reveals an upward trend as the number of multi-frequency spectrograms (namely the number of LSTM subnetworks) increases, and tends to be stable when the number reaches 4.
APA, Harvard, Vancouver, ISO, and other styles
37

Dusek, Daniel. "Decomposition of Non-Stationary Signals Based on the Cochlea Function Principle." Solid State Phenomena 147-149 (January 2009): 594–99. http://dx.doi.org/10.4028/www.scientific.net/ssp.147-149.594.

Full text
Abstract:
This paper deal with possibility of cochlea function principle utilization for decomposition any non-stationary signals. The mathematical model based on array of resonators is described in this paper. This array of resonators is actuated by non-stationary signal, which is compound from different frequency components. Spectrograms calculated for different values of resonators viscous damping are results of this work and this results are also compared with spectrogram obtained from Short Time Fourier Transformation (STFT).
APA, Harvard, Vancouver, ISO, and other styles
38

Wang, Jie, Linhuang Yan, Qiaohe Yang, and Minmin Yuan. "Speech enhancement based on perceptually motivated guided spectrogram filtering." Journal of Intelligent & Fuzzy Systems 40, no. 3 (March 2, 2021): 5443–54. http://dx.doi.org/10.3233/jifs-202278.

Full text
Abstract:
In this paper, a single-channel speech enhancement algorithm is proposed by using guided spectrogram filtering based on masking properties of human auditory system when considering a speech spectrogram as an image. Guided filtering is capable of sharpening details and estimating unwanted textures or background noise from the noisy speech spectrogram. If we consider the noisy spectrogram as a degraded image, we can estimate the spectrogram of the clean speech signal using guided filtering after subtracting noise components. Combined with masking properties of human auditory system, the proposed algorithm adaptively adjusts and reduces the residual noise of the enhanced speech spectrogram according to the corresponding masking threshold. Because the filtering output is a local linear transform of the guidance spectrogram, the local mask window slides can be efficiently implemented via box filter with O(N) computational complexity. Experimental results show that the proposed algorithm can effectively suppress noise in different noisy environments and thus can greatly improve speech quality and speech intelligibility.
APA, Harvard, Vancouver, ISO, and other styles
39

Ladefoged, Peter. "Answers to spectrograms in 20.1." Journal of the International Phonetic Association 20, no. 2 (December 1990): 19–20. http://dx.doi.org/10.1017/s0025100300004199.

Full text
Abstract:
In the previous issue of the Journal, on page 52, there were reproductions of two spectrograms that readers were invited to interpret. Both had been tested by presenting them to a group of about half a dozen experienced spectrogram readers. This group discussed the reproductions, arguing for different interpretations until they had reached a consensus on what phrases must have been spoken. The following are some comments that readers might wish to consider, based on the group's procedure in interpreting the first phrase, which is reproduced again below. Remember that the spectrograms contain English sentences spoken in a British accent close to the RP standard.
APA, Harvard, Vancouver, ISO, and other styles
40

Nanni, Loris, Andrea Rigo, Alessandra Lumini, and Sheryl Brahnam. "Spectrogram Classification Using Dissimilarity Space." Applied Sciences 10, no. 12 (June 17, 2020): 4176. http://dx.doi.org/10.3390/app10124176.

Full text
Abstract:
In this work, we combine a Siamese neural network and different clustering techniques to generate a dissimilarity space that is then used to train an SVM for automated animal audio classification. The animal audio datasets used are (i) birds and (ii) cat sounds, which are freely available. We exploit different clustering methods to reduce the spectrograms in the dataset to a number of centroids that are used to generate the dissimilarity space through the Siamese network. Once computed, we use the dissimilarity space to generate a vector space representation of each pattern, which is then fed into an support vector machine (SVM) to classify a spectrogram by its dissimilarity vector. Our study shows that the proposed approach based on dissimilarity space performs well on both classification problems without ad-hoc optimization of the clustering methods. Moreover, results show that the fusion of CNN-based approaches applied to the animal audio classification problem works better than the stand-alone CNNs.
APA, Harvard, Vancouver, ISO, and other styles
41

Feng, Sheng, Xiaoqiang Hua, and Xiaoqian Zhu. "Matrix Information Geometry for Spectral-Based SPD Matrix Signal Detection with Dimensionality Reduction." Entropy 22, no. 9 (August 20, 2020): 914. http://dx.doi.org/10.3390/e22090914.

Full text
Abstract:
In this paper, a novel signal detector based on matrix information geometric dimensionality reduction (DR) is proposed, which is inspired from spectrogram processing. By short time Fourier transform (STFT), the received data are represented as a 2-D high-precision spectrogram, from which we can well judge whether the signal exists. Previous similar studies extracted insufficient information from these spectrograms, resulting in unsatisfactory detection performance especially for complex signal detection task at low signal-noise-ratio (SNR). To this end, we use a global descriptor to extract abundant features, then exploit the advantages of matrix information geometry technique by constructing the high-dimensional features as symmetric positive definite (SPD) matrices. In this case, our task for signal detection becomes a binary classification problem lying on an SPD manifold. Promoting the discrimination of heterogeneous samples through information geometric DR technique that is dedicated to SPD manifold, our proposed detector achieves satisfactory signal detection performance in low SNR cases using the K distribution simulation and the real-life sea clutter data, which can be widely used in the field of signal detection.
APA, Harvard, Vancouver, ISO, and other styles
42

Wang, Xuebing, Junbao Zhang, and Mengxue Yang. "SonicGest: Ultrasonic Gesture Recognition System Combined with GAN on Smartphones." Journal of Sensors 2023 (June 15, 2023): 1–15. http://dx.doi.org/10.1155/2023/6813911.

Full text
Abstract:
In recent years, with the widespread popularity of mobile devices, gesture recognition as a way of human-computer interaction has received more and more attention. However, existing gesture recognition methods have their limitations, such as requiring additional hardware devices, invading user privacy, and causing difficulty in data collection. To address these issues, we propose SonicGest, a recognition system that utilizes acoustic signals to sense in-air gestures. The system only needs the built-in speaker and microphone of the smartphone, without any additional hardware and no privacy disclosure. SonicGest transforms the features of the acoustic Doppler effect caused by gesture movements into a spectrogram, uses spectrogram enhancement techniques to remove noise interference, and then builds a convolutional neural network (CNN) classification model to recognize different gestures. To solve the problem of data collection difficulties, we utilize the Wasserstein distance based on gradient penalty to optimize the loss function of the generative adversarial network (GAN) to generate high-quality spectrograms to expand the dataset. The experimental results show that SonicGest has a recognition accuracy of 98.9% for ten kinds of gestures.
APA, Harvard, Vancouver, ISO, and other styles
43

Trapanotto, Martino, Loris Nanni, Sheryl Brahnam, and Xiang Guo. "Convolutional Neural Networks for the Identification of African Lions from Individual Vocalizations." Journal of Imaging 8, no. 4 (April 1, 2022): 96. http://dx.doi.org/10.3390/jimaging8040096.

Full text
Abstract:
The classification of vocal individuality for passive acoustic monitoring (PAM) and census of animals is becoming an increasingly popular area of research. Nearly all studies in this field of inquiry have relied on classic audio representations and classifiers, such as Support Vector Machines (SVMs) trained on spectrograms or Mel-Frequency Cepstral Coefficients (MFCCs). In contrast, most current bioacoustic species classification exploits the power of deep learners and more cutting-edge audio representations. A significant reason for avoiding deep learning in vocal identity classification is the tiny sample size in the collections of labeled individual vocalizations. As is well known, deep learners require large datasets to avoid overfitting. One way to handle small datasets with deep learning methods is to use transfer learning. In this work, we evaluate the performance of three pretrained CNNs (VGG16, ResNet50, and AlexNet) on a small, publicly available lion roar dataset containing approximately 150 samples taken from five male lions. Each of these networks is retrained on eight representations of the samples: MFCCs, spectrogram, and Mel spectrogram, along with several new ones, such as VGGish and stockwell, and those based on the recently proposed LM spectrogram. The performance of these networks, both individually and in ensembles, is analyzed and corroborated using the Equal Error Rate and shown to surpass previous classification attempts on this dataset; the best single network achieved over 95% accuracy and the best ensembles over 98% accuracy. The contributions this study makes to the field of individual vocal classification include demonstrating that it is valuable and possible, with caution, to use transfer learning with single pretrained CNNs on the small datasets available for this problem domain. We also make a contribution to bioacoustics generally by offering a comparison of the performance of many state-of-the-art audio representations, including for the first time the LM spectrogram and stockwell representations. All source code for this study is available on GitHub.
APA, Harvard, Vancouver, ISO, and other styles
44

Park, Dongsuk, Seungeui Lee, SeongUk Park, and Nojun Kwak. "Radar-Spectrogram-Based UAV Classification Using Convolutional Neural Networks." Sensors 21, no. 1 (December 31, 2020): 210. http://dx.doi.org/10.3390/s21010210.

Full text
Abstract:
With the upsurge in the use of Unmanned Aerial Vehicles (UAVs) in various fields, detecting and identifying them in real-time are becoming important topics. However, the identification of UAVs is difficult due to their characteristics such as low altitude, slow speed, and small radar cross-section (LSS). With the existing deterministic approach, the algorithm becomes complex and requires a large number of computations, making it unsuitable for real-time systems. Hence, effective alternatives enabling real-time identification of these new threats are needed. Deep learning-based classification models learn features from data by themselves and have shown outstanding performance in computer vision tasks. In this paper, we propose a deep learning-based classification model that learns the micro-Doppler signatures (MDS) of targets represented on radar spectrogram images. To enable this, first, we recorded five LSS targets (three types of UAVs and two different types of human activities) with a frequency modulated continuous wave (FMCW) radar in various scenarios. Then, we converted signals into spectrograms in the form of images by Short time Fourier transform (STFT). After the data refinement and augmentation, we made our own radar spectrogram dataset. Secondly, we analyzed characteristics of the radar spectrogram dataset with the ResNet-18 model and designed the ResNet-SP model with less computation, higher accuracy and stability based on the ResNet-18 model. The results show that the proposed ResNet-SP has a training time of 242 s and an accuracy of 83.39%, which is superior to the ResNet-18 that takes 640 s for training with an accuracy of 79.88%.
APA, Harvard, Vancouver, ISO, and other styles
45

Król, Andrzej, and Tomasz Szymczyk. "Comparative analysis of the quality of recorded sound in the function of different recording formats." Journal of Computer Sciences Institute 24 (September 30, 2022): 189–94. http://dx.doi.org/10.35784/jcsi.2934.

Full text
Abstract:
In article, the quality of the following encoders was analyzed: mp3, AAC, wma and OGG Vorbis. An original graphic method was used to carry out the quantitative research. It consists in comparing the number of pixels (representing data) between the spectrogram of a wav file and the spectrograms of files compressed with different codecs and bit rates. It has been shown that the Ogg Vorbis encoder retains the most data from the uncompressed wav sample in all tested bit rates (128KBit / s, 160KBit / s, 320KBit / s).
APA, Harvard, Vancouver, ISO, and other styles
46

R. mergu, Rohini, and Shantanu K. Dixit. "Multi-Resolution Speech Spectrogram." International Journal of Computer Applications 15, no. 4 (February 28, 2011): 28–32. http://dx.doi.org/10.5120/1937-2587.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Svetlakov, Mikhail, Ilya Kovalev, Anton Konev, Evgeny Kostyuchenko, and Artur Mitsel. "Representation Learning for EEG-Based Biometrics Using Hilbert–Huang Transform." Computers 11, no. 3 (March 20, 2022): 47. http://dx.doi.org/10.3390/computers11030047.

Full text
Abstract:
A promising approach to overcome the various shortcomings of password systems is the use of biometric authentication, in particular the use of electroencephalogram (EEG) data. In this paper, we propose a subject-independent learning method for EEG-based biometrics using Hilbert spectrograms of the data. The proposed neural network architecture treats the spectrogram as a collection of one-dimensional series and applies one-dimensional dilated convolutions over them, and a multi-similarity loss was used as the loss function for subject-independent learning. The architecture was tested on the publicly available PhysioNet EEG Motor Movement/Imagery Dataset (PEEGMIMDB) with a 14.63% Equal Error Rate (EER) achieved. The proposed approach’s main advantages are subject independence and suitability for interpretation via created spectrograms and the integrated gradients method.
APA, Harvard, Vancouver, ISO, and other styles
48

Franzoni, Valentina, Giulio Biondi, and Alfredo Milani. "Emotional sounds of crowds: spectrogram-based analysis using deep learning." Multimedia Tools and Applications 79, no. 47-48 (August 17, 2020): 36063–75. http://dx.doi.org/10.1007/s11042-020-09428-x.

Full text
Abstract:
AbstractCrowds express emotions as a collective individual, which is evident from the sounds that a crowd produces in particular events, e.g., collective booing, laughing or cheering in sports matches, movies, theaters, concerts, political demonstrations, and riots. A critical question concerning the innovative concept of crowd emotions is whether the emotional content of crowd sounds can be characterized by frequency-amplitude features, using analysis techniques similar to those applied on individual voices, where deep learning classification is applied to spectrogram images derived by sound transformations. In this work, we present a technique based on the generation of sound spectrograms from fragments of fixed length, extracted from original audio clips recorded in high-attendance events, where the crowd acts as a collective individual. Transfer learning techniques are used on a convolutional neural network, pre-trained on low-level features using the well-known ImageNet extensive dataset of visual knowledge. The original sound clips are filtered and normalized in amplitude for a correct spectrogram generation, on which we fine-tune the domain-specific features. Experiments held on the finally trained Convolutional Neural Network show promising performances of the proposed model to classify the emotions of the crowd.
APA, Harvard, Vancouver, ISO, and other styles
49

Appenzeller, I., B. Wolf, and O. Stahl. "An extended nebulosity surrounding the S Dor variable R 127." Symposium - International Astronomical Union 122 (1987): 429–30. http://dx.doi.org/10.1017/s0074180900156876.

Full text
Abstract:
Using the CASPEC echelle spectrograph of the European Southern Observatory, La Silla, Chile, we obtained new high resolution spectrograms of the LMC S Dor variable R 127 in the blue and red spectral range.The red spectrogram, which contains the [N II] 6548 and 6533 and the [S II] 6717 and 6731 lines shows the presence of a well resolved extended gaseous nebula around R 127 (see Figures 1 and 2). The nebula (which is also detected at the Balmer lines) shows blueshifted and redshifted emission (projected) on the position of the stellar continuum, and no wavelength-shift at the maximum (East-West) distance from the star. Hence, the nebulosity appears to be an expanding shell, reminiscent of the nebula around the galactic extreme supergiant AG Car. The angular diameter (or East-West extension) of the nebula around R 127 is of the order 4“, corresponding to ≈1 pc at the distance of the LMC. The expansion velocity of the R 127 nebula is found to be 28 km s−1 from our spectrograms. Hence, assuming a constant expansion velocity we derive a kinematic age of the R 127 nebula of ≈2 · 104 years. This corresponds closely to the expected lifetime of the S Dor evolutionary phase.A more detailed description of our results will be published in the proceedings of the Workshop on “Instabilities in Luminous Early Type Stars” (C. de Loore, H. Lamers, eds.) Lunteren 1986.
APA, Harvard, Vancouver, ISO, and other styles
50

Mersy, Gabriel. "Efficient Robust Music Genre Classification with Depthwise Separable Convolutions and Source Separation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15972–73. http://dx.doi.org/10.1609/aaai.v35i18.17982.

Full text
Abstract:
Given recent advances in deep music source separation, a feature representation method is proposed that combines source separation with a state-of-the-art representation learning technique that is suitably repurposed for computer audition (i.e. machine listening). A depthwise separable convolutional neural network is trained on a challenging electronic dance music (EDM) data set and its performance is compared to convolutional neural networks operating on both source separated and standard spectrograms. It is shown that source separation improves classification performance in a limited-data setting compared to the standard single spectrogram approach.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography