To see the other types of publications on this topic, follow the link: Speech Activity Detection (SAD).

Journal articles on the topic 'Speech Activity Detection (SAD)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech Activity Detection (SAD).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Kaur, Sukhvinder, and J. S. Sohal. "Speech Activity Detection and its Evaluation in Speaker Diarization System." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 16, no. 1 (2017): 7567–72. http://dx.doi.org/10.24297/ijct.v16i1.5893.

Full text
Abstract:
In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concl
APA, Harvard, Vancouver, ISO, and other styles
2

Gimeno, Pablo, Dayana Ribas, Alfonso Ortega, Antonio Miguel, and Eduardo Lleida. "Unsupervised Adaptation of Deep Speech Activity Detection Models to Unseen Domains." Applied Sciences 12, no. 4 (2022): 1832. http://dx.doi.org/10.3390/app12041832.

Full text
Abstract:
Speech Activity Detection (SAD) aims to accurately classify audio fragments containing human speech. Current state-of-the-art systems for the SAD task are mainly based on deep learning solutions. These applications usually show a significant drop in performance when test data are different from training data due to the domain shift observed. Furthermore, machine learning algorithms require large amounts of labelled data, which may be hard to obtain in real applications. Considering both ideas, in this paper we evaluate three unsupervised domain adaptation techniques applied to the SAD task. A
APA, Harvard, Vancouver, ISO, and other styles
3

Dutta, Satwik, Prasanna Kothalkar, Johanna Rudolph, et al. "Advancing speech activity detection for automatic speech assessment of pre-school children prompted speech using COMBO-SAD." Journal of the Acoustical Society of America 148, no. 4 (2020): 2469–67. http://dx.doi.org/10.1121/1.5146831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Muhammad, Fahreza Alghifari, Surya Gunawan Teddy, Aminah binti Wan Nordin Mimi, Asif Ahmad Qadri Syed, Kartiwi Mira, and Janin Zuriati. "On the use of voice activity detection in speech emotion recognition." Bulletin of Electrical Engineering and Informatics 8, no. 4 (2019): 1324–32. https://doi.org/10.11591/eei.v8i4.1646.

Full text
Abstract:
Emotion recognition through speech has many potential applications, however the challenge comes from achieving a high emotion recognition while using limited resources or interference such as noise. In this paper we have explored the possibility of improving speech emotion recognition by utilizing the voice activity detection (VAD) concept. The emotional voice data from the Berlin Emotion Database (EMO-DB) and a custom-made database LQ Audio Dataset are firstly preprocessed by VAD before feature extraction. The features are then passed to the deep neural network for classification. In this pap
APA, Harvard, Vancouver, ISO, and other styles
5

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of differe
APA, Harvard, Vancouver, ISO, and other styles
6

V, Sethuram, Ande Prasad, and R. Rajeswara Rao. "Metaheuristic adapted convolutional neural network for Telugu speaker diarization." Intelligent Decision Technologies 15, no. 4 (2022): 561–77. http://dx.doi.org/10.3233/idt-211005.

Full text
Abstract:
In speech technology, a pivotal role is being played by the Speaker diarization mechanism. In general, speaker diarization is the mechanism of partitioning the input audio stream into homogeneous segments based on the identity of the speakers. The automatic transcription readability can be improved with the speaker diarization as it is good in recognizing the audio stream into the speaker turn and often provides the true speaker identity. In this research work, a novel speaker diarization approach is introduced under three major phases: Feature Extraction, Speech Activity Detection (SAD), and
APA, Harvard, Vancouver, ISO, and other styles
7

Khalil, Driss, Amrutha Prasad, Petr Motlicek, et al. "An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain." Aerospace 10, no. 10 (2023): 876. http://dx.doi.org/10.3390/aerospace10100876.

Full text
Abstract:
In air traffic management (ATM), voice communications are critical for ensuring the safe and efficient operation of aircraft. The pertinent voice communications—air traffic controller (ATCo) and pilot—are usually transmitted in a single channel, which poses a challenge when developing automatic systems for air traffic management. Speaker clustering is one of the challenges when applying speech processing algorithms to identify and group the same speaker among different speakers. We propose a pipeline that deploys (i) speech activity detection (SAD) to identify speech segments, (ii) an automati
APA, Harvard, Vancouver, ISO, and other styles
8

PROLAY GHOSH. "An Improved Convolutional Neural Network For Speech Detection." Journal of Information Systems Engineering and Management 10, no. 3 (2025): 621–30. https://doi.org/10.52783/jisem.v10i3.5951.

Full text
Abstract:
The detection of emotions from speech is the aim of this paper. Speech consists of anger, joy and fear have very high and wide range in pitch, whereas Speech consists of sad and tired emotion have very low pitch. Speech Emotion detection technology can recognize human emotions to help machines better for understanding intentions of a user to improve the human-computer interaction. Classification model named Convolutional Neural Network (CNN) based on mainly Mel Frequency Cepstral Coefficient (MFCC) feature to detect emotion have been presented here. Different approaches have been discussed and
APA, Harvard, Vancouver, ISO, and other styles
9

Zhao, Hui, Yu Tai Wang, and Xing Hai Yang. "Emotion Detection System Based on Speech and Facial Signals." Advanced Materials Research 459 (January 2012): 483–87. http://dx.doi.org/10.4028/www.scientific.net/amr.459.483.

Full text
Abstract:
This paper introduces the present status of speech emotion detection. In order to improve the emotion recognition rate of single mode, the bimodal fusion method based on speech and facial expression is proposed. First, we establishes emotional database include speech and facial expression. For different emotions, calm, happy, surprise, anger, sad, we extract ten speech parameters and use the PCA method to detect the speech emotion. Then we analyze the bimodal emotion detection of fusing facial expression information. The experiment results show that the emotion recognition rate with bimodal fu
APA, Harvard, Vancouver, ISO, and other styles
10

Rajdeep, Bhoomi, Hardik B. ,. Patel, and Sailesh Iyer. "Human Emotion Identification from Speech using Neural Network." International Journal of Computers 16 (November 10, 2022): 87–103. http://dx.doi.org/10.46300/9108.2022.16.15.

Full text
Abstract:
Detection of mood and behavior by voice analysis which helps to detect the speaker’s mood by the voice frequency. Here, I aim to present the mood like happy, and sad and behavior detection devices using machine learning and artificial intelligence which can be detected by voice analysis. Using this device, it detects the user’s mood. Moreover, this device detects the frequency by trained model and algorithm. The algorithm is well trained to catch the frequency where it helps to identify the mood happy or sad of the speaker and behavior. On the other hand, behavior can be predicted in form, it
APA, Harvard, Vancouver, ISO, and other styles
11

Shreya, S., P. Likitha, G. Sai Charan, and Dr Shruti Bhargava Choubey. "Speech Emotion Detection Through Live Calls." International Journal for Research in Applied Science and Engineering Technology 11, no. 5 (2023): 691–95. http://dx.doi.org/10.22214/ijraset.2023.51575.

Full text
Abstract:
Abstract: Speech emotion recognition is a popular study area right now, with the goal of enhancing human-machine connection. Most of the research being done in this field now classifies emotions into different groups by extracting discriminatory features. Most of the work done nowadays concerns verbal expressions used for lexical analysis and emotion recognition. In our project, emotions are categorized into the following categories: angry, calm, fearful, happy, and sad. Speech Emotion Recognition, often known as, SER, is a technology that takes advantage of the fact that tone and pitch in a s
APA, Harvard, Vancouver, ISO, and other styles
12

Singla, Chaitanya, and Sukhdev Singh. "PEMO: A New Validated Dataset for Punjabi Speech Emotion Detection." International Journal on Recent and Innovation Trends in Computing and Communication 10, no. 10 (2022): 52–58. http://dx.doi.org/10.17762/ijritcc.v10i10.5734.

Full text
Abstract:
This research work presents a new valid dataset for Punjabi called the Punjabi Emotional Speech Database (PEMO) which has been developed to assess the ability to recognize emotions in speech by both computers and humans. The PEMO includes speech samples from about 60 speakers with an age range between 20 and 45 years, for four fundamental emotions, including anger, sad, happy and neutral. In order to create the data, Punjabi films are retrieved from different multimedia websites such as YouTube. The movies are processed and transformed into utterances with software called PRAAT. The database c
APA, Harvard, Vancouver, ISO, and other styles
13

E.S, Pallavi. "Speech Emotion Recognition Based on Machine Learning." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem33995.

Full text
Abstract:
The speech is the most effective means of communication, to recognize the emotions in speech is the most crucial task. In this paper we are using the Artificial Neural Network to recognize the emotions in speech. Hence, providing an efficient and accurate technique for speech based emotion recognition is also an important task. This study is focused on seven basic human emotions (angry, disgust, fear, happy, neutral, surprise, sad). The training and validating accuracy and also lose can be seen in a graph while training the dataset.According to it confusion matrix for model is created. The fea
APA, Harvard, Vancouver, ISO, and other styles
14

G, Apeksha. "Speech Emotion Recognition Using ANN." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem32584.

Full text
Abstract:
The speech is the most effective means of communication, to recognize the emotions in speech is the most crucial task. In this paper we are using the Artificial Neural Network to recognize the emotions in speech. Hence, providing an efficient and accurate technique for speech based emotion recognition is also an important task. This study is focused on seven basic human emotions (angry, disgust, fear, happy, neutral, surprise, sad). The training and validating accuracy and also lose can be seen in a graph while training the dataset. According to it confusion matrix for model is created. The se
APA, Harvard, Vancouver, ISO, and other styles
15

G, Apeksha. "Speech Emotion Recognition Using Machine Learning." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 05 (2024): 1–5. http://dx.doi.org/10.55041/ijsrem32388.

Full text
Abstract:
The speech is the most effective means of communication, to recognize the emotions in speech is the most crucial task. In this paper we are using the Artificial Neural Network to recognize the emotions in speech. Hence, providing an efficient and accurate technique for speech based emotion recognition is also an important task. This study is focused on seven basic human emotions (angry, disgust, fear, happy, neutral, surprise, sad). The training and validating accuracy and also lose can be seen in a graph while training the dataset. According to it confusion matrix for model is created. The se
APA, Harvard, Vancouver, ISO, and other styles
16

Maryamah, Maryamah, Nicholas Juan Kalvin Pradiptamurty, Hafiyyah Khayyiroh Shafro, Mohammad Sihabudin Al Qurtubi, Giovanny Alberta Tambahjong, and Qothrotunnidha' Almaulidiyah. "Speech Emotion Recognition (SER) dengan Metode Bidirectional LSTM." PROSIDING SEMINAR NASIONAL SAINS DATA 3, no. 1 (2023): 153–61. http://dx.doi.org/10.33005/senada.v3i1.105.

Full text
Abstract:
Emotions are a part of humans as a form of response to experienced events. Emotion analysis or known as speech emotion recognition (SER) is a field many researchers are interested in because voice recognition systems can assist in criminal investigations, monitoring, and detection of potentially dangerous events, and assisting the health care system. Therefore, this study proposes the detection of SER using the Bidirectional Long short-term memory (Bi-LSTM) model approach. The dataset used was scraped on the YouTube platform. The dataset is manually labeled then feature extraction is performed
APA, Harvard, Vancouver, ISO, and other styles
17

Heo, Youngjun, and Sunggu Lee. "Supervised Contrastive Learning for Voice Activity Detection." Electronics 12, no. 3 (2023): 705. http://dx.doi.org/10.3390/electronics12030705.

Full text
Abstract:
The noise robustness of voice activity detection (VAD) tasks, which are used to identify the human speech portions of a continuous audio signal, is important for subsequent downstream applications such as keyword spotting and automatic speech recognition. Although various aspects of VAD have been recently studied by researchers, a proper training strategy for VAD has not received sufficient attention. Thus, a training strategy for VAD using supervised contrastive learning is proposed for the first time in this paper. The proposed method is used in conjunction with audio-specific data augmentat
APA, Harvard, Vancouver, ISO, and other styles
18

Kadam, Ms Sonal. "Depression Detection Using Machine learning." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 04 (2025): 1–9. https://doi.org/10.55041/ijsrem45504.

Full text
Abstract:
Abstract - According to the World Health organization, by2030depression will be the second leading cause of disable Depression is state of Mental illness. It is characterized by long lasting feeling of sadness or despair. Most patients with depression do not complain that they are depressed. If a person is sad for really long time, then that person can be considered as depressed person. such person needs to take the proper diagnosis with the help of psychiatrist. But this people did not like visit the psychiatrist to check that if they are depressed or not because they are fear people judgemen
APA, Harvard, Vancouver, ISO, and other styles
19

Gelly, Gregory, and Jean-Luc Gauvain. "Optimization of RNN-Based Speech Activity Detection." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 3 (2018): 646–56. http://dx.doi.org/10.1109/taslp.2017.2769220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Koh, Min‐sung, and Margaret Mortz. "Improved voice activity detection of noisy speech." Journal of the Acoustical Society of America 107, no. 5 (2000): 2907–8. http://dx.doi.org/10.1121/1.428823.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Quan, Changqin, Bin Zhang, Xiao Sun, and Fuji Ren. "A combined cepstral distance method for emotional speech recognition." International Journal of Advanced Robotic Systems 14, no. 4 (2017): 172988141771983. http://dx.doi.org/10.1177/1729881417719836.

Full text
Abstract:
Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to co
APA, Harvard, Vancouver, ISO, and other styles
22

Vanita G. Kshirsagar. "Echoes of the Mind: A CNN Approach for Early Mental Health Prediction." Journal of Information Systems Engineering and Management 10, no. 31s (2025): 777–86. https://doi.org/10.52783/jisem.v10i31s.5132.

Full text
Abstract:
Introduction: The depression often goes undiagnosed due to the absence of objective and accessible detection methodology. This paper focuses on developing an audio-based system that analyzes speech patterns, tone, and sentiment to predict early signs of depression, enabling timely intervention. Objectives: To design a machine learning model that detects depression using speech features such as pitch, tone, and rhythm. To improve early mental health diagnosis by leveraging audio-based sentiment analysis. Methods: Speech signals will be processed using feature extraction techniques like MFCCs (M
APA, Harvard, Vancouver, ISO, and other styles
23

Koctúrová, Marianna, and Jozef Juhár. "Neural Network Architecture for EEG Based Speech Activity Detection." Acta Electrotechnica et Informatica 21, no. 4 (2021): 9–13. http://dx.doi.org/10.2478/aei-2021-0002.

Full text
Abstract:
Abstract In this paper, research focused on speech activity detection using brain EEG signals is presented. In addition to speech stimulation of brain activity, an innovative approach based on the simultaneous stimulation of the brain by visual stimuli such as reading and color naming has been used. Designing the solution, classification using two types of artificial neural networks were proposed: shallow Feed-forward Neural Network and deep Convolutional Neural Network. Experimental results of classification demonstrated F1 score 79.50% speech detection using shallow neural network and 84.39%
APA, Harvard, Vancouver, ISO, and other styles
24

Dash, Debadatta, Paul Ferrari, Satwik Dutta, and Jun Wang. "NeuroVAD: Real-Time Voice Activity Detection from Non-Invasive Neuromagnetic Signals." Sensors 20, no. 8 (2020): 2248. http://dx.doi.org/10.3390/s20082248.

Full text
Abstract:
Neural speech decoding-driven brain-computer interface (BCI) or speech-BCI is a novel paradigm for exploring communication restoration for locked-in (fully paralyzed but aware) patients. Speech-BCIs aim to map a direct transformation from neural signals to text or speech, which has the potential for a higher communication rate than the current BCIs. Although recent progress has demonstrated the potential of speech-BCIs from either invasive or non-invasive neural signals, the majority of the systems developed so far still assume knowing the onset and offset of the speech utterances within the c
APA, Harvard, Vancouver, ISO, and other styles
25

Balabanova, Tatiana N., Diana I. Gaivoronskayа, and Anna N. Doborovich. "Using neural network technologies in determining the emotional state of a person in oral communication." Research Result. Theoretical and Applied Linguistics 10, no. 4 (2024): 17–39. https://doi.org/10.18413/2313-8912-2024-10-4-0-2.

Full text
Abstract:
Human oral speech often has an emotional connotation; this is due to the fact that emotions and our mood influence the physiology of the vocal tract and, as a result, speech. When a person is happy, worried, sad or angry, it is reflected in various characteristics of the voice, the pace of speech and its intonation. However, assessing a person’s emotional state through speech can have a beneficial effect on various areas of life, for example, medicine, psychology, criminology, marketing and education, etc. In medicine, the use of assessing emotions by speech can help in the diagnosis and treat
APA, Harvard, Vancouver, ISO, and other styles
26

Sheriff, Alimi, and Yussuff I. O. Abayomi. "Voice Activity Detection Using Weighted K-Means Thresholding Algorithm." International Journal of Innovative Technology and Exploring Engineering 14, no. 4 (2025): 1–7. https://doi.org/10.35940/ijitee.d1051.14040325.

Full text
Abstract:
Voice activity detection (VAD) separates speech segments from silent segments of an audio signal, and it is valuable for many speech-processing applications because it assists in improving performance and system efficiency; such applications include speech recognition and speaker verification. In this study, K-means, a clustering algorithm, was extended to a thresholding algorithm termed K-means weighted thresholding and was utilized for discriminating voiced/speech segments from silent segments from audio or speech signals. The voice signal was fragmented into frames of 2048 samples, and the
APA, Harvard, Vancouver, ISO, and other styles
27

Alimi, Sheriff. "Voice Activity Detection Using Weighted K-Means Thresholding Algorithm." International Journal of Innovative Technology and Exploring Engineering (IJITEE) 14, no. 4 (2025): 1–7. https://doi.org/10.35940/ijitee.D1051.14040325.

Full text
Abstract:
<strong>Abstract:</strong> Voice activity detection (VAD) separates speech segments from silent segments of an audio signal, and it is valuable for many speech-processing applications because it assists in improving performance and system efficiency; such applications include speech recognition and speaker verification. In this study, K-means, a clustering algorithm, was extended to a thresholding algorithm termed K-means weighted thresholding and was utilized for discriminating voiced/speech segments from silent segments from audio or speech signals. The voice signal was fragmented into frame
APA, Harvard, Vancouver, ISO, and other styles
28

Ouyang, Qianhe. "Speech emotion detection based on MFCC and CNN-LSTM architecture." Applied and Computational Engineering 5, no. 1 (2023): 243–49. http://dx.doi.org/10.54254/2755-2721/5/20230570.

Full text
Abstract:
Emotion detection techniques have been applied to multiple cases mainly from facial image features and vocal audio features, of which the latter aspect is disputed yet not only due to the complexity of speech audio processing but also the difficulties of extracting appropriate features. Part of the SAVEE and RAVDESS datasets are selected and combined as the dataset, containing seven sorts of common emotions (i.e. happy, neutral, sad, anger, disgust, fear, and surprise) and thousands of samples. Based on the Librosa package, this paper processes the initial audio input into waveplot and spectru
APA, Harvard, Vancouver, ISO, and other styles
29

Faghani, Maral, Hamidreza Rezaee-Dehsorkh, Nassim Ravanshad, and Hamed Aminzadeh. "Ultra-Low-Power Voice Activity Detection System Using Level-Crossing Sampling." Electronics 12, no. 4 (2023): 795. http://dx.doi.org/10.3390/electronics12040795.

Full text
Abstract:
This paper presents an ultra-low-power voice activity detection (VAD) system to discriminate speech from non-speech parts of audio signals. The proposed VAD system uses level-crossing sampling for voice activity detection. The useless samples in the non-speech parts of the signal are eliminated due to the activity-dependent nature of this sampling scheme. A 40 ms moving window with a 30 ms overlap is exploited as a feature extraction block, within which the output samples of the level-crossing analog-to-digital converter (LC-ADC) are counted as the feature. The only variable used to distinguis
APA, Harvard, Vancouver, ISO, and other styles
30

Hu, Da Li, Liang Zhong Yi, Zheng Pei, and Bing Luo. "Voice Activity Detection with Decision Trees in Noisy Environments." Applied Mechanics and Materials 128-129 (October 2011): 749–52. http://dx.doi.org/10.4028/www.scientific.net/amm.128-129.749.

Full text
Abstract:
An improved project based on double thresholds method in noisy environments is proposed for robust endpoints detection. Firstly, in this method, the distribution of zero crossing rate (ZCR) on the preprocessed signal is taken into account, and then the speech signal is divided into different parts to obtain appropriate thresholds with decision trees on the basis of the ZCR distribution. Finally, the double thresholds method, focusing on different importance of the energy and ZCR, is taken in the corresponding situation to determine the input segment is speech or non-speech. Simulation results
APA, Harvard, Vancouver, ISO, and other styles
31

Singh, Jagjeet, Lakshmi Babu Saheer, and Oliver Faust. "Speech Emotion Recognition Using Attention Model." International Journal of Environmental Research and Public Health 20, no. 6 (2023): 5140. http://dx.doi.org/10.3390/ijerph20065140.

Full text
Abstract:
Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify
APA, Harvard, Vancouver, ISO, and other styles
32

Li, Qiang, Hong En Xie, and Qiu Ju Zheng. "The Voice Activity Detection Algorithm Based on Spectral Entropy and High-Order Statistics." Applied Mechanics and Materials 624 (August 2014): 495–99. http://dx.doi.org/10.4028/www.scientific.net/amm.624.495.

Full text
Abstract:
The voice activity detection is one of the key technologies of variable rate speech coding. The development of speech coding technology requires higher performance of the detection. Based on the analysis of spectral entropy and high-order statistics of the basic definition and property of the foundation, this article proposes a voice activity detection algorithm which combines spectral entropy with high-order statistics. The algorithm can effectively detect the speech and non-speech segments, and can get reasonable results in a complex background noise environment.
APA, Harvard, Vancouver, ISO, and other styles
33

Alexander, N., D. Nakahara, N. Ozaki, et al. "Striatal dopamine release and metabolism in sinoaortic-denervated rats by in vivo microdialysis." American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 254, no. 2 (1988): R396—R399. http://dx.doi.org/10.1152/ajpregu.1988.254.2.r396.

Full text
Abstract:
The purpose of this study was to provide new evidence favoring the hypothesis that cardiovascular information from arterial baroreceptors is integrated with the nigrostriatal system that contributes to regulation of motor activity. Samples of extracellular striatal dopamine (DA) and its metabolites, dihydroxyphenylacetic acid (DOPAC) and homovanillic acid (HVA), were collected by the technique of in vivo microdialysis and analyzed by high-performance liquid chromatography-electron capture detection. Rats were prepared with a guide tube placed in the caudate-putamen for subsequent insertion of
APA, Harvard, Vancouver, ISO, and other styles
34

Mihalache, Serban, and Dragos Burileanu. "Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection." Sensors 22, no. 3 (2022): 1228. http://dx.doi.org/10.3390/s22031228.

Full text
Abstract:
In this work, we first propose a deep neural network (DNN) system for the automatic detection of speech in audio signals, otherwise known as voice activity detection (VAD). Several DNN types were investigated, including multilayer perceptrons (MLPs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs), with the best performance being obtained for the latter. Additional postprocessing techniques, i.e., hysteretic thresholding, minimum duration filtering, and bilateral extension, were employed in order to boost performance. The systems were trained and tested using several
APA, Harvard, Vancouver, ISO, and other styles
35

Potamitis, I., and E. Fishler. "Speech activity detection of moving speaker using microphone arrays." Electronics Letters 39, no. 16 (2003): 1223. http://dx.doi.org/10.1049/el:20030726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Mattys, Sven L., and Jamie H. Clark. "Lexical activity in speech processing: evidence from pause detection." Journal of Memory and Language 47, no. 3 (2002): 343–59. http://dx.doi.org/10.1016/s0749-596x(02)00037-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Mondal, Sujoy, and Abhirup Das Barman. "Speech activity detection using time-frequency auditory spectral pattern." Applied Acoustics 167 (October 2020): 107403. http://dx.doi.org/10.1016/j.apacoust.2020.107403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Siddiqui, Maria Andleeb, Najmi Ghani Haider, Waseemullah Nazir, and Syed Muhammad Nabeel Mustafa. "Hybrid model for speech emotion recognition of normal and autistic children (SERNAC)." Mehran University Research Journal of Engineering and Technology 43, no. 2 (2024): 20. http://dx.doi.org/10.22581/muet1982.2779.

Full text
Abstract:
Since the last decade, autism spectrum disorder (ASD) has been used as a general term to describe a wide range of conditions, including autistic syndrome, Asperger's disorder, and pervasive developmental disability. This problem emerges as a decreased ability to share emotions and a greater difficulty understanding others' feelings, leading to increased social communication difficulties. To assist patients with ASD, we proposed a concept that incorporates speech emotion detection technologies, which are widely used in the field of human-computer interaction (particularly youngsters). An algori
APA, Harvard, Vancouver, ISO, and other styles
39

Nadira Mohammad Yosi, Aqila Nur, Khairul Azami Sidek, Hamwira Sakti Yaacob, Marini Othman, and Ahmad Zamani Jusoh. "Emotion recognition using electroencephalogram signal." Indonesian Journal of Electrical Engineering and Computer Science 15, no. 2 (2019): 786. http://dx.doi.org/10.11591/ijeecs.v15.i2.pp786-793.

Full text
Abstract:
&lt;p class="Abstract"&gt;Emotion play an essential role in human’s life and it is not consciously controlled. Some of the emotion can be easily expressed by facial expressions, speech, behavior and gesture but some are not. This study investigates the emotion recognition using electroencephalogram (EEG) signal. Undoubtedly, EEG signals can detect human brain activity accurately with high resolution data acquisition device as compared to other biological signals. Changes in the human brain’s electrical activity occur very quickly, thus a high resolution device is required to determine the emot
APA, Harvard, Vancouver, ISO, and other styles
40

Rekik, Ouahbi, and Mustapha Djeddou. "Homogeneity Test Based Voice Activity Detection." AL-Lisaniyyat 20, no. 1 (2014): 77–85. https://doi.org/10.61850/allj.v20i1.506.

Full text
Abstract:
In this paper a new approach for voice activity detection (VAD) is proposed. This technique is based on homogeneity test of two autoregressive (AR) processes; each one models a speech window and involves the measure of a defined distance. The homogeneity test is formulated as a hypothesis test with a threshold derived analytically according to a userdefined false-alarm probability. Results using Aurora database shows the effectiveness of the proposed technique compared to other methods and standards
APA, Harvard, Vancouver, ISO, and other styles
41

Ji, Chang Peng, Mo Gao, and Jie Yang. "Voice Activity Detection Based on Multiple Statistical Models." Advanced Materials Research 181-182 (January 2011): 765–69. http://dx.doi.org/10.4028/www.scientific.net/amr.181-182.765.

Full text
Abstract:
One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. For a class of VAD algorithms based on Gaussian model and Laplacian model, we incorporate complex Laplacian probability density function to our analysis of statistical properties. Since the statistical characteristics of the speech signal are differently affected by the no
APA, Harvard, Vancouver, ISO, and other styles
42

Zhan, Wen Lian, and Jing Fang Wang. "Voice Activity Detection Based on Nonlinear Processing Techniques." Applied Mechanics and Materials 198-199 (September 2012): 1560–66. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.1560.

Full text
Abstract:
Hilbert-Huang transform is developed in recent years dealing with nonlinear, non-stationary signal analysis of the complete local time-frequency method, recurrence plot method is a recursive nonlinear dynamic behavior of time series method of reconstruction. In this paper, Hilbert-Huang Transform empirical mode decomposition (EMD) and the recurrence plot (RP) method, a new voice activity detection algorithm. Firstly, through the speech and noise based on the empirical mode decomposition and multi-scale features of the different intrinsic mode function (IMF) on a time scale filtering and nonlin
APA, Harvard, Vancouver, ISO, and other styles
43

Zhang, Yan, Zhen-min Tang, Yan-ping Li, and Yang Luo. "A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement." Scientific World Journal 2014 (2014): 1–8. http://dx.doi.org/10.1155/2014/723643.

Full text
Abstract:
Accurate and effective voice activity detection (VAD) is a fundamental step for robust speech or speaker recognition. In this study, we proposed a hierarchical framework approach for VAD and speech enhancement. The modified Wiener filter (MWF) approach is utilized for noise reduction in the speech enhancement block. For the feature selection and voting block, several discriminating features were employed in a voting paradigm for the consideration of reliability and discriminative power. Effectiveness of the proposed approach is compared and evaluated to other VAD techniques by using two well-k
APA, Harvard, Vancouver, ISO, and other styles
44

Colmenarez, Antonio, and Andreas Kellner. "Speech activity detection using acoustic and facial characteristics in an automatic speech recognition system." Journal of the Acoustical Society of America 124, no. 6 (2008): 3373. http://dx.doi.org/10.1121/1.3047458.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Park, Yun-sik, and Sang-min Lee. "Speech enhancement through voice activity detection using speech absence probability based on Teager energy." Journal of Central South University 20, no. 2 (2013): 424–32. http://dx.doi.org/10.1007/s11771-013-1503-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Taubakabyl, N. M. "CONVOLUTIONAL NEURAL NETWORKS IN DETECTING SPEECH ACTIVITY IN A STREAM." Bulletin of Shakarim University. Technical Sciences 1, no. 4(16) (2024): 33–40. https://doi.org/10.53360/2788-7995-2024-4(16)-5.

Full text
Abstract:
The research presented in this article focuses on the development of a system for detecting speech activity in audio streams using convolutional neural networks (CNNs). Speech activity detection plays a crucial role in many modern applications, such as voice-activated assistants, real-time communication platforms, and automated transcription services. The study synthesizes findings from nine key studies, demonstrating the effectiveness of CNNs in handling complex audio data, isolating speech signals from noise, and improving overall detection accuracy.The research emphasizes the architectural
APA, Harvard, Vancouver, ISO, and other styles
47

Venkateswarlu, Dr S. China. "Speech Emotion Recognition using Machine Learning." INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 09, no. 05 (2025): 1–9. https://doi.org/10.55041/ijsrem48705.

Full text
Abstract:
Abstract -- Speech signals are being considered as most effective means of communication between human beings. Many researchers have found different methods or systems to identify emotions from speech signals. Here, the various features of speech are used to classify emotions. Features like pitch, tone, intensity are essential for classification. Large number of the datasets are available for speech emotion recognition. Firstly, the extraction of features from speech emotion is carried out and then another important part is classification of emotions based upon speech. Hence, different classif
APA, Harvard, Vancouver, ISO, and other styles
48

Yeh, Yu-Tseng, Chia-Chi Chang, and Jeih-Weih Hung. "Empirical Analysis of Learning Improvements in Personal Voice Activity Detection Frameworks." Electronics 14, no. 12 (2025): 2372. https://doi.org/10.3390/electronics14122372.

Full text
Abstract:
Personal Voice Activity Detection (PVAD) has emerged as a critical technology for enabling speaker-specific detection in multi-speaker environments, surpassing the limitations of conventional Voice Activity Detection (VAD) systems that merely distinguish speech from non-speech. PVAD systems are essential for applications such as personalized voice assistants and robust speech recognition, where accurately identifying a target speaker’s voice amidst background speech and noise is crucial for both user experience and computational efficiency. Despite significant progress, PVAD frameworks still f
APA, Harvard, Vancouver, ISO, and other styles
49

Puri, Tanvi, Mukesh Soni, Gaurav Dhiman, Osamah Ibrahim Khalaf, Malik alazzam, and Ihtiram Raza Khan. "Detection of Emotion of Speech for RAVDESS Audio Using Hybrid Convolution Neural Network." Journal of Healthcare Engineering 2022 (February 27, 2022): 1–9. http://dx.doi.org/10.1155/2022/8472947.

Full text
Abstract:
Every human being has emotion for every item related to them. For every customer, their emotion can help the customer representative to understand their requirement. So, speech emotion recognition plays an important role in the interaction between humans. Now, the intelligent system can help to improve the performance for which we design the convolution neural network (CNN) based network that can classify emotions in different categories like positive, negative, or more specific. In this paper, we use the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) audio records. The L
APA, Harvard, Vancouver, ISO, and other styles
50

Ting, Liu, and Luo Xinwei. "An improved voice activity detection method based on spectral features and neural network." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 263, no. 2 (2021): 4570–80. http://dx.doi.org/10.3397/in-2021-2747.

Full text
Abstract:
The recognition accuracy of speech signal and noise signal is greatly affected under low signal-to-noise ratio. The neural network with parameters obtained from the training set can achieve good results in the existing data, but is poor for the samples with different the environmental noises. This method firstly extracts the features based on the physical characteristics of the speech signal, which have good robustness. It takes the 3-second data as samples, judges whether there is speech component in the data under low signal-to-noise ratios, and gives a decision tag for the data. If a reason
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!