To see the other types of publications on this topic, follow the link: Speech Activity Detection (SAD).

Journal articles on the topic 'Speech Activity Detection (SAD)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speech Activity Detection (SAD).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Kaur, Sukhvinder, and J. S. Sohal. "Speech Activity Detection and its Evaluation in Speaker Diarization System." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 16, no. 1 (March 13, 2017): 7567–72. http://dx.doi.org/10.24297/ijct.v16i1.5893.

Full text
Abstract:
In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concluded that neural network based SADcomparatively better than Gaussian mixture model based SAD.
APA, Harvard, Vancouver, ISO, and other styles
2

Dutta, Satwik, Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell, and John H. Hansen. "Advancing speech activity detection for automatic speech assessment of pre-school children prompted speech using COMBO-SAD." Journal of the Acoustical Society of America 148, no. 4 (October 2020): 2469–67. http://dx.doi.org/10.1121/1.5146831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (December 1, 2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.
APA, Harvard, Vancouver, ISO, and other styles
4

Zhao, Hui, Yu Tai Wang, and Xing Hai Yang. "Emotion Detection System Based on Speech and Facial Signals." Advanced Materials Research 459 (January 2012): 483–87. http://dx.doi.org/10.4028/www.scientific.net/amr.459.483.

Full text
Abstract:
This paper introduces the present status of speech emotion detection. In order to improve the emotion recognition rate of single mode, the bimodal fusion method based on speech and facial expression is proposed. First, we establishes emotional database include speech and facial expression. For different emotions, calm, happy, surprise, anger, sad, we extract ten speech parameters and use the PCA method to detect the speech emotion. Then we analyze the bimodal emotion detection of fusing facial expression information. The experiment results show that the emotion recognition rate with bimodal fusion is about 6 percent points higher than the recognition rate with only speech prosodic features
APA, Harvard, Vancouver, ISO, and other styles
5

Gelly, Gregory, and Jean-Luc Gauvain. "Optimization of RNN-Based Speech Activity Detection." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 3 (March 2018): 646–56. http://dx.doi.org/10.1109/taslp.2017.2769220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Koh, Min‐sung, and Margaret Mortz. "Improved voice activity detection of noisy speech." Journal of the Acoustical Society of America 107, no. 5 (May 2000): 2907–8. http://dx.doi.org/10.1121/1.428823.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Quan, Changqin, Bin Zhang, Xiao Sun, and Fuji Ren. "A combined cepstral distance method for emotional speech recognition." International Journal of Advanced Robotic Systems 14, no. 4 (July 1, 2017): 172988141771983. http://dx.doi.org/10.1177/1729881417719836.

Full text
Abstract:
Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.
APA, Harvard, Vancouver, ISO, and other styles
8

Dash, Debadatta, Paul Ferrari, Satwik Dutta, and Jun Wang. "NeuroVAD: Real-Time Voice Activity Detection from Non-Invasive Neuromagnetic Signals." Sensors 20, no. 8 (April 16, 2020): 2248. http://dx.doi.org/10.3390/s20082248.

Full text
Abstract:
Neural speech decoding-driven brain-computer interface (BCI) or speech-BCI is a novel paradigm for exploring communication restoration for locked-in (fully paralyzed but aware) patients. Speech-BCIs aim to map a direct transformation from neural signals to text or speech, which has the potential for a higher communication rate than the current BCIs. Although recent progress has demonstrated the potential of speech-BCIs from either invasive or non-invasive neural signals, the majority of the systems developed so far still assume knowing the onset and offset of the speech utterances within the continuous neural recordings. This lack of real-time voice/speech activity detection (VAD) is a current obstacle for future applications of neural speech decoding wherein BCI users can have a continuous conversation with other speakers. To address this issue, in this study, we attempted to automatically detect the voice/speech activity directly from the neural signals recorded using magnetoencephalography (MEG). First, we classified the whole segments of pre-speech, speech, and post-speech in the neural signals using a support vector machine (SVM). Second, for continuous prediction, we used a long short-term memory-recurrent neural network (LSTM-RNN) to efficiently decode the voice activity at each time point via its sequential pattern-learning mechanism. Experimental results demonstrated the possibility of real-time VAD directly from the non-invasive neural signals with about 88% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
9

Mattys, Sven L., and Jamie H. Clark. "Lexical activity in speech processing: evidence from pause detection." Journal of Memory and Language 47, no. 3 (October 2002): 343–59. http://dx.doi.org/10.1016/s0749-596x(02)00037-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Potamitis, I., and E. Fishler. "Speech activity detection of moving speaker using microphone arrays." Electronics Letters 39, no. 16 (2003): 1223. http://dx.doi.org/10.1049/el:20030726.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Mondal, Sujoy, and Abhirup Das Barman. "Speech activity detection using time-frequency auditory spectral pattern." Applied Acoustics 167 (October 2020): 107403. http://dx.doi.org/10.1016/j.apacoust.2020.107403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Hu, Da Li, Liang Zhong Yi, Zheng Pei, and Bing Luo. "Voice Activity Detection with Decision Trees in Noisy Environments." Applied Mechanics and Materials 128-129 (October 2011): 749–52. http://dx.doi.org/10.4028/www.scientific.net/amm.128-129.749.

Full text
Abstract:
An improved project based on double thresholds method in noisy environments is proposed for robust endpoints detection. Firstly, in this method, the distribution of zero crossing rate (ZCR) on the preprocessed signal is taken into account, and then the speech signal is divided into different parts to obtain appropriate thresholds with decision trees on the basis of the ZCR distribution. Finally, the double thresholds method, focusing on different importance of the energy and ZCR, is taken in the corresponding situation to determine the input segment is speech or non-speech. Simulation results indicate that the proposed method with decision trees obtains more accurate data than the traditional double thresholds method.
APA, Harvard, Vancouver, ISO, and other styles
13

Colmenarez, Antonio, and Andreas Kellner. "Speech activity detection using acoustic and facial characteristics in an automatic speech recognition system." Journal of the Acoustical Society of America 124, no. 6 (2008): 3373. http://dx.doi.org/10.1121/1.3047458.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Park, Yun-sik, and Sang-min Lee. "Speech enhancement through voice activity detection using speech absence probability based on Teager energy." Journal of Central South University 20, no. 2 (February 2013): 424–32. http://dx.doi.org/10.1007/s11771-013-1503-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Centanni, T. M., A. M. Sloan, A. C. Reed, C. T. Engineer, R. L. Rennaker, and M. P. Kilgard. "Detection and identification of speech sounds using cortical activity patterns." Neuroscience 258 (January 2014): 292–306. http://dx.doi.org/10.1016/j.neuroscience.2013.11.030.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Bellur, Ashwin, and Mounya Elhilali. "Feedback-Driven Sensory Mapping Adaptation for Robust Speech Activity Detection." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, no. 3 (March 2017): 481–92. http://dx.doi.org/10.1109/taslp.2016.2639322.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Ramı́rez, Javier, José C. Segura, Carmen Benı́tez, Ángel de la Torre, and Antonio Rubio. "Efficient voice activity detection algorithms using long-term speech information." Speech Communication 42, no. 3-4 (April 2004): 271–87. http://dx.doi.org/10.1016/j.specom.2003.10.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Benyassine, Adil. "Usage of voice activity detection for efficient coding of speech." Journal of the Acoustical Society of America 104, no. 1 (July 1998): 29. http://dx.doi.org/10.1121/1.424041.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Jyothish Lal, G., E. A. Gopalakrishnan, and D. Govind. "Glottal Activity Detection from the Speech Signal Using Multifractal Analysis." Circuits, Systems, and Signal Processing 39, no. 4 (September 13, 2019): 2118–50. http://dx.doi.org/10.1007/s00034-019-01253-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Ji, Chang Peng, Mo Gao, and Jie Yang. "Voice Activity Detection Based on Multiple Statistical Models." Advanced Materials Research 181-182 (January 2011): 765–69. http://dx.doi.org/10.4028/www.scientific.net/amr.181-182.765.

Full text
Abstract:
One of the key issues in practical speech processing is to achieve robust voice activity detection (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ the Gaussian assumption in the discrete Fourier transform (DFT) domain, which, however, deviates from the real observation. For a class of VAD algorithms based on Gaussian model and Laplacian model, we incorporate complex Laplacian probability density function to our analysis of statistical properties. Since the statistical characteristics of the speech signal are differently affected by the noise types and levels, to cope with the time-varying environments, our approach is aimed at finding adaptively an appropriate statistical model in an online fashion. The performance of the proposed VAD approaches in stationary noise environment is evaluated with the aid of an objective measure.
APA, Harvard, Vancouver, ISO, and other styles
21

PARK, Yun-Sik, and Sangmin LEE. "Voice Activity Detection Using Global Speech Absence Probability Based on Teager Energy for Speech Enhancement." IEICE Transactions on Information and Systems E95.D, no. 10 (2012): 2568–71. http://dx.doi.org/10.1587/transinf.e95.d.2568.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Veisi, H., and H. Sameti. "Hidden-Markov-model-based voice activity detector with high speech detection rate for speech enhancement." IET Signal Processing 6, no. 1 (2012): 54. http://dx.doi.org/10.1049/iet-spr.2010.0282.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Zhan, Wen Lian, and Jing Fang Wang. "Voice Activity Detection Based on Nonlinear Processing Techniques." Applied Mechanics and Materials 198-199 (September 2012): 1560–66. http://dx.doi.org/10.4028/www.scientific.net/amm.198-199.1560.

Full text
Abstract:
Hilbert-Huang transform is developed in recent years dealing with nonlinear, non-stationary signal analysis of the complete local time-frequency method, recurrence plot method is a recursive nonlinear dynamic behavior of time series method of reconstruction. In this paper, Hilbert-Huang Transform empirical mode decomposition (EMD) and the recurrence plot (RP) method, a new voice activity detection algorithm. Firstly, through the speech and noise based on the empirical mode decomposition and multi-scale features of the different intrinsic mode function (IMF) on a time scale filtering and nonlinear dynamic behavior of the recurrence plot method, quantitative Recursive analysis of statistical uncertainty for endpoint detection. Simulation results show that the method has a strong non-steady-state dynamic analysis capabilities, in low SNR environment more accurately than the traditional method to extract the start and end point of the speech signal, robustness.
APA, Harvard, Vancouver, ISO, and other styles
24

Alexander, N., D. Nakahara, N. Ozaki, N. Kaneda, T. Sasaoka, N. Iwata, and T. Nagatsu. "Striatal dopamine release and metabolism in sinoaortic-denervated rats by in vivo microdialysis." American Journal of Physiology-Regulatory, Integrative and Comparative Physiology 254, no. 2 (February 1, 1988): R396—R399. http://dx.doi.org/10.1152/ajpregu.1988.254.2.r396.

Full text
Abstract:
The purpose of this study was to provide new evidence favoring the hypothesis that cardiovascular information from arterial baroreceptors is integrated with the nigrostriatal system that contributes to regulation of motor activity. Samples of extracellular striatal dopamine (DA) and its metabolites, dihydroxyphenylacetic acid (DOPAC) and homovanillic acid (HVA), were collected by the technique of in vivo microdialysis and analyzed by high-performance liquid chromatography-electron capture detection. Rats were prepared with a guide tube placed in the caudate-putamen for subsequent insertion of microdialysis probes. During the 1st wk after sinoaortic denervation (SAD) or sham operation (SO), a microdialysis probe was inserted and perfused with Ringer solution at the rate of 2 microliter/min in the freely moving rats. Samples were collected every 20 min before and after injection of pargyline, 100 mg/kg ip. The results showed that SAD rats have approximately 50% less extracellular striatal DA, DOPAC, and HVA than SO rats (P less than 0.01). After blockade of monoamine oxidase activity with pargyline, striatal DA accumulated three times faster in SO than SAD rats suggesting DA synthesis is reduced in SAD rats. These data provide further evidence that the arterial baroreceptor system affects dopaminergic metabolism in the nigrostriatal system possibly as a means for integration of cardiovascular and motor activity.
APA, Harvard, Vancouver, ISO, and other styles
25

Zhang, Yan, Zhen-min Tang, Yan-ping Li, and Yang Luo. "A Hierarchical Framework Approach for Voice Activity Detection and Speech Enhancement." Scientific World Journal 2014 (2014): 1–8. http://dx.doi.org/10.1155/2014/723643.

Full text
Abstract:
Accurate and effective voice activity detection (VAD) is a fundamental step for robust speech or speaker recognition. In this study, we proposed a hierarchical framework approach for VAD and speech enhancement. The modified Wiener filter (MWF) approach is utilized for noise reduction in the speech enhancement block. For the feature selection and voting block, several discriminating features were employed in a voting paradigm for the consideration of reliability and discriminative power. Effectiveness of the proposed approach is compared and evaluated to other VAD techniques by using two well-known databases, namely, TIMIT database and NOISEX-92 database. Experimental results show that the proposed method performs well under a variety of noisy conditions.
APA, Harvard, Vancouver, ISO, and other styles
26

Li, Qiang, Hong En Xie, and Qiu Ju Zheng. "The Voice Activity Detection Algorithm Based on Spectral Entropy and High-Order Statistics." Applied Mechanics and Materials 624 (August 2014): 495–99. http://dx.doi.org/10.4028/www.scientific.net/amm.624.495.

Full text
Abstract:
The voice activity detection is one of the key technologies of variable rate speech coding. The development of speech coding technology requires higher performance of the detection. Based on the analysis of spectral entropy and high-order statistics of the basic definition and property of the foundation, this article proposes a voice activity detection algorithm which combines spectral entropy with high-order statistics. The algorithm can effectively detect the speech and non-speech segments, and can get reasonable results in a complex background noise environment.
APA, Harvard, Vancouver, ISO, and other styles
27

Hoffman, M. W., Z. Li, and D. Khataniar. "GSC-based spatial voice activity detection for enhanced speech coding in the presence of competing speech." IEEE Transactions on Speech and Audio Processing 9, no. 2 (2001): 175–78. http://dx.doi.org/10.1109/89.902284.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Nadira Mohammad Yosi, Aqila Nur, Khairul Azami Sidek, Hamwira Sakti Yaacob, Marini Othman, and Ahmad Zamani Jusoh. "Emotion recognition using electroencephalogram signal." Indonesian Journal of Electrical Engineering and Computer Science 15, no. 2 (August 1, 2019): 786. http://dx.doi.org/10.11591/ijeecs.v15.i2.pp786-793.

Full text
Abstract:
<p class="Abstract">Emotion play an essential role in human’s life and it is not consciously controlled. Some of the emotion can be easily expressed by facial expressions, speech, behavior and gesture but some are not. This study investigates the emotion recognition using electroencephalogram (EEG) signal. Undoubtedly, EEG signals can detect human brain activity accurately with high resolution data acquisition device as compared to other biological signals. Changes in the human brain’s electrical activity occur very quickly, thus a high resolution device is required to determine the emotion precisely. In this study, we will prove the strength and reliability of EEG signals as an emotion recognition mechanism for four different emotions which are happy, sad, fear and calm. Data of six different subjects were collected by using BrainMarker EXG device which consist of 19 channels. The pre-processing stage was performed using second order of low pass Butterworth filter to remove the unwanted signals. Then, two ranges of frequency bands were extracted from the signals which are alpha and beta. Finally, these samples will be classified using MLP Neural Network. Classification accuracy up to 91% is achieved and the average percentage of accuracy for calm, fear, happy and sad are 83.5%, 87.3%, 85.83% and 87.6% respectively. Thus, a proof of concept, this study has been capable of proposing a system of recognizing four states of emotion which are happy, sad, fear and calm by using EEG signal.</p>
APA, Harvard, Vancouver, ISO, and other styles
29

Tachioka, Yuuki, Toshiyuki Hanazawa, Tomohiro Narita, and Jun Ishii. "Voice Activity Detection Using Density Ratio Estimation of Speech and Noise." IEEJ Transactions on Electronics, Information and Systems 133, no. 8 (2013): 1549–55. http://dx.doi.org/10.1541/ieejeiss.133.1549.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Kaushik, Lakshmish, Abhijeet Sangwan, and John H. L. Hansen. "Speech Activity Detection in Naturalistic Audio Environments: Fearless Steps Apollo Corpus." IEEE Signal Processing Letters 25, no. 9 (September 2018): 1290–94. http://dx.doi.org/10.1109/lsp.2018.2841653.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Sadjadi, Seyed Omid, and John H. L. Hansen. "Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux." IEEE Signal Processing Letters 20, no. 3 (March 2013): 197–200. http://dx.doi.org/10.1109/lsp.2013.2237903.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Song, Taeyup, Kyungsun Lee, Sung Soo Kim, Jae-Won Lee, and Hanseok Ko. "Visual Voice Activity Detection and Adaptive Threshold Estimation for Speech Recognition." Journal of the Acoustical Society of Korea 34, no. 4 (July 31, 2015): 321–27. http://dx.doi.org/10.7776/ask.2015.34.4.321.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Carlin, Michael A., and Mounya Elhilali. "A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, no. 12 (December 2015): 2422–33. http://dx.doi.org/10.1109/taslp.2015.2481179.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Liu, Qingju, Andrew J. Aubrey, and Wenwu Wang. "Interference Reduction in Reverberant Speech Separation With Visual Voice Activity Detection." IEEE Transactions on Multimedia 16, no. 6 (October 2014): 1610–23. http://dx.doi.org/10.1109/tmm.2014.2322824.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Pannala, Vishala, and B. Yegnanarayana. "A neural network approach for speech activity detection for Apollo corpus." Computer Speech & Language 65 (January 2021): 101137. http://dx.doi.org/10.1016/j.csl.2020.101137.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Ting, Liu, and Luo Xinwei. "An improved voice activity detection method based on spectral features and neural network." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 263, no. 2 (August 1, 2021): 4570–80. http://dx.doi.org/10.3397/in-2021-2747.

Full text
Abstract:
The recognition accuracy of speech signal and noise signal is greatly affected under low signal-to-noise ratio. The neural network with parameters obtained from the training set can achieve good results in the existing data, but is poor for the samples with different the environmental noises. This method firstly extracts the features based on the physical characteristics of the speech signal, which have good robustness. It takes the 3-second data as samples, judges whether there is speech component in the data under low signal-to-noise ratios, and gives a decision tag for the data. If a reasonable trajectory which is like the trajectory of speech is found, it is judged that there is a speech segment in the 3-second data. Then, the dynamic double threshold processing is used for preliminary detection, and then the global double threshold value is obtained by K-means clustering. Finally, the detection results are obtained by sequential decision. This method has the advantages of low complexity, strong robustness, and adaptability to multi-national languages. The experimental results show that the performance of the method is better than that of traditional methods under various signal-to-noise ratios, and it has good adaptability to multi language.
APA, Harvard, Vancouver, ISO, and other styles
37

Rudramurthy, M. S., V. Kamakshi Prasad, and R. Kumaraswamy. "Voice Activity Detection Algorithm Using Zero Frequency Filter Assisted Peaking Resonator and Empirical Mode Decomposition." Journal of Intelligent Systems 22, no. 3 (September 1, 2013): 269–82. http://dx.doi.org/10.1515/jisys-2013-0036.

Full text
Abstract:
AbstractIn this article, a new adaptive data-driven strategy for voice activity detection (VAD) using empirical mode decomposition (EMD) is proposed. Speech data are decomposed using an a posteriori, adaptive, data-driven EMD in the time domain to yield a set of physically meaningful intrinsic mode functions (IMFs). Each IMF preserves the nonlinear and nonstationary property of the speech utterance. Among a set of IMFs, the IMF that contains source information dominantly called characteristic IMF (CIMF) can be identified and extracted by designing a zero-frequency filter-assisted peaking resonator. The detected CIMF is used to compute energy using short-term processing. Choosing proper threshold, voiced regions in speech utterances are detected using frame energy. The proposed framework has been studied on both clean speech utterance and noisy speech utterance (0-dB white noise). The proposed method is used for voice activity detection (VAD) in the presence of white noise and shows encouraging result in the presence of white noise up to 0 dB.
APA, Harvard, Vancouver, ISO, and other styles
38

Sztahó, Dávid, and Klára Vicsi. "Speech activity detection and automatic prosodic processing unit segmentation for emotion recognition." Intelligent Decision Technologies 8, no. 4 (June 27, 2014): 315–24. http://dx.doi.org/10.3233/idt-140199.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Bykov, M. M., V. V. Kovtun, and O. O. Maksimov. "Speech activity detection for the automated speaker recognition system of critical use." Journal of Engineering Sciences 4, no. 1 (2017): H 14—H 20. http://dx.doi.org/10.21272/jes.2017.4(1).h3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Shafiee, Soheil, Farshad Almasganj, Bahram Vazirnezhad, and Ayyoob Jafari. "A two-stage speech activity detection system considering fractal aspects of prosody." Pattern Recognition Letters 31, no. 9 (July 2010): 936–48. http://dx.doi.org/10.1016/j.patrec.2009.12.014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Fernando, Tharindu, Sridha Sridharan, Mitchell McLaren, Darshana Priyasad, Simon Denman, and Clinton Fookes. "Temporarily-Aware Context Modeling Using Generative Adversarial Networks for Speech Activity Detection." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 1159–69. http://dx.doi.org/10.1109/taslp.2020.2982297.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Tao, Fei, and Carlos Busso. "End-to-end audiovisual speech activity detection with bimodal recurrent neural models." Speech Communication 113 (October 2019): 25–35. http://dx.doi.org/10.1016/j.specom.2019.07.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Syed, WaheeduddinQ, and Hsiao-Chun Wu. "Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise." EURASIP Journal on Audio, Speech, and Music Processing 2008, no. 1 (2008): 639839. http://dx.doi.org/10.1186/1687-4722-2008-639839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Li, Jie, and Datao You. "Enhanced Speech Based Jointly Statistical Probability Distribution Function for Voice Activity Detection." Chinese Journal of Electronics 26, no. 2 (March 1, 2017): 325–30. http://dx.doi.org/10.1049/cje.2017.01.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Syed, Waheeduddin Q., and Hsiao-Chun Wu. "Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise." EURASIP Journal on Audio, Speech, and Music Processing 2008 (2008): 1–8. http://dx.doi.org/10.1155/2008/639839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Sholokhov, Alexey, Md Sahidullah, and Tomi Kinnunen. "Semi-supervised speech activity detection with an application to automatic speaker verification." Computer Speech & Language 47 (January 2018): 132–56. http://dx.doi.org/10.1016/j.csl.2017.07.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Rudramurthy, M. S., Nilabh Kumar Pathak, V. Kamakshi Prasad, and R. Kumaraswamy. "Speaker Identification Using Empirical Mode Decomposition-Based Voice Activity Detection Algorithm under Realistic Conditions." Journal of Intelligent Systems 23, no. 4 (December 1, 2014): 405–21. http://dx.doi.org/10.1515/jisys-2013-0089.

Full text
Abstract:
AbstractSpeaker recognition (SR) under mismatched conditions is a challenging task. Speech signal is nonlinear and nonstationary, and therefore, difficult to analyze under realistic conditions. Also, in real conditions, the nature of the noise present in speech data is not known a priori. In such cases, the performance of speaker identification (SI) or speaker verification (SV) degrades considerably under realistic conditions. Any SR system uses a voice activity detector (VAD) as the front-end subsystem of the whole system. The performance of most VADs deteriorates at the front end of the SR task or system under degraded conditions or in realistic conditions where noise plays a major role. Recently, speech data analysis and processing using Norden E. Huang’s empirical mode decomposition (EMD) combined with Hilbert transform, commonly referred to as Hilbert–Huang transform (HHT), has become an emerging trend. EMD is an a posteriori, adaptive, data analysis tool used in time domain that is widely accepted by the research community. Recently, speech data analysis and speech data processing for speech recognition and SR tasks using EMD have been increasing. EMD-based VAD has become an important adaptive subsystem of the SR system that mostly mitigates the effect of mismatch between the training and the testing phase. Recently, we have developed a VAD algorithm using a zero-frequency filter-assisted peaking resonator (ZFFPR) and EMD. In this article, the efficacy of an EMD-based VAD algorithm is studied at the front end of a text-independent language-independent SI task for the speaker’s data collected in three languages at five different places, such as home, street, laboratory, college campus, and restaurant, under realistic conditions using EDIROL-R09 HR, a 24-bit wav/MP3 recorder. The performance of this proposed SI task is compared against the traditional energy-based VAD in terms of percentage identification rate. In both cases, widely accepted Mel frequency cepstral coefficients are computed by employing frame processing (20-ms frame size and 10-ms frame shift) from the extracted voiced speech regions using the respective VAD techniques from the realistic speech utterances, and are used as a feature vector for speaker modeling using popular Gaussian mixture models. The experimental results showed that the proposed SI task with the VAD algorithm using ZFFPR and EMD at its front end performs better than the SI task with short-term energy-based VAD when used at its front end, and is somewhat encouraging.
APA, Harvard, Vancouver, ISO, and other styles
48

Gacka, Ewa, and Monika Kaźmierczak. "Speech screening examinations as an example of activity in the field of speech‑language therapy." Logopaedica Lodziensia, no. 1 (December 30, 2017): 31–42. http://dx.doi.org/10.18778/2544-7238.01.04.

Full text
Abstract:
Logopaedics prevention consists in the prevention and early detection of speech disorders to provide adequate therapeutic action, limiting or eliminating the negative effects of language‑related abnormalities. As a part of speech‑language prophylaxis the speech‑language screening is provided. It aims at identifying persons requiring in‑depth speech diagnosis. The article presents the results of speech screening of pupils of grades I–III in one of primary schools in Łódź. Some practical conclusions drawn from the research are also presented.
APA, Harvard, Vancouver, ISO, and other styles
49

Pek, Kimhuoch, Takayuki Arai, and Noboru Kanedera. "Voice activity detection in noise using modulation spectrum of speech: Investigation of speech frequency and modulation frequency ranges." Acoustical Science and Technology 33, no. 1 (2012): 33–44. http://dx.doi.org/10.1250/ast.33.33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Zhou, Bin, Jing Liu, and Zheng Pei. "Noise-Robust Voice Activity Detector Based on Four States-Based HMM." Applied Mechanics and Materials 411-414 (September 2013): 743–48. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.743.

Full text
Abstract:
Voice activity detection (VAD) is more and more essential in the noisy environments to provide an accuracy performance in the speech recognition. In this paper, we provide a method based on left-right hidden Markov model (HMM) to identify the start and end of the speech. The method builds two models of non-speech and speech instead of existed two states, formally, each model could include several states, we also analysis other features, such as pitch index, pitch magnitude and fractal dimension of speech and non-speech.. We compare the VAD results with the proposed algorithm and two states HMM. Experiments show that the proposed method make a better performance than two states HMMs in VAD, especially in the low signal-to-noise ratio (SNR) environment.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography