Academic literature on the topic 'Speech Activity Detection (SAD)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speech Activity Detection (SAD).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speech Activity Detection (SAD)"

1

Kaur, Sukhvinder, and J. S. Sohal. "Speech Activity Detection and its Evaluation in Speaker Diarization System." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 16, no. 1 (March 13, 2017): 7567–72. http://dx.doi.org/10.24297/ijct.v16i1.5893.

Full text
Abstract:
In speaker diarization, the speech/voice activity detection is performed to separate speech, non-speech and silent frames. Zero crossing rate and root mean square value of frames of audio clips has been used to select training data for silent, speech and nonspeech models. The trained models are used by two classifiers, Gaussian mixture model (GMM) and Artificial neural network (ANN), to classify the speech and non-speech frames of audio clip. The results of ANN and GMM classifier are compared by Receiver operating characteristics (ROC) curve and Detection ErrorTradeoff (DET) graph. It is concluded that neural network based SADcomparatively better than Gaussian mixture model based SAD.
APA, Harvard, Vancouver, ISO, and other styles
2

Dutta, Satwik, Prasanna Kothalkar, Johanna Rudolph, Christine Dollaghan, Jennifer McGlothlin, Thomas Campbell, and John H. Hansen. "Advancing speech activity detection for automatic speech assessment of pre-school children prompted speech using COMBO-SAD." Journal of the Acoustical Society of America 148, no. 4 (October 2020): 2469–67. http://dx.doi.org/10.1121/1.5146831.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Mahalakshmi, P. "A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)." Asian Journal of Pharmaceutical and Clinical Research 9, no. 9 (December 1, 2016): 360. http://dx.doi.org/10.22159/ajpcr.2016.v9s3.14352.

Full text
Abstract:
ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection.
APA, Harvard, Vancouver, ISO, and other styles
4

Zhao, Hui, Yu Tai Wang, and Xing Hai Yang. "Emotion Detection System Based on Speech and Facial Signals." Advanced Materials Research 459 (January 2012): 483–87. http://dx.doi.org/10.4028/www.scientific.net/amr.459.483.

Full text
Abstract:
This paper introduces the present status of speech emotion detection. In order to improve the emotion recognition rate of single mode, the bimodal fusion method based on speech and facial expression is proposed. First, we establishes emotional database include speech and facial expression. For different emotions, calm, happy, surprise, anger, sad, we extract ten speech parameters and use the PCA method to detect the speech emotion. Then we analyze the bimodal emotion detection of fusing facial expression information. The experiment results show that the emotion recognition rate with bimodal fusion is about 6 percent points higher than the recognition rate with only speech prosodic features
APA, Harvard, Vancouver, ISO, and other styles
5

Gelly, Gregory, and Jean-Luc Gauvain. "Optimization of RNN-Based Speech Activity Detection." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 3 (March 2018): 646–56. http://dx.doi.org/10.1109/taslp.2017.2769220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Koh, Min‐sung, and Margaret Mortz. "Improved voice activity detection of noisy speech." Journal of the Acoustical Society of America 107, no. 5 (May 2000): 2907–8. http://dx.doi.org/10.1121/1.428823.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Quan, Changqin, Bin Zhang, Xiao Sun, and Fuji Ren. "A combined cepstral distance method for emotional speech recognition." International Journal of Advanced Robotic Systems 14, no. 4 (July 1, 2017): 172988141771983. http://dx.doi.org/10.1177/1729881417719836.

Full text
Abstract:
Affective computing is not only the direction of reform in artificial intelligence but also exemplification of the advanced intelligent machines. Emotion is the biggest difference between human and machine. If the machine behaves with emotion, then the machine will be accepted by more people. Voice is the most natural and can be easily understood and accepted manner in daily communication. The recognition of emotional voice is an important field of artificial intelligence. However, in recognition of emotions, there often exists the phenomenon that two emotions are particularly vulnerable to confusion. This article presents a combined cepstral distance method in two-group multi-class emotion classification for emotional speech recognition. Cepstral distance combined with speech energy is well used as speech signal endpoint detection in speech recognition. In this work, the use of cepstral distance aims to measure the similarity between frames in emotional signals and in neutral signals. These features are input for directed acyclic graph support vector machine classification. Finally, a two-group classification strategy is adopted to solve confusion in multi-emotion recognition. In the experiments, Chinese mandarin emotion database is used and a large training set (1134 + 378 utterances) ensures a powerful modelling capability for predicting emotion. The experimental results show that cepstral distance increases the recognition rate of emotion sad and can balance the recognition results with eliminating the over fitting. And for the German corpus Berlin emotional speech database, the recognition rate between sad and boring, which are very difficult to distinguish, is up to 95.45%.
APA, Harvard, Vancouver, ISO, and other styles
8

Dash, Debadatta, Paul Ferrari, Satwik Dutta, and Jun Wang. "NeuroVAD: Real-Time Voice Activity Detection from Non-Invasive Neuromagnetic Signals." Sensors 20, no. 8 (April 16, 2020): 2248. http://dx.doi.org/10.3390/s20082248.

Full text
Abstract:
Neural speech decoding-driven brain-computer interface (BCI) or speech-BCI is a novel paradigm for exploring communication restoration for locked-in (fully paralyzed but aware) patients. Speech-BCIs aim to map a direct transformation from neural signals to text or speech, which has the potential for a higher communication rate than the current BCIs. Although recent progress has demonstrated the potential of speech-BCIs from either invasive or non-invasive neural signals, the majority of the systems developed so far still assume knowing the onset and offset of the speech utterances within the continuous neural recordings. This lack of real-time voice/speech activity detection (VAD) is a current obstacle for future applications of neural speech decoding wherein BCI users can have a continuous conversation with other speakers. To address this issue, in this study, we attempted to automatically detect the voice/speech activity directly from the neural signals recorded using magnetoencephalography (MEG). First, we classified the whole segments of pre-speech, speech, and post-speech in the neural signals using a support vector machine (SVM). Second, for continuous prediction, we used a long short-term memory-recurrent neural network (LSTM-RNN) to efficiently decode the voice activity at each time point via its sequential pattern-learning mechanism. Experimental results demonstrated the possibility of real-time VAD directly from the non-invasive neural signals with about 88% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
9

Mattys, Sven L., and Jamie H. Clark. "Lexical activity in speech processing: evidence from pause detection." Journal of Memory and Language 47, no. 3 (October 2002): 343–59. http://dx.doi.org/10.1016/s0749-596x(02)00037-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Potamitis, I., and E. Fishler. "Speech activity detection of moving speaker using microphone arrays." Electronics Letters 39, no. 16 (2003): 1223. http://dx.doi.org/10.1049/el:20030726.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Speech Activity Detection (SAD)"

1

Näslund, Anton, and Charlie Jeansson. "Robust Speech Activity Detection and Direction of Arrival Using Convolutional Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297756.

Full text
Abstract:
Social robots are becoming more and more common in our everyday lives. In the field of conversational robotics, the development goes towards socially engaging robots with humanlike conversation. This project looked into one of the technical aspects when recognizing speech, videlicet speech activity detection (SAD). The presented solution uses a convolutional neural network (CNN) based system to detect speech in a forward azimuth area. The project used a dataset from FestVox, called CMU Artic and was complimented by adding recorded noises. A library called Pyroomacoustics were used to simulate a real world setup to create a robust system. A simplified version was built, this model only detected speech activity and a accuracy of 95%was reached. The finished model resulted in an accuracy of 93%.It was compared with similar project, a voice activity detection(VAD) algorithm WebRTC with beamforming, as no previous published solutions to our project was found. Our solution proved to be higher in accuracy in both cases, compared to the accuracy WebRTC achieved on our dataset.
Sociala robotar blir vanligare och vanligare i våra vardagliga liv. Inom konversationsrobotik går utvecklingen mot socialt engagerande robotar som kan ha mänskliga konversationer. Detta projekt tittar på en av de tekniska aspekterna vid taligenkänning, nämligen talaktivitets detektion. Den presenterade lösningen använder ett convolutional neuralt nätverks(CNN) baserat system för att detektera tal i ett framåtriktat azimut område. Projektet använde sig av ett dataset från FestVox, kallat CMU Artic och kompletterades genom att lägga till ett antal inspelade störningsljud. Ett bibliotek som heter Pyroomacoustics användes för att simulera en verklig miljö för att skapa ett robust system. En förenklad modell konstruerades som endast detekterade talaktivitet och en noggrannhet på 95% uppnåddes. Den färdiga maskinen resulterade i en noggrannhet på 93%. Det jämfördes med liknande projekt, en röstaktivitetsdetekterings (VAD) algoritm WebRTC med strålformning, eftersom inga tidigare publicerade lösningar för vårt projekt hittades. Det visade sig att våra lösningar hade högre noggrannhet än den WebRTC uppnådde på vårt dataset.
Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
APA, Harvard, Vancouver, ISO, and other styles
2

Wejdelind, Marcus, and Nils Wägmark. "Multi-speaker Speech Activity Detection From Video." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-297701.

Full text
Abstract:
A conversational robot will in many cases have todeal with multi-party spoken interaction in which one or morepeople could be speaking simultaneously. To do this, the robotmust be able to identify the speakers in order to attend to them.Our project has approached this problem from a visual pointof view where a Convolutional Neural Network (CNN) wasimplemented and trained using video stream input containingone or more faces from an already existing data set (AVA-Speech). The goal for the network has then been to for eachface, and in each point in time, detect the probability of thatperson speaking. Our best result using an added Optical Flowfunction was 0.753 while we reached 0.781 using another pre-processing method of the data. These numbers correspondedsurprisingly well with the existing scientific literature in thearea where 0.77 proved to be an appropriate benchmark level.
En social robot kommer i många fall tvingasatt hantera konversationer med flera interlokutörer och därolika personer pratar samtidigt. För att uppnå detta är detviktigt att roboten kan identifiera talaren för att i nästa ledkunna bistå eller interagera med denna. Detta projekt harundersökt problemet med en visuell utgångspunkt där ettFaltningsnätverk (CNN) implementerades och tränades medvideo-input från ett redan befintligt dataset (AVA-Speech).Målet för nätverket har varit att för varje ansikte, och i varjetidpunkt, detektera sannolikheten att den personen talar. Vårtbästa resultat vid användning av Optical Flow var 0,753 medanvi lyckades nå 0,781 med en annan typ av förprocessering avdatan. Dessa resultat motsvarade den existerande vetenskapligalitteraturen på området förvånansvärt bra där 0,77 har visatsig vara ett lämpligt jämförelsevärde.
Kandidatexjobb i elektroteknik 2020, KTH, Stockholm
APA, Harvard, Vancouver, ISO, and other styles
3

Murrin, Paul. "Objective measurement of voice activity detectors." Thesis, University of York, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.325647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Laverty, Stephen William. "Detection of Nonstationary Noise and Improved Voice Activity Detection in an Automotive Hands-free Environment." Link to electronic thesis, 2005. http://www.wpi.edu/Pubs/ETD/Available/etd-051105-110646/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Minotto, Vicente Peruffo. "Audiovisual voice activity detection and localization of simultaneous speech sources." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/77231.

Full text
Abstract:
Em vista da tentência de se criarem intefaces entre humanos e máquinas que cada vez mais permitam meios simples de interação, é natural que sejam realizadas pesquisas em técnicas que procuram simular o meio mais convencional de comunicação que os humanos usam: a fala. No sistema auditivo humano, a voz é automaticamente processada pelo cérebro de modo efetivo e fácil, também comumente auxiliada por informações visuais, como movimentação labial e localizacão dos locutores. Este processamento realizado pelo cérebro inclui dois componentes importantes que a comunicação baseada em fala requere: Detecção de Atividade de Voz (Voice Activity Detection - VAD) e Localização de Fontes Sonoras (Sound Source Localization - SSL). Consequentemente, VAD e SSL também servem como ferramentas mandatórias de pré-processamento em aplicações de Interfaces Humano-Computador (Human Computer Interface - HCI), como no caso de reconhecimento automático de voz e identificação de locutor. Entretanto, VAD e SSL ainda são problemas desafiadores quando se lidando com cenários acústicos realísticos, particularmente na presença de ruído, reverberação e locutores simultâneos. Neste trabalho, são propostas abordagens para tratar tais problemas, para os casos de uma e múltiplas fontes sonoras, através do uso de informações audiovisuais, explorando-se variadas maneiras de se fundir as modalidades de áudio e vídeo. Este trabalho também emprega um arranjo de microfones para o processamento de som, o qual permite que as informações espaciais dos sinais acústicos sejam exploradas através do algoritmo estado-da-arte SRP (Steered Response Power). Por consequência adicional, uma eficiente implementação em GPU do SRP foi desenvolvida, possibilitando processamento em tempo real do algoritmo. Os experimentos realizados mostram uma acurácia média de 95% ao se efetuar VAD de até três locutores simultâneos, e um erro médio de 10cm ao se localizar tais locutores.
Given the tendency of creating interfaces between human and machines that increasingly allow simple ways of interaction, it is only natural that research effort is put into techniques that seek to simulate the most conventional mean of communication humans use: the speech. In the human auditory system, voice is automatically processed by the brain in an effortless and effective way, also commonly aided by visual cues, such as mouth movement and location of the speakers. This processing done by the brain includes two important components that speech-based communication require: Voice Activity Detection (VAD) and Sound Source Localization (SSL). Consequently, VAD and SSL also serve as mandatory preprocessing tools for high-end Human Computer Interface (HCI) applications in a computing environment, as the case of automatic speech recognition and speaker identification. However, VAD and SSL are still challenging problems when dealing with realistic acoustic scenarios, particularly in the presence of noise, reverberation and multiple simultaneous speakers. In this work we propose some approaches for tackling these problems using audiovisual information, both for the single source and the competing sources scenario, exploiting distinct ways of fusing the audio and video modalities. Our work also employs a microphone array for the audio processing, which allows the spatial information of the acoustic signals to be explored through the stateof- the art method Steered Response Power (SRP). As an additional consequence, a very fast GPU version of the SRP is developed, so that real-time processing is achieved. Our experiments show an average accuracy of 95% when performing VAD of up to three simultaneous speakers and an average error of 10cm when locating such speakers.
APA, Harvard, Vancouver, ISO, and other styles
6

Ent, Petr. "Voice Activity Detection." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2009. http://www.nusl.cz/ntk/nusl-235483.

Full text
Abstract:
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.
APA, Harvard, Vancouver, ISO, and other styles
7

Cho, Yong Duk. "Speech detection, enhancement and compression for voice communications." Thesis, University of Surrey, 2001. http://epubs.surrey.ac.uk/842991/.

Full text
Abstract:
Speech signal processing for voice communications can be characterised in terms of silence compression, noise reduction, and speech compression. The limit in the channel bandwidth of voice communication systems requires efficient compression of speech and silence signals while retaining the voice quality. Silence compression by means of both voice activity detection (VAD) and comfort noise generation could present transparent speech-quality while substantially lowering the transmission bit-rate, since pause regions between talk spurts do not include any voice information. Thus, this thesis proposes smoothed likelihood ratio-based VAD, designed on the basis of a behavioural analysis and improvement of a statistical model-based voice activity detector. Input speech could exhibit noisy signals, which could make the voice communication fatiguing and less intelligible. This task can be alleviated by noise reduction as a preprocessor for speech coding. Noise characteristics in speech enhancement are adapted typically during the pause regions classified by a voice activity detector. However, VAD errors could lead to over- or under- estimation of the noise statistics. Thus, this thesis proposes mixed decision-based noise adaptation based on a integration of soft and hard decision-based methods, defined with the speech presence uncertainty and VAD result, respectively. At low bit-rate speech coding, the sinusoidal model has been widely applied because of its good nature exploiting the phase redundancy of speech signals. Its performance, however, can be severely smeared by mis-estimation of the pitch. Thus, this thesis proposes a robust pitch estimation technique based on the autocorrelation of spectral amplitudes. Another important parameter in sinusoidal speech coding is the spectral magnitude of the LP-residual signal. It is, however, not easy to directly quantise the magnitudes because the dimensions of the spectral vectors are variable from frame to frame depending on the pitch. To alleviate this problem, this thesis proposes mel-scale-based dimension conversion, which converts the spectral vectors to a fixed dimension based on mel-scale warping. A predictive coding scheme is also employed in order to exploit the inter-frame redundancy between the spectral vectors. Experimental results show that each proposed technique is suitable for enhancing speech quality for voice communications. Furthermore, an improved speech coder incorporating the proposed techniques is developed. The vocoder gives speech quality comparable to TIA/EIA IS-127 for noisy speech whilst operating at lower than half the bit-rate of the reference coder. Key words: voice activity detection, speech enhancement, pitch, spectral magnitude quantisation, low bit-rate coding.
APA, Harvard, Vancouver, ISO, and other styles
8

Doukas, Nikolaos. "Voice activity detection using energy based measures and source separation." Thesis, Imperial College London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245220.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sinclair, Mark. "Speech segmentation and speaker diarisation for transcription and translation." Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/20970.

Full text
Abstract:
This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and punctuation restoration. While segmentation and diarization may appear to be relatively simple tasks to describe, in practise we find that they are very challenging and are, in general, ill-defined problems. Therefore, we first provide a formalisation of each of the problems as the sub-division of speech within acoustic space and time. Here, we see that the task can become very difficult when we want to partition this domain into our target classes of speakers, whilst avoiding other classes that reside in the same space, such as phonemes. We present a theoretical framework for describing and discussing the tasks as well as introducing existing state-of-the-art methods and research. Current Speaker Diarization systems are notoriously sensitive to hyper-parameters and lack robustness across datasets. Therefore, we present a method which uses a series of oracle experiments to expose the limitations of current systems and to which system components these limitations can be attributed. We also demonstrate how Diarization Error Rate (DER), the dominant error metric in the literature, is not a comprehensive or reliable indicator of overall performance or of error propagation to subsequent downstream tasks. These results inform our subsequent research. We find that, as a precursor to Speaker Diarization, the task of Speech Segmentation is a crucial first step in the system chain. Current methods typically do not account for the inherent structure of spoken discourse. As such, we explored a novel method which exploits an utterance-duration prior in order to better model the segment distribution of speech. We show how this method improves not only segmentation, but also the performance of subsequent speech recognition, machine translation and speaker diarization systems. Typical ASR transcriptions do not include punctuation and the task of enriching transcriptions with this information is known as ‘punctuation restoration’. The benefit is not only improved readability but also better compatibility with NLP systems that expect sentence-like units such as in conventional machine translation. We show how segmentation and diarization are related tasks that are able to contribute acoustic information that complements existing linguistically-based punctuation approaches. There is a growing demand for speech technology applications in the broadcast media domain. This domain presents many new challenges including diverse noise and recording conditions. We show that the capacity of existing GMM-HMM based speech segmentation systems is limited for such scenarios and present a Deep Neural Network (DNN) based method which offers a more robust speech segmentation method resulting in improved speech recognition performance for a television broadcast dataset. Ultimately, we are able to show that the speech segmentation is an inherently ill-defined problem for which the solution is highly dependent on the downstream task that it is intended for.
APA, Harvard, Vancouver, ISO, and other styles
10

Thorell, Hampus. "Voice Activity Detection in the Tiger Platform." Thesis, Linköping University, Department of Electrical Engineering, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-6586.

Full text
Abstract:

Sectra Communications AB has developed a terminal for encrypted communication called the Tiger platform. During voice communication delays have sometimes been experienced resulting in conversational complications.

A solution to this problem, as was proposed by Sectra, would be to introduce voice activity detection, which means a separation of speech parts and non-speech parts of the input signal, to the Tiger platform. By only transferring the speech parts to the receiver, the bandwidth needed should be dramatically decreased. A lower bandwidth needed implies that the delays slowly should disappear. The problem is then to come up with a method that manages to distinguish the speech parts from the input signal. Fortunately a lot of theory on the subject has been done and numerous voice activity methods exist today.

In this thesis the theory of voice activity detection has been studied. A review of voice activity detectors that exist on the market today followed by an evaluation of some of these was performed in order to select a suitable candidate for the Tiger platform. This evaluation would later become the foundation for the selection of a voice activity detector for implementation.

Finally, the implementation of the chosen voice activity detector, including a comfort noise generator, was done on the platform. This implementation was based on the special requirements of the platform. Tests of the implementation in office environments show that possible delays are steadily being reduced during periods of speech inactivity, while the active speech quality is preserved.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Speech Activity Detection (SAD)"

1

Alam, Tanvirul, and Akib Khan. "Lightweight CNN for Robust Voice Activity Detection." In Speech and Computer, 1–12. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60276-5_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Solé-Casals, Jordi, Pere Martí-Puig, Ramon Reig-Bolaño, and Vladimir Zaiats. "Score Function for Voice Activity Detection." In Advances in Nonlinear Speech Processing, 76–83. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-11509-7_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Pertilä, Pasi, Alessio Brutti, Piergiorgio Svaizer, and Maurizio Omologo. "Multichannel Source Activity Detection, Localization, and Tracking." In Audio Source Separation and Speech Enhancement, 47–64. Chichester, UK: John Wiley & Sons Ltd, 2018. http://dx.doi.org/10.1002/9781119279860.ch4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Málek, Jiří, and Jindřich Žďánský. "Voice-Activity and Overlapped Speech Detection Using x-Vectors." In Text, Speech, and Dialogue, 366–76. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58323-1_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Huang, Zhongqiang, and Mary P. Harper. "Speech Activity Detection on Multichannels of Meeting Recordings." In Machine Learning for Multimodal Interaction, 415–27. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11677482_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zelinka, Jan. "Deep Learning and Online Speech Activity Detection for Czech Radio Broadcasting." In Text, Speech, and Dialogue, 428–35. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00794-2_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chu, Stephen M., Etienne Marcheret, and Gerasimos Potamianos. "Automatic Speech Recognition and Speech Activity Detection in the CHIL Smart Room." In Machine Learning for Multimodal Interaction, 332–43. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11677482_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Górriz, J. M., C. G. Puntonet, J. Ramírez, and J. C. Segura. "Bispectrum Estimators for Voice Activity Detection and Speech Recognition." In Nonlinear Analyses and Algorithms for Speech Processing, 174–85. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11613107_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Macho, Dušan, Climent Nadeu, and Andrey Temko. "Robust Speech Activity Detection in Interactive Smart-Room Environments." In Machine Learning for Multimodal Interaction, 236–47. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11965152_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Honarmandi Shandiz, Amin, and László Tóth. "Voice Activity Detection for Ultrasound-Based Silent Speech Interfaces Using Convolutional Neural Networks." In Text, Speech, and Dialogue, 499–510. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-83527-9_43.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speech Activity Detection (SAD)"

1

Abdulla, Waleed H., Zhou Guan, and Hou Chi Sou. "Noise robust speech activity detection." In 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). IEEE, 2009. http://dx.doi.org/10.1109/isspit.2009.5407509.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Matic, A., V. Osmani, and O. Mayora. "Speech activity detection using accelerometer." In 2012 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2012. http://dx.doi.org/10.1109/embc.2012.6346377.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tsai, TJ, and Nelson Morgan. "Speech activity detection: An economics approach." In ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2013. http://dx.doi.org/10.1109/icassp.2013.6638987.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Khoury, Elie, and Matt Garland. "I-Vectors for speech activity detection." In Odyssey 2016. ISCA, 2016. http://dx.doi.org/10.21437/odyssey.2016-48.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

K, Punnoose A. "New Features for Speech Activity Detection." In SMM19, Workshop on Speech, Music and Mind 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/smm.2019-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Laskowski, Kornel, Qin Jin, and Tanja Schultz. "Crosscorrelation-based multispeaker speech activity detection." In Interspeech 2004. ISCA: ISCA, 2004. http://dx.doi.org/10.21437/interspeech.2004-350.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Sarfjoo, Seyyed Saeed, Srikanth Madikeri, and Petr Motlicek. "Speech Activity Detection Based on Multilingual Speech Recognition System." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-1058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Harsha, B. V. "A noise robust speech activity detection algorithm." In Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004. IEEE, 2004. http://dx.doi.org/10.1109/isimp.2004.1434065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Shahsavari, Sajad, Hossein Sameti, and Hossein Hadian. "Speech activity detection using deep neural networks." In 2017 Iranian Conference on Electrical Engineering (ICEE). IEEE, 2017. http://dx.doi.org/10.1109/iraniancee.2017.7985293.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Heese, Florian, Markus Niermann, and Peter Vary. "Speech-codebook based soft Voice Activity Detection." In ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. http://dx.doi.org/10.1109/icassp.2015.7178789.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography