Academic literature on the topic 'Automatic speech recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Automatic speech recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Automatic speech recognition"

1

Fried, Louis. "AUTOMATIC SPEECH RECOGNITION." Information Systems Management 13, no. 1 (January 1996): 29–37. http://dx.doi.org/10.1080/10580539608906969.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Chigier, Benjamin. "Automatic speech recognition." Journal of the Acoustical Society of America 103, no. 1 (January 1998): 19. http://dx.doi.org/10.1121/1.423151.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hovell, Simon Alexander. "Automatic speech recognition." Journal of the Acoustical Society of America 107, no. 5 (2000): 2325. http://dx.doi.org/10.1121/1.428610.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Espy‐Wilson, Carol. "Automatic speech recognition." Journal of the Acoustical Society of America 117, no. 4 (April 2005): 2403. http://dx.doi.org/10.1121/1.4786105.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Merrill, John W. "Automatic speech recognition." Journal of the Acoustical Society of America 121, no. 1 (2007): 29. http://dx.doi.org/10.1121/1.2434334.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Rao, P. V. S., and K. K. Paliwal. "Automatic speech recognition." Sadhana 9, no. 2 (September 1986): 85–120. http://dx.doi.org/10.1007/bf02747521.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

SAYEM, Asm. "Speech Analysis for Alphabets in Bangla Language: Automatic Speech Recognition." International Journal of Engineering Research 3, no. 2 (February 1, 2014): 88–93. http://dx.doi.org/10.17950/ijer/v3s2/211.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Carlson, Gloria Stevens, and Jared Bernstein. "Automatic speech recognition of impaired speech." International Journal of Rehabilitation Research 11, no. 4 (December 1988): 396–97. http://dx.doi.org/10.1097/00004356-198812000-00013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

SAGISAKA, Yoshinori. "AUTOMATIC SPEECH RECOGNITION MODELS." Kodo Keiryogaku (The Japanese Journal of Behaviormetrics) 22, no. 1 (1995): 40–47. http://dx.doi.org/10.2333/jbhmk.22.40.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Receveur, Simon, Robin Weiss, and Tim Fingscheidt. "Turbo Automatic Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, no. 5 (May 2016): 846–62. http://dx.doi.org/10.1109/taslp.2016.2520364.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Automatic speech recognition"

1

Alcaraz, Meseguer Noelia. "Speech Analysis for Automatic Speech Recognition." Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9092.

Full text
Abstract:

The classical front end analysis in speech recognition is a spectral analysis which parametrizes the speech signal into feature vectors; the most popular set of them is the Mel Frequency Cepstral Coefficients (MFCC). They are based on a standard power spectrum estimate which is first subjected to a log-based transform of the frequency axis (mel- frequency scale), and then decorrelated by using a modified discrete cosine transform. Following a focused introduction on speech production, perception and analysis, this paper gives a study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations. The work has been developed into two steps: first, the computation of the MFCC vectors from the source speech files by using HTK Software; and second, the implementation of the generative model in itself, which, actually, represents the conversion chain from HTK-generated MFCC vectors to speech reconstruction. In order to know the goodness of the speech coding into feature vectors and to evaluate the generative model, the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed. For that, spectral models based on Linear Prediction Coding (LPC) analysis have been used. During the implementation of the generative model some results have been obtained in terms of the reconstruction of the spectral representation and the quality of the synthesized speech.

APA, Harvard, Vancouver, ISO, and other styles
2

Gabriel, Naveen. "Automatic Speech Recognition in Somali." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166216.

Full text
Abstract:
The field of speech recognition during the last decade has left the research stage and found its way into the public market, and today, speech recognition software is ubiquitous around us. An automatic speech recognizer understands human speech and represents it as text. Most of the current speech recognition software employs variants of deep neural networks. Before the deep learning era, the hybrid of hidden Markov model and Gaussian mixture model (HMM-GMM) was a popular statistical model to solve speech recognition. In this thesis, automatic speech recognition using HMM-GMM was trained on Somali data which consisted of voice recording and its transcription. HMM-GMM is a hybrid system in which the framework is composed of an acoustic model and a language model. The acoustic model represents the time-variant aspect of the speech signal, and the language model determines how probable is the observed sequence of words. This thesis begins with background about speech recognition. Literature survey covers some of the work that has been done in this field. This thesis evaluates how different language models and discounting methods affect the performance of speech recognition systems. Also, log scores were calculated for the top 5 predicted sentences and confidence measures of pre-dicted sentences. The model was trained on 4.5 hrs of voiced data and its corresponding transcription. It was evaluated on 3 mins of testing data. The performance of the trained model on the test set was good, given that the data was devoid of any background noise and lack of variability. The performance of the model is measured using word error rate(WER) and sentence error rate (SER). The performance of the implemented model is also compared with the results of other research work. This thesis also discusses why log and confidence score of the sentence might not be a good way to measure the performance of the resulting model. It also discusses the shortcoming of the HMM-GMM model, how the existing model can be improved, and different alternatives to solve the problem.
APA, Harvard, Vancouver, ISO, and other styles
3

Al-Shareef, Sarah. "Conversational Arabic Automatic Speech Recognition." Thesis, University of Sheffield, 2015. http://etheses.whiterose.ac.uk/10145/.

Full text
Abstract:
Colloquial Arabic (CA) is the set of spoken variants of modern Arabic that exist in the form of regional dialects and are considered generally to be mother-tongues in those regions. CA has limited textual resource because it exists only as a spoken language and without a standardised written form. Normally the modern standard Arabic (MSA) writing convention is employed that has limitations in phonetically representing CA. Without phonetic dictionaries the pronunciation of CA words is ambiguous, and can only be obtained through word and/or sentence context. Moreover, CA inherits the MSA complex word structure where words can be created from attaching affixes to a word. In automatic speech recognition (ASR), commonly used approaches to model acoustic, pronunciation and word variability are language independent. However, one can observe significant differences in performance between English and CA, with the latter yielding up to three times higher error rates. This thesis investigates the main issues for the under-performance of CA ASR systems. The work focuses on two directions: first, the impact of limited lexical coverage, and insufficient training data for written CA on language modelling is investigated; second, obtaining better models for the acoustics and pronunciations by learning to transfer between written and spoken forms. Several original contributions result from each direction. Using data-driven classes from decomposed text are shown to reduce out-of-vocabulary rate. A novel colloquialisation system to import additional data is introduced; automatic diacritisation to restore the missing short vowels was found to yield good performance; and a new acoustic set for describing CA was defined. Using the proposed methods improved the ASR performance in terms of word error rate in a CA conversational telephone speech ASR task.
APA, Harvard, Vancouver, ISO, and other styles
4

Jalalvand, Shahab. "Automatic Speech Recognition Quality Estimation." Doctoral thesis, Università degli studi di Trento, 2017. https://hdl.handle.net/11572/368743.

Full text
Abstract:
Evaluation of automatic speech recognition (ASR) systems is difficult and costly, since it requires manual transcriptions. This evaluation is usually done by computing word error rate (WER) that is the most popular metric in ASR community. Such computation is doable only if the manual references are available, whereas in the real-life applications, it is a too rigid condition. A reference-free metric to evaluate the ASR performance is \textit{confidence measure} which is provided by the ASR decoder. However, the confidence measure is not always available, especially in commercial ASR usages. Even if available, this measure is usually biased towards the decoder. From this perspective, the confidence measure is not suitable for comparison purposes, for example between two ASR systems. These issues motivate the necessity of an automatic quality estimation system for ASR outputs. This thesis explores ASR quality estimation (ASR QE) from different perspectives including: feature engineering, learning algorithms and applications. From feature engineering perspective, a wide range of features extractable from input signal and output transcription are studied. These features represent the quality of the recognition from different aspects and they are divided into four groups: signal, textual, hybrid and word-based features. From learning point of view, we address two main approaches: i) QE via regression, suitable for single hypothesis scenario; ii) QE via machine-learned ranking (MLR), suitable for multiple hypotheses scenario. In the former, a regression model is used to predict the WER score of each single hypothesis that is created through a single automatic transcription channel. In the latter, a ranking model is used to predict the order of multiple hypotheses with respect to their quality. Multiple hypotheses are mainly generated by several ASR systems or several recording microphones. From application point of view, we introduce two applications in which ASR QE makes salient improvement in terms of WER: i) QE-informed data selection for acoustic model adaptation; ii) QE-informed system combination. In the former, we exploit single hypothesis ASR QE methods in order to select the best adaptation data for upgrading the acoustic model. In the latter, we exploit multiple hypotheses ASR QE methods to rank and combine the automatic transcriptions in a supervised manner. The experiments are mostly conducted on CHiME-3 English dataset. CHiME-3 consists of Wall Street Journal utterances, recorded by multiple far distant microphones in noisy environments. The results show that QE-informed acoustic model adaptation leads to 1.8\% absolute WER reduction and QE-informed system combination leads to 1.7% absolute WER reduction in CHiME-3 task. The outcomes of this thesis are packed in the frame of an open source toolkit named TranscRater -transcription rating toolkit- (https://github.com/hlt-mt/TranscRater) which has been developed based on the aforementioned studies. TranscRater can be used to extract informative features, train the QE models and predict the quality of the reference-less recognitions in a variety of ASR tasks.
APA, Harvard, Vancouver, ISO, and other styles
5

Jalalvand, Shahab. "Automatic Speech Recognition Quality Estimation." Doctoral thesis, University of Trento, 2017. http://eprints-phd.biblio.unitn.it/2058/1/PhD_Thesis.pdf.

Full text
Abstract:
Evaluation of automatic speech recognition (ASR) systems is difficult and costly, since it requires manual transcriptions. This evaluation is usually done by computing word error rate (WER) that is the most popular metric in ASR community. Such computation is doable only if the manual references are available, whereas in the real-life applications, it is a too rigid condition. A reference-free metric to evaluate the ASR performance is \textit{confidence measure} which is provided by the ASR decoder. However, the confidence measure is not always available, especially in commercial ASR usages. Even if available, this measure is usually biased towards the decoder. From this perspective, the confidence measure is not suitable for comparison purposes, for example between two ASR systems. These issues motivate the necessity of an automatic quality estimation system for ASR outputs. This thesis explores ASR quality estimation (ASR QE) from different perspectives including: feature engineering, learning algorithms and applications. From feature engineering perspective, a wide range of features extractable from input signal and output transcription are studied. These features represent the quality of the recognition from different aspects and they are divided into four groups: signal, textual, hybrid and word-based features. From learning point of view, we address two main approaches: i) QE via regression, suitable for single hypothesis scenario; ii) QE via machine-learned ranking (MLR), suitable for multiple hypotheses scenario. In the former, a regression model is used to predict the WER score of each single hypothesis that is created through a single automatic transcription channel. In the latter, a ranking model is used to predict the order of multiple hypotheses with respect to their quality. Multiple hypotheses are mainly generated by several ASR systems or several recording microphones. From application point of view, we introduce two applications in which ASR QE makes salient improvement in terms of WER: i) QE-informed data selection for acoustic model adaptation; ii) QE-informed system combination. In the former, we exploit single hypothesis ASR QE methods in order to select the best adaptation data for upgrading the acoustic model. In the latter, we exploit multiple hypotheses ASR QE methods to rank and combine the automatic transcriptions in a supervised manner. The experiments are mostly conducted on CHiME-3 English dataset. CHiME-3 consists of Wall Street Journal utterances, recorded by multiple far distant microphones in noisy environments. The results show that QE-informed acoustic model adaptation leads to 1.8\% absolute WER reduction and QE-informed system combination leads to 1.7% absolute WER reduction in CHiME-3 task. The outcomes of this thesis are packed in the frame of an open source toolkit named TranscRater -transcription rating toolkit- (https://github.com/hlt-mt/TranscRater) which has been developed based on the aforementioned studies. TranscRater can be used to extract informative features, train the QE models and predict the quality of the reference-less recognitions in a variety of ASR tasks.
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Peidong. "Robust Automatic Speech Recognition By Integrating Speech Separation." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1619099401042668.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Seward, Alexander. "Efficient Methods for Automatic Speech Recognition." Doctoral thesis, KTH, Tal, musik och hörsel, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3675.

Full text
Abstract:
This thesis presents work in the area of automatic speech recognition (ASR). The thesis focuses on methods for increasing the efficiency of speech recognition systems and on techniques for efficient representation of different types of knowledge in the decoding process. In this work, several decoding algorithms and recognition systems have been developed, aimed at various recognition tasks. The thesis presents the KTH large vocabulary speech recognition system. The system was developed for online (live) recognition with large vocabularies and complex language models. The system utilizes weighted transducer theory for efficient representation of different knowledge sources, with the purpose of optimizing the recognition process. A search algorithm for efficient processing of hidden Markov models (HMMs) is presented. The algorithm is an alternative to the classical Viterbi algorithm for fast computation of shortest paths in HMMs. It is part of a larger decoding strategy aimed at reducing the overall computational complexity in ASR. In this approach, all HMM computations are completely decoupled from the rest of the decoding process. This enables the use of larger vocabularies and more complex language models without an increase of HMM-related computations. Ace is another speech recognition system developed within this work. It is a platform aimed at facilitating the development of speech recognizers and new decoding methods. A real-time system for low-latency online speech transcription is also presented. The system was developed within a project with the goal of improving the possibilities for hard-of-hearing people to use conventional telephony by providing speech-synchronized multimodal feedback. This work addresses several additional requirements implied by this special recognition task.
QC 20100811
APA, Harvard, Vancouver, ISO, and other styles
8

Vipperla, Ravichander. "Automatic Speech Recognition for ageing voices." Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5725.

Full text
Abstract:
With ageing, human voices undergo several changes which are typically characterised by increased hoarseness, breathiness, changes in articulatory patterns and slower speaking rate. The focus of this thesis is to understand the impact of ageing on Automatic Speech Recognition (ASR) performance and improve the ASR accuracies for older voices. Baseline results on three corpora indicate that the word error rates (WER) for older adults are significantly higher than those of younger adults and the decrease in accuracies is higher for males speakers as compared to females. Acoustic parameters such as jitter and shimmer that measure glottal source disfluencies were found to be significantly higher for older adults. However, the hypothesis that these changes explain the differences in WER for the two age groups is proven incorrect. Experiments with artificial introduction of glottal source disfluencies in speech from younger adults do not display a significant impact on WERs. Changes in fundamental frequency observed quite often in older voices has a marginal impact on ASR accuracies. Analysis of phoneme errors between younger and older speakers shows a pattern of certain phonemes especially lower vowels getting more affected with ageing. These changes however are seen to vary across speakers. Another factor that is strongly associated with ageing voices is a decrease in the rate of speech. Experiments to analyse the impact of slower speaking rate on ASR accuracies indicate that the insertion errors increase while decoding slower speech with models trained on relatively faster speech. We then propose a way to characterise speakers in acoustic space based on speaker adaptation transforms and observe that speakers (especially males) can be segregated with reasonable accuracies based on age. Inspired by this, we look at supervised hierarchical acoustic models based on gender and age. Significant improvements in word accuracies are achieved over the baseline results with such models. The idea is then extended to construct unsupervised hierarchical models which also outperform the baseline models by a good margin. Finally, we hypothesize that the ASR accuracies can be improved by augmenting the adaptation data with speech from acoustically closest speakers. A strategy to select the augmentation speakers is proposed. Experimental results on two corpora indicate that the hypothesis holds true only when the amount of available adaptation is limited to a few seconds. The efficacy of such a speaker selection strategy is analysed for both younger and older adults.
APA, Harvard, Vancouver, ISO, and other styles
9

Guzy, Julius Jonathan. "Automatic speech recognition : a refutation approach." Thesis, De Montfort University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.254196.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Deterding, David Henry. "Speaker normalisation for automatic speech recognition." Thesis, University of Cambridge, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359822.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Automatic speech recognition"

1

Yu, Dong, and Li Deng. Automatic Speech Recognition. London: Springer London, 2015. http://dx.doi.org/10.1007/978-1-4471-5779-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lee, Kai-Fu. Automatic Speech Recognition. Boston, MA: Springer US, 1989. http://dx.doi.org/10.1007/978-1-4615-3650-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Woelfel, Matthias. Distant speech recognition. Chichester, West Sussex, U.K: Wiley, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Junqua, Jean-Claude, and Jean-Paul Haton. Robustness in Automatic Speech Recognition. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-1297-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lee, Chin-Hui, Frank K. Soong, and Kuldip K. Paliwal, eds. Automatic Speech and Speaker Recognition. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-1367-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Keshet, Joseph, and Samy Bengio, eds. Automatic Speech and Speaker Recognition. Chichester, UK: John Wiley & Sons, Ltd, 2009. http://dx.doi.org/10.1002/9780470742044.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Huang, X. D. Hidden Markov models for speech recognition. Edinburgh: Edinburgh University Press, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Markowitz, Judith A. Using speech recognition. Upper Saddle River, N.J: Prentice Hall PTR, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ainsworth, W. A. Speech recognition by machine. London, U.K: P. Peregrinus on behalf of the Institution of Electrical Engineers, 1988.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ainsworth, W. A. Speech recognition by machine. London: Peregrinus on behalf of the Institution of Electrical Engineers, 1987.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Automatic speech recognition"

1

Kurematsu, Akira, and Tsuyoshi Morimoto. "Speech Recognition." In Automatic Speech Translation, 9–41. London: CRC Press, 2023. http://dx.doi.org/10.1201/9780429333385-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lu, Xugang, Sheng Li, and Masakiyo Fujimoto. "Automatic Speech Recognition." In SpringerBriefs in Computer Science, 21–38. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-15-0595-9_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Owens, F. J. "Automatic Speech Recognition." In Signal Processing of Speech, 138–73. London: Macmillan Education UK, 1993. http://dx.doi.org/10.1007/978-1-349-22599-6_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Schäuble, Peter. "Automatic Speech Recognition." In Multimedia Information Retrieval, 61–120. Boston, MA: Springer US, 1997. http://dx.doi.org/10.1007/978-1-4615-6163-7_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Soltau, Hagen, George Saon, Lidia Mangu, Hong-Kwang Kuo, Brian Kingsbury, Stephen Chu, and Fadi Biadsy. "Automatic Speech Recognition." In Natural Language Processing of Semitic Languages, 409–59. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-642-45358-8_13.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chowdhary, K. R. "Automatic Speech Recognition." In Fundamentals of Artificial Intelligence, 651–68. New Delhi: Springer India, 2020. http://dx.doi.org/10.1007/978-81-322-3972-7_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gruhn, Rainer E., Wolfgang Minker, and Satoshi Nakamura. "Automatic Speech Recognition." In Signals and Communication Technology, 5–17. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-19586-0_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kamath, Uday, John Liu, and James Whitaker. "Automatic Speech Recognition." In Deep Learning for NLP and Speech Recognition, 369–404. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-14596-5_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Weik, Martin H. "automatic speech recognition." In Computer Science and Communications Dictionary, 88. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/1-4020-0613-6_1147.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Potamianos, Gerasimos, Lori Lamel, Matthias Wölfel, Jing Huang, Etienne Marcheret, Claude Barras, Xuan Zhu, et al. "Automatic Speech Recognition." In Computers in the Human Interaction Loop, 43–59. London: Springer London, 2009. http://dx.doi.org/10.1007/978-1-84882-054-8_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Automatic speech recognition"

1

O'Shaughnessy, Douglas. "Automatic speech recognition." In 2015 Chilean Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON). IEEE, 2015. http://dx.doi.org/10.1109/chilecon.2015.7400411.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Glasser, Abraham. "Automatic Speech Recognition Services." In CHI '19: CHI Conference on Human Factors in Computing Systems. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3290607.3308461.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Catariov, Alexandru. "Automatic speech recognition systems." In Chisinau - DL tentative, edited by Andrei M. Andriesh and Veacheslav L. Perju. SPIE, 2005. http://dx.doi.org/10.1117/12.612047.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Paulik, M., S. Stuker, C. Fugen, T. Schultz, T. Schaaf, and A. Waibel. "Speech translation enhanced automatic speech recognition." In IEEE Workshop on Automatic Speech Recognition and Understanding, 2005. IEEE, 2005. http://dx.doi.org/10.1109/asru.2005.1566488.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ahmed, Basem H. A., and Ayman S. Ghabayen. "Arabic Automatic Speech Recognition Enhancement." In 2017 Palestinian International Conference on Information and Communication Technology (PICICT). IEEE, 2017. http://dx.doi.org/10.1109/picict.2017.12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Adi, Derry Pramono, Agustinus Bimo Gumelar, and Ralin Pramasuri Arta Meisa. "Interlanguage of Automatic Speech Recognition." In 2019 International Seminar on Application for Technology of Information and Communication (iSemantic). IEEE, 2019. http://dx.doi.org/10.1109/isemantic.2019.8884310.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Anoop, C. S., and A. G. Ramakrishnan. "Automatic Speech Recognition for Sanskrit." In 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT). IEEE, 2019. http://dx.doi.org/10.1109/icicict46008.2019.8993283.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Munteanu, Cosmin, Gerald Penn, Ron Baecker, and Yuecheng Zhang. "Automatic speech recognition for webcasts." In the 8th international conference. New York, New York, USA: ACM Press, 2006. http://dx.doi.org/10.1145/1180995.1181005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Potamianos, Alexandros, Shrikanth Narayanan, and Sungbok Lee. "Automatic speech recognition for children." In 5th European Conference on Speech Communication and Technology (Eurospeech 1997). ISCA: ISCA, 1997. http://dx.doi.org/10.21437/eurospeech.1997-623.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, C. Julian. "Speech recognition with automatic punctuation." In 6th European Conference on Speech Communication and Technology (Eurospeech 1999). ISCA: ISCA, 1999. http://dx.doi.org/10.21437/eurospeech.1999-115.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Automatic speech recognition"

1

Clements, Mark A., John H. Hansen, Kathleen E. Cummings, and Sungjae Lim. Automatic Recognition of Speech in Stressful Environments. Fort Belvoir, VA: Defense Technical Information Center, August 1991. http://dx.doi.org/10.21236/ada242917.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Brown, Peter F. The Acoustic-Modeling Problem in Automatic Speech Recognition. Fort Belvoir, VA: Defense Technical Information Center, December 1987. http://dx.doi.org/10.21236/ada188529.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Vergyri, Dimitra, and Katrin Kirchhoff. Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition. Fort Belvoir, VA: Defense Technical Information Center, January 2004. http://dx.doi.org/10.21236/ada457846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Bass, James D. Advancing Noise Robust Automatic Speech Recognition for Command and Control Applications. Fort Belvoir, VA: Defense Technical Information Center, March 2006. http://dx.doi.org/10.21236/ada461436.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Stevenson, G. Analysis of Pre-Trained Deep Neural Networks for Large-Vocabulary Automatic Speech Recognition. Office of Scientific and Technical Information (OSTI), July 2016. http://dx.doi.org/10.2172/1289367.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Fatehifar, Mohsen, Josef Schlittenlacher, David Wong, and Kevin Munro. Applications Of Automatic Speech Recognition And Text-To-Speech Models To Detect Hearing Loss: A Scoping Review Protocol. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, January 2023. http://dx.doi.org/10.37766/inplasy2023.1.0029.

Full text
Abstract:
Review question / Objective: This scoping review aims to identify published methods that have used automatic speech recognition or text-to-speech recognition technologies to detect hearing loss and report on their accuracy and limitations. Condition being studied: Hearing enables us to communicate with the surrounding world. According to reports by the World Health Organization, 1.5 billion suffer from some degree of hearing loss of which 430 million require medical attention. It is estimated that by 2050, 1 in every 4 people will experience some sort of hearing disability. Hearing loss can significantly impact people’s ability to communicate and makes social interactions a challenge. In addition, it can result in anxiety, isolation, depression, hindrance of learning, and a decrease in general quality of life. A hearing assessment is usually done in hospitals and clinics with special equipment and trained staff. However, these services are not always available in less developed countries. Even in developed countries, like the UK, access to these facilities can be a challenge in rural areas. Moreover, during a crisis like the Covid-19 pandemic, accessing the required healthcare can become dangerous and challenging even in large cities.
APA, Harvard, Vancouver, ISO, and other styles
7

Oran, D. Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources. RFC Editor, December 2005. http://dx.doi.org/10.17487/rfc4313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tao, Yang, Amos Mizrach, Victor Alchanatis, Nachshon Shamir, and Tom Porter. Automated imaging broiler chicksexing for gender-specific and efficient production. United States Department of Agriculture, December 2014. http://dx.doi.org/10.32747/2014.7594391.bard.

Full text
Abstract:
Extending the previous two years of research results (Mizarch, et al, 2012, Tao, 2011, 2012), the third year’s efforts in both Maryland and Israel were directed towards the engineering of the system. The activities included the robust chick handling and its conveyor system development, optical system improvement, online dynamic motion imaging of chicks, multi-image sequence optimal feather extraction and detection, and pattern recognition. Mechanical System Engineering The third model of the mechanical chick handling system with high-speed imaging system was built as shown in Fig. 1. This system has the improved chick holding cups and motion mechanisms that enable chicks to open wings through the view section. The mechanical system has achieved the speed of 4 chicks per second which exceeds the design specs of 3 chicks per second. In the center of the conveyor, a high-speed camera with UV sensitive optical system, shown in Fig.2, was installed that captures chick images at multiple frames (45 images and system selectable) when the chick passing through the view area. Through intensive discussions and efforts, the PIs of Maryland and ARO have created the protocol of joint hardware and software that uses sequential images of chick in its fall motion to capture opening wings and extract the optimal opening positions. This approached enables the reliable feather feature extraction in dynamic motion and pattern recognition. Improving of Chick Wing Deployment The mechanical system for chick conveying and especially the section that cause chicks to deploy their wings wide open under the fast video camera and the UV light was investigated along the third study year. As a natural behavior, chicks tend to deploy their wings as a mean of balancing their body when a sudden change in the vertical movement was applied. In the latest two years, this was achieved by causing the chicks to move in a free fall, in the earth gravity (g) along short vertical distance. The chicks have always tended to deploy their wing but not always in wide horizontal open situation. Such position is requested in order to get successful image under the video camera. Besides, the cells with checks bumped suddenly at the end of the free falling path. That caused the chicks legs to collapse inside the cells and the image of wing become bluer. For improving the movement and preventing the chick legs from collapsing, a slowing down mechanism was design and tested. This was done by installing of plastic block, that was printed in a predesign variable slope (Fig. 3) at the end of the path of falling cells (Fig.4). The cells are moving down in variable velocity according the block slope and achieve zero velocity at the end of the path. The slop was design in a way that the deacceleration become 0.8g instead the free fall gravity (g) without presence of the block. The tests showed better deployment and wider chick's wing opening as well as better balance along the movement. Design of additional sizes of block slops is under investigation. Slops that create accelerations of 0.7g, 0.9g, and variable accelerations are designed for improving movement path and images.
APA, Harvard, Vancouver, ISO, and other styles
9

Issues in Data Processing and Relevant Population Selection. OSAC Speaker Recognition Subcommittee, November 2022. http://dx.doi.org/10.29325/osac.tg.0006.

Full text
Abstract:
In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography