Academic literature on the topic 'Automatic speech recognition – Statistical methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Automatic speech recognition – Statistical methods.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Automatic speech recognition – Statistical methods"

1

Boyer, A., J. Di Martino, P. Divoux, J. P. Haton, J. F. Mari, and K. Smaili. "Statistical methods in multi-speaker automatic speech recognition." Applied Stochastic Models and Data Analysis 6, no. 3 (September 1990): 143–55. http://dx.doi.org/10.1002/asm.3150060302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kłosowski, Piotr. "A Rule-Based Grapheme-to-Phoneme Conversion System." Applied Sciences 12, no. 5 (March 7, 2022): 2758. http://dx.doi.org/10.3390/app12052758.

Full text
Abstract:
This article presents a rule-based grapheme-to-phoneme conversion method and algorithm for Polish. It should be noted that the fundamental grapheme-to-phoneme conversion rules have been developed by Maria Steffen-Batóg and presented in her set of monographs dedicated to the automatic grapheme-to-phoneme conversion of texts in Polish. The author used previously developed rules and independently developed the grapheme-to-phoneme conversion algorithm.The algorithm has been implemented as a software application called TransFon, which allows the user to convert any text in Polish orthography to corresponding strings of phonemes, in phonemic transcription. Using TransFon, a phonemic Polish language corpus was created out of an orthographic corpus. The phonemic language corpusallows statistical analysis of the Polish language, as well as the development of phoneme- and word-based language models for automatic speech recognition using statistical methods. The developed phonemic language corpus opens up further opportunities for research to improve automatic speech recognition in Polish. The development of statistical methods for speech recognition and language modelling requires access to large language corpora, including phonemic corpora. The method presented here enables the creation of such corpora.
APA, Harvard, Vancouver, ISO, and other styles
3

Toth, Laszlo, Ildiko Hoffmann, Gabor Gosztolya, Veronika Vincze, Greta Szatloczki, Zoltan Banreti, Magdolna Pakaski, and Janos Kalman. "A Speech Recognition-based Solution for the Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech." Current Alzheimer Research 15, no. 2 (January 3, 2018): 130–38. http://dx.doi.org/10.2174/1567205014666171121114930.

Full text
Abstract:
Background: Even today the reliable diagnosis of the prodromal stages of Alzheimer's disease (AD) remains a great challenge. Our research focuses on the earliest detectable indicators of cognitive decline in mild cognitive impairment (MCI). Since the presence of language impairment has been reported even in the mild stage of AD, the aim of this study is to develop a sensitive neuropsychological screening method which is based on the analysis of spontaneous speech production during performing a memory task. In the future, this can form the basis of an Internet-based interactive screening software for the recognition of MCI. Methods: Participants were 38 healthy controls and 48 clinically diagnosed MCI patients. The provoked spontaneous speech by asking the patients to recall the content of 2 short black and white films (one direct, one delayed), and by answering one question. Acoustic parameters (hesitation ratio, speech tempo, length and number of silent and filled pauses, length of utterance) were extracted from the recorded speech signals, first manually (using the Praat software), and then automatically, with an automatic speech recognition (ASR) based tool. First, the extracted parameters were statistically analyzed. Then we applied machine learning algorithms to see whether the MCI and the control group can be discriminated automatically based on the acoustic features. Results: The statistical analysis showed significant differences for most of the acoustic parameters (speech tempo, articulation rate, silent pause, hesitation ratio, length of utterance, pause-per-utterance ratio). The most significant differences between the two groups were found in the speech tempo in the delayed recall task, and in the number of pauses for the question-answering task. The fully automated version of the analysis process – that is, using the ASR-based features in combination with machine learning - was able to separate the two classes with an F1-score of 78.8%. Conclusion: The temporal analysis of spontaneous speech can be exploited in implementing a new, automatic detection-based tool for screening MCI for the community.
APA, Harvard, Vancouver, ISO, and other styles
4

Gellatly, Andrew W., and Thomas A. Dingus. "Speech Recognition and Automotive Applications: Using Speech to Perform in-Vehicle Tasks." Proceedings of the Human Factors and Ergonomics Society Annual Meeting 42, no. 17 (October 1998): 1247–51. http://dx.doi.org/10.1177/154193129804201715.

Full text
Abstract:
An experiment was conducted to investigate the effects of automatic speech recognition (ASR) system design, driver input-modality, and driver age on driving performance during in-vehicle task execution and in-vehicle task usability. Results showed that ASR system design (i.e., recognition accuracy and recognition error type) and driver input-modality (i.e., manual or speech) significantly affected certain dependent measures. However, the differences found were small, suggesting that less than ideal ASR system design/performance can be considered for use in automobiles without substantially improving or degrading driving performance. Several of the speech-input conditions tested were statistically similar, as determined by the dependent measures, to current manual-input methods used to perform identical in-vehicle tasks. Further research is warranted to determine how extended exposure to, and use of, ASR systems affects driving performance, in-vehicle task usability, and driver opinion compared with conventional manual-input methods. In addition, the research should investigate whether prolonged exposure to, and use of, ASR systems results in significant improvements compared to the current research findings.
APA, Harvard, Vancouver, ISO, and other styles
5

Seman, Noraini, and Ahmad Firdaus Norazam. "Hybrid methods of brandt’s generalised likelihood ratio and short-term energy for malay word speech segmentation." Indonesian Journal of Electrical Engineering and Computer Science 16, no. 1 (October 1, 2019): 283. http://dx.doi.org/10.11591/ijeecs.v16.i1.pp283-291.

Full text
Abstract:
<p>Speech segmentation is an important part for speech recognition, synthesizing and coding. Statistical based approach detects segmentation points via computing spectral distortion of the signal without prior knowledge of the acoustic information proved to be able to give good match, less omission but lot of insertion. In this study the segmentation is done both manually and automatically using Malay words in traditional Malay poetry. This study proposed a hybrid method of Brandt’s generalized likelihood ratio (GLR) and short-term energy algorithm. The Brandt’s algorithm tries to estimate the abrupt change in energy to determine the segmentation points. A total of five Pantun are used in read mode and spoken by one male student in a noise free room. Experiments are conducted to see the the accuracy, insertion, and omission of the segmentation points. Experimental results show on average 80% accuracy with 0.2 second time tolerance for automatic segmentation with the algorithm having no knowledge of the acoustic characteristics<em>. </em></p>
APA, Harvard, Vancouver, ISO, and other styles
6

Cabral, Frederico Soares, Hidekazu Fukai, and Satoshi Tamura. "Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors." Sensors 19, no. 16 (August 9, 2019): 3481. http://dx.doi.org/10.3390/s19163481.

Full text
Abstract:
The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.
APA, Harvard, Vancouver, ISO, and other styles
7

Hai, Yanfei. "Computer-aided teaching mode of oral English intelligent learning based on speech recognition and network assistance." Journal of Intelligent & Fuzzy Systems 39, no. 4 (October 21, 2020): 5749–60. http://dx.doi.org/10.3233/jifs-189052.

Full text
Abstract:
The purpose of this paper is to use English specific syllables and prosodic features in spoken speech data to carry out English spoken recognition, and to explore effective methods for the design and application of English speech detection and automatic recognition systems. The method proposed by this study is a combination of SVM_FF based classifier, SVM_IER based classifier and syllable classifier. Compared with the method based on the combination of other phonological characteristics such as phonological rate, intensity, formant and energy statistics and pronunciation rate, and the syllable-based classifier based on specific syllable training, a better recognition rate is obtained. In addition, this study conducts simulation experiments on the proposed English recognition and identification method based on specific syllables and prosodic features and analyzes the experimental results. The result found that the recognition performance of the English spoken recognition system constructed by this study is significantly better than the traditional model.
APA, Harvard, Vancouver, ISO, and other styles
8

Markovnikov, Nikita, and Irina Kipyatkova. "Encoder-decoder models for recognition of Russian speech." Information and Control Systems, no. 4 (October 4, 2019): 45–53. http://dx.doi.org/10.31799/1684-8853-2019-4-45-53.

Full text
Abstract:
Problem: Classical systems of automatic speech recognition are traditionally built using an acoustic model based on hidden Markovmodels and a statistical language model. Such systems demonstrate high recognition accuracy, but consist of several independentcomplex parts, which can cause problems when building models. Recently, an end-to-end recognition method has been spread, usingdeep artificial neural networks. This approach makes it easy to implement models using just one neural network. End-to-end modelsoften demonstrate better performance in terms of speed and accuracy of speech recognition. Purpose: Implementation of end-toendmodels for the recognition of continuous Russian speech, their adjustment and comparison with hybrid base models in terms ofrecognition accuracy and computational characteristics, such as the speed of learning and decoding. Methods: Creating an encoderdecodermodel of speech recognition using an attention mechanism; applying techniques of stabilization and regularization of neuralnetworks; augmentation of data for training; using parts of words as an output of a neural network. Results: An encoder-decodermodel was obtained using an attention mechanism for recognizing continuous Russian speech without extracting features or usinga language model. As elements of the output sequence, we used parts of words from the training set. The resulting model could notsurpass the basic hybrid models, but surpassed the other baseline end-to-end models, both in recognition accuracy and in decoding/learning speed. The word recognition error was 24.17% and the decoding speed was 0.3 of the real time, which is 6% faster than thebaseline end-to-end model and 46% faster than the basic hybrid model. We showed that end-to-end models could work without languagemodels for the Russian language, while demonstrating a higher decoding speed than hybrid models. The resulting model was trained onraw data without extracting any features. We found that for the Russian language the hybrid type of an attention mechanism gives thebest result compared to location-based or context-based attention mechanisms. Practical relevance: The resulting models require lessmemory and less speech decoding time than the traditional hybrid models. That fact can allow them to be used locally on mobile deviceswithout using calculations on remote servers.
APA, Harvard, Vancouver, ISO, and other styles
9

AFLI, HAITHEM, LOÏC BARRAULT, and HOLGER SCHWENK. "Building and using multimodal comparable corpora for machine translation." Natural Language Engineering 22, no. 4 (June 15, 2016): 603–25. http://dx.doi.org/10.1017/s1351324916000152.

Full text
Abstract:
AbstractIn recent decades, statistical approaches have significantly advanced the development of machine translation systems. However, the applicability of these methods directly depends on the availability of very large quantities of parallel data. Recent works have demonstrated that a comparable corpus can compensate for the shortage of parallel corpora. In this paper, we propose an alternative to comparable corpora containing text documents as resources for extracting parallel data: a multimodal comparable corpus with audio documents in source language and text document in target language, built fromEuronewsandTEDweb sites. The audio is transcribed by an automatic speech recognition system, and translated with a baseline statistical machine translation system. We then use information retrieval in a large text corpus in the target language in order to extract parallel sentences/phrases. We evaluate the quality of the extracted data on an English to French translation task and show significant improvements over a state-of-the-art baseline.
APA, Harvard, Vancouver, ISO, and other styles
10

Kozlova, A. T. "Temporal Characteristics of Prosody in Imperative Utterances and the Phenomenon of Emphatic Length in the English Language." Bulletin of Kemerovo State University, no. 3 (October 27, 2018): 192–96. http://dx.doi.org/10.21603/2078-8975-2018-3-192-196.

Full text
Abstract:
The paper focuses on one of the most effective factors of linguistic manipulation, i.e. imperative utterance. The subject of the study was direct contact appeals, whose structures corresponded to the literary norms of the English language. The research determined and described the temporal component of imperative prosody. The author employed electro-acoustic, mathematical and statistical methods. The phonetic experiment revealed four prosodic structures, as well as their inter-structural and inter-style levels, the degree of temporal fluctuation and the phenomenon of emphatic length, the latter being recognized as the basic temporal feature of imperative prosody. Temporal variation in a phrase and its functional segments in different prosodic structures and in certain extra-linguistic conditions convincingly demonstrates the set of absolute and inter-style markers of this prosodic subsystem. In practice, the results of the present research can be applied in teaching communicatively oriented utterances and in making up the algorithm of automatic speech recognition and synthesis.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Automatic speech recognition – Statistical methods"

1

Wu, Jian, and 武健. "Discriminative speaker adaptation and environmental robustness in automatic speech recognition." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31246138.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

黃伯光 and Pak-kwong Wong. "Statistical language models for Chinese recognition: speech and character." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31239456.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Chan, Oscar. "Prosodic features for a maximum entropy language model." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0244.

Full text
Abstract:
A statistical language model attempts to characterise the patterns present in a natural language as a probability distribution defined over word sequences. Typically, they are trained using word co-occurrence statistics from a large sample of text. In some language modelling applications, such as automatic speech recognition (ASR), the availability of acoustic data provides an additional source of knowledge. This contains, amongst other things, the melodic and rhythmic aspects of speech referred to as prosody. Although prosody has been found to be an important factor in human speech recognition, its use in ASR has been limited. The goal of this research is to investigate how prosodic information can be employed to improve the language modelling component of a continuous speech recognition system. Because prosodic features are largely suprasegmental, operating over units larger than the phonetic segment, the language model is an appropriate place to incorporate such information. The prosodic features and standard language model features are combined under the maximum entropy framework, which provides an elegant solution to modelling information obtained from multiple, differing knowledge sources. We derive features for the model based on perceptually transcribed Tones and Break Indices (ToBI) labels, and analyse their contribution to the word recognition task. While ToBI has a solid foundation in linguistic theory, the need for human transcribers conflicts with the statistical model's requirement for a large quantity of training data. We therefore also examine the applicability of features which can be automatically extracted from the speech signal. We develop representations of an utterance's prosodic context using fundamental frequency, energy and duration features, which can be directly incorporated into the model without the need for manual labelling. Dimensionality reduction techniques are also explored with the aim of reducing the computational costs associated with training a maximum entropy model. Experiments on a prosodically transcribed corpus show that small but statistically significant reductions to perplexity and word error rates can be obtained by using both manually transcribed and automatically extracted features.
APA, Harvard, Vancouver, ISO, and other styles
4

Fu, Qiang. "A generalization of the minimum classification error (MCE) training method for speech recognition and detection." Diss., Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22705.

Full text
Abstract:
The model training algorithm is a critical component in the statistical pattern recognition approaches which are based on the Bayes decision theory. Conventional applications of the Bayes decision theory usually assume uniform error cost and result in a ubiquitous use of the maximum a posteriori (MAP) decision policy and the paradigm of distribution estimation as practice in the design of a statistical pattern recognition system. The minimum classification error (MCE) training method is proposed to overcome some substantial limitations for the conventional distribution estimation methods. In this thesis, three aspects of the MCE method are generalized. First, an optimal classifier/recognizer design framework is constructed, aiming at minimizing non-uniform error cost.A generalized training criterion named weighted MCE is proposed for pattern and speech recognition tasks with non-uniform error cost. Second, the MCE method for speech recognition tasks requires appropriate management of multiple recognition hypotheses for each data segment. A modified version of the MCE method with a new approach to selecting and organizing recognition hypotheses is proposed for continuous phoneme recognition. Third, the minimum verification error (MVE) method for detection-based automatic speech recognition (ASR) is studied. The MVE method can be viewed as a special version of the MCE method which aims at minimizing detection/verification errors. We present many experiments on pattern recognition and speech recognition tasks to justify the effectiveness of our generalizations.
APA, Harvard, Vancouver, ISO, and other styles
5

Seward, Alexander. "Efficient Methods for Automatic Speech Recognition." Doctoral thesis, KTH, Tal, musik och hörsel, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3675.

Full text
Abstract:
This thesis presents work in the area of automatic speech recognition (ASR). The thesis focuses on methods for increasing the efficiency of speech recognition systems and on techniques for efficient representation of different types of knowledge in the decoding process. In this work, several decoding algorithms and recognition systems have been developed, aimed at various recognition tasks. The thesis presents the KTH large vocabulary speech recognition system. The system was developed for online (live) recognition with large vocabularies and complex language models. The system utilizes weighted transducer theory for efficient representation of different knowledge sources, with the purpose of optimizing the recognition process. A search algorithm for efficient processing of hidden Markov models (HMMs) is presented. The algorithm is an alternative to the classical Viterbi algorithm for fast computation of shortest paths in HMMs. It is part of a larger decoding strategy aimed at reducing the overall computational complexity in ASR. In this approach, all HMM computations are completely decoupled from the rest of the decoding process. This enables the use of larger vocabularies and more complex language models without an increase of HMM-related computations. Ace is another speech recognition system developed within this work. It is a platform aimed at facilitating the development of speech recognizers and new decoding methods. A real-time system for low-latency online speech transcription is also presented. The system was developed within a project with the goal of improving the possibilities for hard-of-hearing people to use conventional telephony by providing speech-synchronized multimodal feedback. This work addresses several additional requirements implied by this special recognition task.
QC 20100811
APA, Harvard, Vancouver, ISO, and other styles
6

Clarkson, P. R. "Adaptation of statistical language models for automatic speech recognition." Thesis, University of Cambridge, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597745.

Full text
Abstract:
Statistical language models encode linguistic information in such a way as to be useful to systems which process human language. Such systems include those for optical character recognition and machine translation. Currently, however, the most common application of language modelling is in automatic speech recognition, and it is this that forms the focus of this thesis. Most current speech recognition systems are dedicated to one specific task (for example, the recognition of broadcast news), and thus use a language model which has been trained on text which is appropriate to that task. If, however, one wants to perform recognition on more general language, then creating an appropriate language model is far from straightforward. A task-specific language model will often perform very badly on language from a different domain, whereas a model trained on text from many diverse styles of language might perform better in general, but will not be especially well suited to any particular domain. Thus the idea of an adaptive language model whose parameters automatically adjust to the current style of language is an appealing one. In this thesis, two adaptive language models are investigated. The first is a mixture-based model. The training text is partitioned according to the style of text, and a separate language model is constructed for each component. Each component is assigned a weighting according to its performance at modelling the observed text, and a final language model is constructed as the weighted sum of each of the mixture components. The second approach is based on a cache of recent words. Previous work has shown that words that have occurred recently have a higher probability of occurring in the immediate future than would be predicted by a standard triagram language model. This thesis investigates the hypothesis that more recent words should be considered more significant within the cache by implementing a cache in which a word's recurrence probability decays exponentially over time. The problem of how to predict the effect of a particular language model on speech recognition accuracy is also addressed in this thesis. The results presented here, as well as those of other recent research, suggest that perplexity, the most commonly used method of evaluating language models, is not as well correlated with word error rate as was once thought. This thesis investigates the connection between a language model's perplexity and its effect on speech recognition performance, and will describe the development of alternative measures of a language models' quality which are better correlated with word error rate. Finally, it is shown how the recognition performance which is achieved using mixture-based language models can be improved by optimising the mixture weights with respect to these new measures.
APA, Harvard, Vancouver, ISO, and other styles
7

Wei, Yi. "Statistical methods on automatic aircraft recognition in aerial images." Thesis, University of Strathclyde, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.248947.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wong, Pak-kwong. "Statistical language models for Chinese recognition : speech and character /." Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20158725.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

McGreevy, Michael. "Statistical language modelling for large vocabulary speech recognition." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/16444/1/Michael_McGreevy_Thesis.pdf.

Full text
Abstract:
The move towards larger vocabulary Automatic Speech Recognition (ASR) systems places greater demands on language models. In a large vocabulary system, acoustic confusion is greater, thus there is more reliance placed on the language model for disambiguation. In addition to this, ASR systems are increasingly being deployed in situations where the speaker is not conscious of their interaction with the system, such as in recorded meetings and surveillance scenarios. This results in more natural speech, which contains many false starts and disfluencies. In this thesis we investigate a novel approach to the modelling of speech corrections. We propose a syntactic model of speech corrections, and seek to determine if this model can improve on the performance of standard language modelling approaches when applied to conversational speech. We investigate a number of related variations to our basic approach and compare these approaches against the class-based N-gram. We also investigate the modelling of styles of speech. Specifically, we investigate whether the incorporation of prior knowledge about sentence types can improve the performance of language models. We propose a sentence mixture model based on word-class N-grams, in which the sentence mixture models and the word-class membership probabilities are jointly trained. We compare this approach with word-based sentence mixture models.
APA, Harvard, Vancouver, ISO, and other styles
10

McGreevy, Michael. "Statistical language modelling for large vocabulary speech recognition." Queensland University of Technology, 2006. http://eprints.qut.edu.au/16444/.

Full text
Abstract:
The move towards larger vocabulary Automatic Speech Recognition (ASR) systems places greater demands on language models. In a large vocabulary system, acoustic confusion is greater, thus there is more reliance placed on the language model for disambiguation. In addition to this, ASR systems are increasingly being deployed in situations where the speaker is not conscious of their interaction with the system, such as in recorded meetings and surveillance scenarios. This results in more natural speech, which contains many false starts and disfluencies. In this thesis we investigate a novel approach to the modelling of speech corrections. We propose a syntactic model of speech corrections, and seek to determine if this model can improve on the performance of standard language modelling approaches when applied to conversational speech. We investigate a number of related variations to our basic approach and compare these approaches against the class-based N-gram. We also investigate the modelling of styles of speech. Specifically, we investigate whether the incorporation of prior knowledge about sentence types can improve the performance of language models. We propose a sentence mixture model based on word-class N-grams, in which the sentence mixture models and the word-class membership probabilities are jointly trained. We compare this approach with word-based sentence mixture models.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Automatic speech recognition – Statistical methods"

1

Statistical methods for speech recognition. Cambridge, Mass: MIT Press, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Alumäe, Tanel. Methods for Estonian large vocabulary speech recognition. [Tallinn]: Tallinn University of Technology Press, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Joseph, Keshet, and Bengio Samy, eds. Automatic speech and speaker recognition: Large margin and kernel methods. Hoboken, NJ: J. Wiley & Sons, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Minker, Wolfgang. Incorporating Knowledge Sources into Statistical Speech Recognition. Boston, MA: Springer Science+Business Media, LLC, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Jelinek, Frederick. Statistical Methods for Speech Recognition. MIT Press, 2022.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Keshet, Joseph, and Samy Bengio. Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods. Wiley & Sons, Limited, John, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Keshet, Joseph, and Samy Bengio. Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods. Wiley & Sons, Incorporated, John, 2009.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

A, Fourcin, ed. Speech input and output assessment: Multilingual methods and standards. Chichester, West Sussex, England: E. Horwood, 1989.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Müller, Christian. Speaker Classification I: Fundamentals, Features, and Methods. Springer London, Limited, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lamel, Lori, and Jean-Luc Gauvain. Speech Recognition. Edited by Ruslan Mitkov. Oxford University Press, 2012. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0016.

Full text
Abstract:
Speech recognition is concerned with converting the speech waveform, an acoustic signal, into a sequence of words. Today's approaches are based on a statistical modellization of the speech signal. This article provides an overview of the main topics addressed in speech recognition, which are, acoustic-phonetic modelling, lexical representation, language modelling, decoding, and model adaptation. Language models are used in speech recognition to estimate the probability of word sequences. The main components of a generic speech recognition system are, main knowledge sources, feature analysis, and acoustic and language models, which are estimated in a training phase, and the decoder. The focus of this article is on methods used in state-of-the-art speaker-independent, large-vocabulary continuous speech recognition (LVCSR). Primary application areas for such technology are dictation, spoken language dialogue, and transcription for information archival and retrieval systems. Finally, this article discusses issues and directions of future research.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Automatic speech recognition – Statistical methods"

1

de Mori, Renato. "Statistical Methods for Automatic Speech Recognition." In Speech Processing, Recognition and Artificial Neural Networks, 165–89. London: Springer London, 1999. http://dx.doi.org/10.1007/978-1-4471-0845-0_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Juang, B. H., Wu Chou, and C. H. Lee. "Statistical and Discriminative Methods for Speech Recognition." In Speech Recognition and Coding, 41–55. Berlin, Heidelberg: Springer Berlin Heidelberg, 1995. http://dx.doi.org/10.1007/978-3-642-57745-1_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Pironkov, Gueorgui, Sean U. N. Wood, Stéphane Dupont, and Thierry Dutoit. "Investigating a Hybrid Learning Approach for Robust Automatic Speech Recognition." In Statistical Language and Speech Processing, 67–78. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-00810-9_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Mari, Jean-François, and René Schott. "Some Applications in Speech Recognition." In Probabilistic and Statistical Methods in Computer Science, 153–76. Boston, MA: Springer US, 2001. http://dx.doi.org/10.1007/978-1-4757-6280-8_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Fuhrmann, Ferdinand, Anna Maly, Christina Leitner, and Franz Graf. "Three Experiments on the Application of Automatic Speech Recognition in Industrial Environments." In Statistical Language and Speech Processing, 109–18. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-68456-7_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Juang, B. H., W. Chou, and C. H. Lee. "Statistical and Discriminative Methods for Speech Recognition." In The Kluwer International Series in Engineering and Computer Science, 109–32. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-1367-0_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hodžić, Migdat, and Tarik Namas. "Automatic SAR Target Recognition and Pose Estimation. Part 2. Statistical Methods for Target Recognition." In Lecture Notes in Networks and Systems, 901–30. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-71321-2_77.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Coleti, Thiago Adriano, Marcelo Morandini, and Fátima de Lourdes dos Santos Nunes. "Analyzing Face and Speech Recognition to Create Automatic Information for Usability Evaluation." In Human-Computer Interaction. Human-Centred Design Approaches, Methods, Tools, and Environments, 184–92. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-39232-0_21.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Korenevsky, M. L., Yu N. Matveev, and A. V. Yakovlev. "Investigation and Development of Methods for Improving Robustness of Automatic Speech Recognition Algorithms in Complex Acoustic Environments." In Proceedings of the Scientific-Practical Conference "Research and Development - 2016", 11–20. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-62870-7_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Emiliani, U., P. Podini, and F. Sani. "Combined Application of Neural Network and Artificial Intelligence Methods to Automatic Speech Recognition in a Continuous Utterance." In Artificial Neural Nets and Genetic Algorithms, 269–74. Vienna: Springer Vienna, 1993. http://dx.doi.org/10.1007/978-3-7091-7533-0_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Automatic speech recognition – Statistical methods"

1

Drygajlo, Andrzej, Didier Meuwly, and Anil Alexander. "Statistical methods and Bayesian interpretation of evidence in forensic automatic speaker recognition." In 8th European Conference on Speech Communication and Technology (Eurospeech 2003). ISCA: ISCA, 2003. http://dx.doi.org/10.21437/eurospeech.2003-297.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Xiao, Xiaoqiang, Jasha Droppo, and Alex Acero. "Information retrieval methods for automatic speech recognition." In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2010. http://dx.doi.org/10.1109/icassp.2010.5495229.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kozierski, Piotr, Talar Sadalla, Szymon Drgas, and Adam Dabrowski. "Allophones in automatic whispery speech recognition." In 2016 21st International Conference on Methods and Models in Automation and Robotics (MMAR). IEEE, 2016. http://dx.doi.org/10.1109/mmar.2016.7575241.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Furui, Sadaoki. "Robust methods in automatic speech recognition and understanding." In 8th European Conference on Speech Communication and Technology (Eurospeech 2003). ISCA: ISCA, 2003. http://dx.doi.org/10.21437/eurospeech.2003-575.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lehning, Michael. "Statistical methods for the automatic labelling of German prosody." In 4th European Conference on Speech Communication and Technology (Eurospeech 1995). ISCA: ISCA, 1995. http://dx.doi.org/10.21437/eurospeech.1995-500.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Prodeus, Arkadiy, and Kateryna Kukharicheva. "Training of automatic speech recognition system on noised speech." In 2016 4th International Conference on Methods and Systems of Navigation and Motion Control (MSNMC). IEEE, 2016. http://dx.doi.org/10.1109/msnmc.2016.7783147.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Schatzmann, Jost, Blaise Thomson, and Steve Young. "Error simulation for training statistical dialogue systems." In 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). IEEE, 2007. http://dx.doi.org/10.1109/asru.2007.4430167.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Paliwal, Kuldip K., James G. Lyons, Stephen So, Anthony P. Stark, and Kamil K. Wojcicki. "Comparative evaluation of speech enhancement methods for robust automatic speech recognition." In 2010 4th International Conference on Signal Processing and Communication Systems (ICSPCS 2010). IEEE, 2010. http://dx.doi.org/10.1109/icspcs.2010.5709761.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Ney, H. "One decade of statistical machine translation: 1996-2005." In IEEE Workshop on Automatic Speech Recognition and Understanding, 2005. IEEE, 2005. http://dx.doi.org/10.1109/asru.2005.1566466.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Junichi Tsujii. "Combining statistical models with symbolic grammar in parsing." In 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU). IEEE, 2007. http://dx.doi.org/10.1109/asru.2007.4430140.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Automatic speech recognition – Statistical methods"

1

Fatehifar, Mohsen, Josef Schlittenlacher, David Wong, and Kevin Munro. Applications Of Automatic Speech Recognition And Text-To-Speech Models To Detect Hearing Loss: A Scoping Review Protocol. INPLASY - International Platform of Registered Systematic Review and Meta-analysis Protocols, January 2023. http://dx.doi.org/10.37766/inplasy2023.1.0029.

Full text
Abstract:
Review question / Objective: This scoping review aims to identify published methods that have used automatic speech recognition or text-to-speech recognition technologies to detect hearing loss and report on their accuracy and limitations. Condition being studied: Hearing enables us to communicate with the surrounding world. According to reports by the World Health Organization, 1.5 billion suffer from some degree of hearing loss of which 430 million require medical attention. It is estimated that by 2050, 1 in every 4 people will experience some sort of hearing disability. Hearing loss can significantly impact people’s ability to communicate and makes social interactions a challenge. In addition, it can result in anxiety, isolation, depression, hindrance of learning, and a decrease in general quality of life. A hearing assessment is usually done in hospitals and clinics with special equipment and trained staff. However, these services are not always available in less developed countries. Even in developed countries, like the UK, access to these facilities can be a challenge in rural areas. Moreover, during a crisis like the Covid-19 pandemic, accessing the required healthcare can become dangerous and challenging even in large cities.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography