Tesis sobre el tema "Speech recognition"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Speech recognition".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Chuchilina, L. M. y I. E. Yeskov. "Speech recognition". Thesis, Видавництво СумДУ, 2008. http://essuir.sumdu.edu.ua/handle/123456789/15995.
Texto completoAlcaraz, Meseguer Noelia. "Speech Analysis for Automatic Speech Recognition". Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9092.
Texto completoThe classical front end analysis in speech recognition is a spectral analysis which parametrizes the speech signal into feature vectors; the most popular set of them is the Mel Frequency Cepstral Coefficients (MFCC). They are based on a standard power spectrum estimate which is first subjected to a log-based transform of the frequency axis (mel- frequency scale), and then decorrelated by using a modified discrete cosine transform. Following a focused introduction on speech production, perception and analysis, this paper gives a study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations. The work has been developed into two steps: first, the computation of the MFCC vectors from the source speech files by using HTK Software; and second, the implementation of the generative model in itself, which, actually, represents the conversion chain from HTK-generated MFCC vectors to speech reconstruction. In order to know the goodness of the speech coding into feature vectors and to evaluate the generative model, the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed. For that, spectral models based on Linear Prediction Coding (LPC) analysis have been used. During the implementation of the generative model some results have been obtained in terms of the reconstruction of the spectral representation and the quality of the synthesized speech.
Kleinschmidt, Tristan Friedrich. "Robust speech recognition using speech enhancement". Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/31895/1/Tristan_Kleinschmidt_Thesis.pdf.
Texto completoEriksson, Mattias. "Speech recognition availability". Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2651.
Texto completoThis project investigates the importance of availability in the scope of dictation programs. Using speech recognition technology for dictating has not reached the public, and that may very well be a result of poor availability in today’s technical solutions.
I have constructed a persona character, Johanna, who personalizes the target user. I have also developed a solution that streams audio into a speech recognition server and sends back interpreted text. Johanna affirmed that the solution was successful in theory.
I then incorporated test users that tried out the solution in practice. Half of them do indeed claim that their usage has been and will continue to be increased thanks to the new level of availability.
Uebler, Ulla. "Multilingual speech recognition /". Berlin : Logos Verlag, 2000. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=009117880&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.
Texto completoWang, Yonglian. "Speech Recognition under Stress". Available to subscribers only, 2009. http://proquest.umi.com/pqdweb?did=1968468151&sid=9&Fmt=2&clientId=1509&RQT=309&VName=PQD.
Texto completoLucas, Adrian Edward. "Acoustic level speech recognition". Thesis, University of Surrey, 1991. http://epubs.surrey.ac.uk/2819/.
Texto completoŽmolíková, Kateřina. "Far-Field Speech Recognition". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255331.
Texto completoSun, Felix (Felix W. ). "Speech Representation Models for Speech Synthesis and Multimodal Speech Recognition". Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106378.
Texto completoThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 59-63).
The field of speech recognition has seen steady advances over the last two decades, leading to the accurate, real-time recognition systems available on mobile phones today. In this thesis, I apply speech modeling techniques developed for recognition to two other speech problems: speech synthesis and multimodal speech recognition with images. In both problems, there is a need to learn a relationship between speech sounds and another source of information. For speech synthesis, I show that using a neural network acoustic model results in a synthesizer that is more tolerant of noisy training data than previous work. For multimodal recognition, I show how information from images can be effectively integrated into the recognition search framework, resulting in improved accuracy when image data is available.
by Felix Sun.
M. Eng.
Miyajima, C., D. Negi, Y. Ninomiya, M. Sano, K. Mori, K. Itou, K. Takeda y Y. Suenaga. "Audio-Visual Speech Database for Bimodal Speech Recognition". INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10460.
Texto completoItakura, Fumitada, Tetsuya Shinde, Kiyoshi Tatara, Taisuke Ito, Ikuya Yokoo, Shigeki Matsubara, Kazuya Takeda y Nobuo Kawaguchi. "CIAIR speech corpus for real world speech recognition". The oriental chapter of COCOSDA (The International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques), 2002. http://hdl.handle.net/2237/15462.
Texto completoWang, Peidong. "Robust Automatic Speech Recognition By Integrating Speech Separation". The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1619099401042668.
Texto completoAl-Otaibi, Abdulhadi S. "Arabic speech processing : syllabic segmentation and speech recognition". Thesis, Aston University, 1988. http://publications.aston.ac.uk/8064/.
Texto completoTran, Thao y Nathalie Tkauc. "Face recognition and speech recognition for access control". Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-39776.
Texto completoDewey, John K. "Speech recognition of foreign accent". Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1994. http://handle.dtic.mil/100.2/ADA282979.
Texto completoStemmer, Georg. "Modeling variability in speech recognition /". Berlin : Logos-Verl, 2005. http://deposit.ddb.de/cgi-bin/dokserv?id=2659313&prov=M&dok_var=1&dok_ext=htm.
Texto completoMustafa, M. K. "On-device mobile speech recognition". Thesis, Nottingham Trent University, 2016. http://irep.ntu.ac.uk/id/eprint/28044/.
Texto completoHaque, Serajul. "Perceptual features for speech recognition". University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0187.
Texto completoNilsson, Tobias. "Speech Recognition Software and Vidispine". Thesis, Umeå universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-71428.
Texto completoThompson, J. "Speech variability in speaker recognition". Thesis, Swansea University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.639230.
Texto completoLeventis, Constantinos P. "Speech recognition application in C.I.C". Thesis, Monterey, California. Naval Postgraduate School, 1991. http://hdl.handle.net/10945/26786.
Texto completoMilner, Benjamin Peter. "Speech recognition in adverse environments". Thesis, University of East Anglia, 1994. https://ueaeprints.uea.ac.uk/2907/.
Texto completoLong, Christopher J. "Wavelet methods in speech recognition". Thesis, Loughborough University, 1999. https://dspace.lboro.ac.uk/2134/14108.
Texto completoStewart, Darryl William. "Syllable based continuous speech recognition". Thesis, Queen's University Belfast, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.325993.
Texto completoLuettin, Juergen. "Visual speech and speaker recognition". Thesis, University of Sheffield, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264432.
Texto completoJafri, Afshan. "Morphology-based Arabic speech recognition". Thesis, University of Essex, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.429298.
Texto completoSANTOS, DEBORA ANDREA DE OLIVEIRA. "SPEECH RECOGNITION IN NOISE ENVIRONMENT". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2001. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=1987@1.
Texto completoEste trabalho apresenta um estudo comparativo de três técnicas de melhoria das taxas de reconhecimento de voz em ambiente adverso, a saber: Normalização da Média Cepestral (CMN), Subtração Espectral e Regressão Linear no Sentido da Máxima Verossimilhança (MLLR), aplicadas isoladamente e em concomitância, duas a duas. Os testes são realizados usando um sistema simples: reconhecimento de palavras isoladas (dígitos de zero a nove, e meia), modo dependente do locutor, modelos ocultos de Markov do tipo contínuo, e vetores de atributos com doze coeficientes cepestrais derivados da análise de predição linear. São adotados três tipos de ruído (gaussiano branco, falatório e de fábrica) em nove razões sinal-ruído diferentes. Os resultados experimentais demonstram que o emprego isolado das técnicas de reconhecimento robusto é, em geral, vantajoso, pois nas diversas razões sinal-ruído para as quais os testes são efetuados, quando as taxas de reconhecimento não sofrem um acréscimo, mantém-se as mesmas obtidas quando não se aplica nenhum método de aumento da robustez. Analisando-se comparativamente as implementações isoladas e simultânea das técnicas, constata-se que a simultânea nem sempre é atraente, dependendo da dupla empregada. Apresentam-se, ainda, os resultados decorrentes do uso de modelos ruidosos, observando-se que, embora sejam inegavelmente melhores, sua utilização é inviável na prática. Das técnicas implementadas, a que representa resultados mais próximos ao emprego de modelos ruidosos é a MLLR, seguida pela CMN, e por último pela Subtração Espectral. Estas últimas, embora percam em desempenho para a primeira, apresentam como vantagem a simplicidade e a generalidade. No que concerne as técnicas usadas concomitantemente, a dupla Subtração Espectral e MLLR é a considerada de melhor performance, pois mostra-se conveniente em relação ao emprego isolado de ambos os métodos, o que nem sempre ocorre com o uso de outras combinações das técnicas individuais.
This work presents a comparative study of three techniques for improving the speech recognition rates in adverse environment, namely: Cepstral Mean Normalization (CMN), Spectral Subtraction and Maximum Likelihood Linear Regression (MLLR). They are implemented in two ways: separately and in pairs. The tests are carried out on a simple system: recognition of isolated words (digits from zero to nine, and the word half), speaker-dependent mode, continuous hidden Markov models, and speech feature vectors with twelve cepstral coefficients derived from linear predictive analysis. Three types of noise are considered (the white one, voice babble and from factory) at nine different signal-to-noise ratios. Experimental result demonstrate that it is worth using separately the techniques of robust recognition. This is because for all signal-to-noise conditions when the recognition accuracy is not improved it is the same one obtained when no method for increasing the robustness is applied. Analyzing comparatively the isolated and simultaneous applications of the techniques, it is verified that the later is not always more attractive than the former one. This depends on the pair of techniques. The use of noisy models is also considered. Although it presents better results, it is not feasible to implement in pratical situations. Among the implemented techniques, MLLR presents closer results to the ones obtaneid with noisy models, followed by CMN, and, at last, by Spectral Subtraction. Although the two later ones are beaten by the first, in terms of recognition accuracy, their advantages are the simplicity and the generality. The use of simultaneous techniques reveals that the pair Spectral Subtraction and MLLR is the one with the best performance because it is superior in comparison with the individual use of both methods. This does not happen with other combination of techniques.
Este trabajo presenta un estudio comparativo de tres técnicas de mejoría de las tasas de reconocimiento de voz en ambiente adverso, a saber: Normalización de la Media Cepextral (CMN), Substracción Espectral y Regresión Lineal en el Sentido de la Máxima Verosimilitud (MLLR), aplicadas separada y conjuntamente, dos a dos. Las pruebas son realizados usando un sistema simple: reconocimiento de palabras aisladas (dígitos de cero al nueve, y media), de modo dependiente del locutor, modelos ocultos de Markov de tipo contínuo, y vectores de atributos con doce coeficientes cepextrales derivados del análisis de predicción lineal. Se adoptan tres tipos de ruido (gausiano blanco, parlatorio y de fábrica) en nueve razones señal- ruido diferentes. Los resultados experimentales demuestran que el empleo aislado de las técnicas de reconocimiento robusto es, en general, ventajoso, pues en las diversas relaciones señal ruido para las cuales las pruebas son efetuadas, cuando la tasa de reconocimiento no aumenta, manteniendo las mismas tasas cuando no se aplica ningún método de aumento de robustez. Analizando comparativamente las implementaciones aisladas y simultáneas de las técnicas, se constata que no siempre la simultánea resulta atractiva, dependiendo de la dupla utilizada. Se presentan además los resultados al utilizar modelos ruidosos, observando que, aunque resultan mejores, su utilización em la práctica resulta inviable. De las técnicas implementadas, la que presenta resultados más próximos al empleo de modelos ruidosos es la MLLR, seguida por la CMN, y por último por la Substracción Espectral. Estas últimas, aunque tienen desempeño peor que la primera, tienen como ventaja la simplicidad y la generalidad. En lo que se refiere a las técnicas usadas concomitantemente, la dupla Substracción Espectral y MLLR es la de mejor performance, pues se muestra conveniente en relación al empleo aislado de ambos métodos, lo que no siempre ocurre con el uso de otras combinaciones de las técnicas individuales.
Ragni, Anton. "Discriminative models for speech recognition". Thesis, University of Cambridge, 2014. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.707926.
Texto completoMelnikoff, Stephen Jonathan. "Speech recognition in programmable logic". Thesis, University of Birmingham, 2003. http://etheses.bham.ac.uk//id/eprint/16/.
Texto completoPrice, Michael Ph D. (Michael R. ). Massachusetts Institute of Technology. "Energy-scalable speech recognition circuits". Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106090.
Texto completoCataloged from PDF version of thesis.
Includes bibliographical references (pages 135-141).
As people become more comfortable with speaking to machines, the applications of speech interfaces will diversify and include a wider range of devices, such as wearables, appliances, and robots. Automatic speech recognition (ASR) is a key component of these interfaces that is computationally intensive. This thesis shows how we designed special-purpose integrated circuits to bring local ASR capabilities to electronic devices with a small size and power footprint. This thesis adopts a holistic, system-driven approach to ASR hardware design. We identify external memory bandwidth as the main driver in system power consumption and select algorithms and architectures to minimize it. We evaluate three acoustic modeling approaches-Gaussian mixture models (GMMs), subspace GMMs (SGMMs), and deep neural networks (DNNs)-and identify tradeoffs between memory bandwidth and recognition accuracy. DNNs offer the best tradeoffs for our application; we describe a SIMD DNN architecture using parameter quantization and sparse weight matrices to save bandwidth. We also present a hidden Markov model (HMM) search architecture using a weighted finite-state transducer (WFST) representation. Enhancements to the search architecture, including WFST compression and caching, predictive beam width control, and a word lattice, reduce memory bandwidth to 10 MB/s or less, despite having just 414 kB of on-chip SRAM. The resulting system runs in real-time with accuracy comparable to a software recognizer using the same models. We provide infrastructure for deploying recognizers trained with open-source tools (Kaldi) on the hardware platform. We investigate voice activity detection (VAD) as a wake-up mechanism and conclude that an accurate and robust algorithm is necessary to minimize system power, even if it results in larger area and power for the VAD itself. We design fixed-point digital implementations of three VAD algorithms and explore their performance on two synthetic tasks with SNRs from -5 to 30 dB. The best algorithm uses modulation frequency features with an NN classifier, requiring just 8.9 kB of parameters. Throughout this work we emphasize energy scalability, or the ability to save energy when high accuracy or complex models are not required. Our architecture exploits scalability from many sources: model hyperparameters, runtime parameters such as beam width, and voltage/frequency scaling. We demonstrate these concepts with results from five ASR tasks, with vocabularies ranging from 11 words to 145,000 words.
by Michael Price.
Ph. D.
Yoder, Benjamin W. (Benjamin Wesley) 1977. "Spontaneous speech recognition using HMMs". Thesis, Massachusetts Institute of Technology, 2001. http://hdl.handle.net/1721.1/36108.
Texto completoIncludes bibliographical references (leaf 63).
This thesis describes a speech recognition system that was built to support spontaneous speech understanding. The system is composed of (1) a front end acoustic analyzer which computes Mel-frequency cepstral coefficients, (2) acoustic models of context-dependent phonemes (triphones), (3) a back-off bigram statistical language model, and (4) a beam search decoder based on the Viterbi algorithm. The contextdependent acoustic models resulted in 67.9% phoneme recognition accuracy on the standard TIMIT speech database. Spontaneous speech was collected using a "Wizard of Oz" simulation of a simple spatial manipulation game. Naive subjects were instructed to manipulate blocks on a computer screen in order to solve a series of geometric puzzles using only spoken commands. A hidden human operator performed actions in response to each spoken command. The speech from thirteen subjects formed the corpus for the speech recognition results reported here. Using a task-specific bigram statistical language model and context-dependent acoustic models, the system achieved a word recognition accuracy of 67.6%. The recognizer operated using a vocabulary of 523 words. The recognition had a word perplexity of 36.
by Benjamin W. Yoder.
M.Eng.
Higgins, Irina. "Computational neuroscience of speech recognition". Thesis, University of Oxford, 2015. https://ora.ox.ac.uk/objects/uuid:daa8d096-6534-4174-b63e-cc4161291c90.
Texto completoGabriel, Naveen. "Automatic Speech Recognition in Somali". Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166216.
Texto completoMcDermott, Erik. "Discriminative training for speech recognition /". Electronic version of summary, 1997. http://www.wul.waseda.ac.jp/gakui/gaiyo/2460.pdf.
Texto completoKlautau, Aldebaro. "Speech recognition using discriminative classifiers /". Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3091208.
Texto completoAl-Shareef, Sarah. "Conversational Arabic Automatic Speech Recognition". Thesis, University of Sheffield, 2015. http://etheses.whiterose.ac.uk/10145/.
Texto completoThambiratnam, David P. "Speech recognition in adverse environments". Thesis, Queensland University of Technology, 1999. https://eprints.qut.edu.au/36099/1/36099_Thambiratnam_1999.pdf.
Texto completoJalalvand, Shahab. "Automatic Speech Recognition Quality Estimation". Doctoral thesis, Università degli studi di Trento, 2017. https://hdl.handle.net/11572/368743.
Texto completoJalalvand, Shahab. "Automatic Speech Recognition Quality Estimation". Doctoral thesis, University of Trento, 2017. http://eprints-phd.biblio.unitn.it/2058/1/PhD_Thesis.pdf.
Texto completoChua, W. W. "Speech recognition predictability of a Cantonese speech intelligibility index". Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B30509737.
Texto completoEvans, N. W. D. "Spectral subtraction for speech enhancement and automatic speech recognition". Thesis, Swansea University, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.636935.
Texto completoChua, W. W. y 蔡蕙慧. "Speech recognition predictability of a Cantonese speech intelligibility index". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30509737.
Texto completoJett, Brandi. "The role of coarticulation in speech-on-speech recognition". Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1554498179209764.
Texto completoIsaacs, Dale. "A comparison of the network speech recognition and distributed speech recognition systems and their effect on speech enabling mobile devices". Master's thesis, University of Cape Town, 2010. http://hdl.handle.net/11427/11232.
Texto completoOver the past 10 years there has been an exponential increase in the number of mobile subscribers worldwide. Market research has shown that the number of mobile subscribers rose to 4.3 billion towards end of Q1 in 2009. The unprecedented development of the telecommunication industry over the last decade has brought about the need for ubiquitous access to a host of different information resources and services. Today, speech remains the best medium of communication between people and it is conceivable that speech enabling mobile devices will allow users who only have mobile devices, to access all the information which is now available over the world wide web.
Schramm, Hauke. "Modeling spontaneous speech variability for large vocabulary continuous speech recognition". [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=97968479X.
Texto completoLebart, Katia. "Speech dereverberation applied to automatic speech recognition and hearing aids". Thesis, University of Sussex, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.285064.
Texto completoMwanyoha, Sadiki Pili 1974. "A speech recognition module for speech-to-text language translation". Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9862.
Texto completoIncludes bibliographical references (leaves 47-48).
by Sadiki Pili Mwanyoha.
S.B.and M.Eng.
LEBART, KATIA. "Speech dereverberation applied to automatic speech recognition and hearing aids". Rennes 1, 1999. http://www.theses.fr/1999REN10033.
Texto completoSöderberg, Hampus. "Engaging Speech UI's - How to address a speech recognition interface". Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20591.
Texto completoJohnston, Samuel John Charles y Samuel John Charles Johnston. "An Approach to Automatic and Human Speech Recognition Using Ear-Recorded Speech". Diss., The University of Arizona, 2017. http://hdl.handle.net/10150/625626.
Texto completo