Dissertations / Theses on the topic 'Speech'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Speech.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Sun, Felix (Felix W. ). "Speech Representation Models for Speech Synthesis and Multimodal Speech Recognition." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106378.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 59-63).
The field of speech recognition has seen steady advances over the last two decades, leading to the accurate, real-time recognition systems available on mobile phones today. In this thesis, I apply speech modeling techniques developed for recognition to two other speech problems: speech synthesis and multimodal speech recognition with images. In both problems, there is a need to learn a relationship between speech sounds and another source of information. For speech synthesis, I show that using a neural network acoustic model results in a synthesizer that is more tolerant of noisy training data than previous work. For multimodal recognition, I show how information from images can be effectively integrated into the recognition search framework, resulting in improved accuracy when image data is available.
by Felix Sun.
M. Eng.
Alcaraz, Meseguer Noelia. "Speech Analysis for Automatic Speech Recognition." Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9092.
Full textThe classical front end analysis in speech recognition is a spectral analysis which parametrizes the speech signal into feature vectors; the most popular set of them is the Mel Frequency Cepstral Coefficients (MFCC). They are based on a standard power spectrum estimate which is first subjected to a log-based transform of the frequency axis (mel- frequency scale), and then decorrelated by using a modified discrete cosine transform. Following a focused introduction on speech production, perception and analysis, this paper gives a study of the implementation of a speech generative model; whereby the speech is synthesized and recovered back from its MFCC representations. The work has been developed into two steps: first, the computation of the MFCC vectors from the source speech files by using HTK Software; and second, the implementation of the generative model in itself, which, actually, represents the conversion chain from HTK-generated MFCC vectors to speech reconstruction. In order to know the goodness of the speech coding into feature vectors and to evaluate the generative model, the spectral distance between the original speech signal and the one produced from the MFCC vectors has been computed. For that, spectral models based on Linear Prediction Coding (LPC) analysis have been used. During the implementation of the generative model some results have been obtained in terms of the reconstruction of the spectral representation and the quality of the synthesized speech.
Kleinschmidt, Tristan Friedrich. "Robust speech recognition using speech enhancement." Thesis, Queensland University of Technology, 2010. https://eprints.qut.edu.au/31895/1/Tristan_Kleinschmidt_Thesis.pdf.
Full textBlank, Sarah Catrin. "Speech comprehension, speech production and recovery of propositional speech following aphasic stroke." Thesis, Imperial College London, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.407772.
Full textPrice, Moneca C. "Interactions between speech coders and disordered speech." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ28640.pdf.
Full textChong, Fong Loong. "Objective speech quality measurement for Chinese speech." Thesis, University of Canterbury. Computer Science and Software Engineering, 2005. http://hdl.handle.net/10092/9607.
Full textStedmon, Alexander Winstan. "Putting speech in, taking speech out : human factors in the use of speech interfaces." Thesis, University of Nottingham, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.420342.
Full textMiyajima, C., D. Negi, Y. Ninomiya, M. Sano, K. Mori, K. Itou, K. Takeda, and Y. Suenaga. "Audio-Visual Speech Database for Bimodal Speech Recognition." INTELLIGENT MEDIA INTEGRATION NAGOYA UNIVERSITY / COE, 2005. http://hdl.handle.net/2237/10460.
Full textTang, Lihong. "Nonsensical speech : speech acts in postsocialist Chinese culture /." Thesis, Connect to this title online; UW restricted, 2008. http://hdl.handle.net/1773/6662.
Full textItakura, Fumitada, Tetsuya Shinde, Kiyoshi Tatara, Taisuke Ito, Ikuya Yokoo, Shigeki Matsubara, Kazuya Takeda, and Nobuo Kawaguchi. "CIAIR speech corpus for real world speech recognition." The oriental chapter of COCOSDA (The International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques), 2002. http://hdl.handle.net/2237/15462.
Full textWang, Peidong. "Robust Automatic Speech Recognition By Integrating Speech Separation." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1619099401042668.
Full textLimbu, Sireesh Haang. "Direct Speech to Speech Translation Using Machine Learning." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-439141.
Full textHu, Ke. "Speech Segregation in Background Noise and Competing Speech." The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339018952.
Full textAl-Otaibi, Abdulhadi S. "Arabic speech processing : syllabic segmentation and speech recognition." Thesis, Aston University, 1988. http://publications.aston.ac.uk/8064/.
Full textSmith, Peter Wilfred Hesling. "Speech act theory, discourse structure and indirect speech." Thesis, University of Leeds, 1991. http://etheses.whiterose.ac.uk/734/.
Full textTran, Viet Anh. "Silent communication : whispered speech-to-clear speech conversion." Grenoble INPG, 2010. http://www.theses.fr/2010INPG0006.
Full textIn recent years, advances in wireless communication technology have led to the widespread use of cellular phones. Because of noisy environmental conditions and competing surrounding conversations, users tend to speak loudly. As a consequence, private policies and public legislation tend to restrain the use of cellular phone in public places. Silent speech which can only be heard by a limited set of listeners close to the speaker is an attractive solution to this problem if it can effectively be used for quiet and private communication. The motivation of this research thesis was to investigate ways of improving the naturalness and the intelligibility of synthetic speech obtained from the conversion of silent or whispered speech. A Non-audible murmur (NAM) condenser microphone, together with signal-based Gaussian Mixture Model (GMM) mapping, were chosen because promising results were already obtained with this sensor and this approach, and because the size of the NAM sensor is well adapted to mobile communication technology. Several improvements to the speech conversion obtained with this sensor were considered. A first set of improvement concerns characteristics of the voiced source. One of the features missing in whispered or silent speech with respect to loud or modal speech is F0, which is crucial in conveying linguistic (question vs. Statement, syntactic grouping, etc. ) as well as paralinguistic (attitudes, emotions) information. The proposed estimation of voicing and F0 for converted speech by separate predictors improves both predictions. The naturalness of the converted speech was then further improved by extending the context window of the input feature from phoneme size to syllable size and using a Linear Discriminant Analysis (LDA) instead of a Principal Component Analysis (PCA) for the dimension reduction of input feature vector. The objective positive influence of this new approach of the quality of the output converted speech was confirmed by perceptual tests. Another approach investigated in this thesis consisted in integrating visual information as a complement to the acoustic information in both input and output data. Lip movements which significantly contribute to the intelligibility of visual speech in face-to-face human interaction were explored by using an accurate lip motion capture system from 3D positions of coloured beads glued on the speaker's face. The visual parameters are represented by 5 components related to the rotation of the jaw, to lip rounding, upper and lower lip vertical movements and movements of the throat which is associated with the underlying movements of the larynx and hyoid bone. Including these visual features in the input data significantly improved the quality of the output converted speech, in terms of F0 and spectral features. In addition, the audio output was replaced by an audio-visual output. Subjective perceptual tests confirmed that the investigation of the visual modality in either the input or output data or both, improves the intelligibility of the whispered speech conversion. Both of these improvements are confirmed by subjective tests. Finally, we investigated the technique using a phonetic pivot by combining Hidden Markov Model (HMM)-based speech recognition and HMM-based speech synthesis techniques to convert whispered speech data to audible one in order to compare the performance of the two state-of-the-art approaches. Audiovisual features were used in the input data and audiovisual speech was produced as an output. The objective performance of the HMM-based system was inferior to the direct signal-to-signal system based on a GMM. A few interpretations of this result were proposed together with future lines of research
Chuchilina, L. M., and I. E. Yeskov. "Speech recognition." Thesis, Видавництво СумДУ, 2008. http://essuir.sumdu.edu.ua/handle/123456789/15995.
Full textWindchy, Eli. "Keynote Speech." Digital Commons @ East Tennessee State University, 2018. https://dc.etsu.edu/dcseug/2018/schedule/9.
Full textChua, W. W. "Speech recognition predictability of a Cantonese speech intelligibility index." Click to view the E-thesis via HKUTO, 2004. http://sunzi.lib.hku.hk/hkuto/record/B30509737.
Full textOverton, Katherine. "Perceptual Differences in Natural Speech and Personalized Synthetic Speech." Scholar Commons, 2017. http://scholarcommons.usf.edu/etd/6921.
Full textMailend, Marja-Liisa, and Marja-Liisa Mailend. "Speech Motor Planning in Apraxia of Speech and Aphasia." Diss., The University of Arizona, 2017. http://hdl.handle.net/10150/625882.
Full textMak, Cheuk-yan Charin. "Effects of speech and noise on Cantonese speech intelligibility." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37989790.
Full textEvans, N. W. D. "Spectral subtraction for speech enhancement and automatic speech recognition." Thesis, Swansea University, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.636935.
Full textChua, W. W., and 蔡蕙慧. "Speech recognition predictability of a Cantonese speech intelligibility index." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30509737.
Full textMak, Cheuk-yan Charin, and 麥芍欣. "Effects of speech and noise on Cantonese speech intelligibility." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37989790.
Full textLe, Cornu Thomas. "Reconstruction of intelligible audio speech from visual speech information." Thesis, University of East Anglia, 2016. https://ueaeprints.uea.ac.uk/67012/.
Full textJett, Brandi. "The role of coarticulation in speech-on-speech recognition." Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case1554498179209764.
Full textBi, Ning. "Speech conversion and its application to alaryngeal speech enhancement." Diss., The University of Arizona, 1995. http://hdl.handle.net/10150/187290.
Full textGordon, Jane S. "Use of synthetic speech in tests of speech discrimination." PDXScholar, 1985. https://pdxscholar.library.pdx.edu/open_access_etds/3443.
Full textMUKHERJEE, SANKAR. "Sensorimotor processes in speech listening and speech-based interaction." Doctoral thesis, Università degli studi di Genova, 2019. http://hdl.handle.net/11567/941827.
Full textKong, Jessica Lynn. "The Effect Of Mean Fundamental Frequency Normalization Of Masker Speech For A Speech-In-Speech Recognition Task." Case Western Reserve University School of Graduate Studies / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=case1588949121900459.
Full textSchramm, Hauke. "Modeling spontaneous speech variability for large vocabulary continuous speech recognition." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=97968479X.
Full textLidstone, Jane Stephanie May. "Private speech and inner speech in typical and atypical development." Thesis, Durham University, 2010. http://etheses.dur.ac.uk/526/.
Full textHoward, John Graham. "Temporal aspects of auditory-visual speech and non-speech perception." Thesis, University of Reading, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.553127.
Full textSimm, William Alexander. "Dysarthric speech measures for use in evidence-based speech therapy." Thesis, Lancaster University, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.531724.
Full textLebart, Katia. "Speech dereverberation applied to automatic speech recognition and hearing aids." Thesis, University of Sussex, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.285064.
Full textAlghamdi, Najwa. "Visual speech enhancement and its application in speech perception training." Thesis, University of Sheffield, 2017. http://etheses.whiterose.ac.uk/19667/.
Full textMwanyoha, Sadiki Pili 1974. "A speech recognition module for speech-to-text language translation." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9862.
Full textIncludes bibliographical references (leaves 47-48).
by Sadiki Pili Mwanyoha.
S.B.and M.Eng.
Moers-Prinz, Donata [Verfasser]. "Fast Speech in Unit Selection Speech Synthesis / Donata Moers-Prinz." Bielefeld : Universitätsbibliothek Bielefeld, 2020. http://d-nb.info/1219215201/34.
Full textLEBART, KATIA. "Speech dereverberation applied to automatic speech recognition and hearing aids." Rennes 1, 1999. http://www.theses.fr/1999REN10033.
Full textSöderberg, Hampus. "Engaging Speech UI's - How to address a speech recognition interface." Thesis, Malmö högskola, Fakulteten för teknik och samhälle (TS), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20591.
Full textShuster, Linda Irene. "Speech perception and speech production : between and within modal adaptation /." The Ohio State University, 1986. http://rave.ohiolink.edu/etdc/view?acc_num=osu148726754698296.
Full textKim, Hyo-Jong. "Stephen's speech missiological implications of Stephen's speech in Luke-Acts /." Online full text .pdf document, available to Fuller patrons only, 1999. http://www.tren.com.
Full textVescovi, Federico <1993>. "Understanding Speech Acts: Towards the Automated Detection of Speech Acts." Master's Degree Thesis, Università Ca' Foscari Venezia, 2019. http://hdl.handle.net/10579/15644.
Full textEriksson, Mattias. "Speech recognition availability." Thesis, Linköping University, Department of Computer and Information Science, 2004. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2651.
Full textThis project investigates the importance of availability in the scope of dictation programs. Using speech recognition technology for dictating has not reached the public, and that may very well be a result of poor availability in today’s technical solutions.
I have constructed a persona character, Johanna, who personalizes the target user. I have also developed a solution that streams audio into a speech recognition server and sends back interpreted text. Johanna affirmed that the solution was successful in theory.
I then incorporated test users that tried out the solution in practice. Half of them do indeed claim that their usage has been and will continue to be increased thanks to the new level of availability.
Øygarden, Jon. "Norwegian Speech Audiometry." Doctoral thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for språk- og kommunikasjonsstudier, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-5409.
Full textNilsson, Mattias. "Entropy and Speech." Doctoral thesis, Stockholm : Sound and Image Processing Laboratory, School of Electrical Engineering, Royal Institute of Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3990.
Full textJanardhanan, Deepa. "Wideband speech enhancement." Aachen Shaker, 2008. http://d-nb.info/989298310/04.
Full textDonovan, R. E. "Trainable speech synthesis." Thesis, University of Cambridge, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.598598.
Full textOliver, Richard George. "Malocclusion and speech." Thesis, Cardiff University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390247.
Full text