Dissertations / Theses on the topic 'Signal processing; Voice recognition'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Signal processing; Voice recognition.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Nayfeh, Taysir H. "Multi-signal processing for voice recognition in noisy environments." Thesis, This resource online, 1991. http://scholar.lib.vt.edu/theses/available/etd-10222009-125021/.
Full textFredrickson, Steven Eric. "Neural networks for speaker identification." Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294364.
Full textLittle, M. A. "Biomechanically informed nonlinear speech signal processing." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.
Full textRegnier, Lise. "Localization, Characterization and Recognition of Singing Voices." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00687475.
Full textAdami, Andre Gustavo. "Sistema de reconhecimento de locutor utilizando redes neurais artificiais." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 1997. http://hdl.handle.net/10183/18277.
Full textThis work deals with the application of recent technologies related to the promising research domain of Intelligent Computing (IC) and to the traditional Digital Signal Processing area. This work aims to apply both technologies in a Voice Processing specific application which is the speaker recognition task. Many security control applications can be supported by speaker recognition technology, both in identification and verification of different speakers. The speaker recognition process can be divided into two main phases: basic characteristics extraction from the voice signal and classification. In the extraction phase, one proposed goal was the application of recent advances in DSP theory to the problem approached in this work. In this context, the fundamental frequency and the formant frequencies were employed as parameters to identify the speaker. The first one was obtained through the use of autocorrelation and the second ones were obtained through Fourier transform. These parameters were extracted from the portion of speech where the vocal tract presents a coarticulation between two voiced sounds. This approach is used to extract the characteristics of this apparatus vocal changing. In this work, the Multi-Layer Perceptron (MLP) ANN architecture was investigated in conjunction with the backpropagation learning algorithm. In this sense, some main characteristics extracted from the signal (voice) were used as input parameters to the ANN used. The output of MLP, trained previously with the speakers features, returns the authenticity of that signal. Tests were performed with 10 different male speakers, whose age were in the range from 18 to 24 years. The results are very promising. In this work it is also presented an approach to implement a speaker recognition system by applying conventional methods to the speaker classification process. The methods used are Dynamic Time Warping (DTW) and Vector Quantization (VQ).
Stolfi, Rumiko Oishi. "Sintese e reconhecimento da fala humana." [s.n.], 2006. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276267.
Full textDissertação (mestrado profissional) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-07T21:57:26Z (GMT). No. of bitstreams: 1 Stolfi_RumikoOishi_M.pdf: 1514197 bytes, checksum: e93f45916d359641c73b31b00952a914 (MD5) Previous issue date: 2006
Resumo: O objetivo deste trabalho é apresentar uma revisão dos principais conceitos e métodos envolvidos na síntese, processamento e reconhecimento da fala humana por computador.Estas tecnologias têm inúmeras aplicações, que têm aumentado substancialmente nos últimos anos com a popularização de equipamentos de comunicação portáteis (celulares, laptops, palmtops) e a universalização da Internet. A primeira parte deste trabalho é uma revisão dos conceitos básicos de processamento de sinais, incluindo transformada de Fourier, espectro de potência e espectrograma, filtros, digitalização de sinais e o teorema de Nyquist. A segunda parte descreve as principais características da fala humana, os mecanismos envolvidos em sua produção e percepção, e o conceito de fone (unidade lingüística de som). Nessa parte também descrevemos brevemente as principais técnicas para a conversão ortográfica-fonética, para a síntese de fala a partir da descrição fonética, e para o reconhecimento da fala natural. A terceira parte descreve um projeto prático que desenvolvemos para consolidar os conhecimentos adquiridos neste mestrado: um programa que gera canções populares japonesas a partir de uma descrição textual da letra de música, usando método de síntese concatenativa. No final do trabalho listamos também alguns softwares disponíveis (livres e comerciais) para síntese e reconhecimento da fala
Abstract: The goal of this dissertation is to review the main concepts relating to the synthesis, processing, and recognition of human speech by computer. These technologies have many applications, which have increased substantially in recent years after the spread of portable communication equipment (mobile phones, laptops, palmtops) and the universal access to the Internet. The first part of this work is a revision of fundamental concepts of signal processing, including the Fourier transform, power spectrum and spectrogram, filters, signal digitalization, and Nyquist's theorem. The second part describes the main characteristics of human speech, the mechanisms involved in its production and perception, and the concept of phone (linguistic unit of sound). In this part we also briefly describe the main techniques used for orthographic-phonetic transcription, for speech synthesis from a phonetic description, and for the recognition of natural speech. The third part describes a practical project we developed to consolidate the knowledge acquired in our Masters studies: a program that generates Japanese popular songs from a textual description of the lyrics and music, using the concatenative synthesis method. At the end of this dissertation, we list some available software products (free and commercial) for speech synthesis and speech recognition
Mestrado
Engenharia de Computação
Mestre em Ciência da Computação
Clotworthy, Christopher John. "A study of automated voice recognition." Thesis, Queen's University Belfast, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356909.
Full textWells, Ian. "Digital signal processing architectures for speech recognition." Thesis, University of the West of England, Bristol, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294705.
Full textAggoun, Amar. "DPCM video signal/image processing." Thesis, University of Nottingham, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.335792.
Full textMorris, Robert W. "Enhancement and recognition of whispered speech." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.
Full textRex, James Alexander. "Microphone signal processing for speech recognition in cars." Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.326728.
Full textShah, Afnan Arafat. "Improving automatic speech recognition transcription through signal processing." Thesis, University of Southampton, 2017. https://eprints.soton.ac.uk/418970/.
Full textDoukas, Nikolaos. "Voice activity detection using energy based measures and source separation." Thesis, Imperial College London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245220.
Full textOddiraju, Swetha. "Improving performance for adaptive filtering with voice applications." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/6271.
Full textThe entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on September 29, 2008) Includes bibliographical references.
Hanna, Salim Alia. "Digital signal processing algorithms for speech coding and recognition." Thesis, Imperial College London, 1987. http://hdl.handle.net/10044/1/46268.
Full textYOUSSIF, ROSHDY S. "HYBRID INTELLIGENT SYSTEMS FOR PATTERN RECOGNITION AND SIGNAL PROCESSING." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1085714219.
Full textNosa, Ogbewi. "Signal Processing and patternrecognition algorithm for monitoringParkinson’s disease." Thesis, Högskolan Dalarna, Datateknik, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:du-2376.
Full textWilson, Shawn C. "Voice recognition systems : assessment of implementation aboard U.S. naval ships." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FWilson.pdf.
Full textThesis advisor(s): Michael T. McMaster, Kenneth J. Hagan. Includes bibliographical references (p. 47-49). Also available online.
Wu, Ping. "Kohonen self-organising neural networks in speech signal processing." Thesis, University of Reading, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386985.
Full textJohnson, Joanna. "The effectiveness of voice recognition technology as used by persons with disabilities." Online version, 1998. http://www.uwstout.edu/lib/thesis/1998/1998johnsonj.pdf.
Full textWatkins, L. R. "Optical fibre communications : signal processing to accommodate system impairments." Thesis, Bangor University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.279143.
Full textSmith, Philip F. "Surface evaluation by the signal processing of ultrasonic pulses." Thesis, University of Aberdeen, 1990. http://digitool.abdn.ac.uk/R?func=search-advanced-go&find_code1=WSN&request1=AAIU024863.
Full textWang, Yuanxun. "Radar signature prediction and feature extraction using advanced signal processing techniques /." Digital version accessible at:, 1999. http://wwwlib.umi.com/cr/utexas/main.
Full textSANTOS, JÚNIOR Gutemberg Gonçalves dos. "Redução de ruído para sistemas de reconhecimento de voz utilizando subespaços vetoriais." Universidade Federal de Campina Grande, 2009. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/1508.
Full textMade available in DSpace on 2018-08-20T20:10:09Z (GMT). No. of bitstreams: 1 GUTEMBERG GONÇALVES DOS SANTOS JÚNIOR - DISSERTAÇÃO PPGEE 2009..pdf: 2756190 bytes, checksum: 5812d37f7ad4c18eb26e9672d4890812 (MD5) Previous issue date: 2009-05-08
O estabelecimento de uma interface de comunicação através da voz entre seres humanos e computadores vem sendo perseguido desde o início da era da computação. Nesta direção, diversos avanços foram realizados nas últimas seis décadas, permitindo o uso comercial de aplicações com reconhecimento de voz nos dias atuais. Entretanto, fatores como ruídos, reverberações, distorções entre outros, comprometem o desempenho desses sistemas ao reduzir a taxa de acerto quando submetidos a ambientes adversos. Assim, o estudo de técnicas que diminuam os efeitos desses problemas é de grande valia e vem ganhando destaque nas últimas décadas. O trabalho apresentado nesta dissertação tem como objetivo a redução dos problemas referentes aos ruídos característicos de ambientes automotivos, tornando os sistemas de reconhecimento de voz utilizados nesses ambientes mais robustos. Dessa forma, o controle de funcionalidades não-críticas de um automóvel, ou seja, funcionalidades que não coloquem em risco a vida do usuário como tocadores de música e ar condicionado, pode ser realizado através de comandos de voz. O sistema proposto é baseado numa etapa de pré-processamento do sinal de voz através do método de subespaços vetoriais. O desempenho deste método está diretamente relacionado com as dimensões (linhas× colunas) das matrizes representativas do sinal de entrada. Levando isso em consideração, a decomposição ULLV, apesar de se tratar de uma aproximação do método de subespaços vetoriais, foi utilizada por oferecer uma menor complexidade computacional quando comparada a métodos tradicionais baseados na decomposição SVD. O sistema de reconhecimento de voz Julius foi o escolhido para o estudo de caso por se tratar de um sistema desenvolvido em código livre que oferece um alto desempenho. Um banco de dados de voz com 44800 amostras foi gerado com o modelo de um ambiente automotivo. Por fim, a robustez do sistema foi avaliada e comparada com um método tradicional de redução de ruído chamado subtração espectral.
The establishment of a speech-based communication interface between humans and computers has been pursued since the beginning of the computer era. Several studies have been made over the last six decades in order to accomplish this interface, making possible commercial use of speech recognition applications. However, factors such as noise, reverberation, distortion among others degrades the performance of these systems. Thus, reducing their success rate when operating in adverse environments. With this in mind, the study of techniques to reduce the impact of these problems is of a great value and has gained prominence in recent decades. The work presented in this dissertation aims to reduce problems related to noise encountered in an automotive environment, improving the speech recognition system robustness. Thus,controlofnon-critical features of a car, such as CD player and air conditioning, can be performed through voice commands. The proposed system is based on a speech signal preprocessing step using the signal subspace method. Its performance is related to the size (lines× columns) of the matrices that represents the input signal. Therefore, the ULLV decomposition was used because it offers a lower computational complexity compared to traditional methods based on SVD decomposition. The speech recognizer Julius is an open source software that offers high performance and was the chosen one for the case study. A noisy speech database with 44800 samples was generated to model the automotive environment. Finally, the robustness of the system was evaluated and compared with a traditional method of noise reduction called spectral subtraction.
Osanlou, Ardeshir. "Soft computing and fractal geometry in signal processing and pattern recognition." Thesis, De Montfort University, 2000. http://hdl.handle.net/2086/4242.
Full textElvira, Jose M. "Neural networks for speech and speaker recognition." Thesis, Staffordshire University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.262314.
Full textEl, Malki Karim. "A novel approach to high quality voice using echo cancellation and silence detection." Thesis, University of Sheffield, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.286579.
Full textHartley, David Andrew. "Image correlation using digital signal processors." Thesis, Liverpool John Moores University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.304465.
Full textJalalinajafabadi, Farideh. "Computerised GRBAS assessement of voice quality." Thesis, University of Manchester, 2016. https://www.research.manchester.ac.uk/portal/en/theses/computerised-grbas-assessement-of-voice-quality(7efd3263-b109-4137-87cf-b9559c61730b).html.
Full textDeVilliers, Edward Michael. "Implementing voice recognition and natural language processing in the NPSNET networked virtual environment." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1996. http://handle.dtic.mil/100.2/ADA320340.
Full textThesis advisor(s): Nelson D. Ludlow, John S. Falby. "September 1996." Includes bibliographical references (p. 171-175). Also available online.
Calitz, Wietsche Roets. "Independent formant and pitch control applied to singing voice." Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16267.
Full textENGLISH ABSTRACT: A singing voice can be manipulated artificially by means of a digital computer for the purposes of creating new melodies or to correct existing ones. When the fundamental frequency of an audio signal that represents a human voice is changed by simple algorithms, the formants of the voice tend to move to new frequency locations, making it sound unnatural. The main purpose is to design a technique by which the pitch and formants of a singing voice can be controlled independently.
AFRIKAANSE OPSOMMING: Onafhanklike formant- en toonhoogte beheer toegepas op ’n sangstem: ’n Sangstem kan deur ’n digitale rekenaar gemanipuleer word om nuwe melodie¨e te skep, of om bestaandes te verbeter. Wanneer die fundamentele frekwensie van ’n klanksein (wat ’n menslike stem voorstel) deur ’n eenvoudige algoritme verander word, skuif die oorspronklike formante na nuwe frekwensie gebiede. Dit veroorsaak dat die resultaat onnatuurlik klink. Die hoof oogmerk is om ’n tegniek te ontwerp wat die toonhoogte en die formante van ’n sangstem apart kan beheer.
Yiu, Siu Fung. "Recursive state-space approach to Ground Probing Radar signal processing." Thesis, Lancaster University, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.278379.
Full textZhu, Yong. "Digital signal and image processing techniques for ultrasonic nondestructive evaluation." Thesis, City University London, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.336431.
Full textAnsourian, Megeurditch N. "Digital signal processing for the analysis of fetal breathing movements." Thesis, University of Edinburgh, 1989. http://hdl.handle.net/1842/13595.
Full textChan, Arthur Yu Chung. "Robust speech recognition against unknown short-time noise /." View Abstract or Full-Text, 2002. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202002%20CHAN.
Full textIncludes bibliographical references (leaves 119-125). Also available in electronic version. Access restricted to campus users.
Smith, Quentin D. "Multichannel Digital Signal Processor Based Red/Black Keyset." International Foundation for Telemetering, 1992. http://hdl.handle.net/10150/611927.
Full textThis paper addresses a method to provide both secure and non-secure voice communications to a DS-1 network from a common keyset. In order to comply with both the electrical isolation requirements and the operational security issues regarding voice communications, an all-digital approach to the keyset was developed based upon the AD2101 DSP. Protocols that are handled by the keyset include: Multiple PTT modes, hot mike, telephone access, priority override, direct access, indirect access, paging, and monitor only. Special features that are addressed include: independent channel by channel assignment of access protocols, headset assignment, speaker assignment, and PTT assignment. Multiple microprocessors are used to implement the foregoing as well as down-loadable configurations, remote keyset control and monitoring, and composite audio outputs. Partitioning of the digital design provides RED to BLACK channel isolation and RED channel to AC power isolation of greater than 107 dB.
Nylén, Helmer. "Detecting Signal Corruptions in Voice Recordings for Speech Therapy." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291429.
Full textNär en patients röst spelas in för analys i talterapi kan inspelningskvaliteten påverkas av olika signalproblem, till exempel bakgrundsljud eller klippning. Utrustningen och expertisen som behövs för att upptäcka små störningar finns dock inte alltid tillgänglig på mindre kliniker. Därför undersöker denna studie olika maskininlärningsalgoritmer för att automatiskt kunna upptäcka utvalda problem i talinspelningar, bland andra infraljud och slumpmässig utsläckning av signalen. Fem algoritmer analyseras: stödvektormaskin, Convolutional Neural Network, Long Short-term Memory (LSTM), Gaussian mixture model-baserad dold Markovmodell och generatorbaserad dold Markovmodell. Ett verktyg för att skapa datamängder med försämrade inspelningar utvecklas för att kunna testa algoritmerna. Vi undersöker separat fallen där inspelningarna tillåts ha en eller flera problem samtidigt, och använder framförallt en slags kepstralkoefficienter, MFCC:er, som särdrag. För varje typ av problem undersöker vi också sätt att förbättra noggrannheten, till exempel genom att filtrera bort irrelevanta delar av signalen med hjälp av en röstupptäckare, ändra särdragsparametrarna, eller genom att använda en ensemble av klassificerare. Experimenten visar att maskininlärning är ett rimligt tillvägagångssätt för detta problem då den balanserade träffsäkerheten överskrider 75%för samtliga testade störningar. Den delen av studien som fokuserade på enproblemsinspelningar gav inga resultat som tydde på att en algoritm var klart bättre än de andra, men i flerproblemsfallet överträffade LSTM:en generellt övriga algoritmer. Värt att notera är att den nådde över 95 % balanserad träffsäkerhet på både vitt brus och infraljud. Eftersom algoritmerna enbart tränats på engelskspråkiga, talade meningar så har detta verktyg i nuläget begränsad praktisk användbarhet. Däremot är det lätt att utöka dessa experiment med andra typer av inspelningar, signalproblem, särdrag eller algoritmer.
Vemulapalli, Smita. "Audio-video based handwritten mathematical content recognition." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45958.
Full textLoscos, Àlex. "Spectral processing of the singing voice." Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7542.
Full textLa tesi presenta nous procediments i formulacions per a la descripció i transformació d'aquells atributs específicament vocals de la veu cantada. La tesis inclou, entre d'altres, algorismes per l'anàlisi i la generació de desordres vocals como ara rugositat, ronquera, o veu aspirada, detecció i modificació de la freqüència fonamental de la veu, detecció de nasalitat, conversió de veu cantada a melodia, detecció de cops de veu, mutació de veu cantada, i transformació de veu a instrument; exemplificant alguns d'aquests algorismes en aplicacions concretes.
Esta tesis doctoral versa sobre el procesado digital de la voz cantada, más concretamente, sobre el análisis, transformación y síntesis de este tipo de voz basándose e dominio espectral, con especial énfasis en aquellas técnicas relevantes para el desarrollo de aplicaciones musicales.
La tesis presenta nuevos procedimientos y formulaciones para la descripción y transformación de aquellos atributos específicamente vocales de la voz cantada. La tesis incluye, entre otros, algoritmos para el análisis y la generación de desórdenes vocales como rugosidad, ronquera, o voz aspirada, detección y modificación de la frecuencia fundamental de la voz, detección de nasalidad, conversión de voz cantada a melodía, detección de los golpes de voz, mutación de voz cantada, y transformación de voz a instrumento; ejemplificando algunos de éstos en aplicaciones concretas.
This dissertation is centered on the digital processing of the singing voice, more concretely on the analysis, transformation and synthesis of this type of voice in the spectral domain, with special emphasis on those techniques relevant for music applications.
The thesis presents new formulations and procedures for both describing and transforming those attributes of the singing voice that can be regarded as voice specific. The thesis includes, among others, algorithms for rough and growl analysis and transformation, breathiness estimation and emulation, pitch detection and modification, nasality identification, voice to melody conversion, voice beat onset detection, singing voice morphing, and voice to instrument transformation; being some of them exemplified with concrete applications.
Barton, Antony James. "Signal processing techniques for data reduction and event recognition in cough counting." Thesis, University of Manchester, 2013. https://www.research.manchester.ac.uk/portal/en/theses/signal-processing-techniques-for-data-reduction-and-event-recognition-in-cough-counting(dc73495a-35b0-4d17-a6f8-cc2f88008659).html.
Full textSchelinski, Stefanie. "Mechanisms of Voice Processing: Evidence from Autism Spectrum Disorder." Doctoral thesis, Humboldt-Universität zu Berlin, 2018. http://dx.doi.org/10.18452/19091.
Full textThe correct perception of information carried by the voice is a key requirement for successful human communication. Hearing another person’s voice provides information about who is speaking (voice identity), what is said (vocal speech) and the emotional state of a person (vocal emotion). Autism spectrum disorder (ASD) is associated with impaired voice identity and vocal emotion perception while the perception of vocal speech is relatively intact. However, the underlying mechanisms of these voice perception impairments are unclear. For example, it is unclear at which processing stage voice perception difficulties occur, i.e. whether they are rather of apperceptive or associative nature or whether impairments in voice identity processing in ASD are associated with dysfunction of voice-sensitive brain regions. Within the scope of my dissertation we systematically investigated voice perception and its impairments in adults with high-functioning ASD and typically developing matched controls (matched pairwise on age, gender, and intellectual abilities). In the first two studies we characterised the behavioural and neuronal profile of voice identity recognition in ASD using two functional magnetic resonance imaging (fMRI) experiments and a comprehensive behavioural test battery. In the third study we investigated the underlying behavioural mechanisms of impaired vocal emotion recognition in ASD. Our results inform models on human communication and advance our understanding for basic mechanisms which might contribute to core symptoms in ASD, such as difficulties in communication. For example, our results converge to support the view that in ASD difficulties in perceiving and integrating lower-level sensory features, i.e. acoustic characteristics of the voice might critically contribute to difficulties in higher-level social cognition, i.e. voice identity and vocal emotion recognition.
Sukittanon, Somsak. "Modulation scale analysis : theory and application for nonstationary signal classification /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/5875.
Full textBakheet, Mohammed. "Improving Speech Recognition for Arabic language Using Low Amounts of Labeled Data." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-176437.
Full textKwok, Kwok Sai. "Algorithms for image segmentation and their applications to video signal processing." Thesis, Imperial College London, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.244298.
Full textBirkenes, Øystein. "A Framework for Speech Recognition using Logistic Regression." Doctoral thesis, Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-1599.
Full textAlthough discriminative approaches like the support vector machine or logistic regression have had great success in many pattern recognition application, they have only achieved limited success in speech recognition. Two of the difficulties often encountered include 1) speech signals typically have variable lengths, and 2) speech recognition is a sequence labeling problem, where each spoken utterance corresponds to a sequence of words or phones.
In this thesis, we present a framework for automatic speech recognition using logistic regression. We solve the difficulty of variable length speech signals by including a mapping in the logistic regression framework that transforms each speech signal into a fixed-dimensional vector. The mapping is defined either explicitly with a set of hidden Markov models (HMMs) for the use in penalized logistic regression (PLR), or implicitly through a sequence kernel to be used with kernel logistic regression (KLR). Unlike previous work that has used HMMs in combination with a discriminative classification approach, we jointly optimize the logistic regression parameters and the HMM parameters using a penalized likelihood criterion.
Experiments show that joint optimization improves the recognition accuracy significantly. The sequence kernel we present is motivated by the dynamic time warping (DTW) distance between two feature vector sequences. Instead of considering only the optimal alignment path, we sum up the contributions from all alignment paths. Preliminary experiments with the sequence kernel show promising results.
A two-step approach is used for handling the sequence labeling problem. In the first step, a set of HMMs is used to generate an N-best list of sentence hypotheses for a spoken utterance. In the second step, these sentence hypotheses are rescored using logistic regression on the segments in the N-best list. A garbage class is introduced in the logistic regression framework in order to get reliable probability estimates for the segments in the N-best lists. We present results on both a connected digit recognition task and a continuous phone recognition task.
Faubel, Friedrich [Verfasser], and Dietrich [Akademischer Betreuer] Klakow. "Statistical signal processing techniques for robust speech recognition / Friedrich Faubel. Betreuer: Dietrich Klakow." Saarbrücken : Saarländische Universitäts- und Landesbibliothek, 2016. http://d-nb.info/1090875703/34.
Full textMeyer, Georg. "Models of neurons in the ventral cochlear nucleus : signal processing and speech recognition." Thesis, Keele University, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.334715.
Full textGooch, Richard M. "Machine learning techniques for signal processing, pattern recognition and knowledge extraction from examples." Thesis, University of Bristol, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294898.
Full text健紘, 大田, and Kenko Ota. "Studies in signal processing for robust speech recognition in noisy and reverberant environments." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB10268908/?lang=0, 2008. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB10268908/?lang=0.
Full textChai, Xiaoyong. "Sensor-based multiple-goal recognition /." View abstract or full-text, 2005. http://library.ust.hk/cgi/db/thesis.pl?COMP%202005%20CHAI.
Full text