Rozprawy doktorskie na temat „Automatic speech recognition – Statistical methods”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 36 najlepszych rozpraw doktorskich naukowych na temat „Automatic speech recognition – Statistical methods”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Wu, Jian, i 武健. "Discriminative speaker adaptation and environmental robustness in automatic speech recognition". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B31246138.
Pełny tekst źródła黃伯光 i Pak-kwong Wong. "Statistical language models for Chinese recognition: speech and character". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31239456.
Pełny tekst źródłaChan, Oscar. "Prosodic features for a maximum entropy language model". University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0244.
Pełny tekst źródłaFu, Qiang. "A generalization of the minimum classification error (MCE) training method for speech recognition and detection". Diss., Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22705.
Pełny tekst źródłaSeward, Alexander. "Efficient Methods for Automatic Speech Recognition". Doctoral thesis, KTH, Tal, musik och hörsel, 2003. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-3675.
Pełny tekst źródłaQC 20100811
Clarkson, P. R. "Adaptation of statistical language models for automatic speech recognition". Thesis, University of Cambridge, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.597745.
Pełny tekst źródłaWei, Yi. "Statistical methods on automatic aircraft recognition in aerial images". Thesis, University of Strathclyde, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.248947.
Pełny tekst źródłaWong, Pak-kwong. "Statistical language models for Chinese recognition : speech and character /". Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20158725.
Pełny tekst źródłaMcGreevy, Michael. "Statistical language modelling for large vocabulary speech recognition". Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/16444/1/Michael_McGreevy_Thesis.pdf.
Pełny tekst źródłaMcGreevy, Michael. "Statistical language modelling for large vocabulary speech recognition". Queensland University of Technology, 2006. http://eprints.qut.edu.au/16444/.
Pełny tekst źródłaDoulaty, Bashkand Mortaza. "Methods for addressing data diversity in automatic speech recognition". Thesis, University of Sheffield, 2017. http://etheses.whiterose.ac.uk/17096/.
Pełny tekst źródłaGayvert, Robert T. "A statistical approach to formant tracking /". Online version of thesis, 1988. http://hdl.handle.net/1850/10499.
Pełny tekst źródłaSalvi, Giampiero. "Mining Speech Sounds : Machine Learning Methods for Automatic Speech Recognition and Analysis". Doctoral thesis, Stockholm : KTH School of Computer Science and Comunication, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4111.
Pełny tekst źródłaWhittaker, Edward William Daniel. "Statistical language modelling for automatic speech recognition of Russian and English". Thesis, University of Cambridge, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621936.
Pełny tekst źródłaSingh-Miller, Natasha 1981. "Neighborhood analysis methods in acoustic modeling for automatic speech recognition". Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62450.
Pełny tekst źródłaCataloged from PDF version of thesis.
Includes bibliographical references (p. 121-134).
This thesis investigates the problem of using nearest-neighbor based non-parametric methods for performing multi-class class-conditional probability estimation. The methods developed are applied to the problem of acoustic modeling for speech recognition. Neighborhood components analysis (NCA) (Goldberger et al. [2005]) serves as the departure point for this study. NCA is a non-parametric method that can be seen as providing two things: (1) low-dimensional linear projections of the feature space that allow nearest-neighbor algorithms to perform well, and (2) nearest-neighbor based class-conditional probability estimates. First, NCA is used to perform dimensionality reduction on acoustic vectors, a commonly addressed problem in speech recognition. NCA is shown to perform competitively with another commonly employed dimensionality reduction technique in speech known as heteroscedastic linear discriminant analysis (HLDA) (Kumar [1997]). Second, a nearest neighbor-based model related to NCA is created to provide a class-conditional estimate that is sensitive to the possible underlying relationship between the acoustic-phonetic labels. An embedding of the labels is learned that can be used to estimate the similarity or confusability between labels. This embedding is related to the concept of error-correcting output codes (ECOC) and therefore the proposed model is referred to as NCA-ECOC. The estimates provided by this method along with nearest neighbor information is shown to provide improvements in speech recognition performance (2.5% relative reduction in word error rate). Third, a model for calculating class-conditional probability estimates is proposed that generalizes GMM, NCA, and kernel density approaches. This model, called locally-adaptive neighborhood components analysis, LA-NCA, learns different low-dimensional projections for different parts of the space. The models exploits the fact that in different parts of the space different directions may be important for discrimination between the classes. This model is computationally intensive and prone to over-fitting, so methods for sub-selecting neighbors used for providing the classconditional estimates are explored. The estimates provided by LA-NCA are shown to give significant gains in speech recognition performance (7-8% relative reduction in word error rate) as well as phonetic classification.
by Natasha Singh-Miller.
Ph.D.
Delmege, James W. "CLASS : a study of methods for coarse phonetic classification /". Online version of thesis, 1988. http://hdl.handle.net/1850/10449.
Pełny tekst źródłaDambreville, Samuel. "Statistical and geometric methods for shape-driven segmentation and tracking". Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22707.
Pełny tekst źródłaCommittee Chair: Allen Tannenbaum; Committee Member: Anthony Yezzi; Committee Member: Marc Niethammer; Committee Member: Patricio Vela; Committee Member: Yucel Altunbasak.
Ravindran, Sourabh. "Physiologically Motivated Methods For Audio Pattern Classification". Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14066.
Pełny tekst źródłaMelin, Håkan. "Automatic speaker verification on site and by telephone: methods, applications and assessment". Doctoral thesis, KTH, Tal, musik och hörsel, TMH, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4242.
Pełny tekst źródłaQC 20100910
Yaman, Sibel. "A multi-objective programming perspective to statistical learning problems". Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26470.
Pełny tekst źródłaCommittee Chair: Chin-Hui Lee; Committee Member: Anthony Yezzi; Committee Member: Evans Harrell; Committee Member: Fred Juang; Committee Member: James H. McClellan. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Berry, Jeffrey James. "Machine Learning Methods for Articulatory Data". Diss., The University of Arizona, 2012. http://hdl.handle.net/10150/223348.
Pełny tekst źródłaKhodai-Joopari, Mehrdad Information Technology & Electrical Engineering Australian Defence Force Academy UNSW. "Forensic speaker analysis and identification by computer : a Bayesian approach anchored in the cepstral domain". Awarded by:University of New South Wales - Australian Defence Force Academy. School of Information Technology and Electrical Engineering, 2007. http://handle.unsw.edu.au/1959.4/38715.
Pełny tekst źródłaYoshino, Koichiro. "Spoken Dialogue System for Information Navigation based on Statistical Learning of Semantic and Dialogue Structure". 京都大学 (Kyoto University), 2014. http://hdl.handle.net/2433/192214.
Pełny tekst źródłaZamora, Martínez Francisco Julián. "Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática". Doctoral thesis, Universitat Politècnica de València, 2012. http://hdl.handle.net/10251/18066.
Pełny tekst źródłaZamora Martínez, FJ. (2012). Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18066
Palancia
Le, Hai Son. "Continuous space models with neural networks in natural language processing". Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00776704.
Pełny tekst źródłaBezůšek, Marek. "Objektivizace Testu 3F - dysartrický profil pomocí akustické analýzy". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-442568.
Pełny tekst źródłaSodré, Bruno Ribeiro. "Reconhecimento de padrões aplicados à identificação de patologias de laringe". Universidade Tecnológica Federal do Paraná, 2016. http://repositorio.utfpr.edu.br/jspui/handle/1/2013.
Pełny tekst źródłaDiseases that affect the larynx have been considerably increased in recent years due to the condition of nowadays society where there have been unhealthy habits like smoking, alcohol and tobacco and an increased vocal abuse, perhaps due to the increase in noise pollution, especially in large urban cities. Currently the exam performed by per-oral endoscopy (aimed to identify laryngeal pathologies) have been videolaryngoscopy and videostroboscopy, both invasive and often uncomfortable to the patient. Seeking to improve the comfort of the patients who need to undergo through these procedures, this study aims to identify acoustic patterns that can be applied to the identification of laryngeal pathologies in order to creating a new non-invasive larynx assessment method. Here two different configurations of neural networks were used. The first one was generated from 524.287 combinations of 19 acoustic measurements to classify voices into normal or from a diseased larynx, and achieved an max accuracy of 99.5% (96.99±2.08%). Using 3 and 6 rotated measurements (obtained from the principal components analysis method), the accuracy was 93.98±0.24% and 94.07±0.29%, respectively. With 6 rotated measurements from a previouly standardization of the 19 acoustic measurements, the accuracy was 97.88±1.53%. The second one, to classify 23 different voice types (including normal voices), showed better accuracy in identifying hiperfunctioned larynxes and normal voices, with 58.23±18.98% and 52.15±18.31%, respectively. The worst accuracy was obtained from vocal fatigues, with 0.57±1.99%. Excluding normal voices of the analysis, hyperfunctioned voices remained the most easily identifiable (with an accuracy of 57.3±19.55%) followed by anterior-posterior constriction (with 18.14±11.45%), and the most difficult condition to be identified remained vocal fatigue (with 0.7±2.14%). Re-sampling the neural networks input vectors, it was obtained accuracies of 25.88±10.15%, 21.47±7.58%, and 18.44±6.57% from such networks with 20, 30, and 40 hidden layer neurons, respectively. For comparison, classification using support vector machine produced an accuracy of 67±6.2%. Thus, it was shown that the acoustic measurements need to be improved to achieve better results of classification among the studied laryngeal pathologies. Even so, it was found that is possible to discriminate normal from dysphonic speakers.
Goecke, Roland. "A stereo vision lip tracking algorithm and subsequent statistical analyses of the audio-video correlation in Australian English". Phd thesis, 2004. http://hdl.handle.net/1885/149999.
Pełny tekst źródłaGhane, Parisa. "Silent speech recognition in EEG-based brain computer interface". Thesis, 2015. http://hdl.handle.net/1805/9886.
Pełny tekst źródłaA Brain Computer Interface (BCI) is a hardware and software system that establishes direct communication between human brain and the environment. In a BCI system, brain messages pass through wires and external computers instead of the normal pathway of nerves and muscles. General work ow in all BCIs is to measure brain activities, process and then convert them into an output readable for a computer. The measurement of electrical activities in different parts of the brain is called electroencephalography (EEG). There are lots of sensor technologies with different number of electrodes to record brain activities along the scalp. Each of these electrodes captures a weighted sum of activities of all neurons in the area around that electrode. In order to establish a BCI system, it is needed to set a bunch of electrodes on scalp, and a tool to send the signals to a computer for training a system that can find the important information, extract them from the raw signal, and use them to recognize the user's intention. After all, a control signal should be generated based on the application. This thesis describes the step by step training and testing a BCI system that can be used for a person who has lost speaking skills through an accident or surgery, but still has healthy brain tissues. The goal is to establish an algorithm, which recognizes different vowels from EEG signals. It considers a bandpass filter to remove signals' noise and artifacts, periodogram for feature extraction, and Support Vector Machine (SVM) for classification.
May, Avner. "Kernel Approximation Methods for Speech Recognition". Thesis, 2018. https://doi.org/10.7916/D8D80P9T.
Pełny tekst źródłaChandrasekaran, Aravind. "Efficient methods for rapid UBM training (RUT) for robust speaker verification /". 2008. http://proquest.umi.com/pqdweb?did=1650508671&sid=2&Fmt=2&clientId=10361&RQT=309&VName=PQD.
Pełny tekst źródłaWang, Qi. "Nonlinear noise compensation in feature domain for speech recognition with numerical methods /". 2004. http://wwwlib.umi.com/cr/yorku/fullcit?pMQ99403.
Pełny tekst źródłaTypescript. Includes bibliographical references (leaves 60-65). Also available on the Internet. MODE OF ACCESS via web browser by entering the following URL: http://wwwlib.umi.com/cr/yorku/fullcit?pMQ99403
"The statistical evaluation of minutiae-based automatic fingerprint verification systems". Thesis, 2006. http://library.cuhk.edu.hk/record=b6074180.
Pełny tekst źródłaChen, Jiansheng.
"November 2006."
Adviser: Yiu-Sang Moon.
Source: Dissertation Abstracts International, Volume: 68-08, Section: B, page: 5343.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2006.
Includes bibliographical references (p. 110-122).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.
"Robust methods for Chinese spoken document retrieval". 2003. http://library.cuhk.edu.hk/record=b5896122.
Pełny tekst źródłaThesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 158-169).
Abstracts in English and Chinese.
Abstract --- p.2
Acknowledgements --- p.6
Chapter 1 --- Introduction --- p.23
Chapter 1.1 --- Spoken Document Retrieval --- p.24
Chapter 1.2 --- The Chinese Language and Chinese Spoken Documents --- p.28
Chapter 1.3 --- Motivation --- p.33
Chapter 1.3.1 --- Assisting the User in Query Formation --- p.34
Chapter 1.4 --- Goals --- p.34
Chapter 1.5 --- Thesis Organization --- p.35
Chapter 2 --- Multimedia Repository --- p.37
Chapter 2.1 --- The Cantonese Corpus --- p.37
Chapter 2.1.1 --- The RealMedia´ёØCollection --- p.39
Chapter 2.1.2 --- The MPEG-1 Collection --- p.40
Chapter 2.2 --- The Multimedia Markup Language --- p.42
Chapter 2.3 --- Chapter Summary --- p.44
Chapter 3 --- Monolingual Retrieval Task --- p.45
Chapter 3.1 --- Properties of Cantonese Video Archive --- p.45
Chapter 3.2 --- Automatic Speech Transcription --- p.46
Chapter 3.2.1 --- Transcription of Cantonese Spoken Documents --- p.47
Chapter 3.2.2 --- Indexing Units --- p.48
Chapter 3.3 --- Known-Item Retrieval Task --- p.49
Chapter 3.3.1 --- Evaluation ´ؤ Average Inverse Rank --- p.50
Chapter 3.4 --- Retrieval Model --- p.51
Chapter 3.5 --- Experimental Results --- p.52
Chapter 3.6 --- Chapter Summary --- p.53
Chapter 4 --- The Use of Audio and Video Information for Monolingual Spoken Document Retrieval --- p.55
Chapter 4.1 --- Video-based Segmentation --- p.56
Chapter 4.1.1 --- Metric Computation --- p.57
Chapter 4.1.2 --- Shot Boundary Detection --- p.58
Chapter 4.1.3 --- Shot Transition Detection --- p.67
Chapter 4.2 --- Audio-based Segmentation --- p.69
Chapter 4.2.1 --- Gaussian Mixture Models --- p.69
Chapter 4.2.2 --- Transition Detection --- p.70
Chapter 4.3 --- Performance Evaluation --- p.72
Chapter 4.3.1 --- Automatic Story Segmentation --- p.72
Chapter 4.3.2 --- Video-based Segmentation Algorithm --- p.73
Chapter 4.3.3 --- Audio-based Segmentation Algorithm --- p.74
Chapter 4.4 --- Fusion of Video- and Audio-based Segmentation --- p.75
Chapter 4.5 --- Retrieval Performance --- p.76
Chapter 4.6 --- Chapter Summary --- p.78
Chapter 5 --- Document Expansion for Monolingual Spoken Document Retrieval --- p.79
Chapter 5.1 --- Document Expansion using Selected Field Speech Segments --- p.81
Chapter 5.1.1 --- Annotations from MmML --- p.81
Chapter 5.1.2 --- Selection of Cantonese Field Speech --- p.83
Chapter 5.1.3 --- Re-weighting Different Retrieval Units --- p.84
Chapter 5.1.4 --- Retrieval Performance with Document Expansion using Selected Field Speech --- p.84
Chapter 5.2 --- Document Expansion using N-best Recognition Hypotheses --- p.87
Chapter 5.2.1 --- Re-weighting Different Retrieval Units --- p.90
Chapter 5.2.2 --- Retrieval Performance with Document Expansion using TV-best Recognition Hypotheses --- p.90
Chapter 5.3 --- Document Expansion using Selected Field Speech and N-best Recognition Hypotheses --- p.92
Chapter 5.3.1 --- Re-weighting Different Retrieval Units --- p.92
Chapter 5.3.2 --- Retrieval Performance with Different Indexed Units --- p.93
Chapter 5.4 --- Chapter Summary --- p.94
Chapter 6 --- Query Expansion for Cross-language Spoken Document Retrieval --- p.97
Chapter 6.1 --- The TDT-2 Corpus --- p.99
Chapter 6.1.1 --- English Textual Queries --- p.100
Chapter 6.1.2 --- Mandarin Spoken Documents --- p.101
Chapter 6.2 --- Query Processing --- p.101
Chapter 6.2.1 --- Query Weighting --- p.101
Chapter 6.2.2 --- Bigram Formation --- p.102
Chapter 6.3 --- Cross-language Retrieval Task --- p.103
Chapter 6.3.1 --- Indexing Units --- p.104
Chapter 6.3.2 --- Retrieval Model --- p.104
Chapter 6.3.3 --- Performance Measure --- p.105
Chapter 6.4 --- Relevance Feedback --- p.106
Chapter 6.4.1 --- Pseudo-Relevance Feedback --- p.107
Chapter 6.5 --- Retrieval Performance --- p.107
Chapter 6.6 --- Chapter Summary --- p.109
Chapter 7 --- Conclusions and Future Work --- p.111
Chapter 7.1 --- Future Work --- p.114
Chapter A --- XML Schema for Multimedia Markup Language --- p.117
Chapter B --- Example of Multimedia Markup Language --- p.128
Chapter C --- Significance Tests --- p.135
Chapter C.1 --- Selection of Cantonese Field Speech Segments --- p.135
Chapter C.2 --- Fusion of Video- and Audio-based Segmentation --- p.137
Chapter C.3 --- Document Expansion with Reporter Speech --- p.137
Chapter C.4 --- Document Expansion with N-best Recognition Hypotheses --- p.140
Chapter C.5 --- Document Expansion with Reporter Speech and N-best Recognition Hypotheses --- p.140
Chapter C.6 --- Query Expansion with Pseudo Relevance Feedback --- p.142
Chapter D --- Topic Descriptions of TDT-2 Corpus --- p.145
Chapter E --- Speech Recognition Output from Dragon in CLSDR Task --- p.148
Chapter F --- Parameters Estimation --- p.152
Chapter F.1 --- "Estimating the Number of Relevant Documents, Nr" --- p.152
Chapter F.2 --- "Estimating the Number of Terms Added from Relevant Docu- ments, Nrt , to Original Query" --- p.153
Chapter F.3 --- "Estimating the Number of Non-relevant Documents, Nn , from the Bottom-scoring Retrieval List" --- p.153
Chapter F.4 --- "Estimating the Number of Terms, Selected from Non-relevant Documents (Nnt), to be Removed from Original Query" --- p.154
Chapter G --- Abbreviations --- p.155
Bibliography --- p.158
Τσιλφίδης, Αλέξανδρος. "Signal processing methods for enhancing speech and music signals in reverberant environments". Thesis, 2011. http://nemertes.lis.upatras.gr/jspui/handle/10889/4710.
Pełny tekst źródłaΗ διατριβή αποτελείται από εννιά κεφάλαια, δύο παραρτήματα καθώς και την σχετική βιβλιογραφία. Είναι γραμμένη στα αγγλικά ενώ περιλαμβάνει και ελληνική περίληψη. Στην παρούσα διατριβή, αναπτύσσονται μεθόδοι ψηφιακής επεξεργασίας σήματος για την αφαίρεση αντήχησης από σήματα ομιλίας και μουσικής. Οι προτεινόμενοι αλγόριθμοι καλύπτουν ένα μεγάλο εύρος εφαρμογών αρχικά εστιάζοντας στην τυφλή (“blind”) αφαίρεση για μονοκαναλικά σήματα. Στοχεύοντας σε πιο ειδικά σενάρια χρήσης προτείνονται επίσης αμφιωτικοί αλγόριθμοι αλλά και τεχνικές που προϋποθέτουν την πραγματοποίηση κάποιας ακουστικής μέτρησης. Οι αλγόριθμοι επικεντρώνουν στην αφαίρεση της καθυστερημένης αντήχησης που είναι ιδιαίτερα επιβλαβής για την ποιότητα σημάτων ομιλίας και μουσικής και μειώνει την καταληπτότητα της ομιλίας. Επίσης, επειδή αλλοιώνει σημαντικά τα στατιστικά των σημάτων, μειώνει σημαντικά την απόδοση συστημάτων αυτόματης αναγνώρισης ομιλίας καθώς και άλλων αλγορίθμων ψηφιακής επεξεργασίας ομιλίας και μουσικής. Έτσι οι προτεινόμενοι αλγόριθμοι μπορούν είτε να χρησιμοποιηθούν σαν αυτόνομες τεχνικές βελτίωσης της ποιότητας των ακουστικών σημάτων είτε να ενσωματωθούν σαν στάδια προ-επεξεργασίας σε άλλες εφαρμογές. Η κύρια μέθοδος αφαίρεσης αντήχησης που προτείνεται στην διατριβή, είναι βασισμένη στην αντιληπτική μοντελοποίηση και χρησιμοποιεί ένα σύγχρονο ψυχοακουστικό μοντέλο. Με βάση αυτό το μοντέλο γίνεται μία εκτίμηση των σημείων του σήματος που η αντήχηση είναι ακουστή δηλαδή που δεν επικαλύπτεται από το ισχυρότερο σε ένταση καθαρό από αντήχηση σήμα. Η συγκεκριμένη εκτίμηση οδηγεί σε μία επιλεκτική επεξεργασία σήματος όπου η αφαίρεση πραγματοποιείται σε αυτά και μόνο τα σημεία, μέσω πρωτότυπων υβριδικών συναρτήσεων κέρδους που βασίζονται σε δείκτες αντικειμενικής και υποκειμενικής αλλοίωσης. Εκτεταμένα αντικειμενικά και υποκειμενικά πειράματα δείχνουν ότι η προτεινόμενη τεχνική δίνει βέλτιστες ποιοτικά ανηχωικές εκτιμήσεις ανεξάρτητα από το μέγεθος του χώρου.
Medeiros, Henrique Rodrigues Barbosa de. "Automatic detection of disfluencies in a corpus of university lectures". Master's thesis, 2014. http://hdl.handle.net/10071/8683.
Pełny tekst źródłaEsta tese aborda a identificação de sequências disfluentes e respetivas regiões estruturais. As experiências aqui descritas baseiam-se em segmentação e informação relativa a prosódia, calculadas a partir de um corpus de aulas universitárias em Português Europeu, contendo cerca de 32 horas de fala e de cerca de 7,7% de disfluências. O conjunto de características utilizadas provou ser discriminatório na identificação das regiões contidas na produção de disfluências. Os melhores resultados dizem respeito à deteção do interregnum, seguida da deteção do ponto de interrupção. Foram testados vários métodos de aprendizagem automática, sendo as Árvores de Decisão e Regressão as que geralmente obtiveram os melhores resultados. O conjunto de características mais informativas para a identificação e distinção de regiões disfluentes abrange rácios de duração de palavras, nível de confiança da palavra atual, rácios envolvendo silêncios e declives de pitch e de energia. Características tais como o número de fones e sílabas por palavra provaram ser mais úteis para a identificação do interregnum, enquanto pitch e energia foram os mais adequados para identificar o ponto de interrupção. Foram também realizadas experiências focando a deteção de pausas preenchidas. Por enquanto, para estas experiências foi utilizado apenas material proveniente de alinhamento forçado, já que o sistema de reconhecimento automático não está bem adaptado a este domínio. Este estudo representa um novo passo no sentido da deteção automática de pausas preenchidas para Português Europeu, utilizando recursos prosódicos. Em trabalho futuro pretende-se estender esse estudo para transcrições automáticas e também abordar outros domínios, explorando conjuntos mais extensos de características linguísticas.