Dissertationen: „Acoustic analysis of speech“

1

John, Jeeva. „Acoustic Analysis of Speech of Persons with Autistic Spectrum Disorders“. Bowling Green State University / OhioLINK, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1206329066.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Nulsen, Susan, und n/a. „Combining acoustic analysis and phonotactic analysis to improve automatic speech recognition“. University of Canberra. Information Sciences & Engineering, 1998. http://erl.canberra.edu.au./public/adt-AUC20060825.131042.

Der volle Inhalt der Quelle

Annotation:

This thesis addresses the problem of automatic speech recognition, specifically, how to transform an acoustic waveform into a string of words or phonemes. A preliminary chapter gives linguistic information potentially useful in automatic speech recognition. This is followed by a description of the Wave Analysis Laboratory (WAL), a rule-based system which detects features in speech and was designed as the acoustic front end of a speech recognition system. Temporal reasoning as used in WAL rules is examined. The use of WAL in recognizing one particular class of speech sounds, the nasal consonants, is described in detail. The remainder of the thesis looks at the statistical analysis of samples of spontaneous speech. An orthographic transcription of a large sample of spontaneous speech is automatically translated into phonemes. Tables of the frequencies of word initial and word final phoneme clusters are constructed to illustrate some of the phonotactic constraints of the language. Statistical data is used to assign phonemes to phonotactic classes. These classes are unlike the acoustic classes, although there is a general distinction between the vowels, the consonants and the word boundary. A way of measuring the phonetic balance of a sample of speech is described. This can be used as a means of ranking potential test samples in terms of how well they represent the language. A phoneme n-gram model is used to measure the entropy of the language. The broad acoustic encoding output from WAL is used with this language model to reconstruct a small test sample. "Branching" a simpler alternative to perplexity is introduced and found to give similar results to perplexity. Finally, the drop in branching is calculated as knowledge of various sets of acoustic classes is considered. In the work described in this thesis the main contributions made to automatic speech recognition and the study of speech are in the development of the Wave Analysis Laboratory and in the analysis of speech from a phonotactic point of view. The phoneme cluster frequencies provide new information on spoken language, as do the phonotactic classes. The measures of phonetic balance and branching provide additional tools for use in the development of speech recognition systems.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Brock, James L. „Acoustic classification using independent component analysis /“. Link to online version, 2006. https://ritdml.rit.edu/dspace/handle/1850/2067.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Singh-Miller, Natasha 1981. „Neighborhood analysis methods in acoustic modeling for automatic speech recognition“. Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62450.

Der volle Inhalt der Quelle

Annotation:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 121-134).
This thesis investigates the problem of using nearest-neighbor based non-parametric methods for performing multi-class class-conditional probability estimation. The methods developed are applied to the problem of acoustic modeling for speech recognition. Neighborhood components analysis (NCA) (Goldberger et al. [2005]) serves as the departure point for this study. NCA is a non-parametric method that can be seen as providing two things: (1) low-dimensional linear projections of the feature space that allow nearest-neighbor algorithms to perform well, and (2) nearest-neighbor based class-conditional probability estimates. First, NCA is used to perform dimensionality reduction on acoustic vectors, a commonly addressed problem in speech recognition. NCA is shown to perform competitively with another commonly employed dimensionality reduction technique in speech known as heteroscedastic linear discriminant analysis (HLDA) (Kumar [1997]). Second, a nearest neighbor-based model related to NCA is created to provide a class-conditional estimate that is sensitive to the possible underlying relationship between the acoustic-phonetic labels. An embedding of the labels is learned that can be used to estimate the similarity or confusability between labels. This embedding is related to the concept of error-correcting output codes (ECOC) and therefore the proposed model is referred to as NCA-ECOC. The estimates provided by this method along with nearest neighbor information is shown to provide improvements in speech recognition performance (2.5% relative reduction in word error rate). Third, a model for calculating class-conditional probability estimates is proposed that generalizes GMM, NCA, and kernel density approaches. This model, called locally-adaptive neighborhood components analysis, LA-NCA, learns different low-dimensional projections for different parts of the space. The models exploits the fact that in different parts of the space different directions may be important for discrimination between the classes. This model is computationally intensive and prone to over-fitting, so methods for sub-selecting neighbors used for providing the classconditional estimates are explored. The estimates provided by LA-NCA are shown to give significant gains in speech recognition performance (7-8% relative reduction in word error rate) as well as phonetic classification.
by Natasha Singh-Miller.
Ph.D.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Williams, A. Lynn. „Phonologic and Acoustic Analyses of Final Consonant Omission“. Digital Commons @ East Tennessee State University, 1998. https://dc.etsu.edu/etsu-works/2008.

Der volle Inhalt der Quelle

Annotation:

Acoustic analyses have recently been brought to bear on the phonological error pattern of final consonant omission. The results from such acoustic analyses have generally supported the correctness of the phonological analyses. The purpose of this report is to present seemingly conflicting results from a generative phonological analysis and an acoustic analysis of one misarticulating child who omitted word-final obstruents. The apparent conflict is resolved in terms of two possible explanations with differing treatment implications.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Lee, Matthew E. „Acoustic Models for the Analysis and Synthesis of the Singing Voice“. Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6859.

Der volle Inhalt der Quelle

Annotation:

Throughout our history, the singing voice has been a fundamental tool for musical expression. While analysis and digital synthesis techniques have been developed for normal speech, few models and techniques have been focused on the singing voice. The central theme of this research is the development of models aimed at the characterization and synthesis of the singing voice. First, a spectral model is presented in which asymmetric generalized Gaussian functions are used to represent the formant structure of a singing voice in a flexible manner. Efficient methods for searching the parameter space are investigated and challenges associated with smooth parameter trajectories are discussed. Next a model for glottal characterization is introduced by first presenting an analysis of the relationship between measurable spectral qualities of the glottal waveform and perceptually relevant time-domain parameters. A mathematical derivation of this relationship is presented and is extended as a method for parameter estimation. These concepts are then used to outline a procedure for modifying glottal textures and qualities in the frequency domain. By combining these models with the Analysis-by-Synthesis/Overlap-Add sinusoidal model, the spectral and glottal models are shown to be capable of characterizing the singing voice according to traits such as level of training and registration. An application is presented in which these parameterizations are used to implement a system for singing voice enhancement. Subjective listening tests were conducted in which listeners showed an overall preference for outputs produced by the proposed enhancement system over both unmodified voices and voices enhanced with competitive methods.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Ng, So-sum. „Acoustic analysis of contour tones produced by Cantonese dysarthric speakers“. Click to view the E-thesis via HKUTO, 2001. http://sunzi.lib.hku.hk/hkuto/record/B36208024.

Der volle Inhalt der Quelle

Annotation:

Thesis (B.Sc)--University of Hong Kong, 2001.
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, May 4, 2001." Also available in print.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Srinivasan, Nandini. „Acoustic Analysis of English Vowels by Young Spanish-English Bilingual Language Learners“. Thesis, The George Washington University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10815722.

Der volle Inhalt der Quelle

Annotation:

Several studies across various languages have shown that monolingual listeners perceive significant differences between the speech of monolinguals and bilinguals. However, these differences may not always affect the phoneme category as identified by the listener or the speaker; differences may often be found between tokens corresponding to unique phonological categories and, as such, be more easily detectable through acoustic analysis. We hypothesized that unshared English vowels produced by young Spanish-English bilinguals would have measurably different formant values and duration than the same vowels produced by young English monolinguals because of Spanish influence on English phonology. We did not find significant differences in formant values between the two groups, but we found that SpanishEnglish bilinguals produced certain vowels with longer duration than English monolinguals. Our findings add to the ever-growing body of literature on bilingual language acquisition and the perception of accentedness.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Odlozinski, Lisa M. „An acoustic analysis of speech rate control procedures in Parkinson's disease“. Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape17/PQDD_0004/MQ30738.pdf.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Cao, Ying Alisa 1979. „Analysis of acoustic cues for identifying consonant /ð/ in continuous speech“. Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87279.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Badenhorst, Jacob Andreas Cornelius. „Data sufficiency analysis for automatic speech recognition / by J.A.C. Badenhorst“. Thesis, North-West University, 2009. http://hdl.handle.net/10394/3994.

Der volle Inhalt der Quelle

Annotation:

The languages spoken in developing countries are diverse and most are currently under-resourced from an automatic speech recognition (ASR) perspective. In South Africa alone, 10 of the 11 official languages belong to this category. Given the potential for future applications of speech-based information systems such as spoken dialog system (SDSs) in these countries, the design of minimal ASR audio corpora is an important research area. Specifically, current ASR systems utilise acoustic models to represent acoustic variability, and effective ASR corpus design aims to optimise the amount of relevant variation within training data while minimising the size of the corpus. Therefore an investigation of the effect that different amounts and types of training data have on these models is needed. With this dissertation specific consideration is given to the data sufficiency principals that apply to the training of acoustic models. The investigation of this task lead to the following main achievements: 1) We define a new stability measurement protocol that provides the capability to view the variability of ASR training data. 2) This protocol allows for the investigation of the effect that various acoustic model complexities and ASR normalisation techniques have on ASR training data requirements. Specific trends with regard to the data requirements for different phone categories and how these are affected by various modelling strategies are observed. 3) Based on this analysis acoustic distances between phones are estimated across language borders, paving the way for further research in cross-language data sharing. Finally the knowledge obtained from these experiments is applied to perform a data sufficiency analysis of a new speech recognition corpus of South African languages: The Lwazi ASR corpus. The findings correlate well with initial phone recognition results and yield insight into the sufficient number of speakers required for the development of minimal telephone ASR corpora.
Thesis (M. Ing. (Computer and Electronical Engineering))--North-West University, Potchefstroom Campus, 2009.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Daly, Nancy Ann. „Acoustic-phonetic and linguistic analyses of spontaneous speech : implications for speech understanding“. Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/12009.

Der volle Inhalt der Quelle

Annotation:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (leaves 142-149).
by Nancy Ann Daly.
Ph.D.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Dempster, Gavin John. „A large-scale analysis of the acoustic-phonetic markers of speaker sex“. Thesis, University of Sheffield, 1996. http://etheses.whiterose.ac.uk/10188/.

Der volle Inhalt der Quelle

Annotation:

The research for this thesis lies within the field of speaker characterisation through the acoustic-phonetic analysis of speech. The thesis consists of two parts: 1. An investigation of the acoustic-phonetic differences between the speech of women and men; 2. An examination of the practicalities of automating the investigation to analyse a large speech database. The acoustic-phonetic markers of speaker sex examined here are the fundamental frequency, the formant frequencies, and the relative amplitude of the first harmonic. The aims of the investigation were, firstly, to establish to what extent these markers differentiate between the sexes, and secondly, to examine the extent of between- and within-speaker deviation from the female and male norms, or average values for each sex. These points were investigated by an automated acoustic-phonetic analysis of the TIMIT database, involving a data set of almost 16,000 segments of speech. An automated method was developed to enable the signal processing and statistical analysis of a data set of this size. The problems to be encountered in the analysis of a highly variable data source (i.e. the acoustic speech waveform) are addressed.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Higa, Rodrigo Hitoshi. „Influence of different orthodontic upper retainers in speech: analysis by laypersons and acoustic analysis“. Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/25/25144/tde-02102018-221945/.

Der volle Inhalt der Quelle

Annotation:

Introduction: The aim of this study was to evaluate the influence of different upper retainers in speech, through Perceptual Auditory Analysis by the laypersons and acoustic analysis. Methods: Eighteen volunteers were selected to use four types of upper retainers: conventional Wrap-Around (CWA), modified horseshoe Wrap-Around (HWA), modified anterior hole Wrap-Around (AHWA) and vacuum-formed (VF). They were used for 21 days each, with intervals of 7 days without use among them. Speech evaluation was performed in vocal excerpts recordings made before installation of the retainers (T0), immediately after the installation of each retainer (T1), and 21 days after the installation (T2). The Perceptual Auditory Analysis of laypersons was performed by means of the visual analogue scale of 100 mm, while the acoustic analysis consisted of the mean diadochokinesia (DDK) rate evaluation, as well as the formant frequencies F1 and F2 of the fricative consonants. One-way ANOVA and two-way ANOVA were used. Results: In the Perceptual Auditory Analysis of laypersons, there was a worsening in the values of T0 to T1 in all the retainers, but only for CWA and VF the values were statistically lower. In T2 the values increased, but for the VF the value still remained statistically lower than T0 while for the AHWA the difference of T0 for T2 was practically null. There were no changes in DDK values. For the formant frequencies, in general way there was a difference from T0 to T1 and a little difference from T0 to T2, whereas in the comparison among the devices the CWA presented greater changes in the F1 formants of some consonants, whereas AHWA presented lower values, with the others devices showing intermediate values. Conclusions: In both types of analysis (subjective and objective), there was a change in speech after the installation of each retainer, with an improvement after 21 days of use. The laypersons considered larger speech disorders involving VF, and smaller ones involving AHWA. For the acoustic analysis, the changes were greater for CWA, whereas for AHWA there were lower changes.
Introdução: O objetivo deste estudo foi avaliar a influência de diferentes contenções superiores na fala, através de análise perceptiva auditiva por leigos e análise acústica. Métodos: Dezoito voluntários foram selecionados para utilizar quatro tipos de contenções superiores, sendo elas: placa Wrap-Around convencional (WAC), Wrap- Around modificada em formato de ferradura (WAF) Wrap-Around modificada com orifício anterior (WAO) e contenção termoplástica transparente (CTT). Elas foram usadas por 21 dias cada, com intervalos de 7 dias sem utilização entre elas. A avaliação da fala foi realizada em gravações de trechos vocais realizadas antes da instalação das contenções (T0), imediatamente após a instalação de cada contenção (T1), assim como após 21 dias de uso destas (T2). A análise perceptiva auditiva dos leigos foi realizada através da escala visual analógica de 100 mm, enquanto a análise acústica consistiu na avaliação da média da taxa de diadococinesia (DDC), bem como a frequência dos formantes F1 e F2 das consoantes fricativas. Os testes ANOVA a um critério e ANOVA a dois critérios foram utilizados. Resultados: Na análise perceptiva auditiva dos leigos houve uma piora nos valores de T0 para T1 em todas as contenções, mas somente para WAC e CTT os valores foram estatisticamente menores. Em T2 os valores voltaram a aumentar, mas para CTT ainda houve um valor estatisticamente menor do que T0 enquanto para WAO a diferença de T0 para T2 foi praticamente nula. Não houve alterações nos valores da DDC. Para os formantes, de uma maneira geral houve uma diferença de T0 para T1 e pouca diferença de T0 para T2, enquanto na comparação entre os aparelhos a WAC apresentou alterações maiores nos formantes F1 de algumas consoantes, enquanto WAO apresentou valores menores, e os demais dispositivos valores intermediários. Conclusões: Nos dois tipos de análise (subjetiva e objetiva) houve alteração na fala após a instalação de cada contenção, havendo uma melhora após 21 dias de uso. Os leigos consideraram maiores as alterações da fala envolvendo a CTT, e menores envolvendo WAO. Para a análise acústica os valores foram piores para WAC, enquanto para WAO as alterações foram menores.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Jackson, Philip J. B. „Characterisation of plosive, fricative and aspiration components in speech production“. Thesis, University of Southampton, 2000. https://eprints.soton.ac.uk/254111/.

Der volle Inhalt der Quelle

Annotation:

This thesis is a study of the production of human speech sounds by acoustic modelling and signal analysis. It concentrates on sounds that are not produced by voicing (although that may be present), namely plosives, fricatives and aspiration, which all contain noise generated by flow turbulence. It combines the application of advanced speech analysis techniques with acoustic flow-duct modelling of the vocal tract, and draws on dynamic magnetic resonance image (dMRI) data of the pharyngeal and oral cavities, to relate the sounds to physical shapes. Having superimposed vocal-tract outlines on three sagittal dMRI slices of an adult male subject, a simple description of the vocal tract suitable for acoustic modelling was derived through a sequence of transformations. The vocal-tract acoustics program VOAC, which relaxes many of the assumptions of conventional plane-wave models, incorporates the effects of net flow into a one-dimensional model (viz., flow separation, increase of entropy, and changes to resonances), as well as wall vibration and cylindrical wavefronts. It was used for synthesis by computing transfer functions from sound sources specified within the tract to the far field. Being generated by a variety of aero-acoustic mechanisms, unvoiced sounds are somewhat varied in nature. Through analysis that was informed by acoustic modelling, resonance and anti-resonance frequencies of ensemble-averaged plosive spectra were examined for the same subject, and their trajectories observed during release. The anti-resonance frequencies were used to compute the place of occlusion. In vowels and voiced fricatives, voicing obscures the aspiration and frication components. So, a method was devised to separate the voiced and unvoiced parts of a speech signal, the pitch-scaled harmonic filter (PSHF), which was tested extensively on synthetic signals. Based on a harmonic model of voicing, it outputs harmonic and anharmonic signals appropriate for subsequent analysis as time series or as power spectra. By applying the PSHF to sustained voiced fricatives, we found that, not only does voicing modulate the production of frication noise, but that the timing of pulsation cannot be explained by acoustic propagation alone. In addition to classical investigation of voiceless speech sounds, VOAC and the PSHF demonstrated their practical value in helping further to characterise plosion, frication and aspiration noise. For the future, we discuss developing VOAC within an articulatory synthesiser, investigating the observed flow-acoustic mechanism in a dynamic physical model of voiced frication, and applying the PSHF more widely in the field of speech research.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Jones, Catherine Jacquelynn Julia. „Queclaratives in Xhosa : an acoustic and perceptual analysis“. Thesis, Stellenbosch : Stellenbosch University, 2001. http://hdl.handle.net/10019.1/52426.

Der volle Inhalt der Quelle

Annotation:

Thesis (PhD)--University of Stellenbosch, 2001.
ENGLISH ABSTRACT: Key words: acoustic speech analysis, speech synthesis, speech perception, copulative queclarative, linguistics, psycho linguistics, human language technology This study investigates the notion of interrogativity in Xhosa as expressed in the form of Queclaratives. Queclaratives, or statements which are question-like in function, have been studied in many languages of the world. Unfortunately with regard to the Bantu languages, studies relating to interrogativity in general have largely been impressionistic in nature. This research comprised two aspects of analysis. These included an acoustic and a perceptual analysis of data. The reason for this approach is that, without this combination the results could have been considered to be suspect and lacking in authenticity. The acoustic analysis was conducted on 858 words in statement and queclarative pairs. Significant parameters were extracted and these were then statistically analyzed. The results revealed that duration on the penultimate vowel, pitch on the penultimate vowel and the overall raised pitch of queclaratives as opposed to statements were indeed the acoustically significant parameters differentiating statements from queclaratives. However as is well known, there is no one-to-one relationship between the acoustic signal and its perception and, therefore, it is imperative that such findings also be perceptually tested. The perceptual testing of these parameters was conducted in an attempt to elicit whether they were perceptually significant and also at what point in the utterance listeners could differentiate between queclaratives and statements. The next progression was the compilation of carefully designed perception tests on the acoustically significant parameters. Two experiments were compiled using stimuli that were manipulations of the original signal of one of the selected informant's utterances. These tests were administered on multimedia computers in the Language Laboratory at the University of Stellenbosch using 64 subjects for the first experiment and 63 for the second. The results of the perception tests showed that duration and pitch on the penultimate syllable are perceptually highly significant in differentiating statements from queclaratives. However the results also indicated very early recognition of the different forms with minimal speech segments from which the penultimate vowels were absent altogether. This then suggests that the perceptual judgements made earlier in the utterance may be either reinforced or overridden by the duration and pitch on the penultimate vowel. These results have assisted in the validation of some impressionistic claims made within the Bantu and other languages, while refuting others. However, as this corpus of data has included research on copulative queclaratives, it appeals for further research on this subject using any other linguistic markers. The results have also been evaluated in terms of their possible contribution to the related disciplines of psycholinguistics, linguistics and human language technologies. In so doing, the thesis makes an urgent appeal to researchers to pursue this experimental approach to language research. Another appeal is made for an awareness campaign as to the importance of this approach in harnessing the power of language for the development of language and society as a whole. The fertility of the South African society lies in its richness of multilingualism and the necessity for the improvement of the dissemination of information to all people of all languages and the improvement of communication between people in general, including those less fortunate in terms of literacy skills.
AFRIKAANSE OPSOMMING: Sleutelwoorde: akoestiese spraakanalise, spraaksintese, spraakpersepsie, kopulatiewe stelvraag, linguistiek, psigolinguistiek, taal-en-spraaktegnologie In hierdie projek word die aard van vraagstelling in Xhosa ondersoek met betrekking tot die stelvraag-vorm. Stelvrae, of stellings wat ook as vrae kan funksioneer, is reeds bestudeer vir heelwat wêreldtale. Oor die algemeen was studies oor vraagstelling in die Afrikatale egter grootliks impressionisties van aard. Hierdie navorsingsprojek het uit twee analisekomponente bestaan, naamlik 'n akoestiese analise van die data en 'n reeks persepsueie eksperimente. Sonder die kombinasie van die twee tipes analise sou die resultate van die navorsing minder kredietwaardig gewees het. Die akoestiese analise is gedoen op 858 woordpare bestaande uit stellings en stelvrae. Die data is statisties ontleed en die relevante parameters is onttrek. Die resultate het daarop gedui dat die duur en toonhoogte van die voorlaaste vokaal sowel as die register van die hele woord belangrike parameters is in die onderskeid tussen stellings en stelvrae. Aangesien dit wel bekend is dat daar nie 'n een-tot-een verwantskap tussen die akoestiese klanksein en die persepsie daarvan is nie, is dit noodsaaklik om ook 'n persepsueie eksperiment uit te voer. Die persepsueie toetse is so opgestel dat bepaal kon word watter akoestiese parameters ook persepsueel relevant is en om die vroegste sillabie te vind waar luisteraars reeds die onderskeid tussen die twee vorme kan maak. Die volgende stap was om stimuli vir die persepsietoetse op te stel wat inderdaad bogenoemde resultate sou lewer. Stimuli is geskep deur die spraakdata van een spreker te manipuleer. Die persepsietoetse is toe uitgevoer op multimedia-rekenaars in die Taallaboratorium van die Universiteit van Stellenbosch. Die resultate van die persepsietoetse het gewys dat die duur en toonhoogte van die voorlaaste sillabe ook persepsueel belangrik is in die ondersekeid tussen die verskillende vorme. Dit was ook duidelik dat proefpersone die vorme van mekaar kon onderskei met minimale hoeveelhede inligting waar die voorlaaste en laaste sillabes heeltemal afwesig was. Dit dui daarop dat luisteraars persepsuele besluite baie vroeg in die woord neem, maar dat hierdie besluite óf versterk óf omgekeer kan word deur die duur en toonhoogte van die voorlaaste sillabe. Die resultate van die navorsing het sekere impressionistiese stellings ten opsigte van Afrikatale ondersteun terwyl ander stellings as foutief bewys is. Een van die belangrike bevindings was dat die impressionistiese standpunt dat stellings 'n dalende intonasiekontoer en vrae 'n stygende intonasiekontoer tydens die afloop van die uiting het, 'n oorvereenvondiging is. Hierdie werk is gedoen op enkelwoord-kopulatief stelvrae en leen dit daartoe om uitgebrei te word na frases en sinne vir toekomstige navorsmg. Verder is die navorsingsresultate in verband gebring met verwante dissiplines soos psigolinguistiek, linguistiek en taal- en spraaktegnologie. 'n Pleidooi is gelewer vir 'n bewusmakingsveldtog om die belangrikheid van hierdie tipe navorsing te beklemtoon om die potensiaal van taal te benut vir die ontwikkeling van Suider-Afrikaanse tale en gemeenskappe. Die rykheid van ons gemeenskap lê in die veeltaligheid daarvan en bied besondere uitdagings om die verspreiding van inligting na alle mense van alle tale te verbeter en om die kommunikasie tussen mense in die algemeen, maar ook spesifiek vir diegene met laer vlakke van geletterdheid.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Keerio, Ayaz. „Acoustic analysis of Sindhi speech : a pre-curser for an ASR system“. Thesis, University of Sussex, 2011. http://sro.sussex.ac.uk/id/eprint/6325/.

Der volle Inhalt der Quelle

Annotation:

The functional and formative properties of speech sounds are usually referred to as acoustic-phonetics in linguistics. This research aims to demonstrate acoustic-phonetic features of the elemental sounds of Sindhi, which is a branch of the Indo-European family of languages mainly spoken in the Sindh province of Pakistan and in some parts of India. In addition to the available articulatory-phonetic knowledge; acoustic-phonetic knowledge has been classified for the identification and classification of Sindhi language sounds. Determining the acoustic features of the language sounds helps to bring together the sounds with similar acoustic characteristics under the name of one natural class of meaningful phonemes. The obtained acoustic features and corresponding statistical results for a particular natural class of phonemes provides a clear understanding of the meaningful phonemes of Sindhi and it also helps to eliminate redundant sounds present in the inventory. At present Sindhi includes nine redundant, three interchanging, three substituting, and three confused pairs of consonant sounds. Some of the unique acoustic-phonetic features of Sindhi highlighted in this study are determining the acoustic features of the large number of the contrastive voiced implosives of Sindhi and the acoustic impact of the language flexibility in terms of the insertion and digestion of the short vowels in the utterance. In addition to this the issue of the presence of the affricate class of sounds and the diphthongs in Sindhi is addressed. The compilation of the meaningful language phoneme set by learning their acoustic-phonetic features serves one of the major goals of this study; because twelve such sounds of Sindhi are studied that are not yet part of the language alphabet. The main acoustic features learned for the phonological structures of Sindhi are the fundamental frequency, formants, and the duration — along with the analysis of the obtained acoustic waveforms, the formant tracks and the computer generated spectrograms. The impetus for doing such research comes from the fact that detailed knowledge of the sound characteristics of the language-elements has a broad variety of applications — from developing accurate synthetic speech production systems to modeling robust speaker-independent speech recognizers. The major research achievements and contributions this study provides in the field include the compilation and classification of the elemental sounds of Sindhi. Comprehensive measurement of the acoustic features of the language sounds; suitable to be incorporated into the design of a Sindhi ASR system. Understanding of the dialect specific acoustic variation of the elemental sounds of Sindhi. A speech database comprising the voice samples of the native Sindhi speakers. Identification of the language‘s redundant, substituting and interchanging pairs of sounds. Identification of the language‘s sounds that can potentially lead to the segmentation and recognition errors for a Sindhi ASR system design. The research achievements of this study create the fundamental building blocks for future work to design a state-of-the-art prototype, which is: gender and environment independent, continuous and conversational ASR system for Sindhi.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Ofuka, Etsuko. „Acoustic and perceptual analyses of politeness in Japanese speech“. Thesis, University of Leeds, 1996. http://etheses.whiterose.ac.uk/1036/.

Der volle Inhalt der Quelle

Annotation:

In order to examine potential acoustic cues for politeness in Japanese speech, fO and temporal aspects of polite and casual utterances of two question sentences spoken by six male native speakers were acoustically analysed. The analysis showed that fO movement of the final part of utterances and speech rate of utterance were consistently differently used in these different speaking styles (i.e., 'polite' and 'casual')across all the speakers. Perceptual experiments with listeners using a rating scale method confirmed that these acoustic variables, which were manipulated using digital resynthesis, had an impact on politeness perception. It was showed that the duration and fO direction of the final vowel of utterances were so influential that the overall impression of utterance politeness was changed. An experiment which used speech rate variations of a polite utterance showed the important role of this variable in perceived politeness. Politeness ratings showed an inverted-U shape as a function of speech rate, but differed according to particular speakers. The speech rate of listeners was found to affect their utterance rate preference; listeners clearly preferred rates close to their own, i.e., rates they perceived as 'natural' or comfortable. A final experiment, using speech rate variations of a polite utterance as stimuli and a two alternative forced-choice procedure, showed a very high correlation between perceived politenesss cores and naturalness scores. This suggests the importance of listener characteristics in politeness research.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

TAKEDA, Kazuya, Norihide KITAOKA und Makoto SAKAI. „Acoustic Feature Transformation Based on Discriminant Analysis Preserving Local Structure for Speech Recognition“. Institute of Electronics, Information and Communication Engineers, 2010. http://hdl.handle.net/2237/14969.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Bezuidenhout, Hannelie. „An electroglottographic and acoustic analysis of glottal activity during speech initiation in stuttering“. Pretoria : [s.n.], 2006. http://upetd.up.ac.za/thesis/available/etd-09252008-142958.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Tyson, Na'im R. „Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech“. The Ohio State University, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=osu1339695879.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Nissen, Shawn L. „An Acoustic Analysis of Voiceless Obstruents Produced by Adults and Typically Developing Children“. The Ohio State University, 2003. http://rave.ohiolink.edu/etdc/view?acc_num=osu1041225568.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Saikachi, Yoko. „Development, perceptual evaluation, and acoustic analysis of amplitude-based F0 control in Electrolarynx speech“. Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/54667.

Der volle Inhalt der Quelle

Annotation:

Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2009.
"September 2009." Cataloged from PDF version of thesis.
Includes bibliographical references (p. 120-126).
An Electrolarynx (EL) is a battery-powered device that produces a sound that can be used to acoustically excite the vocal tract as a substitute for laryngeal voice production. ELs provide laryngectomy patients with the basic capability to communicate, but current EL devices produce a mechanical speech quality which has been largely attributed to the lack of natural fundamental frequency (F0) variation. In order to improve the quality of EL speech, the present study aimed to develop and evaluate an automatic F0 control scheme, in which F0 was modulated based on variations in the root-mean-squared (RMS) amplitude of the EL speech signal. Recordings of declarative sentences produced by two male subjects before and after total laryngectomy were used to develop procedures for calculating F0 contours for EL speech, and perceptual experiments and acoustic analyses were conducted to examine the impact of F0 modulation on the quality and prosodic function of the EL speech. The results of perceptual experiments showed that modulating the F0 of EL speech using a linear relationship between amplitude and frequency made it significantly more natural sounding than EL speech with constant F0, but also revealed some limitations in terms of communicating linguistic contrasts (distinction between question vs. statement and location of contrastive stress). Results are interpreted in relation to the acoustic characteristics of F0 modified EL speech and discussed in terms of their clinical implications and suggestion for improved algorithms of F0 control in EL speech.
by Yoko Saikachi.
Ph.D.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Butt, Abdul Haleem. „Speech Assessment for the Classification of Hypokinetic Dysthria in Parkinson Disease“. Thesis, Högskolan Dalarna, Datateknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:du-10041.

Der volle Inhalt der Quelle

Annotation:

The aim of this thesis is to investigate computerized voice assessment methods to classify between the normal and Dysarthric speech signals. In this proposed system, computerized assessment methods equipped with signal processing and artificial intelligence techniques have been introduced. The sentences used for the measurement of inter-stress intervals (ISI) were read by each subject. These sentences were computed for comparisons between normal and impaired voice. Band pass filter has been used for the preprocessing of speech samples. Speech segmentation is performed using signal energy and spectral centroid to separate voiced and unvoiced areas in speech signal. Acoustic features are extracted from the LPC model and speech segments from each audio signal to find the anomalies. The speech features which have been assessed for classification are Energy Entropy, Zero crossing rate (ZCR), Spectral-Centroid, Mean Fundamental-Frequency (Meanf0), Jitter (RAP), Jitter (PPQ), and Shimmer (APQ). Naïve Bayes (NB) has been used for speech classification. For speech test-1 and test-2, 72% and 80% accuracies of classification between healthy and impaired speech samples have been achieved respectively using the NB. For speech test-3, 64% correct classification is achieved using the NB. The results direct the possibility of speech impairment classification in PD patients based on the clinical rating scale.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Ng, Yuk-sim Cherry. „Perceptual and acoustic analysis of dysarthric dysphonia direct magnitude estimation versus interval scaling /“. Click to view the E-thesis via HKUTO, 2002. http://sunzi.lib.hku.hk/hkuto/record/B36208425.

Der volle Inhalt der Quelle

Annotation:

Thesis (B.Sc)--University of Hong Kong, 2002.
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, May 10, 2002." Also available in print.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Li, Yee-key Nicole, und 李依祺. „Acoustic and perceptual analysis of modal and falsetto registers in females with dysphonia“. Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B26653278.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Li, Qiang. „Acoustic Analysis of Intonation in Persons with Parkinson's Disease Receiving Transcranial Magnetic Stimulation and Intensive Voice Therapy“. Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10843550.

Der volle Inhalt der Quelle

Annotation:

Intonation is one of the prosodic features manifested acoustically in the fundamental frequency (F0). Intonation abnormality is common and prominent in the speech of persons with Parkinson's disease (PD). Intensive speech therapies such as Lee Silverman Voice Treatment (LSVT-LOUD^®) have been demonstrated effective for increasing vocal intensity and F0 variability, but no prior studies have examined linguistic features of intonation before and after treatment in PD. Additionally, transcranial magnetic stimulation (TMS) has been demonstrated as an appropriate adjuvant to a primary treatment. It is reasonable to hypothesize that intonation abnormality will be improved after the combined modality treatment of LSVT-LOUD^® and TMS. To examine this hypothesis, the current research investigated acoustically five intonational features including F0 declination, resetting, emphasis, terminal fall, and syntactic pre-junctural fall in twenty PD participants, receiving LSVT-LOUD^® alone, or combined with TMS delivered to the left or right cerebral hemisphere. The primary experiment was designed and carried out by Shalini Narayana and colleagues in their project funded by the Michael J. Fox Foundation for Parkinson's Research. They collected and provided the recorded reading samples for the current study.

F0 changes of each of five intonational feature were measured before and after the combined modality treatment, and at two months follow-up, then analyzed statistically. The results revealed that F0 declination, emphasis, and terminal fall changed significantly from pre- to post-treatment, and the changes of declination and terminal fall were maintained at the follow-up evaluations.

The observed changes in intonation were attributed to LSVT alone, which caused large changes of F0 magnitude. F0 resetting and syntactic pre-junctural fall did not change significantly following treatment, probably because these intonational features need very precise fine motor control of the intrinsic laryngeal muscles to make small-range, rapid F0 adjustments, which were not improved by LSVT in present PD participants. Difficulties with syntactic processing previously reported in PD may have contributed to the lack of improvement in resetting and pre-junctural fall, since these F0 features are used to mark syntactic boundaries within utterances. Consideration of incorporation of linguistic intonation to speech intervention for speakers with PD is suggested for future clinical research.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

TAKEDA, Kazuya, Seiichi NAKAGAWA, Yuya HATTORI, Norihide KITAOKA und Makoto SAKAI. „Evaluation of Combinational Use of Discriminant Analysis-Based Acoustic Feature Transformation and Discriminative Training“. Institute of Electronics, Information and Communication Engineers, 2010. http://hdl.handle.net/2237/14968.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

Bianchi, Michelle. „Effects of clear speech and linguistic experience on acoustic characteristics of vowel production“. [Tampa, Fla.] : University of South Florida, 2007. http://purl.fcla.edu/usf/dc/et/SFE0002084.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Talkar, Tanya. „Design of tool for analysis of speech development disorders using landmarks and other acoustic cues“. Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113098.

Der volle Inhalt der Quelle

Annotation:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (page 71).
Non-word repetition tasks have been used to diagnose children with various developmental difficulties with phonology, but these productions have not been phonetically analyzed to reveal the nature of the modifications produced by children diagnosed with SLI, autism spectrum disorder or dyslexia compared to those produced by typically-developing children. In this thesis, we compared the modification of predicted acoustic cues to distinctive features of manner, place and voicing for just under 30 children (ages 5-12), for the CN-Rep word inventory, in an extension of the earlier analysis in Levy et al. 2014. Feature cues, including abrupt acoustic landmarks (Stevens 2002) and other acoustic feature cues, were hand-labeled and analysis of factors that may influence feature cue modifications included position in the word, position in the syllable, word length measured in syllables, lexical stress, and manner type. Results suggest specific patterns of modification in specific contexts for specific clinical populations. These findings set the foundation for understanding how phonetic variation in speech arises in both typical and clinical populations, and for using this knowledge to develop tools to aid in more accurate and insightful diagnosis as well as improved intervention methods.
by Tanya Talkar.
M. Eng.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Celaya, Marissa. „Speech Adaptation to Electropalatography in Children's Productions of /s/ and /Êƒ/“. BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/4103.

Der volle Inhalt der Quelle

Annotation:

Previous research has investigated adults' ability to adapt their speech when a electropalatographic (EPG) pseudopalate is placed in the oral cavity; however, less is known about how younger speakers who are continuing to develop their motor speech abilities might adapt their speech to the presence of the device. This study examined the effect of an EPG pseudopalate on elementary school-aged children's ability to produce the fricatives /s/ and /ƒ/. Audio recordings of six children were collected at eight time intervals including before placement of the pseudopalate, at 30-minute increments for two hours with the pseudopalate in place, immediately following removal of the pseudopalate and 30 minutes after removal. An acoustic analysis was completed looking at consonant duration, spectral mean, spectral variance, and relative intensity. Disturbance of speech patterns from the presence of the pseudopalate was noted for most of the acoustic measures, most noticeably for the relative intensity of both /s/ and /ƒ/, as well as for the spectral mean and spectral variance of /ƒ/. Although there was a relatively high amount of variability among and within speakers, signs of adaptation were apparent after only 30 minutes for some participants. For some acoustic measures, however, full adaptation often did not occur until the pseudopalate was removed. Although future research is needed, it is hoped that this study will provide a greater understanding of children's ability to adapt to the EPG pseudopalate.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

Smith, Megan Marie. „The Sound of the Snow Queen: An Acoustic Analysis of Vowel Clarity in "Let it Go"“. Ohio University Honors Tutorial College / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ouhonors1461277153.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Torgerson, Richard Christen. „A Comparison of Beijing and Taiwan Mandarin Tone Register: An Acoustic Analysis of Three Native Speech Styles“. Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd1003.pdf.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Chan, Carlos Chun Ming. „Speaker model adaptation in automatic speech recognition“. Thesis, Robert Gordon University, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.339307.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

Schulz, Henrik. „Large vocabulary continuous speech recognition for the transcription of Catalan broadcast news and conversations : towards analysis and modelling of acoustic reduction in spontaneous speech“. Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/405985.

Der volle Inhalt der Quelle

Annotation:

The transcription of spontaneous speech still poses a challenge to state-of-the-art methods for automatic speech recognition. The present thesis describes the comprehensive development of a large vocabulary continuous speech recognition system for the transcription of Catalan broadcast news and conversions and evolves towards novel approaches for analysis and modelling of acoustic reduction in spontaneous speech. It emphasises initially on various conventional methods for acoustic analysis, acoustic and language modelling and hypothesis search. Improvements over the original single-pass baseline system are mainly attained by domain and speaking style emphasising interpolation of individually estimated language models, linear discriminating projection of acoustic observations that improves the phonetic class separability, speaker normalisation of the acoustic observations, speaker adaptive training and acoustic model adaptation in a multi-pass system approach. The analysis of acoustic reduction initially emphasises on context independent vowel and consonant specific spectral and temporal properties whose parameters display statistically significant differences between the phoneme prototypes in spontaneous speech and their canonical realisations in planned speech. The introduction of the feature space analysis provides the general means to reveal these differences in conventional acoustic observations for automatic speech recognition. It displays statistically significant differences context-independently but also in a syllable context between adjacent phonemes suggesting particular reduction patterns. The analysis furthermore challenges the often suggested coherence between the co-occurring reduction of spectral and temporal properties. The modelling of acoustic reduction first emphasises on segment conditioned discriminating variables and variability class dependent models and variability class specific adaptation of the original acoustic model. It introduces phoneme rate as means to analyse temporal properties and feature space reduction ratio as means to analyse the reduction of spectral properties in conventional feature space for large vocabulary continuous speech recognition as discriminating variables. These variables are clustered and determine the classes for segment conditioned variability class dependent models and their scoring during the hypothesis search in recognition. Both approaches displays no significant performance improvement. Furthermore the modelling advances towards segment constituent predictability dependent models that introduce predictability as discriminating variable for variability class dependent models relying on the fundamental coherence between predictability and acoustic reduction that is suggested through the principle of least effort and the redundancy theory. It thereby emphasises on word and phoneme predictability. This approach displays no significant performance improvement. Planned speech is apparently antagonising the principle of least effort. Thus, a prior segment conditioned analysis of acoustic reduction may indicate its average degree of reduction, while their within-segment variation may indicate whether it exhibits sufficient relaxation of the speaking style to adopt the principle of least effort. Thus, segments exhibiting small within-segment variation may be modelled separately from those of large within-segment variation, whereas modelling the latter by word, syllable or phoneme predictability dependent models may provide a research perspective.
La transcripció de converses espontànies encara suposa un repte per als mètodes actuals de reconeixement automàtic de veu. Aquesta tesi descriu el desenvolupament d'un sistema de reconeixement de veu continu de vocabulari gran per a la transcripció de converses i notícies emeses en català i condueix cap a noves aproximacions per a l'anàlisi i modelat de la reducció acústica en converses espontànies. Es centra inicialment en diversos mètodes convencionals per a l'anàlisi acústica, modelat acústic i del llenguatge i en la cerca d'hipòtesis. Les millores respecte el sistema original d'única passada són principalment degudes al domini i l'estil en la parla posant èmfasi en la interpolació de models de llenguatge, discriminació lineal i projecció d'observacions acústiques, entrenament adaptat al locutor per millorar la separació de les classes fonètiques, normalització de les observacions acústiques, i adaptació del model acústic en una sistema de múltiples passades. L'anàlisi de reducció acústica posa inicialment èmfasi en les propietats espectrals i temporals independents de vocals i consonant específiques, els paràmetres de les quals mostren diferències estadísticament significatives entre els prototips de fonemes en la conversa espontània i la seva realització canònica en el discurs planejat. La introducció de l'anàlisi del espai de característiques proporciona els mitjans generals per a revelar aquestes diferències en observacions acústiques convencionals per al reconeixement automàtic de veu. Mostra diferències estadísticament significatives independents de context però també entre fonemes adjacents en el context de síl·laba suggerint patrons de reducció particulars. A més, l'anàlisi desafia la, sovint suggerida, coherència entre les reducció simultànies de les propietats espectrals i temporals. El modelat de la reducció acústica primer fa èmfasi en variables discriminants de cada segment, models dependents de la variabilitat de la classe i l'adaptació del model acústic original. Introdueix la taxa de fonemes com a mitjà d'analitzar propietats temporals i la proporció de la reducció del espai de característiques com a mitjà d'analitzar la reducció dels propietats espectrals en el espai de característiques convencional per al reconeixement de veu continu de vocabulari gran com a variables discriminants. Aquestes variables s'agrupen i determinen les classes per a models dependents de la variabilitat de cada segment i la seva puntuació durant el reconeixement i cerca d'hipòtesi. Ambdues aproximacions no mostren una millora significativa en el rendiment. A més a més, les tècniques de modelat es dirigeixen cap a models dependents de la predicibilitat del segment que introdueixen la predicibilitat com a variable discriminant per a models dependents de la classe de variabilitat basats en la coherència fonamental entre predicibilitat i reducció acústica que es suggereix pel principi del mínim esforç i la teoria de la redundància. Per tant, emfatitza la predicibilitat de les paraules i dels fonemes. Aquesta aproximació no suposa cap millora significativa de rendiment. El discurs planejat és aparentment antagònic amb el principi del mínim esforç. Per tant, un anàlisi previ condicionat al segment de la reducció acústica pot indicar el seu grau mig de reducció, mentre la variació intra-segmental pot indicar si exhibeix prou relaxació en l'estil de parlar per adoptar el principi del mínim esforç. Per tant, segments amb poca variació intra-segmental poden ser modelats apart dels que tenen gran variació intra-segmental, mentre que modelar aquestes darreres mitjançant models dependents de predicibilitat de paraula, síl·laba o fonema poden aportar una perspectiva viable de recerca.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

Kocjancic, Tanja. „Ultrasound and acoustic analysis of lingual movement in teenagers with childhood apraxia of speech, control adults and typically developing children“. Thesis, Queen Margaret University, 2010. https://eresearch.qmu.ac.uk/handle/20.500.12289/7448.

Der volle Inhalt der Quelle

Annotation:

Childhood apraxia of speech (CAS) is a neurological motor speech disorder affecting spatiotemporal planning of speech movements. Speech characteristics of CAS are still not well defined and the main aim of this thesis was to reveal them by analysing acoustic and articulatory data obtained by ultrasound imaging. Ultrasound recording provided temporal and articulatory measurement of duration of syllables and segments, amount and rate of tongue movement over the syllables and observation of the patterns of tongue movement. Data was provided by three teenagers with CAS and two control groups, one of ten typically developing children and the other of ten adults. Results showed that, as a group, speakers with CAS differed from the adults but not from the typically developing children in syllable duration and in rate of tongue movement. They did not differ from either of the control groups in amount of tongue movement. Individually, speakers with CAS showed similar or even greater consistency on these features than the control speakers but displayed different abilities to adapt them to changes in the syllable structure. While all three adapted syllable duration and rate of tongue movement in the adult-like way, only two showed mature adaptation of segment durations and of the amount of tongue movement. Observing patterns of tongue movement showed that speakers with CAS produce different patterns than speakers in the control groups but are at the same time, like adults, very stable in their articulations. Also, speakers with CAS may move their tongues less in the oral space than speakers in the control groups. The differences between the control groups were similar to those found in previous studies. The results provide support for the validity of the methods used, new information about CAS and a promising direction for future research in differential diagnostic and therapy procedures.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Yang, Lening. „Computer modelling of speech intelligibility in underground stations“. Thesis, London South Bank University, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245130.

Der volle Inhalt der Quelle

Annotation:

The aim of this study is to develop a ray tracing computer model for predicting speech intelligibility in underground stations. There are four parts to the study: correctly modelling the sound field in underground stations; developing a mathematical model for predicting speech intelligibility using a ray tracing computer model; using the model to investigate ways of improving speech intelligibility in underground stations; and using the model to analyze the sound field in long enclosures with multiple source systems. Four computer models have been developed for investigating acoustic parameters in different conditions or different types of space. The models have been validated with scale model measurements, and the predictions have also been compared with classical room acoustics. Three new contributions to the ray tracing computer model have been developed in this project: the reverberation time tail compensation, the exact representation of curved surfaces, and diffraction effects. A mathematical model for predicting speech transmission index in long enclosures using the ray tracing method has been developed. The model has been shown to be more accurate and efficient by comparison with scale model measurements and measurementsm ade in a real underground station. The model has been used to investigate ways of improving speech intelligibility in different noise levels and with different source spacing. Finally, the quasi-diffuse sound field theory for long enclosures with multiple source systems has been developed and justified as an approximation method for a quick investigation of speech intelligibility in underground stations.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Freij, G. J. „Enhanced sequential adaptive linear prediction for speech encoding“. Thesis, University of Liverpool, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356268.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

Beet, S. W. „Digital processing of speech produced in hyperbaric helium“. Thesis, University of Liverpool, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356244.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Crozier, Philip Mark. „Enhancement techniques for noise affected telephone quality speech“. Thesis, University of Liverpool, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.321115.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

Rex, James Alexander. „Microphone signal processing for speech recognition in cars“. Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.326728.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Thomas, T. J. „An articulatory model of speech production including turbulence“. Thesis, University of Cambridge, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.333125.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Narayanan, Arun. „Computational auditory scene analysis and robust automatic speech recognition“. The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1401460288.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Chen, Jitong. „On Generalization of Supervised Speech Separation“. The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492038295603502.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

Wang, Yuxuan. „Supervised Speech Separation Using Deep Neural Networks“. The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

Ghaidan, Khaldoon A. „A study of the application of modern techniques to speech waveform analysis“. Thesis, Loughborough University, 1986. https://dspace.lboro.ac.uk/2134/28015.

Der volle Inhalt der Quelle

Annotation:

Spectrograms are perhaps the most commonly used method for studying the characteristics of speech waveforms. Producing a spectrogram can conveniently be divided into two parts, the analysis and the display, and this thesis describes a study of both these aspects.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Lee, Sang Jun. „Comparative analysis of speech intelligibility in church acoustics using computer modeling“. [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0000866.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Raeesy, Zeynabalsadat. „Automatic analysis of magnetic resonance images of speech articulation“. Thesis, University of Oxford, 2013. http://ora.ox.ac.uk/objects/uuid:ffa6d290-6920-4204-8d65-e4f2f09278c5.

Der volle Inhalt der Quelle

Annotation:

Magnetic resonance imaging (MRI) technology has facilitated capturing the dynamics of speech production at fine temporal and spatial resolutions, thus generating substantial quantities of images to be analysed. Manual processing of large MRI databases is labour intensive and time consuming. Hence, to study articulation on large scale, techniques for automatic feature extraction are needed. This thesis investigates approaches for automatic information extraction from an MRI database of dynamic articulation. We first study the articulation by observing the pixel intensity variations in image sequences. The correspondence between acoustic segments and images is established by forced alignment of speech signals recorded during the articulation. We obtain speaker-specific typical phoneme articulations that represent general articulatory configurations in running speech. Articulation dynamics are parametrised by measuring the magnitude of change in intensities over time. We demonstrate a direct correlation between the dynamics of articulation thus measured and the energy of the generated acoustic signals. For more sophisticated applications, a parametric description of vocal tract shape is desired. We investigate different shape extraction techniques and present a framework that can automatically identify and extract the vocal tract shapes. The framework incorporates shape prior information and intensity features in recognising and delineating the shape. The new framework is a promising new tool for automatic identification of vocal tract boundaries in large MRI databases, as demonstrated through extensive assessments. The segmentation framework proposed in this thesis is, to the best of our knowledge, novel in the field of speech production. The methods investigated in this thesis facilitate automatic information extraction from images, either for studying the dynamics of articulation or for vocal tract shape modelling. This thesis advances the state-of-the-art by bringing new perspectives to studying articulation, and introducing a segmentation framework that is automatic, does not require extensive initialisation, and reports a minimum number of failures.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Perkins, Rosalie. „PHONETIC AND ACOUSTIC ANALYSES OF TWO NEW CASES OF FOREIGN ACCENT SYNDROME“. Master's thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4183.

Der volle Inhalt der Quelle

Annotation:

This study presents detailed phonetic and acoustic analyses of the speech characteristics of two new cases of Foreign Accent Syndrome (FAS). Participants include a 48-year-old female who began speaking with an "Eastern European" accent following a traumatic brain injury, and a 45-year-old male who presented with a "British" accent following a subcortical cerebral vascular accident (CVA). Identical samples of the participants' pre- and post-morbid speech were obtained, thus affording a new level of control in the study of Foreign Accent Syndrome. The speech tasks consisted of oral readings of the Grandfather Passage and 18 real words comprised of the stop consonants /p/, /t/, /k/, /b/, /d/, /g/ combined with the peripheral vowels /i/, /a/ and /u/ and ending in a voiceless stop. Computer-based acoustic measures included: 1) voice onset time (VOT), 2) vowel durations, 3) whole word durations, 4) first, second and third formant frequencies, and 5) fundamental frequency. Formant frequencies were measured at three points in the vowel duration: a) 20%, b) 50%, and c) 80% to assess differences in vowel 'onglides' and 'offglides'. The phonetic analysis provided perceptual identification of the major phonetic features associated with the foreign quality of participant's FAS speech, while acoustic measures allowed precise quantification of these features. Results indicated evidence of backing of consonant and vowel productions for both participants. The implications for future research and clinical applications are also considered.
M.A.
Department of Communication Sciences and Disorders
Health and Public Affairs
Comm Sciences & Disorders MA

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Panico, Adriana Campos Balieiro. „"Julgamento do comportamento vocal de jornalistas em diferentes estilos de notícias e seus correlatos acústicos"“. Universidade de São Paulo, 2005. http://www.teses.usp.br/teses/disponiveis/59/59134/tde-03042006-164315/.

Der volle Inhalt der Quelle

Annotation:

Investigaram-se a identificação de diferentes estilos de emissões do telejornalismo classificados em: neutro, sério e descontraído, e seus correlatos acústicos. Apresentadores experientes, de ambos os sexos, com atuação constante em telejornais de rede, gravaram um texto, de mesmo conteúdo semântico, por três vezes, nos três estilos de emissão. A partir dessa gravação, foi separado o áudio em CD, em formato wave, para que se procedesse à análise acústica da amostragem, avaliando os parâmetros acústicos de freqüência, intensidade e duração da emissão; por meio do software Dr. Speech 4.0. Em outro CD, em formato áudio, as amostras foram randomizadas e julgadas por trinta sujeitos, que tiveram a tarefa de identificar os estilos. Os parâmetros acústicos que diferiram significativamente entre os estilos foram a Fo média, a Fo máxima, a variação de Fo e o tempo de fala. Os telespectadores foram capazes de identificar os diferentes estilos. Posteriormente as amostras foram separadas, por estilo, em três CDs, em formato áudio, e apresentadas por meio do Método de Comparação aos Pares a leigos, para que fossem julgadas em função de cada um dos estilos de emissão. Estes resultados foram submetidos à análise multidimensional (MDS-Multidimensional Scaling) para que fossem determinadas as dimensões em que se encontravam os diferentes estilos de emissão. Duas dimensões foram determinadas para cada estilo. No estilo descontraído, a primeira dimensão não teve parâmetro acústico significativamente correlacionado. Na segunda, sem distinção de gênero, foi significativo o número de semitons; para as vozes femininas a Fo mínima, a variação de Fo e o número de semitons; e para as vozes masculinas, nenhum parâmetro correlacionado foi significativo. No estilo neutro, a primeira dimensão, sem distinção de gênero, dois parâmetros foram significativamente correlacionados: Fo mínima e o número de semitons; para as vozes femininas, foi correlacionada significativamente a variação de Fo; e para as vozes masculinas, a Io máxima. Na dimensão dois, nenhum parâmetro foi correlacionado significativamente. No estilo sério, a primeira dimensão foi correlacionada significativamente com o parâmetro acústico tempo de fala, somente para as vozes masculinas. A dimensão dois apresentou os seguintes parâmetros correlacionados significativamente: sem considerar a distinção de gênero, a Fo média, Fo mínima e Io mínima; para as vozes femininas, a Io mínima; e para as vozes masculinas nenhum parâmetro acústico foi correlacionado significativamente. A partir desses resultados discutem-se possibilidades de intervenção com indivíduos que usam a comunicação profissionalmente.
Acoustic correlations were investigated and the identification of different emission styles in TV newscasts was rated in the following manner: neutral, serious and relaxed. Experienced presenters, from both genders, with constant appearances in network TV newscasts recorded a text with the same semantic content three times in the same emission styles. Based on this recording, the audio was separated in a CD in the wave format in order to perform the acoustic analysis of the sample assessing the acoustic parameters of frequency, intensity and duration of the emission through the Dr. Speech 4.0 software. In another CD, in the audio format, the samples were randomized and judged by 30 subjects with the task of identifying the styles. The acoustic parameters that differed significantly among the styles were the average Fo, maximum Fo, the Fo variation and the speech time. The spectators were capable of identifying the different styles. Later, the samples were separated by styles in three CDs in the audio format and presented by means of a Pair Comparison Method to lay individuals to be judged in function of each one of the emission styles. These data were submitted to a multidimensional analysis (MDS-Multidimensional Scaling) in order to determine the dimensions in which the different emission styles were. Two dimensions were determined for each style. In the relaxed style, the first dimension did not have a significantly correlated acoustic parameter. In the second one, without distinguishing the type, the number of semitones was significant; for the female voices, the minimum Fo, the Fo variation and the number of semitones; and for the male voices, there was no significant correlated pattern. In the neutral style, the first dimension, without distinguishing the type, two parameters were significantly correlated: minimum Fo and the number of semitones; for the female voices, the Fo variation had a significant correlation; and for the male voices, the maximum Io. In dimension two, no parameter presented a significant correlation. In the serious style, the first dimension was significantly correlated with the time of speech acoustic parameter, only for the male voices. Dimension two presented the following significantly correlated parameters: without considering the distinction of the type, average Fo, minimum Fo and minimum Io; for the female voices, the minimum Io; and for the male voices no acoustic parameter was significantly correlated. Based on these results, intervention possibilities with individuals that use communication professionally are discussed.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema „Acoustic analysis of speech“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an