Dissertations / Theses on the topic 'ID-speech'

To see the other types of publications on this topic, follow the link: ID-speech.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'ID-speech.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

GENOVESE, GIULIANA. "L'infant-directed speech nella lingua italiana: caratteristiche lessicali, sintattiche, prosodiche e relazione con lo sviluppo linguistico." Doctoral thesis, Università degli Studi di Milano-Bicocca, 2019. http://hdl.handle.net/10281/241109.

Full text
Abstract:
Il presente lavoro di ricerca intende indagare le caratteristiche del linguaggio rivolto ai bambini nella lingua italiana nel primo anno di vita e i suoi effetti sullo sviluppo del linguaggio, dai prerequisiti allo sviluppo lessicale e sintattico. La cornice teorica su cui si fondano le ricerche qui presentate assume che il processo di acquisizione linguistica abbia basi sociali. La prima parte dell’elaborato comprende due studi che descrivono le proprietà lessicali, sintattiche e prosodiche del linguaggio rivolto ai bambini. La seconda, invece, è costituita da due lavori che indagano la qualità e gli effetti dell’input linguistico nello sviluppo del linguaggio, prendendo in considerazione sia un prerequisito in fase verbale, sia le competenze lessicali e sintattiche nel secondo anno di vita; in questa seconda parte sono stati inoltre definiti i predittori dell’apprendimento linguistico, considerando sia le caratteristiche dell’input ma anche il contributo delle competenze comunicative precoci del bambino. Il primo studio presentato è un’indagine a carattere longitudinale nella quale sono state descritte, mediante misure globali e specifiche, le caratteristiche lessicali e sintattiche del linguaggio rivolto ai bambini nella lingua italiana. Ciò che è emerso è un registro semplificato ma non semplice che presenta un periodo di massima semplificazione nella seconda metà del primo anno di vita. La seconda ricerca, sempre a carattere longitudinale, ha preso in esame le proprietà prosodiche del linguaggio rivolto ai bambini e la caratterizzazione prosodica di enunciati con funzione pragmatica differente. I risultati hanno messo in luce una prosodia generalmente enfatizzata nel linguaggio rivolgo ai bambini nel periodo preverbale ma, sorprendentemente, in misura moderata. Inoltre, è stato possibile osservare un pattern di cambiamento nel corso del primo anno di vita che si discosta da quello caratterizzante altre lingue non-tonali. Infine, sono emerse caratteristiche prosodiche distintive per enunciati con funzione pragmatica diversa, elemento che evidenzia il ruolo altamente informativo della prosodia. Il terzo lavoro ha indagato longitudinalmente gli antecedenti dello sviluppo linguistico, valutando il contributo delle competenze comunicative precoci del bambino e il ruolo dell’input - di cui sono state esaminate qualità e stabilità - temi rispetto ai quali la letteratura riporta ancora risultati contrastanti. I dati ottenuti indicano che lo sviluppo linguistico nel secondo anno di vita rispecchi le abilità comunicative precoci e sembri favorito da un input ricco, ridondante e sintatticamente articolato. Infine, il quarto contributo ha analizzato, con un disegno sperimentale, i possibili effetti del canto rivolto ai bambini, ipotizzando un ruolo facilitatore rispetto al parlato nel processo di discriminazione fonetica, precursore preverbale dello sviluppo linguistico. Si tratta di un tema piuttosto trascurato nella letteratura che, invece, si è di fatto sempre concentrata sugli effetti della prosodia tipica del parlato rivolto ai bambini. I risultati principali hanno messo in luce come il ruolo facilitatore del canto in tale registro emerga alla fine del primo anno di vita quando, da un punto di vista evolutivo, si verifica un cambiamento nella capacità di discriminare i fonemi nativi e non nativi. È stato altresì possibile individuare benefici di una maggiore esposizione alla musica e al canto in fase preverbale, sia rispetto alla discriminazione fonetica che al successivo sviluppo lessicale.
This research work aims to explore infant-directed speech features in Italian language during the first year of an infant’s life and its effects on language acquisition, from precursors to advanced lexical and syntactic skills. The theoretical background assumes social bases of linguistic development. The first part consists of two studies on lexical, syntactic and prosodic properties in this special register. The second part includes two researches on quality and effects of linguistic input in language acquisition, taking into account a preverbal precursor and lexical and syntactic abilities during the second year of life; additionally, in this section, the predictors of language learning have been defined, exploring the role of linguistic input and the contribution of early communication skills in infants. The first study is a longitudinal design investigation, with an exhaustive analysis of lexical and syntactic characteristics of infant-directed speech in Italian language, comprehensive of both global and specific measures. From this investigation, the special register addressed to infants appears as a simplified but not simple with a period of maximum simplification in the second half of the first year of an infant’s life. The second longitudinal research examines prosodic properties in infant-directed speech and prosodic characterization of utterances with different pragmatic function. Results show how typical prosody in Italian infant-directed speech is overall emphasized in the preverbal period but, surprisingly, moderately; moreover, prosody changes during the first year even though without the same pattern of other non-tonal languages. Lastly, utterances with different pragmatic functions are characterized by a distinctive prosody. In the third contribution, predictors of language acquisition are longitudinally explored, analyzing the role of early communication skills in infants and of maternal input. In addition, input quality and stability are evaluated. About this topic, literature shows conflicting results. Overall, we find how subsequent linguistic abilities could be predicted by infant’s early communication skills and a by a rich, redundant, syntactically articulated but lexically repetitive input. Lastly, the fourth experimental work analyses the facilitator role of infant-directed song compared to infant-directed speech on the phonetic discrimination process, a preverbal precursor of language acquisition. Literature highlights how typical prosody in this special speech supports the identification of linguistic units in the verbal flow. Nevertheless, the role of infant-directed song has been poorly explored, especially as regard the development of a linguistic prerequisite. Main results prove a facilitator role of infant-directed song at the end of the first year of an infant’s life, when changes in the phonetic discrimination skill occur. Moreover, we find benefic effects of an higher musical and song exposition during the preverbal stage on both phonetic discrimination and subsequent lexical skills.
APA, Harvard, Vancouver, ISO, and other styles
2

Mustafa, M. K. "On-device mobile speech recognition." Thesis, Nottingham Trent University, 2016. http://irep.ntu.ac.uk/id/eprint/28044/.

Full text
Abstract:
Despite many years of research, Speech Recognition remains an active area of research in Artificial Intelligence. Currently, the most common commercial application of this technology on mobile devices uses a wireless client – server approach to meet the computational and memory demands of the speech recognition process. Unfortunately, such an approach is unlikely to remain viable when fully applied over the approximately 7.22 Billion mobile phones currently in circulation. In this thesis we present an On – Device Speech recognition system. Such a system has the potential to completely eliminate the wireless client-server bottleneck. For the Voice Activity Detection part of this work, this thesis presents two novel algorithms used to detect speech activity within an audio signal. The first algorithm is based on the Log Linear Predictive Cepstral Coefficients Residual signal. These LLPCCRS feature vectors were then classified into voice signal and non-voice signal segments using a modified K-means clustering algorithm. This VAD algorithm is shown to provide a better performance as compared to a conventional energy frame analysis based approach. The second algorithm developed is based on the Linear Predictive Cepstral Coefficients. This algorithm uses the frames within the speech signal with the minimum and maximum standard deviation, as candidates for a linear cross correlation against the rest of the frames within the audio signal. The cross correlated frames are then classified using the same modified K-means clustering algorithm. The resulting output provides a cluster for Speech frames and another cluster for Non–speech frames. This novel application of the linear cross correlation technique to linear predictive cepstral coefficients feature vectors provides a fast computation method for use on the mobile platform; as shown by the results presented in this thesis. The Speech recognition part of this thesis presents two novel Neural Network approaches to mobile Speech recognition. Firstly, a recurrent neural networks architecture is developed to accommodate the output of the VAD stage. Specifically, an Echo State Network (ESN) is used for phoneme level recognition. The drawbacks and advantages of this method are explained further within the thesis. Secondly, a dynamic Multi-Layer Perceptron approach is developed. This builds on the drawbacks of the ESN and provides a dynamic way of handling speech signal length variabilities within its architecture. This novel Dynamic Multi-Layer Perceptron uses both the Linear Predictive Cepstral Coefficients (LPC) and the Mel Frequency Cepstral Coefficients (MFCC) as input features. A speaker dependent approach is presented using the Centre for spoken Language and Understanding (CSLU) database. The results show a very distinct behaviour from conventional speech recognition approaches because the LPC shows performance figures very close to the MFCC. A speaker independent system, using the standard TIMIT dataset, is then implemented on the dynamic MLP for further confirmation of this. In this mode of operation the MFCC outperforms the LPC. Finally, all the results, with emphasis on the computation time of both these novel neural network approaches are compared directly to a conventional hidden Markov model on the CSLU and TIMIT standard datasets.
APA, Harvard, Vancouver, ISO, and other styles
3

Melnikoff, Stephen Jonathan. "Speech recognition in programmable logic." Thesis, University of Birmingham, 2003. http://etheses.bham.ac.uk//id/eprint/16/.

Full text
Abstract:
Speech recognition is a computationally demanding task, especially the decoding part, which converts pre-processed speech data into words or sub-word units, and which incorporates Viterbi decoding and Gaussian distribution calculations. In this thesis, this part of the recognition process is implemented in programmable logic, specifically, on a field-programmable gate array (FPGA). Relevant background material about speech recognition is presented, along with a critical review of previous hardware implementations. Designs for a decoder suitable for implementation in hardware are then described. These include details of how multiple speech files can be processed in parallel, and an original implementation of an algorithm for summing Gaussian mixture components in the log domain. These designs are then implemented on an FPGA. An assessment is made as to how appropriate it is to use hardware for speech recognition. It is concluded that while certain parts of the recognition algorithm are not well suited to this medium, much of it is, and so an efficient implementation is possible. Also presented is an original analysis of the requirements of speech recognition for hardware and software, which relates the parameters that dictate the complexity of the system to processing speed and bandwidth. The FPGA implementations are compared to equivalent software, written for that purpose. For a contemporary FPGA and processor, the FPGA outperforms the software by an order of magnitude.
APA, Harvard, Vancouver, ISO, and other styles
4

Safavi, Saeid. "Speaker characterization using adult and children's speech." Thesis, University of Birmingham, 2015. http://etheses.bham.ac.uk//id/eprint/6029/.

Full text
Abstract:
Speech signals contain important information about a speaker, such as age, gender, language, accent, and emotional/psychological state. Automatic recognition of these types of characteristics has a wide range of commercial, medical and forensic applications such as interactive voice response systems, service customization, natural human-machine interaction, recognizing the type of pathology of speakers, and directing the forensic investigation process. Many such applications depend on reliable systems using short speech segments without regard to the spoken text (text-independent). All these applications are also applicable using children’s speech. This research aims to develop accurate methods and tools to identify different characteristics of the speakers. Our experiments cover speaker recognition, gender recognition, age-group classification, and accent identification. However, similar approaches and techniques can be applied to identify other characteristics such as emotional/psychological state. The main focus of this research is on detecting these characteristics from children’s speech, which is previously reported as a more challenging subject compared to adult. Furthermore, the impact of different frequency bands on the performances of several recognition systems is studied, and the performance obtained using children’s speech is compared with the corresponding results from experiments using adults’ speech. Speaker characterization is performed by fitting a probability density function to acoustic features extracted from the speech signals. Since the distribution of acoustic features is complex, Gaussian mixture models (GMM) are applied. Due to lack of data, parametric model adaptation methods have been applied to adapt the universal background model (UBM) to the char acteristics of utterances. An effective approach involves adapting the UBM to speech signals using the Maximum-A-Posteriori (MAP) scheme. Then, the Gaussian means of the adapted GMM are concatenated to form a Gaussian mean super-vector for a given utterance. Finally, a classification or regression algorithm is used to identify the speaker characteristics. While effective, Gaussian mean super-vectors are of a high dimensionality resulting in high computational cost and difficulty in obtaining a robust model in the context of limited data. In the field of speaker recognition, recent advances using the i-vector framework have increased the classification accuracy. This framework, which provides a compact representation of an utterance in the form of a low dimensional feature vector, applies a simple factor analysis on GMM means.
APA, Harvard, Vancouver, ISO, and other styles
5

Tang, Andrea. "Narration and speech and thought presentation in comics." Thesis, University of Huddersfield, 2016. http://eprints.hud.ac.uk/id/eprint/27960/.

Full text
Abstract:
The purpose of this study was to test the application of two linguistic models of narration and one linguistic model of speech and thought presentation on comic texts: Fowler's (1986) internal and external narration types, Simpson's (1993) narrative categories from his 'modal grammar of point of view' and Leech and Short's (1981) speech and thought presentation scales. These three linguistic models of narration and speech and thought presentation, originally designed and used for the analysis of prose texts, were applied to comics, a multimodal medium that tells stories through a combination of both words and images. Through examples from comics, I demonstrate in this thesis that Fowler's (1986) basic distinction between internal and external narration types and Simpson's (1993) narrative categories (categories A, B(N) and B(R) narration) can be identified in both visual and textual forms in the pictures and the words of comics. I also demonstrate the potential application of Leech and Short's (1981) speech and thought presentation scales on comics by identifying instances of the scales' categories (NPV/NPT, NPSA/NPTA, DS/DT and FDS/FDT) from comics, but not all of the speech and thought presentation categories existed in my comic data (there was no evidence of IS/IT and the ategorisation of FIS/FIT was debatable). In addition, I identified other types of discourse that occurred in comics which were not accounted for by Leech and Short's (1981) speech and thought presentation categories: internally and externally-located DS and DT (DS and DT that are presented within (internally) or outside of (externally) the scenes that they originate from), narratorinfluenced forms of DS and DT (where narrator interference seems to occur in DS and DT), visual presentations of speech and thought (where speech and thought are represented by pictorial or symbolic content in balloons) and non-verbal balloons (where no speech or thought is being presented, but states of mind and emphasized pauses or silence are represented by punctuation marks and other symbols in speech balloons).
APA, Harvard, Vancouver, ISO, and other styles
6

Dalby, Jonathan Marler. "Phonetic structure of fast speech in American English." Bloomington : Reproduced by the Indiana University Linguistics Club, 1986. http://books.google.com/books?id=6MpWAAAAMAAJ.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Shen, Ao. "The selective use of gaze in automatic speech recognition." Thesis, University of Birmingham, 2014. http://etheses.bham.ac.uk//id/eprint/5202/.

Full text
Abstract:
The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between the ASR training and recognition conditions, which leads to considerable performance degradation. To improve noise-robustness, exploiting prior knowledge of the acoustic noise in speech enhancement, feature extraction and recognition models are popular approaches. An alternative approach presented in this thesis is to introduce eye gaze as an extra modality. Eye gaze behaviours have roles in interaction and contain information about cognition and visual attention; not all behaviours are relevant to speech. Therefore, gaze behaviours are used selectively to improve ASR performance. This is achieved by inference procedures using noise-dependant models of gaze behaviours and their temporal and semantic relationship with speech. `Selective gaze-contingent ASR' systems are proposed and evaluated on a corpus of eye movement and related speech in different clean, noisy environments. The best performing systems utilise both acoustic and language model adaptation.
APA, Harvard, Vancouver, ISO, and other styles
8

Najafian, Maryam. "Acoustic model selection for recognition of regional accented speech." Thesis, University of Birmingham, 2016. http://etheses.bham.ac.uk//id/eprint/6461/.

Full text
Abstract:
Accent is cited as an issue for speech recognition systems. Our experiments showed that the ASR word error rate is up to seven times greater for accented speech compared with standard British English. The main objective of this research is to develop Automatic Speech Recognition (ASR) techniques that are robust to accent variation. We applied different acoustic modelling techniques to compensate for the effects of regional accents on the ASR performance. For conventional GMM-HMM based ASR systems, we showed that using a small amount of data from a test speaker to choose an accent dependent model using an accent identification system, or building a model using the data from N neighbouring speakers in AID space, will result in superior performance compared to that obtained with unsupervised or supervised speaker adaptation. In addition we showed that using a DNN-HMM rather than a GMM-HMM based acoustic model would improve the recognition accuracy considerably. Even if we apply two stages of accent followed by speaker adaptation to the GMM-HMM baseline system, the GMM-HMM based system will not outperform the baseline DNN-HMM based system. For more contemporary DNN-HMM based ASR systems we investigated how adding different types of accented data to the training set can provide better recognition accuracy on accented speech. Finally, we proposed a new approach for visualisation of the AID feature space. This is helpful in analysing the AID recognition accuracies and analysing AID confusion matrices.
APA, Harvard, Vancouver, ISO, and other styles
9

Fritz, Isabella. "How gesture and speech interact during production and comprehension." Thesis, University of Birmingham, 2018. http://etheses.bham.ac.uk//id/eprint/8084/.

Full text
Abstract:
This thesis investigates the mechanisms that underlie the interaction of gesture and speech during the production and comprehension of language on a temporal and semantic level. The results from the two gesture-speech production experiments provide unambiguous evidence that gestural content is shaped online by the ways in which speakers package information into planning units in speech rather than being influenced by how events are lexicalised. In terms of gesture-speech synchronisation, a meta-analysis of these experiments showed that lexical items which are semantically related to the gesture's content (i.e., semantic affiliates) compete for synchronisation when these affiliates are separated within a sentence. This competition leads to large proportions of gestures not synchronising with any semantic affiliate. These findings demonstrate that gesture onset can be attracted by lexical items that do not co-occur with the gesture. The thesis then tested how listeners process gestures when synchrony is lost and whether preceding discourse related to a gesture's meaning impacts gesture interpretation and processing. Behavioural and ERP results show that gesture interpretation and processing is discourse dependent. Moreover, the ERP experiment demonstrates that when synchronisation between gesture and semantic affiliate is not present the underlying integration processes are different from synchronous gesture-speech combinations.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Li. "A syllable-based, pseudo-articulatory approach to speech recognition." Thesis, University of Birmingham, 2004. http://etheses.bham.ac.uk//id/eprint/4905/.

Full text
Abstract:
The prevailing approach to speech recognition is Hidden Markov Modelling, which yields good performance. However, it ignores phonetics, which has the potential for going beyond the acoustic variance to provide a more abstract underlying representation. The novel approach pursued in this thesis is motivated by phonetic and phonological considerations. It is based on the notion of pseudo-articulatory representations, which are abstract and idealized accounts of articulatory activity. The original work presented here demonstrates the recovery of syllable structure information from pseudo-articulatory representations directly without resorting to statistical models of phone sequences. The work is also original in its use of syllable structures to recover phonemes. This thesis presents the three-stage syllable based, pseudo-articulatory approach in detail. Though it still has problems, this research leads to a more plausible style of automatic speech recognition and will contribute to modelling and understanding speech behaviour. Additionally, it also permits a 'multithreaded' approach combining information from different processes.
APA, Harvard, Vancouver, ISO, and other styles
11

Hubert, Wolfgang. "Getting the most out of DAISY using synthetic speech." Deutsche Zentralbibliothek für Blinde Leipzig (DZB), 2010. https://slub.qucosa.de/id/qucosa%3A1233.

Full text
Abstract:
RTFC is a multichannel publishing tool which has been designed to convert text documents into several accessible formats. You can produce books in common file formats like plain text, HTML and HTML Help as well as large print, Braille, web Braille and DAISY. It implements a standard for e-books which was created at German schools for the blind. This standard makes it possible to get the most out of DAISY even though desktop publishing software often has no capabilities to mark up optional content like annotations or sidebars. Therefore RTFC is especially suitable to convert school books and other non-fiction literature.
APA, Harvard, Vancouver, ISO, and other styles
12

Khalil, G. "Using automatic speech recognition to evaluate Arabic to English transliteration." Thesis, Nottingham Trent University, 2013. http://irep.ntu.ac.uk/id/eprint/92/.

Full text
Abstract:
Increased travel and international communication has led to an increased need for transliteration of Arabic proper names for people, places, technical terms and organisations. There are a variety of available Arabic to English transliteration systems such as Unicode, the Buckwalter Arabic transliteration, and ArabTeX. The transliteration tables have been developed and used by researchers for many years, but there are only limited attempts to evaluate and compare different transliteration systems. This thesis investigates whether or not speech recognition technology could be used to evaluate different Arabic-English transliteration systems. In order to do so there were 5 main objectives: firstly, to investigate the possibility of using English speech recognition engines to recognize Arabic words; secondly, to establish the possibility of automatic transliteration of diacritised Arabic words for the purpose of creating a vocabulary for the speech recognition engine; thirdly, to explore the possibility of automatically generating transliterations of non diacritised Arabic words; fourthly to construct a general method to compare and evaluate different transliteration; and finally, to test the system and use it to experiment with new transliterations ideas. A novel testing method was found to evaluate transliteration rules and an automatic application system has been developed. This method was used to compare five existing transliteration tables: UN, Qalam, Buckwalter, ArabTeX and Alghamdi tables. From the results of these comparisons, new rules were developed in order to improve transliteration performance; these rules achieved of score 37.9% transliteration performance which is higher than the 19.1% score achieved using Alghamdi’s table which was the best performing of the existing transliteration tables tested. Most of the improvement was obtained by changing letter(s) for letter(s) transliterations, further improvements were made by more sophisticated rules based on combinations of letters and diacritics. Speech recognition performance is not a direct test of transliteration acceptability, but does correlate well with human judgement, and offers consistency and repeatability. The issues surrounding the user of English ASR for this application are discussed, as are proposals to further improve transliteration systems.
APA, Harvard, Vancouver, ISO, and other styles
13

Herms, Robert. "Effective Speech Features for Cognitive Load Assessment: Classification and Regression." Universitätsverlag Chemnitz, 2018. https://monarch.qucosa.de/id/qucosa%3A33346.

Full text
Abstract:
This thesis is about the effectiveness of speech features for cognitive load assessment, with particular attention being paid to new perspectives of this research area. A new cognitive load database, called CoLoSS, is introduced containing speech recordings of users who performed a learning task. Various acoustic features from different categories including prosody, voice quality, and spectrum are investigated in terms of their relevance. Moreover, Teager energy parameters, which have proven highly successful in stress detection, are introduced for cognitive load assessment and it is demonstrated how automatic speech recognition technology can be used to extract potential indicators. The suitability of the extracted features is systematically evaluated by recognition experiments with speaker-independent systems designed for discriminating between three levels of load. Additionally, a novel approach to speech-based cognitive load modelling is introduced, whereby the load is represented as a continuous quantity and its prediction can thus be regarded as a regression problem.
Die vorliegende Arbeit befasst sich mit der automatischen Erkennung von kognitiver Belastung auf Basis menschlicher Sprachmerkmale. Der Schwerpunkt liegt auf der Effektivität von akustischen Parametern, wobei die aktuelle Forschung auf diesem Gebiet um neuartige Ansätze erweitert wird. Hierzu wird ein neuer Datensatz – als CoLoSS bezeichnet – vorgestellt, welcher Sprachaufzeichnungen von Nutzern enthält und speziell auf Lernprozesse fokussiert. Zahlreiche Parameter der Prosodie, Stimmqualität und des Spektrums werden im Hinblick auf deren Relevanz analysiert. Darüber hinaus werden die Eigenschaften des Teager Energy Operators, welche typischerweise bei der Stressdetektion Verwendung finden, im Rahmen dieser Arbeit berücksichtigt. Ebenso wird gezeigt, wie automatische Spracherkennungssysteme genutzt werden können, um potenzielle Indikatoren zu extrahieren. Die Eignung der extrahierten Merkmale wird systematisch evaluiert. Dabei kommen sprecherunabhängige Klassifikationssysteme zur Unterscheidung von drei Belastungsstufen zum Einsatz. Zusätzlich wird ein neuartiger Ansatz zur sprachbasierten Modellierung der kognitiven Belastung vorgestellt, bei dem die Belastung eine kontinuierliche Größe darstellt und eine Vorhersage folglich als ein Regressionsproblem betrachtet werden kann.
APA, Harvard, Vancouver, ISO, and other styles
14

Wang, Costello Jingjing. "Comprehending synthetic speech personal and production influences." Doctoral diss., University of Central Florida, 2011. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5077.

Full text
Abstract:
With the increasing prevalence of voice-production technology across societies, clear comprehension while listening to synthetic speech is an obvious goal. Common human factors influences include the listener's language familiarity and age. Production factors include the speaking rate and clarity. This study investigated the speaking comprehension performance of younger and older adults who learned English as their first or second language. Presentations varied by the rate of delivery in words per minute (wpm) and in two forms, synthetic or natural speech. The results showed that younger adults had significantly higher comprehension performance than older adults. English as First Language (EFL) participants performed better than English as Second Language (ESL) participants for both younger and older adults, although the performance gap for the older adults was significantly larger than for younger adults. Younger adults performed significantly better than older adults at the slow speech rate (127 wpm), but surprisingly at the medium speech rate (188 wpm), both age groups performed similarly. Both young and older participants had better comprehension when listening to synthetic speech than natural speech. Both theoretical and design implications are provided from these findings. A cognitive diagnostic tool is proposed as a recommendation for future research.
ID: 030422764; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (Ph.D.)--University of Central Florida, 2011.; Includes bibliographical references (p. 98-104).
Ph.D.
Doctorate
Psychology
Sciences
APA, Harvard, Vancouver, ISO, and other styles
15

Verghese, Susha. "THE SPEECH SITUATION CHECKLIST: A NORMATIVE AND COMPARATIVE INVESTIGAT." Master's thesis, University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3862.

Full text
Abstract:
Studies conducted over the past decades have identified the presence of a greater amount of negative emotional reaction and speech disruption in particular speech situations among children who stutter, compared to those who do not (Brutten & Vanryckeghem, 2003b; Knudson, 1939; Meyers, 1986; Trotter, 1983). Laboratory investigations have been utilized to describe the particular situations that elicit the greatest or least amount of speech concern and fluency failures. More recently, in order to deal with the limitation of laboratory research, the use of self-report tests have gained popularity as a means of exploring the extent of negative emotional reaction and speech disruption in a wide array of speaking situations. However, the availability of such instruments for use with children has been limited. Toward this end, the Speech Situation Checklist (SSC) was designed for use with youngsters who do and do not stutter (Brutten 1965b, 2003b). Past investigations utilizing the SSC for Children have reported on reliability and validity information and provided useful normative data (Brutten & Vanryckeghem, 2003b; Trotter, 1983). Additionally, the findings from those research studies have consistently revealed statistically significant differences in speech-related negative emotional response and speech disorganization between children who do and do not stutter. However, since its initial construction, the SSC has undergone modifications and paucity of normative data for the current American form of the SSC has restricted its clinical use. To fill this void, the revised SSC for children was utilized in the present study to obtain current normative and comparative data for American grade-school stuttering and nonstuttering children. Additionally, the effect of age and gender (and their interaction) on the emotional reaction and speech disruption scores of the SSC was examined. The SSC self-report test was administered to 79 nonstuttering and 19 stuttering elementary and middle-school children between the ages of 6 and 13. Only those nonstutterers who showed no evidence of a speech, language, reading, writing or learning difficulty, or any additional motor or behavioral problems were included in the subject pool. Similarly, only those stuttering participants who did not demonstrate any language or speech disorder other than stuttering were contained in the study. Measures of central tendency and variance indicated an overall mean score of 78.26 (SD=19.34) and 85.69 (SD=22.25) for the sample of nonstuttering children on the Emotional Reaction section and Speech Disruption section of the SSC, respectively. For the group of stutterers the overall mean for Emotional Reaction was 109.53 (SD=34.35) and 109.42 (SD=21.33) for the Speech Disruption section. This difference in group means proved to be statistically significant for both emotional response (t=3.816, p=. 001) and fluency failures (t=4.169, p=. 000), indicating that, as a group, children who stutter report significantly more in the way of emotional response to and fluency failures in the situations described in the SSC, compared to their fluent peers. Significant high correlations were also obtained between the report of emotional response and the extent of fluency failures in the various speaking situations for both the group of nonstuttering (.70) and stuttering (.71) children. As far as the effect of age and gender is concerned, the present study found no significant difference in the ER and SD scores between the male and female or the younger and older group of nonstuttering children. Interestingly, a significant age by gender interaction was obtained for the nonstuttering children, only on the Speech Disruption section of the test.
M.A.
Department of Communicative Disorders
Health and Public Affairs
Communicative Disorders
APA, Harvard, Vancouver, ISO, and other styles
16

Al-Darkazali, Mohammed. "Image processing methods to segment speech spectrograms for word level recognition." Thesis, University of Sussex, 2017. http://sro.sussex.ac.uk/id/eprint/71675/.

Full text
Abstract:
The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone.
APA, Harvard, Vancouver, ISO, and other styles
17

Hu, Hongwei. "Towards an improved model of dynamics for speech recognition and synthesis." Thesis, University of Birmingham, 2012. http://etheses.bham.ac.uk//id/eprint/3704/.

Full text
Abstract:
This thesis describes the research on the use of non-linear formant trajectories to model speech dynamics under the framework of a multiple-level segmental hidden Markov model (MSHMM). The particular type of intermediate-layer model investigated in this study is based on the 12-dimensional parallel formant synthesiser (PFS) control parameters, which can be directly used to synthesise speech with a formant synthesiser. The non-linear formant trajectories are generated by using the speech parameter generation algorithm proposed by Tokuda and colleagues. The performance of the newly developed non-linear trajectory model of dynamics is tested against the piecewise linear trajectory model in both speech recognition and speech synthesis. In speech synthesis experiments, the 12 PFS control parameters and their time derivatives are used as the feature vectors in the HMM-based text-to-speech system. The human listening test and objective test results show that, despite the low overall quality of the synthetic speech, the non-linear trajectory model of dynamics can significantly improve the intelligibility and naturalness of the synthetic speech. Moreover, the generated non-linear formant trajectories match actual formant trajectories in real human speech fairly well. The \(\char{cmmi10}{0x4e}\)-best list rescoring paradigm is employed for the speech recognition experiments. Both context-independent and context-dependent MSHMMs, based on different formant-to-acoustic mapping schemes, are used to rescore an \(\char{cmmi10}{0x4e}\)-best list. The rescoring results show that the introduction of the non-linear trajectory model of formant dynamics results in statistically significant improvement under certain mapping schemes. In addition, the smoothing in the non-linear formant trajectories has been shown to be able to account for contextual effects such as coarticulation.
APA, Harvard, Vancouver, ISO, and other styles
18

Hmad, N. F. "Deep neural network acoustic models for multi-dialect Arabic speech recognition." Thesis, Nottingham Trent University, 2015. http://irep.ntu.ac.uk/id/eprint/27934/.

Full text
Abstract:
Speech is a desirable communication method between humans and computers. The major concerns of the automatic speech recognition (ASR) are determining a set of classification features and finding a suitable recognition model for these features. Hidden Markov Models (HMMs) have been demonstrated to be powerful models for representing time varying signals. Artificial Neural Networks (ANNs) have also been widely used for representing time varying quasi-stationary signals. Arabic is one of the oldest living languages and one of the oldest Semitic languages in the world, it is also the fifth most generally used language and is the mother tongue for roughly 200 million people. Arabic speech recognition has been a fertile area of reasearch over the previous two decades, as attested by the various papers that have been published on this subject. This thesis investigates phoneme and acoustic models based on Deep Neural Networks (DNN) and Deep Echo State Networks for multi-dialect Arabic Speech Recognition. Moreover, the TIMIT corpus with a wide variety of American dialects is also aimed to evaluate the proposed models. The availability of speech data that is time-aligned and labelled at phonemic level is a fundamental requirement for building speech recognition systems. A developed Arabic phoneme database (APD) was manually timed and phonetically labelled. This dataset was constructed from the King Abdul-Aziz Arabic Phonetics Database (KAPD) database for Saudi Arabia dialect and the Centre for Spoken Language Understanding (CSLU2002) database for different Arabic dialects. This dataset covers 8148 Arabic phonemes. In addition, a corpus of 120 speakers (13 hours of Arabic speech) randomly selected from the Levantine Arabic dialect database that is used for training and 24 speakers (2.4 hours) for testing are revised and transcription errors were manually corrected. The selected dataset is labelled automatically using the HTK Hidden Markov Model toolkit. TIMIT corpus is also used for phone recognition and acoustic modelling task. We used 462 speakers (3.14 hours) for training and 24 speakers (0.81 hours) for testing. For Automatic Speech Recognition (ASR), a Deep Neural Network (DNN) is used to evaluate its adoption in developing a framewise phoneme recognition and an acoustic modelling system for Arabic speech recognition. Restricted Boltzmann Machines (RBMs) DNN models have not been explored for any Arabic corpora previously. This allows us to claim priority for adopting this RBM DNN model for the Levantine Arabic acoustic models. A post-processing enhancement was also applied to the DNN acoustic model outputs in order to improve the recognition accuracy and to obtain the accuracy at a phoneme level instead of the frame level. This post process has significantly improved the recognition performance. An Echo State Network (ESN) is developed and evaluated for Arabic phoneme recognition with different learning algorithms. This investigated the use of the conventional ESN trained with supervised and forced learning algorithms. A novel combined supervised/forced supervised learning algorithm (unsupervised adaptation) was developed and tested on the proposed optimised Arabic phoneme recognition datasets. This new model is evaluated on the Levantine dataset and empirically compared with the results obtained from the baseline Deep Neural Networks (DNNs). A significant improvement on the recognition performance was achieved when the ESN model was implemented compared to the baseline RBM DNN model’s result. The results show that the ESN model has a better ability for recognizing phonemes sequences than the DNN model for a small vocabulary size dataset. The adoption of the ESNs model for acoustic modeling is seen to be more valid than the adoption of the DNNs model for acoustic modeling speech recognition, as ESNs are recurrent models and expected to support sequence models better than the RBM DNN models even with the contextual input window. The TIMIT corpus is also used to investigate deep learning for framewise phoneme classification and acoustic modelling using Deep Neural Networks (DNNs) and Echo State Networks (ESNs) to allow us to make a direct and valid comparison between the proposed systems investigated in this thesis and the published works in equivalent projects based on framewise phoneme recognition used the TIMIT corpus. Our main finding on this corpus is that ESN network outperform time-windowed RBM DNN ones. However, our developed system ESN-based shows 10% lower performance when it was compared to the other systems recently reported in the literature that used the same corpus. This due to the hardware availability and not applying speaker and noise adaption that can improve the results in this thesis as our aim is to investigate the proposed models for speech recognition and to make a direct comparison between these models.
APA, Harvard, Vancouver, ISO, and other styles
19

Welsh, Mackenzie. "A Systematic Examination of Practice Amount in Childhood Apraxia of Speech (CAS) Treatment Using an Integral Stimulation Approach." Master's thesis, Temple University Libraries, 2017. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/468695.

Full text
Abstract:
Communication Sciences
M.A.
The purpose of this study was to examine how a critical principle of motor learning, practice amount (high number of trials versus a low number of trials), affects speech motor learning in childhood apraxia of speech (CAS). It also sought to contribute to the literature base regarding using an integral stimulation approach for these children. Currently, a limited evidence base exists for decision-making regarding practice amount in CAS treatment. Using a single-case experimental design with two participants, three target sets of utterances (High Amount, Low Amount, and Control) received different amounts of treatment. Outcomes were compared in terms of retention. Targets were scored regarding perceptual (prosodic and segmental) accuracy. Effect sizes were computed to quantify the extent of treatment effects. For both participants, results show some evidence suggesting a higher amount of practice is advantageous and leads to greater learning. A low amount of treatment did not show clear differences compared to not receiving any treatment. Caution should be taken when interpreting these findings due to its small sample size and modest effects. Results suggest that the integral stimulation approach may only be effective if provided with a significantly high amount of practice. Further research is needed to examine how the principles of motor learning and the integral stimulation approach should be sensibly and systematically applied to promote best outcomes for this population.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
20

Keerio, Ayaz. "Acoustic analysis of Sindhi speech : a pre-curser for an ASR system." Thesis, University of Sussex, 2011. http://sro.sussex.ac.uk/id/eprint/6325/.

Full text
Abstract:
The functional and formative properties of speech sounds are usually referred to as acoustic-phonetics in linguistics. This research aims to demonstrate acoustic-phonetic features of the elemental sounds of Sindhi, which is a branch of the Indo-European family of languages mainly spoken in the Sindh province of Pakistan and in some parts of India. In addition to the available articulatory-phonetic knowledge; acoustic-phonetic knowledge has been classified for the identification and classification of Sindhi language sounds. Determining the acoustic features of the language sounds helps to bring together the sounds with similar acoustic characteristics under the name of one natural class of meaningful phonemes. The obtained acoustic features and corresponding statistical results for a particular natural class of phonemes provides a clear understanding of the meaningful phonemes of Sindhi and it also helps to eliminate redundant sounds present in the inventory. At present Sindhi includes nine redundant, three interchanging, three substituting, and three confused pairs of consonant sounds. Some of the unique acoustic-phonetic features of Sindhi highlighted in this study are determining the acoustic features of the large number of the contrastive voiced implosives of Sindhi and the acoustic impact of the language flexibility in terms of the insertion and digestion of the short vowels in the utterance. In addition to this the issue of the presence of the affricate class of sounds and the diphthongs in Sindhi is addressed. The compilation of the meaningful language phoneme set by learning their acoustic-phonetic features serves one of the major goals of this study; because twelve such sounds of Sindhi are studied that are not yet part of the language alphabet. The main acoustic features learned for the phonological structures of Sindhi are the fundamental frequency, formants, and the duration — along with the analysis of the obtained acoustic waveforms, the formant tracks and the computer generated spectrograms. The impetus for doing such research comes from the fact that detailed knowledge of the sound characteristics of the language-elements has a broad variety of applications — from developing accurate synthetic speech production systems to modeling robust speaker-independent speech recognizers. The major research achievements and contributions this study provides in the field include the compilation and classification of the elemental sounds of Sindhi. Comprehensive measurement of the acoustic features of the language sounds; suitable to be incorporated into the design of a Sindhi ASR system. Understanding of the dialect specific acoustic variation of the elemental sounds of Sindhi. A speech database comprising the voice samples of the native Sindhi speakers. Identification of the language‘s redundant, substituting and interchanging pairs of sounds. Identification of the language‘s sounds that can potentially lead to the segmentation and recognition errors for a Sindhi ASR system design. The research achievements of this study create the fundamental building blocks for future work to design a state-of-the-art prototype, which is: gender and environment independent, continuous and conversational ASR system for Sindhi.
APA, Harvard, Vancouver, ISO, and other styles
21

Samsudin, Nur Hana. "A study on reusing resources of speech synthesis for closely-related languages." Thesis, University of Birmingham, 2017. http://etheses.bham.ac.uk//id/eprint/7783/.

Full text
Abstract:
This thesis describes research on building a text-to-speech (TTS) framework that can accommodate the lack of linguistic information of under-resource languages by using existing resources from another language. It describes the adaptation process required when such limited resource is used. The main natural languages involved in this research are Malay and Iban language. The thesis includes a study on grapheme to phoneme mapping and the substitution of phonemes. A set of substitution matrices is presented which show the phoneme confusion in term of perception among respondents. The experiments conducted study the intelligibility as well as perception based on context of utterances. The study on the phonetic prosody is then presented and compared to the Klatt duration model. This is to find the similarities of cross language duration model if one exists. Then a comparative study of Iban native speaker with an Iban polyglot TTS using Malay resources is presented. This is to confirm that the prosody of Malay can be used to generate Iban synthesised speech. The central hypothesis of this thesis is that by using a closely-related language resource, a natural sounding speech can be produced. The aim of this research was to show that by sticking to the indigenous language characteristics, it is possible to build a polyglot synthesised speech system even with insufficient speech resources.
APA, Harvard, Vancouver, ISO, and other styles
22

Donaher, Joseph Gerard. "SPEECH FLUENCY DEMONSTRATED BY CHILDREN WITH TOURETTE SYNDROME." Diss., Temple University Libraries, 2008. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/7333.

Full text
Abstract:
Communication Sciences
Ph.D.
Children with Tourette Syndrome (CWTS) frequently exhibit a high prevalence of disfluent speech behaviors which are often labeled stuttering. The present study analyzed the fluency characteristics of CWTS, in comparison to children who stutter (CWS) and typically developing peers (TDP). It was predicted that CWTS would be less fluent than TDP but more fluent than CWS. A related purpose was to explore whether differences existed in the pattern of disfluencies demonstrated by these groups. To this end, it was predicted that CWTS would demonstrate significantly lower proportions of stuttering-like disfluencies than CWS and significantly higher proportions of stuttering-like disfluencies than TDP. Participants included eight CWTS, eight CWS and eight TDP. Speech samples, collected during a narrative story telling task, were analyzed to determine whether significant differences in the type and frequency of disfluencies were evident between the groups. Results revealed that CWTS were significantly more fluent than CWS and that CWTS produced significantly lower proportions of stuttering-like disfluencies than CWS. Although not statistically significant, CWTS were twice as disfluent as TDP and CWTS produced significantly higher proportions of stuttering-like disfluencies than TDP. These findings confirmed that CWTS present with an atypical disfluency pattern which can be differentiated from that of CWS and TDP based on the total disfluency level and the proportion of stuttering-like disfluencies.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
23

Stemmer, Georg. "Modeling variability in speech recognition /." Berlin : Logos-Verl, 2005. http://deposit.ddb.de/cgi-bin/dokserv?id=2659313&prov=M&dok_var=1&dok_ext=htm.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Strange, John. "VOICE AUTHENTICATIONA STUDY OF POLYNOMIAL REPRESENTATION OF SPEECH SIGNALS." Master's thesis, University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4015.

Full text
Abstract:
A subset of speech recognition is the use of speech recognition techniques for voice authentication. Voice authentication is an alternative security application to the other biometric security measures such as the use of fingerprints or iris scans. Voice authentication has advantages over the other biometric measures in that it can be utilized remotely, via a device like a telephone. However, voice authentication has disadvantages in that the authentication system typically requires a large memory and processing time than do fingerprint or iris scanning systems. Also, voice authentication research has yet to provide an authentication system as reliable as the other biometric measures. Most voice recognition systems use Hidden Markov Models (HMMs) as their basic probabilistic framework. Also, most voice recognition systems use a frame based approach to analyze the voice features. An example of research which has been shown to provide more accurate results is the use of a segment based model. The HMMs impose a requirement that each frame has conditional independence from the next. However, at a fixed frame rate, typically 10 ms., the adjacent feature vectors might span the same phonetic segment and often exhibit smooth dynamics and are highly correlated. The relationship between features of different phonetic segments is much weaker. Therefore, the segment based approach makes fewer conditional independence assumptions which are also violated to a lesser degree than for the frame based approach. Thus, the HMMs using segmental based approaches are more accurate. The speech polynomials (feature vectors) used in the segmental model have been shown to be Chebychev polynomials. Use of the properties of these polynomials has made it possible to reduce the computation time for speech recognition systems. Also, representing the spoken word waveform as a Chebychev polynomial allows for the recognition system to easily extract useful and repeatable features from the waveform allowing for a more accurate identification of the speaker. This thesis describes the segmental approach to speech recognition and addresses in detail the use of Chebychev polynomials in the representation of spoken words, specifically in the area of speaker recognition. .
M.S.
Department of Mathematics
Arts and Sciences
Mathematics
APA, Harvard, Vancouver, ISO, and other styles
25

Kovacs, Nicolette. "TREATMENT OF CHILDHOOD APRAXIA OF SPEECH: A SINGLE-CASE EXPERIMENTAL DESIGN STUDY OF INTENSITY OF TREATMENT." Master's thesis, Temple University Libraries, 2017. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/462591.

Full text
Abstract:
Communication Sciences
M.A.
Childhood Apraxia of Speech (CAS) is a pediatric motor-speech disorder which has been controversial due to its difficulty to diagnose and little progress in treatment. The purpose of the present study was to examine a principle of motor learning (PML) within the context of an evidence-based treatment for this disorder, as a way to improve outcomes for children with CAS. In particular, this study examines the role of intensity, specifically, massed versus distributed practice, when treating CAS using a modified form of Dynamic Temporal Tactile Cueing (DTTC; Strand et al., 2006). Two participants with CAS between the ages of 5 and 11 received massed and distributed practice on individualized targets in an single-case alternating treatments design with multiple baselines. Accuracy of speech targets on probe tasks was judged by blinded listeners. Results were interpreted through inspection of graphs and calculation of effect sizes. The results of the study showed that massed practice had a marginal benefit over distributed practice. Implications from this study suggest the importance of continued research examining the role of PML in CAS treatment and the value of using a massed-treatment approach when treating CAS.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
26

Moranski, Kara. "Spanish Native-Speaker Perception of Accentedness in Learner Speech." Diss., Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/185727.

Full text
Abstract:
Spanish
Ph.D.
Building upon current research in native-speaker (NS) perception of L2 learner phonology (Zielinski, 2008; Derwing & Munro, 2009), the present investigation analyzed multiple dimensions of NS speech perception in order to achieve a more complete understanding of the specific linguistic elements and attitudinal variables that contribute to perceptions of accent in learner speech. In this mixed-methods study, Spanish monolinguals (n = 18) provided information regarding their views of L1 American English (AE) speakers learning Spanish and also evaluated the extemporaneous production of L2 learners from this same population. The evaluators' preconceived attitudinal notions of L1 AE speakers learning Spanish negatively correlated with numerical accentedness ratings for the speech samples, indicating that evaluators with more positive perceptions of the learners rated their speech as less accented. Following initial numerical ratings, evaluators provided detailed commentary on the individual phonological elements from each utterance that they perceived as "nonnative." Results show that differences in the relative salience of the nonnative segmental productions correspond with certain phonetic and phonemic processes occurring within the sounds, such as aspiration, spirantization and lateralization.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
27

Alverio, Gustavo. "DISCUSSION ON EFFECTIVE RESTORATION OF ORAL SPEECH USING VOICE CONVERSION TECHNIQUES BASED ON GAUSSIAN MIXTURE MODELING." Master's thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2909.

Full text
Abstract:
Today's world consists of many ways to communicate information. One of the most effective ways to communicate is through the use of speech. Unfortunately many lose the ability to converse. This in turn leads to a large negative psychological impact. In addition, skills such as lecturing and singing must now be restored via other methods. The usage of text-to-speech synthesis has been a popular resolution of restoring the capability to use oral speech. Text to speech synthesizers convert text into speech. Although text to speech systems are useful, they only allow for few default voice selections that do not represent that of the user. In order to achieve total restoration, voice conversion must be introduced. Voice conversion is a method that adjusts a source voice to sound like a target voice. Voice conversion consists of a training and converting process. The training process is conducted by composing a speech corpus to be spoken by both source and target voice. The speech corpus should encompass a variety of speech sounds. Once training is finished, the conversion function is employed to transform the source voice into the target voice. Effectively, voice conversion allows for a speaker to sound like any other person. Therefore, voice conversion can be applied to alter the voice output of a text to speech system to produce the target voice. The thesis investigates how one approach, specifically the usage of voice conversion using Gaussian mixture modeling, can be applied to alter the voice output of a text to speech synthesis system. Researchers found that acceptable results can be obtained from using these methods. Although voice conversion and text to speech synthesis are effective in restoring voice, a sample of the speaker before voice loss must be used during the training process. Therefore it is vital that voice samples are made to combat voice loss.
M.S.E.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering MSEE
APA, Harvard, Vancouver, ISO, and other styles
28

Kaplan, Leah. "The Meaning of Being in Speech: Language, Narrative, and Thought." Honors in the Major Thesis, University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/952.

Full text
Abstract:
In this thesis I will follow the works of Jacques Derrida and Hans-Georg Gadamer, reconciling both thinkers by providing a reflection on the necessary and foundational conditions for the experience of meaning. A reflection on Jacques Derrida's formulations on différance, trace, absence, presence, clôture, and hospitality, alongside Gadamer's critical hermeneutics on the aesthetics of play and interpretation will open up this tension and provide a new relation for the possibility for meaning. By reconciling these two philosophers it will become apparent that the Self-Other relationship, the activ-ity of difference,and the trace, all condition a space for heterogeneity within linguistic, hermeneutic, and narrative meaning. It is my case here that we must submit to the multiplicity of identities of meaning in language and reformulate the idea of meaning as a development that emerges not from a radically subjective consciousness, but constituted by absence, history as trace, and most importantly the 'Other.'
B.A.
Bachelors
Arts and Humanities
Philosophy
APA, Harvard, Vancouver, ISO, and other styles
29

Goldenberg, Rebecca. "EFFECTS OF CONVERSATIONAL GROUP TREATMENT ON PATIENT-REPORTED OUTCOME MEASURES OF COMMUNICATION AND SOCIAL ISOLATION." Master's thesis, Temple University Libraries, 2018. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/493605.

Full text
Abstract:
Public Health
M.A.
Individuals with aphasia (IWA) experience deficits in language and communication as well as loss of social networks and decreased social participation. The purpose of the present study was to build on previous research and design a randomized control study that measures the direct effects of conversational group treatment on language and social isolation from the perspective of the individual with aphasia (IWA). Group conversational treatment was administered for one hour, twice weekly for 10 weeks. Thirty-two IWA were randomly assigned to a treatment group or delay control group. All participants were administered a battery of standardized measures of language and communication and two patient-reported outcome measures (PRO's). The Lubben Social Network Scale (Lubben) and the Aphasia Communication Outcome Measure (adaptive ACOM) were administered at pre-treatment, post-treatment, and six-weeks post-treatment. The ACOM specifically measures the effects of aphasia on everyday communication tasks and changes in language and communication. The Lubben determines outcomes related to social isolation and perceived social support from family and friends. Significant changes were found on the ACOM for IWA in the treatment group from pre-treatment to post-treatment and pre-treatment to maintenance. No significant changes were found for the control group. For the Lubben, no significant changes were found for IWA in the treatment group or control group from pre-treatment to post-treatment or pre-treatment to maintenance. The results from this study indicate conversational group treatment was effective in increasing self-perceived language and communication abilities in individuals with aphasia. As IWA feel they can effectively communicate, it can increase group participation, communication with friends/family, and facilitate return to pre-stroke activities.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
30

Rodriguez, Victoria. "LOOKING BIAS: AN EXAMINATION OF THE RELATIONSHIP BETWEEN VISUAL SEARCH AND PRENOMINAL ADJECTIVE ORDER IN ENGLISH." Master's thesis, Temple University Libraries, 2017. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/443754.

Full text
Abstract:
Public Health
M.A.
The aim of the present study was to determine the relationship between the order of analysis of objects within the visual system and prenominal adjective ordering rules in English, as past syntactic and semantic theories have proven insufficient to explain the phenomenon in its entirety. Three experiments were designed to investigate whether ordering preferences when multiple adjectives are stacked before a noun are determined by properties of the visual system that subsequently map directly onto language via the semantic system. First, an experimental protocol was designed to discover whether participants’ visual search pattern varied based on the type of stimuli presented. A second experiment was created to determine whether participants observed features of objects in an order that corresponded to grammatical adjective ordering rules in English. A third and final experiment was devised to explore whether inversions of adjective categories typically positioned closer to the noun were more acceptable than inversions of adjective categories placed further away from the noun or vice versa. Eye tracking data was analyzed for scan sequence (Experiments 1 and 2) and acceptability judgments were obtained using a 7-point Likert Scale survey (Experiment 3). Results showed that participants did not vary systematic scan patterns based on image type, with a greater propensity to not fixate when presented with shapes. Data from the second experiment demonstrated that participants viewed objects in an order that was correlated with prenominal adjective ordering with varying levels of significance. Acceptability judgments from the third experiment indicated that inversions of adjective classes that are typically placed closer to the noun were generally more acceptable than inversions of adjective classes typically placed further from the noun. This study provides preliminary evidence that language rules may be derived from properties of the visual system and cognition. Further research is necessary to explore the nature and extent of correlations between perception, the semantic system, and grammatical features of language.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
31

Leach, Corinne. "MANIPULATING TEMPORAL COMPONENTS DURING SINGLE-WORD PROCESSING TO FACILITATE ACCESS TO STORED ORTHOGRAPHIC REPRESENTATIONS IN LETTER-BY-LETTER READERS." Master's thesis, Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/574233.

Full text
Abstract:
Public Health
M.A.
This study investigated the benefits of rapid presentation of written words as a treatment strategy to enhance reading speed and accuracy in two participants with acquired alexia who are letter-by-letter readers. Previous studies of pure alexia have shown that when words are rapidly presented, participants can accurately perform lexical decision and category judgment tasks, yet they are unable to read words aloud. These studies suggest that rapid presentation of words could be used as a treatment technique to promote whole-word reading. It was predicted that treatment utilizing rapid presentation (250/500 ms) will increase reading speed and accuracy of both trained and untrained words compared to the words trained in standard presentation (5000 ms). A single-subject ABACA/ACABA multiple baseline treatment design was used. Treatment was provided twice per week for four weeks for both rapid and standard presentation treatment. Each session comprised a spoken-to-written word decision task and semantic category judgment task. Stimuli included 80 trained words divided between the two treatments and 20 untrained controls. Weekly probes to assess reading accuracy were administered after every two treatment sessions. Based on effect sizes, results showed no consistent unambiguous benefit for rapid or standard presentation treatment. However, possible generalization to untrained words due to rapid presentation treatment was observed. Future research is warranted to investigate the effectiveness of rapid presentation treatment in letter-by-letter readers.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
32

Hanani, Abualseoud. "Human and computer recognition of regional accents and ethnic groups from British English speech." Thesis, University of Birmingham, 2012. http://etheses.bham.ac.uk//id/eprint/3279/.

Full text
Abstract:
The paralinguistic information in a speech signal includes clues to the geographical and social background of the speaker. This thesis is concerned with automatic extraction of this information from a short segment of speech. A state-of-the-art Language Identication (ID) system, which is obtained by fusing variant of Gaussian mixture model and support vector machines, is developed and evaluated on the NIST 2003 and 2005 Language Recognition Evaluation (LRE) tasks. This system is applied to the problems of regional accent recognition for British English, and ethnic group recognition within a particular accent. We compare the results with human performance and, for accent recognition, the `text dependent' ACCDIST accent recognition measure. For the fourteen regional accents of British English in the ABI-1 corpus (good quality read speech), our language ID system achieves a recognition accuracy of 86.4%, compared with 95.18% for our best ACCDIST-based system and 58.24% for human listeners. The "Voices across Birmingham" corpus contains signicant amounts of telephone conversational speech for the two largest ethnic groups in the city of Birmingham (UK), namely the `Asian' and `White' communities. Our language ID system distinguishes between these two groups with an accuracy of 94.3% compared with 90.24% for human listeners. Although direct comparison is difficult, it seems that our language ID system performs much better on the standard twelve class NIST 2003 Language Recognition Evaluation task or the two class ethnic group recognition task than on the fourteen class regional accent recognition task. We conclude that automatic accent recognition is a challenging task for speech technology, and that the use of natural conversational speech may be advantageous for these types of paralinguistic task. One issue with conventional approaches to language ID that use high-order Gaussian Mixture Models (GMMs) and high-dimensional feature vectors is the amount of computing power that they require. Currently, multi-core Graphics Processing Units (GPUs)provide a possible solution at very little cost. In this thesis we also explore the application of GPUs to speech signal and pattern processing, using language ID as a vehicle to demonstrate their benefits. Realisation of the full potential of GPUs requires both effective coding of predetermined algorithms, and, in cases where there is a choice, selection of the algorithm or technique for a specific function that is most able to exploit the properties of the GPU. We demonstrate these principles using the NIST LRE 2003 task, which involves processing over 600 hours of speech. We focus on two parts of the system, namely the acoustic classifier, which is based on a 2048 component GMM, and the acoustic feature extraction process. In the case of the latter we compare a conventional FFT-based analysis with an FIR filter bank, both in terms of their ability to exploit the GPU architecture and language ID performance. With no increase in error rate our GPU based system, with an FIR-based front-end, completes the full NIST LRE 2003 task in 16 hours, compared with 180 hours for the more conventional FFT-based system on a standard CPU (a speed up factor of more than 11).
APA, Harvard, Vancouver, ISO, and other styles
33

Bai, Linxue. "Speech analysis using very low-dimensional bottleneck features and phone-class dependent neural networks." Thesis, University of Birmingham, 2018. http://etheses.bham.ac.uk//id/eprint/8137/.

Full text
Abstract:
The first part of this thesis focuses on very low-dimensional bottleneck features (BNFs), extracted from deep neural networks (DNNs) for speech analysis and recognition. Very low-dimensional BNFs are analysed in terms of their capability of representing speech and their suitability for modelling speech dynamics. Nine-dimensional BNFs obtained from a phone discrimination DNN are shown to give comparable phone recognition accuracy to 39-dimensional MFCCs, and an average of 34% higher phone recognition accuracy than formant-based features of the same dimensions. They also preserve the trajectory continuity well and thus hold promise for modelling speech dynamics. Visualisations and interpretations of the BNFs are presented, with phonetically motivated studies of the strategies that DNNs employ to create these features. The relationships between BNF representations resulting from different initialisations of DNNs are explored. The second part of this thesis considers BNFs from the perspective of feature extraction. It is motivated by the observation that different types of speech sounds lend themselves to different acoustic analysis, and that the mapping from spectra-in-context to phone posterior probabilities implemented by the DNN is a continuous approximation to a discontinuous function. This suggests that it may be advantageous to replace the single DNN with a set of phone class dependent DNNs. In this case, the appropriate mathematical structure is a manifold. It is shown that this approach leads to significant improvements in frame level phone classification accuracy.
APA, Harvard, Vancouver, ISO, and other styles
34

Steinmeier, Ralf, Stephan B. Sobottka, Gilfe Reiss, Jan Bredow, Johannes Gerber, and Gabriele Schackert. "Surgery of Low-Grade Gliomas Near Speech-Eloquent Regions: Brainmapping versus Preoperative Functional Imaging." Karger, 2002. https://tud.qucosa.de/id/qucosa%3A27614.

Full text
Abstract:
The identification of eloquent areas is of utmost importance in the surgery of tumors located near speech-eloquent brain areas, since the classical concept of a constant localization was proven to be untrue and the spatial localization of these areas may show large interindividual differences. Some neurosurgical centers apply intraoperative electrophysiological methods that, however, necessitate the performance of surgery in the awake patient. This might be a severe burden both for the patient and the operating team in a procedure that lasts several hours; in addition, electrical stimulation may generate epileptic seizures. Alternatively, methods of functional brain imaging (e.g., PET, fMRI, MEG) may be applied, which allow individual localization of speech-eloquent areas. Matching of these image data with a conventional 3D-CT or MRI now allows the exact transfer of this information into the surgical field by neuronavigation. Whereas standards concerning electrophysiological stimulation techniques that could prevent a permanent postoperative worsening of language are available, until now it remains unclear whether the resection of regions shown to be active in functional brain imaging will cause a permanent postoperative deficit.
Die Identifikation sprachaktiver Areale ist von höchster Bedeutung bei der Operation von Tumoren in der Nähe des vermuteten Sprachzentrums, da das klassische Konzept einer konstanten Lokalisation des Sprachzentrums sich als unrichtig erwiesen hat und die räumliche Ausdehnung dieser Areale eine hohe interindividuelle Varianz aufweisen kann. Einige neurochirurgische Zentren benutzen deshalb intraoperativ elektrophysiologische Methoden, die jedoch eine Operation am wachen Patienten voraussetzen. Dies kann sowohl für den Patienten als auch das Operations-Team eine schwere Belastung bei diesem mehrstündigen Eingriff darstellen, zusätzlich können epileptische Anfälle durch die elektrische Stimulation generiert werden. Alternativ können Modalitäten des «functional brain imaging» (PET, fMRT, MEG usw.) eingesetzt werden, die die individuelle Lokalisation sprachaktiver Areale gestatten. Die Bildfusion dieser Daten mit einem konventionellen 3D-CT oder MRT erlaubt den exakten Transfer dieser Daten in den OP-Situs mittels Neuronavigation. Während Standards bei elektrophysiologischen Stimulationstechniken existieren, die eine permanente postoperative Verschlechterung der Sprachfunktion weitgehend verhindern, bleibt die Relevanz sprachaktiver Areale bei den neuesten bildgebenden Techniken bezüglich einer Operations-bedingten Verschlechterung der Sprachfunktion bisher noch unklar.
Dieser Beitrag ist mit Zustimmung des Rechteinhabers aufgrund einer (DFG-geförderten) Allianz- bzw. Nationallizenz frei zugänglich.
APA, Harvard, Vancouver, ISO, and other styles
35

Tronnier, Mechtild. "Nasals and nasalisation in speech production with special emphasis on methodology and Osaka Japanese /." Lund : Lund University Press, 1998. http://books.google.com/books?id=nxZZAAAAMAAJ.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Abe, Mariko. "Syntactic variation across proficiency levels in Japanese EFL learner speech." Diss., Temple University Libraries, 2015. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/350754.

Full text
Abstract:
Teaching & Learning
Ed.D.
Overall patterns of language use variation across oral proficiency levels of 1,243 Japanese EFL learners and 20 native speakers of English using the linguistic features set from Biber (1988) were investigated in this study. The approach combined learner corpora, language processing techniques, visual inspection of descriptive statistics, and multivariate statistical analysis to identify characteristics of learner language use. The largest spoken learner corpus in Japan, the National Institute of Information and Communications Technology Japanese Learner English (NICT JLE) Corpus was used for the analysis. It consists of over one million running words of L2 spoken English with oral proficiency level information. The level of the material in the corpus is approximately equal to a Test of English for International Communication (TOEIC) range of 356 to 921. It also includes data gathered from 20 native speakers who performed identical speaking tasks as the learners. The 58 linguistic features (e.g., grammatical features) were taken from the original list of 67 linguistic features in Biber (1988) to explore the variation of learner language. The following research questions were addressed. First, what linguistic features characterize different oral proficiency levels? Second, to what degree do the language features appearing in the spoken production of high proficiency learners match those of native speakers who perform the same task? Third, is the oral production of Japanese EFL learners rich enough to display the full range of features used by Biber? Grammatical features alone would not be enough to comprehensively distinguish oral proficiency levels, but the results of the study show that various types of grammatical features can be used to describe differences in the levels. First, frequency change patterns (i.e., a rising, a falling, a combination of rising, falling, and a plateauing) across the oral proficiency levels were shown through linguistic features from a wide range of categories: (a) part-of-speech (noun, pronoun it, first person pronoun, demonstrative pronoun, indefinite pronoun, possibility modal, adverb, causative adverb), (b) stance markers (emphatic, hedge, amplifier), (c) reduced forms (contraction, stranded preposition), (d) specialized verb class (private verb), complementation (infinitive), (e) coordination (phrasal coordination), (f) passive (agentless passive), and (g) possibly tense and aspect markers (past tense, perfect aspect). In addition, there is a noticeable gap between native and non-native speakers of English. There are six items that native speakers of English use more frequently than the most advanced learners (perfect aspect, place adverb, pronoun it, stranded preposition, synthetic negation, emphatic) and five items that native speakers use less frequently (past tense, first person pronoun, infinitive, possibility modal, analytic negation). Other linguistic features are used with similar frequency across the levels. What is clear is that the speaking tasks and the time allowed for provided ample opportunity for most of Biber’s features to be used across the levels. The results of this study show that various linguistic features can be used to distinguish different oral proficiency levels, and to distinguish the oral language use of native and non-native speakers of English.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
37

Al-Owaidi, Muhtaram. "Investigating speech acts in English and Arabic short news interviews : a cross-cultural pragmatic study." Thesis, University of Huddersfield, 2018. http://eprints.hud.ac.uk/id/eprint/34754/.

Full text
Abstract:
In the last three decades, Speech Act Theory has been displaced from the spotlight of pragmatic research and relegated to the back seat of this field. This has been the case despite the potential this theory still has to serve pragmatic research. This study is an attempt to revive and develop speech act theory by means of applying it to interactive naturally-occurring discourse proposing a number of different types of speech act and incorporating into analysis a wider range of pragmatic IFIDs. The main purpose of the study is to: (1) investigate speech acts in interaction and find out which 'illocutionary force indicating devices (IFIDs) are used to identify speech acts in an interactive context, and (2) compare the investigated speech acts and IFIDs cross-culturally between English and Arabic. Regarding data, the study investigated 12 English and Arabic short news interviews (six each). Some of these were video-recorded live from BBC and Sky news channels (English dataset) and Al-Arabiya, Sky news Arabia and Al-Wataniya channels (Arabic dataset). Other interviews were downloaded from YouTube. Two topics were the focus of these interviews: (1) the immigration crisis in 2015 (six English and Arabic interviews), and (2) the Iranian nuclear deal in 2015 (six English and Arabic interviews). The study investigated the two datasets to find which speech acts are used in short news interviews and what interactional IFIDs are used to identify them. Results show that many different speech acts are used in news interviews — the study counted 48 individual speech acts in the analysed interviews. However, it was found that a mere itemizing and classification of speech acts in the classical sense (Austin‘s and Searle‘s classifications) was not enough. In addition, the study identifies various new types of speech acts according to the role they play in the ongoing discourse. The first type is termed turn speech acts‘. These are speech acts which have special status in the turn they occur in and are of two subtypes: 'main act' and 'overall speech act'. The second type is 'interactional acts'. These are speech acts which are named in relation to other speech acts in the same exchange. The third type is ̳superior speech acts‘. These are superordinate speech acts with the performance of which other subordinate (inferior) speech acts are performed as well. The study also found three different types of utterances vis-à-vis the speech acts they perform. These are 'single utterance' (which performs a single speech act only), 'double-edged utterance' (which performs two speech acts concurrently) and 'Fala utterance' (which performs three speech acts together). As for IFIDs, the study found that several already-established pragmatic concepts can help identify speech acts in interaction. These are Adjacency Pair, Activity Type, Cooperative Principle, Politeness Principle, Facework, Context (Co-utterance and Pragmalinguistic cues). These devices are new additions to Searle‘s original list of IFIDs. Furthermore, they are expanding this concept as they include a type of IFID different from the original ones. Finally, the study has found no significant differences between English and Arabic news interviews as regards speech acts (types), utterance types and the analysed IFIDs. The study attracts attention to Speech Act Theory and encourages further involvement of this theory in other genres of interactive discourse (e.g., long interviews, chat shows, written internet chat, etc.). It also encourages further exploration of the different types of speech acts and utterances discussed in this study as well as probing the currently-investigated and other IFIDs. It is hoped that by returning to the core insight of SAT (i.e., that language-in-use does things) and at the same time freeing it from its pragmalinguistic shackles, its value can be seen more clearly.
APA, Harvard, Vancouver, ISO, and other styles
38

Almeman, Khalid Abdulrahman. "Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition." Thesis, University of Birmingham, 2015. http://etheses.bham.ac.uk//id/eprint/5763/.

Full text
Abstract:
This thesis has two aims: developing resources for Arabic dialects and improving the speech recognition of Arabic dialects. Two important components are considered: Pronunciation Dictionary (PD) and Language Model (LM). Six parts are involved, which relate to building and evaluating dialects resources and improving the performance of systems for the speech recognition of dialects. Three resources are built and evaluated: one tool and two corpora. The methodology that was used for building the multi-dialect morphology analyser involves the proposal and evaluation of linguistic and statistic bases. We obtained an overall accuracy of 94%. The dialect text corpora have four sub-dialects, with more than 50 million tokens. The multi-dialect speech corpora have 32 speech hours, which were collected from 52 participants. The resultant speech corpora have more than 67,000 speech files. The main objective is improvement in the PDs and LMs of Arabic dialects. The use of incremental methodology made it possible to check orthography and phonology rules incrementally. We were able to distinguish the rules that positively affected the PDs. The Word Error Rate (WER) improved by an accuracy of 5.3% in MSA and 5% in Levantine. Three levels of morphemes were used to improve the LMs of dialects: stem, prefix+stem and stem+suffix. We checked the three forms using two different types of LMs. Eighteen experiments are carried out on MSA, Gulf dialect and Egyptian dialect, all of which yielded positive results, showing that WERs were reduced by 0.5% to 6.8%.
APA, Harvard, Vancouver, ISO, and other styles
39

Argyriou, Paraskevi. "Gestures and metaphor : evidence for gestures' self-oriented functions and hemispheric involvement for speech production." Thesis, University of Birmingham, 2016. http://etheses.bham.ac.uk//id/eprint/6631/.

Full text
Abstract:
The current thesis investigates the link between gestures and metaphor. In Chapter 3, we investigated whether left-hand gestures improve metaphor explanation compared to right-hand gestures or not gesturing at all. Additionally, we collected individual measurements for hemispheric involvement during speech production using the mouth asymmetry technique. We found a left-over-right hand gesturing advantage, which was higher for those with stronger right hemispheric involvement during speech production. This finding suggested that gestures’ self-oriented functions are hemisphere specific. In Chapter 4, we investigated whether left-hand gestures rather than taps trigger metaphorical language use. We found no such evidence, but we found that gestures compared to taps increased the number of words uttered, which in turn led to the use of more metaphors. This points towards gestures’ facilitative effect on speech production, but further research is needed to pin-point exactly what process is facilitated. In Chapter 5, we investigated whether gestures with a particular hand, when produced without speech, prime semantic categorisation of sentences (metaphorical and literal). We found no evidence for priming effects, and further research is needed to examine the effects that gestures, when produced alone might have on semantic processing. Finally, in Chapter 6 we found that producing content compared to function words, makes metaphor processing right hemisphere specific. This indicated that semantic processing is the key to the lateralisation of metaphor processing. The results validated the use of the mouth asymmetry technique as an indirect measurement of hemispheric involvement during speech production tasks.
APA, Harvard, Vancouver, ISO, and other styles
40

Müller, Rainer, Andreas Höhlein, Annette Wolf, Jutta Markwardt, Matthias C. Schulz, Ursula Range, and Bernd Reitemeier. "Evaluation of Selected Speech Parameters after Prosthesis Supply in Patients with Maxillary or Mandibular Defects." Karger, 2013. https://tud.qucosa.de/id/qucosa%3A71635.

Full text
Abstract:
Background: Ablative surgery of oropharyngeal tumors frequently leads to defects in the speech organs, resulting in impairment of speech up to the point of unintelligibility. The aim of the present study was the assessment of selected parameters of speech with and without resection prostheses. Patients and Methods: The speech sounds of 22 patients suffering from maxillary and mandibular defects were recorded using a digital audio tape (DAT) recorder with and without resection prostheses. Evaluation of the resonance and the production of the sounds /s/, /sch/, and /ch/ was performed by 2 experienced speech therapists. Additionally, the patients completed a non-standardized questionnaire containing a linguistic self-assessment. Results: After prosthesis supply, the number of patients with rhinophonia aperta decreased from 7 to 2 while the number of patients with intelligible speech increased from 2 to 20. Correct production of the sounds /s/, /sch/, and /ch/ increased from 2 to 13 patients. A significant improvement of the evaluated parameters could be observed only in patients with maxillary defects. The linguistic self-assessment showed a higher satisfaction in patients with maxillary defects. Conclusion: In patients with maxillary defects due to ablative tumor surgery, an increase in speech performance and intelligibility is possible by supplying resection prostheses.
APA, Harvard, Vancouver, ISO, and other styles
41

Koblick, Heather. "EFFECTS OF SIMULTANEOUS EXERCISE AND SPEECH TASKS ON THE PERCEPTION OF." Master's thesis, University of Central Florida, 2004. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2965.

Full text
Abstract:
The purpose of this study was to investigate the effects of voice production and perception of dyspnea in aerobic instructors during simultaneous tasks of exercise and speech production. The study aimed to document changes that occur during four conditions: 1) voice production without exercise and no use of amplification; 2) voice production without exercise and the use of amplification; 3) voice production during exercise without the use of amplification; 4) voice production during exercise with the use of amplification. Participants included ten aerobic instructors (two male and eight female). The dependent variables included vocal intensity, average fundamental frequency (F0), noise-to-harmonic ratio (NHR), jitter percent (jitt %), shimmer percent (shim %), and participants' self-perception of dyspnea. The results indicated that speech alone, whether it was with or without amplification, had no effect on the sensation of dyspnea. However, when combining speech with exercise, the speech task became increasingly difficult, even more so without the use of amplification. Exercise was observed to inhibit vocal loudness levels as vocal intensity measures were lowest in the conditions with exercise with the use of amplification. Increases in F0 occurred in conditions involving exercise without the use of amplification. Moreover, four participants in various conditions exhibited frequencies that diverged from their gender's normal range. Participants' NHR increased during periods of exercise, however no participants were found to have NHR measures outside the normal range. Four participants were found to have moderate laryngeal pathology that was hemorrhagic in nature. Findings suggest that traditional treatment protocols may need to be modified beyond hygienic approaches in order to address both the respiratory and laryngeal work-loads that are encountered in this population and others involving similar occupational tasks.
M.A.
Department of Communicative Disorders
Health and Public Affairs
Communicative Disorders
APA, Harvard, Vancouver, ISO, and other styles
42

Cooper, Douglas. "Speech Detection using Gammatone Features and One-Class Support Vector Machine." Master's thesis, University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5923.

Full text
Abstract:
A network gateway is a mechanism which provides protocol translation and/or validation of network traffic using the metadata contained in network packets. For media applications such as Voice-over-IP, the portion of the packets containing speech data cannot be verified and can provide a means of maliciously transporting code or sensitive data undetected. One solution to this problem is through Voice Activity Detection (VAD). Many VAD's rely on time-domain features and simple thresholds for efficient speech detection however this doesn't say much about the signal being passed. More sophisticated methods employ machine learning algorithms, but train on specific noises intended for a target environment. Validating speech under a variety of unknown conditions must be possible; as well as differentiating between speech and non- speech data embedded within the packets. A real-time speech detection method is proposed that relies only on a clean speech model for detection. Through the use of Gammatone filter bank processing, the Cepstrum and several frequency domain features are used to train a One-Class Support Vector Machine which provides a clean-speech model irrespective of environmental noise. A Wiener filter is used to provide improved operation for harsh noise environments. Greater than 90% detection accuracy is achieved for clean speech with approximately 70% accuracy for SNR as low as 5dB.
M.S.E.E.
Masters
Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering; Accelerated BS to MS
APA, Harvard, Vancouver, ISO, and other styles
43

Lukman, Joshua R. "Right to publicity and privacy versus first amendment freedom of speech." Honors in the Major Thesis, University of Central Florida, 2003. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/323.

Full text
Abstract:
This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf.edu/Systems/DigitalInitiatives/DigitalCollections/InternetDistributionConsentAgreementForm.pdf You may also contact the project coordinator, Kerri Bottorff, at kerri.bottorff@ucf.edu for more information.
Bachelors
Health and Public Affairs
Legal Studies
APA, Harvard, Vancouver, ISO, and other styles
44

Williams, Leslie Rachele. "EFFICACY OF A COGNITIVE BEHAVIORAL THERAPY-BASED INTENSIVE SUMMER CAMP FOR AN ADOLESCENT WHO STUTTERS: SINGLE-SUBJECT DATA." Master's thesis, Temple University Libraries, 2016. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/390932.

Full text
Abstract:
Communication Sciences
M.A.
Clinicians are increasingly incorporating cognitive behavioral therapy (CBT)-based approaches into fluency treatment for children and adolescents who stutter. However, minimal research examines the efficacy of such programs. The present study assesses the efficacy of a CBT-based, intensive, five-day summer camp that promotes self-acceptance and aims to improve the quality of life of adolescents who stutter. Specifically, this study examines whether the camp is effective in reducing state and trait anxiety, decreasing the negative impact of stuttering on daily life, and increasing fluency. A single-subject design on a 14-year old, male adolescent who stutters, LM, and personal interview data with LM’s mother, MM, are utilized. Post-treatment, LM’s scores reflect improvements in self-efficacy surrounding communication situations, as measured by the Self-Efficacy for Adolescents Scale (SEA-Scale), and improvements in overall speaking-related quality of life, as measured by the Overall Assessment of the Speaker’s Experience of Stuttering – Teen (OASES-T). These improvements were maintained at one and three months follow-up. Nonetheless, a large degree of variation in percent syllables stuttered (%SS) and LM’s consistently low rates of state and trait anxiety, as measured by the State-Trait Anxiety Inventory for Children (STAIC), suggest that additional study is warranted before conclusions can be drawn about the efficacy of the summer camp program on reducing stuttering severity and anxiety.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
45

Dookhoo, Raul. "AUTOMATED REGRESSION TESTING APPROACH TO EXPANSION AND REFINEMENT OF SPEECH RECOGNITION GRAMMARS." Master's thesis, University of Central Florida, 2008. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2634.

Full text
Abstract:
This thesis describes an approach to automated regression testing for speech recognition grammars. A prototype Audio Regression Tester called ART has been developed using Microsoft's Speech API and C#. ART allows a user to perform any of three tasks: automatically generate a new XML-based grammar file from standardized SQL database entries, record and cross-reference audio files for use by an underlying speech recognition engine, and perform regression tests with the aid of an oracle grammar. ART takes as input a wave sound file containing speech and a newly created XML grammar file. It then simultaneously executes two tests: one with the wave file and the new grammar file and the other with the wave file and the oracle grammar. The comparison result of the tests is used to determine whether the test was successful or not. This allows rapid exhaustive evaluations of additions to grammar files to guarantee forward process as the complexity of the voice domain grows. The data used in this research to derive results were taken from the LifeLike project. However, the capabilities of ART extend beyond LifeLike. The results gathered have shown that using a person's recorded voice to do regression testing is as effective as having the person do live testing. A cost-benefit analysis, using two published equations, one for Cost and the other for Benefit, was also performed to determine if automated regression testing is really more effective than manual testing. Cost captures the salaries of the engineers who perform regression testing tasks and Benefit captures revenue gains or losses related to changes in product release time. ART had a higher benefit of $21461.08 when compared to manual regression testing which had a benefit of $21393.99. Coupled with its excellent error detection rates, ART has proven to be very efficient and cost-effective in speech grammar creation and refinement.
M.S.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science MS
APA, Harvard, Vancouver, ISO, and other styles
46

Scofield, Sherri. "Perceptions of the Cognitive, Social, and Physical Competence of Speech Impaired Individuals." Honors in the Major Thesis, University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETH/id/803.

Full text
Abstract:
This item is only available in print in the UCF Libraries. If this is your Honors Thesis, you can help us make it available online for use by researchers around the world by following the instructions on the distribution consent form at http://library.ucf
Bachelors
Arts and Sciences
Psychology
APA, Harvard, Vancouver, ISO, and other styles
47

Sinatra, Anne M. "The Impact of Degraded Speech and Stimulus Familiarity in a Dichotic Listening Task." Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5502.

Full text
Abstract:
It has been previously established that when engaged in a difficult attention intensive task, which involves repeating information while blocking out other information (the dichotic listening task), participants are often able to report hearing their own names in an unattended audio channel (Moray, 1959). This phenomenon, called the cocktail party effect is a result of words that are important to oneself having a lower threshold, resulting in less attention being necessary to process them (Treisman, 1960). The current studies examined the ability of a person who was engaged in an attention demanding task to hear and recall low-threshold words from a fictional story. These low-threshold words included a traditional alert word, “fire” and fictional character names from a popular franchise—Harry Potter. Further, the role of stimulus degradation was examined by including synthetic and accented speech in the task to determine how it would impact attention and performance. In Study 1 participants repeated passages from a novel that was largely unfamiliar to them, The Secret Garden while blocking out a passage from a much more familiar source, Harry Potter and the Deathly Hallows. Each unattended Harry Potter passage was edited so that it would include 4 names from the series, and the word “fire” twice. The type of speech present in the attended and unattended ears (Natural or Synthetic) was varied to examine the impact that processing a degraded speech would have on performance. The speech that the participant shadowed did not impact unattended recall, however it did impact shadowing accuracy. The speech type that was present in the unattended ear did impact the ability to recall low-threshold, Harry Potter information. When the unattended speech type was synthetic, significantly less Harry Potter information was recalled. Interestingly, while Harry Potter information was recalled by participants with both high and low Harry Potter experience, the traditional low-threshold word, “fire” was not noticed by participants. In order to determine if synthetic speech impeded the ability to report low-threshold Harry Potter names due to being degraded or simply being different than natural speech, Study 2 was designed. In Study 2 the attended (shadowed) speech was held constant as American Natural speech, and the unattended ear was manipulated. An accent which was different than the native accent of the participants was included as a mild form of degradation. There were four experimental stimuli which contained one of the following in the unattended ear: American Natural, British Natural, American Synthetic and British Synthetic. Overall, more unattended information was reported when the unattended channel was Natural than Synthetic. This implies that synthetic speech does take more working memory processing power than even an accented natural speech. Further, it was found that experience with the Harry Potter franchise played a role in the ability to report unattended Harry Potter information. Those who had high levels of Harry Potter experience, particularly with audiobooks, were able to process and report Harry Potter information from the unattended stimulus when it was British Natural. While, those with low Harry Potter experience were not able to report unattended Harry Potter information from this slightly degraded stimulus. Therefore, it is believed that the previous audiobook experience of those in the high Harry Potter experience group acted as training and resulted in less working memory being necessary to encode the unattended Harry Potter information. A pilot study was designed in order to examine the impact of story familiarity in the attended and unattended channels of a dichotic listening task. In the pilot study, participants shadowed a Harry Potter passage (familiar) in one condition with a passage from The Secret Garden (unfamiliar) playing in the unattended ear. A second condition had participants shadowing The Secret Garden (unfamiliar) with a passage from Harry Potter (familiar) present in the unattended ear. There was no significant difference in the number of unattended names recalled. Those with low Harry Potter experience reported significantly less attended information when they shadowed Harry Potter than when they shadowed The Secret Garden. Further, there appeared to be a trend such that those with high Harry Potter experience were reporting more attended information when they shadowed Harry Potter than The Secret Garden. This implies that experience with a franchise and characters may make it easier to recall information about a passage, while lack of experience provides no assistance. Overall, the results of the studies indicate that we do treat fictional characters in a way similarly to ourselves. Names and information about fictional characters were able to break through into attention during a task that required a great deal of attention. The experience one had with the characters also served to assist the working memory in processing the information in degraded circumstances. These results have important implications for training, design of alerts, and the use of popular media in the classroom.
ID: 031001451; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Adviser: Valerie K. Sims.; Title from PDF title page (viewed July 2, 2013).; Thesis (Ph.D.)--University of Central Florida, 2012.; Includes bibliographical references (p. 168-173).
Ph.D.
Doctorate
Psychology
Sciences
Psychology; Human Factors Psychology
APA, Harvard, Vancouver, ISO, and other styles
48

Steinberg, John. "A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition." Master's thesis, Temple University Libraries, 2013. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/216605.

Full text
Abstract:
Electrical and Computer Engineering
M.S.E.E.
Nonparametric Bayesian models have become increasingly popular in speech recognition tasks such as language and acoustic modeling due to their ability to discover underlying structure in an iterative manner. These methods do not require a priori assumptions about the structure of the data, such as the number of mixture components, and can learn this structure directly. Dirichlet process mixtures (DPMs) are a widely used nonparametric Bayesian method which can be used as priors to determine an optimal number of mixture components and their respective weights in a Gaussian mixture model (GMM). Because DPMs potentially require an infinite number of parameters, inference algorithms are needed to make posterior calculations tractable. The focus of this work is an evaluation of three of these Bayesian variational inference algorithms which have only recently become computationally viable: Accelerated Variational Dirichlet Process Mixtures (AVDPM), Collapsed Variational Stick Breaking (CVSB), and Collapsed Dirichlet Priors (CDP). To eliminate other effects on performance such as language models, a phoneme classification task is chosen to more clearly assess the viability of these algorithms for acoustic modeling. Evaluations were conducted on the CALLHOME English and Mandarin corpora, consisting of two languages that, from a human perspective, are phonologically very different. It is shown in this work that these inference algorithms yield error rates comparable to a baseline Gaussian mixture model (GMM) but with a factor of up to 20 fewer mixture components. AVDPM is shown to be the most attractive choice because it delivers the most compact models and is computationally efficient, enabling its application to big data problems.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
49

Saito, Yukie. "Effects of Prosody-Based Instruction and Self-Assessment in L2 Speech Development." Diss., Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/597863.

Full text
Abstract:
Teaching & Learning
Ph.D.
The main purpose of this study was to investigate the effects of form-focused instruction (FFI) on prosody with or without self-assessment on the prosodic and global aspects of L2 speech by Japanese EFL learners using a pre-post design. In addition, native English speaking (NS) and non-native English speaking (NNS) raters with high levels of English proficiency were compared to examine the influence of raters’ L1 backgrounds on their comprehensibility ratings. Sixty-one Japanese university students from four intact English presentation classes participated in the study. The comparison group (n = 16) practiced making one-minute speeches in class (45 minutes x 8 times) without explicit instruction on prosody, while the two experimental groups (n = 17 for the FFI-only group; n = 28 for the FFI + SA group) received FFI on word stress, rhythm, and intonation, practiced the target prosodic features in communicative contexts, and received metalinguistic feedback from the instructor. In total, the experimental groups received six-hours of instruction in class, which was comparable to the comparison group. Additionally, the experimental groups completed homework three times; only the FFI + SA group recorded their reading performance and self-assessed it in terms of word stress, rhythm, and intonation. Three oral tasks were employed to elicit the participants’ speech before and after the treatment: reading aloud, one-minute speech, and picture description. The speech samples were rated for comprehensibility by NS and NNS raters and were also analyzed with four prosodic measurements: word stress, rhythm, pitch contour, and pitch range. Instructional effects on prosody were observed clearly. The FFI-only group improved their controlled production of rhythm and pitch contour, while the FFI + SA group significantly improved all of the prosodic features except pitch range. Moreover, the instructional gains for the FFI + SA group were not limited to the controlled task but transferred to the less-controlled tasks. The results showed differential instructional effects on the four prosodic aspects. The FFI in this study did not help the participants widen their pitch range. The FFI on prosody, which was focused on the cross-linguistic differences between Japanese and English, tended to be more effective in terms of improving rhythm and pitch contour, which were categorized as rule-based, than an item-based feature, word stress. The study offered mixed results regarding instructional effects on comprehensibility. The FFI-only group did not significantly improve comprehensibility despite their significant prosodic improvements on the reading aloud task. Their significant comprehensibility growth on the picture description task was not because of the development of prosody, but of other linguistic variables that influence comprehensibility such as speech rate. The FFI + SA group made significant gains for comprehensibility on the three tasks, but the effect sizes were small. This finding indicated that the effects of FFI with self-assessment on comprehensibility were limited due to the multi-faceted nature of comprehensibility. The data elicited from the post-activity questionnaires and students’ interviews revealed that not all the participants in the FFI + SA group reacted positively to the self-assessment practice. Individual differences such as previous learning experience and self-efficacy appeared to influence the learners’ perceptions of the self-assessment practice and possibly their instructional gains. The two groups of raters, L1 English raters (n = 6) and L2 English raters with advanced or native-like English proficiency (n = 6) did not differ in terms of consistency and severity. These findings indicated that NNS raters with high English proficiency could function as reliably as NS raters; however, the qualitative data revealed that the NS raters tended to be more sensitive to pronunciation, especially at the segmental level, across the three tasks compared to the NNS raters. This study provides evidence that FFI, especially when it is reinforced by self-assessment, has pedagogical value; it can improve learners’ production of English prosody in controlled and less-controlled speech, and these gains can in turn contribute to enhanced L2 comprehensibility.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
50

Mahoney, Phillip Matthew. "Script Training and Feedback Type in the Treatment of Apraxia of Speech." Master's thesis, Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/602648.

Full text
Abstract:
Communication Sciences
M.A.
Acquired apraxia of speech (AOS) is a type of motor speech disorder (MSD) characterized by deficits in the motor planning or programming of speech movements (Duffy, 2005). Because AOS is often a chronic condition that may severely impair intelligibility and, thus, significantly reduce quality of life (Ballard et al., 2015), it is necessary to develop efficient and effective treatment protocols. A previous study by Youmans, Youmans, and Hancock (2011), demonstrated the efficacy of script training in the treatment of AOS. Furthermore, extensive research in general motor learning has shown that feedback is one of the most important components of motor learning (Schmidt & Lee, 2011). Research devoted specifically to speech motor learning has generally favored this view, though few studies have distinguished between the two major types of feedback: feedback providing knowledge of results (KR) and feedback providing knowledge of performance (KP). The present study is the first to examine feedback type in treatment for AOS, and the first to examine the utility of script training specifically for a participant with AOS, but no aphasia. The findings from this single-case experimental design study reveal that, compared to KR, KP resulted in greater improvements in speaking rate. KR and KP feedback resulted in comparable gains for accuracy, but condition differences were difficult to interpret due to unexpected rising baselines for the KR scripts. Both KR and KP scripts, but especially the KP scripts, outperformed the untreated control scripts, providing further support for the efficacy of script training for AOS.
Temple University--Theses
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography