To see the other types of publications on this topic, follow the link: Acoustic and Linguistic Modalities.

Dissertations / Theses on the topic 'Acoustic and Linguistic Modalities'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 27 dissertations / theses for your research on the topic 'Acoustic and Linguistic Modalities.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Pérez-Rosas, Verónica. "Exploration of Visual, Acoustic, and Physiological Modalities to Complement Linguistic Representations for Sentiment Analysis." Thesis, University of North Texas, 2014. https://digital.library.unt.edu/ark:/67531/metadc699996/.

Full text
Abstract:
This research is concerned with the identification of sentiment in multimodal content. This is of particular interest given the increasing presence of subjective multimodal content on the web and other sources, which contains a rich and vast source of people's opinions, feelings, and experiences. Despite the need for tools that can identify opinions in the presence of diverse modalities, most of current methods for sentiment analysis are designed for textual data only, and few attempts have been made to address this problem. The dissertation investigates techniques for augmenting linguistic representations with acoustic, visual, and physiological features. The potential benefits of using these modalities include linguistic disambiguation, visual grounding, and the integration of information about people's internal states. The main goal of this work is to build computational resources and tools that allow sentiment analysis to be applied to multimodal data. This thesis makes three important contributions. First, it shows that modalities such as audio, video, and physiological data can be successfully used to improve existing linguistic representations for sentiment analysis. We present a method that integrates linguistic features with features extracted from these modalities. Features are derived from verbal statements, audiovisual recordings, thermal recordings, and physiological sensors signals. The resulting multimodal sentiment analysis system is shown to significantly outperform the use of language alone. Using this system, we were able to predict the sentiment expressed in video reviews and also the sentiment experienced by viewers while exposed to emotionally loaded content. Second, the thesis provides evidence of the portability of the developed strategies to other affect recognition problems. We provided support for this by studying the deception detection problem. Third, this thesis contributes several multimodal datasets that will enable further research in sentiment and deception detection.
APA, Harvard, Vancouver, ISO, and other styles
2

Sinclair, Roderick. "Acoustic guitar practice and acousticity : establishing modalities of creative practice." Thesis, University of Newcastle Upon Tyne, 2008. http://hdl.handle.net/10443/654.

Full text
Abstract:
The contemporarya cousticg uitarh asd evelopedfr om its origins in the 'Spanish' guitar to become a global instrument and the musical voice of a wide range of styles. The very 'acousticity' of the instrumentp ositionsi t as a binary oppositet o the electric guitar ano as a signifier for the organic and the natural world, artistry and maturity,e clecticisma ndt he esoteric.I n this concept-rootedsu bmissiont,h e acoustica nd guitaristicn atureo f the instrumentis consideredin relationt o a range of social, cultural and artistic concerns, and composition is used primarily to test a thesis, wherein a portfolio of original compositions, presented as recordings and understooda s phonogramsc, ommentu pona ndr eflect uponm odeso f performativity: instrument specific performance, introspection, virtuosity, mediation by technology and performance subjectivities.
APA, Harvard, Vancouver, ISO, and other styles
3

Dietz, Kimberly F. "Acoustic and linguistic interdependencies of irregular phonation." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/61154.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 57-58).
Irregular phonation is a commonly occurring but only partially understood phenomenon of human speech production. We know properties of irregular phonation can be clues to a speaker's dialect and even identity. We also have evidence that irregular phonation is used as a signal of linguistic and acoustic intent. Nonetheless, there remain fundamental questions about the nature of irregular phonation and the interdependencies of irregular phonation with acoustic and linguistic speech characteristics, as well as the implications of this relationship for speech processing applications. In this thesis, we hypothesize that irregular phonation occurs naturally in situations with large amounts of change in pitch or power. We therefore focus on investigating parameters such as pitch variance and power variance as well as other measurable properties involving speech dynamics. In this work, we have investigated the frequency and structure of irregular phonation, the acoustic characteristics of the TIMIT Acoustic-Phonetic Speech Corpus, and relationships between these two groups. We show that characteristics of irregular phonation are positively correlated with several of our potential predictors including pitch and power variance. Finally, we demonstrate that these correlations lead to a model with the potential to predict the occurrence and properties of irregular phonation.
by Kimberly F. Dietz.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
4

Ouellette, Gene Paul. "The neurological basis of linguistic prosody : an acoustic investigation." Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56630.

Full text
Abstract:
This study explored the ability of left hemisphere damaged (LHD) nonfluent aphasics, right hemisphere damaged (RHD) patients, and normal speakers to produce acoustic correlates of linguistic prosody. Productions of phonemic stress contrasts (e.g., black$ prime$board vs. black board$ prime$) and contrastive stress tokens (e.g., The man took the bus), were elicited and subjected to acoustic analyses. Results indicated that RHD and LHD groups resembled normal speakers in the use of fundamental frequency and amplitude to encode stress, indicating preserved abilities in both neurological populations. However, the LHD aphasic subjects demonstrated patterns of durational alterations that were statistically different from those obtained for the control and RHD groups. The data are indicative of a basic impairment in speech timing subsequent to LHD. Results are discussed in relation to current theories regarding the neurological basis of linguistic prosody.
APA, Harvard, Vancouver, ISO, and other styles
5

Deschamps-Berger, Théo. "Social Emotion Recognition with multimodal deep learning architecture in emergency call centers." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG036.

Full text
Abstract:
Cette thèse porte sur les systèmes de reconnaissance automatique des émotions dans la parole, dans un contexte d'urgence médicale. Elle aborde certains des défis rencontrés lors de l'étude des émotions dans les interactions sociales et est ancrée dans les théories modernes des émotions, en particulier celles de Lisa Feldman Barrett sur la construction des émotions. En effet, la manifestation des émotions spontanées dans les interactions humaines est complexe et souvent caractérisée par des nuances, des mélanges et étroitement liée au contexte. Cette étude est fondée sur le corpus CEMO, composé de conversations téléphoniques entre appelants et Agents de Régulation Médicale (ARM) d'un centre d'appels d'urgence français. Ce corpus fournit un ensemble riche de données pour évaluer la capacité des systèmes d'apprentissage profond, tels que les Transformers et les modèles pré-entraînés, à reconnaître les émotions spontanées dans les interactions parlées. Les applications pourraient être de fournir des indices émotionnels susceptibles d'améliorer la gestion des appels et la prise de décision des ARM ou encore de faire des synthèses des appels. Les travaux menés dans ma thèse ont porté sur différentes techniques liées à la reconnaissance des émotions vocales, notamment l'apprentissage par transfert à partir de modèles pré-entraînés, les stratégies de fusion multimodale, l'intégration du contexte dialogique et la détection d'émotions mélangées. Un système acoustique initial basé sur des convolutions temporelles et des réseaux récurrents a été développé et validé sur un corpus émotionnel connu de la communauté affective, appelé IEMOCAP puis sur le corpus CEMO. Des recherches approfondies sur des systèmes multimodaux, pré-entraînés en acoustique et linguistique et adaptés à la reconnaissance des émotions, sont présentées. En outre, l'intégration du contexte dialogique dans la détection des émotions a été explorée, mettant en lumière la dynamique complexe des émotions dans les interactions sociales. Enfin, des travaux ont été initiés sur des systèmes multi-étiquettes multimodaux capables de traiter les subtilités des émotions mélangées dues à l'ambiguïté de la perception des annotateurs et du contexte social. Nos recherches mettent en évidence certaines solutions et défis liés à la reconnaissance des émotions dans des situations "in the wild". Cette thèse est financée par la Chaire CNRS AI HUMAAINE : HUman-MAchine Interaction Affective & Ethique
This thesis explores automatic speech-emotion recognition systems in a medical emergency context. It addresses some of the challenges encountered when studying emotions in social interactions. It is rooted in modern theories of emotions, particularly those of Lisa Feldman Barrett on the construction of emotions. Indeed, the manifestation of emotions in human interactions is complex and often characterized by nuanced, mixed, and is highly linked to the context. This study is based on the CEMO corpus, which is composed of telephone conversations between callers and emergency medical dispatchers (EMD) from a French emergency call center. This corpus provides a rich dataset to explore the capacity of deep learning systems, such as Transformers and pre-trained models, to recognize spontaneous emotions in spoken interactions. The applications could be to provide emotional cues that could improve call handling and decision-making by EMD, or to summarize calls. The work carried out in my thesis focused on different techniques related to speech emotion recognition, including transfer learning from pre-trained models, multimodal fusion strategies, dialogic context integration, and mixed emotion detection. An initial acoustic system based on temporal convolutions and recurrent networks was developed and validated on an emotional corpus widely used by the affective community, called IEMOCAP, and then on the CEMO corpus. Extensive research on multimodal systems, pre-trained in acoustics and linguistics and adapted to emotion recognition, is presented. In addition, the integration of dialog context in emotion recognition was explored, underlining the complex dynamics of emotions in social interactions. Finally, research has been initiated towards developing multi-label, multimodal systems capable of handling the subtleties of mixed emotions, often due to the annotator's perception and social context. Our research highlights some solutions and challenges in recognizing emotions in the wild. The CNRS AI HUMAAINE Chair: HUman-MAchine Affective Interaction & Ethics funded this thesis
APA, Harvard, Vancouver, ISO, and other styles
6

Daly, Nancy Ann. "Acoustic-phonetic and linguistic analyses of spontaneous speech : implications for speech understanding." Thesis, Massachusetts Institute of Technology, 1994. http://hdl.handle.net/1721.1/12009.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1994.
Includes bibliographical references (leaves 142-149).
by Nancy Ann Daly.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
7

Bianchi, Michelle. "Effects of clear speech and linguistic experience on acoustic characteristics of vowel production." [Tampa, Fla.] : University of South Florida, 2007. http://purl.fcla.edu/usf/dc/et/SFE0002084.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Marklund, Ellen. "Perceptual reorganization of vowels : Separating the linguistic and acoustic parts of the mismatch response." Doctoral thesis, Stockholms universitet, Institutionen för lingvistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-148559.

Full text
Abstract:
During the first year of life, infants go from perceiving speech sounds primarily based on their acoustic characteristics, to perceiving speech sounds as belonging to speech sound categories relevant in their native language(s). The transition is apparent in that very young infants typically discriminate both native and non-native speech sound contrasts, whereas older infants show better discrimination for native contrasts and worse or no discrimi­na­tion for non-native contrasts. The rate of this perceptual reorganization depends, among other things, on the salience of the relevant speech sounds within the speech signal. As such, the perceptual reorganization of vowels and lexical tone typically precedes the perceptual reorganization of consonants. Perceptual reorganizatoin of speech sounds is often demonstrated by measuring in­fants’ discrimination of specific speech sound contrasts across development. One way of measuring discriminatory ability is to use the mismatch response (MMR). This is a brain response that can be measured using external electroencephalography re­cord­ings. Pre­senting an oddball (deviant) stimulus among a series of standard stimuli elicits a response that, in adults, correlates well with behavioral discrimination. When the two stimuli are speech sounds contrastive in the listeners’ language, the response arguably reflects both acoustic and linguistic processing. In infants, the response is less studied, but has nevertheless already proven useful for studies on the perceptual reorganization of speech sounds. The present thesis documents a series of studies with the end game of investigating how amount of speech exposure influences the perceptual reorganization, and whe­ther the learning mechanisms involved in speech sound cate­gory learning is specific to speech or domain-general. In order to be able to compare MMR results across diffe­rent age groups in infancy, a non-speech control condition needed to be devised however, to account for changes in the MMR across development that are attributable to general brain matura­tion rather than language development specifically. Findings of studies incorporated in the thesis show that spectrally rotated speech can be used to approximate the acoustic part of the MMR in adults. Subtracting the acoustic part of the MMR from the full MMR thus estimates the part of the MMR that is linked to linguistic, rather than acoustic, processing. The strength of this linguistic part of the MMR in four- and eight-month-old infants is directly related to the daily amount of speech that the infants are exposed to. No evidence of distributional learning of non-speech auditory categories was demonstrated in adults, but the results together with previous research generated hypo­theses for future study. In conclusion, the research performed within the scope of this thesis highlight the need of a non-speech control condition for use in developmental speech perception studies using the MMR, demonstrates the viability of one such non-speech control condition, and points toward relevant future research within the topic of speech sound category development.

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 3: Manuscript.

APA, Harvard, Vancouver, ISO, and other styles
9

Levi, Susannah V. "The representation of underlying glides : a cross-linguistic study /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/8406.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Quadros, Talita Lidirene Limanski de. "Análise do uso do par é + adjetivo e do verbo poder em recortes de produção escrita de alunos de ensino fundamental e médio." Universidade Estadual do Oeste do Paraná, 2017. http://tede.unioeste.br/handle/tede/3433.

Full text
Abstract:
Submitted by Neusa Fagundes (neusa.fagundes@unioeste.br) on 2018-02-27T12:17:38Z No. of bitstreams: 2 Talita_Quadros2017.pdf: 3672889 bytes, checksum: 662c90d816bac4e44956db64813b9097 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2018-02-27T12:17:38Z (GMT). No. of bitstreams: 2 Talita_Quadros2017.pdf: 3672889 bytes, checksum: 662c90d816bac4e44956db64813b9097 (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2017-08-21
The present research comes from the need of promoting reflection on an important aspect of the teaching/ learning process of linguistic analysis: the use of elements that delimit positioning. This justification promoted a study on the performance of modals elements in excerpts from texts of elementary and middle school students from a countryside public school in a city from Parana State. It was based on the analysis of the collected material in the databank of the projects Theoretical Application and Reflection in the Classroom: linguistic analysis as a support for the production of texts from students of a public school in the State of Paraná (ART) and Diagnostics and Theoretical Application in Classroom: verification of the performance and evaluation of the teaching of linguistic analysis and textual production of high school students of a public school in the State of Parana (DAT). The basic and qualitative research was subsidized by authors dealing with linguistic modality, such as Castilho and Castilho (1992), Neves (2006), Corbari (2008/2013), Koch (2009) and Sella (2011). This course, based on the definition of study focus, theoretical reference reading, data collection and interpretation of the data collected, motivated to verify in the analyzed excerpts how the modal pair “é + adjetivo” and the verb can indicates points of view that sometimes are linked to the most internal and sometimes to the most external layers of significance. The goal is to interpret the occurrence of these modalities on text excerpts written by students who participated in the aforementioned projects, as well as to verify the degree of producers’ engagement with the expressed content which were expressed through these structures. The layers verification allowed us to evaluate the engagement degree established with the propositional content. This study also allowed us to notice established notions of emphasis and attenuation on the analyzed which leads to articulations that indicate negotiations of points of view.
A presente pesquisa nasce da necessidade de promover reflexão sobre aspecto importante do processo ensino-aprendizagem da análise linguística: o emprego de elementos que demarcam posicionamento. Essa justificativa impulsionou estudo sobre a atuação dos modalizadores em recortes de textos de alunos do ensino fundamental e médio, de escola pública do campo de uma cidade do Estado do Paraná. Partiu-se da análise de material coletado no banco de dados dos projetos Aplicação e Reflexão Teórica na Sala de Aula: análise linguística como suporte para a produção de textos de alunos de uma escola pública do Estado do Paraná (ART) e Diagnósticos e Aplicação Teórica em Sala de Aula: verificação do rendimento e avaliação do ensino de análise linguística e produção textual de alunos do ensino médio de uma escola pública do Estado do Paraná (DAT). A pesquisa básica e de cunho qualitativo foi subsidiada por autores que tratam da modalização linguística, como Castilho e Castilho (1992), Neves (2006), Corbari (2008/2013), Koch (2009) e Sella (2011). Esse percurso, pautado em definição de foco de estudo, leitura de referencial teórico, coleta de dados e interpretação dos dados coletados, motivou verificar nos recortes analisados como os modalizadores par é + adjetivo e verbo poder indicam pontos de vista ora vinculados a camadas mais internas, ora a camadas mais externas da significação. Objetivou-se interpretar ocorrências dos modalizadores em tela em recortes de textos de estudantes participantes dos projetos supracitados, além de verificar o grau de engajamento dos produtores com o conteúdo enunciado, expresso por meio de tais estruturas. A verificação dessas camadas proporcionou avaliar o grau de engajamento estabelecido com o conteúdo proposicional. Este estudo permitiu observar que os modalizadores em análise estabelecem noções de ênfase e de atenuação, o que aponta para articulações que indicam negociações de ponto de vista.
APA, Harvard, Vancouver, ISO, and other styles
11

Ulrich, Natalja. "Linguistic and speaker variation in Russian fricatives." Electronic Thesis or Diss., Lyon 2, 2022. http://www.theses.fr/2022LYO20031.

Full text
Abstract:
Cette thèse présente une investigation acoustico-phonétique des détails phonétiques des fricatives russes.L'objectif principal était de détecter des corrélats acoustiques porteurs d'infor- mations linguistiques et idiosyncrasiques. Les questions abordées étaient de savoir si le lieu d'articulation, le sexe du locuteur ou son identité peuvent être prédits par des indices acoustiques et quelles mesures acoustiques représentent les indicateurs les plus fiables. En outre, la distribution des caractéristiques spécifiques au locuteur et à la variation inter et intra locuteur à travers les indices acoustiques a été étudiée plus en détail. Le projet a commencé par la création d'une grande base de données audio des fricatives russes. Des enregistrements acoustiques ont été obtenus auprès de 59 locuteurs russes natifs. Le jeu de données résultant est composé de 22 561 occurrences comprenant les fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ]. Deux analyses ont été menées à partir de cette base de données. Dans la première étude, un échantillon de données de 6320 occurrences (40 locuteurs) a été utilisé. Trois techniques d'extraction acoustisque (à partir du son complet, de la durée du bruit et des fenêtres centrales de 30 ms) ont été sollicitées pour extraire des mesures temporelles et spectrales. En outre, 13 coefficients cepstraux (Mel-Frequency Cepstral Coefficients, MFCC) ont été calculés à partir de la fenêtre centrale de 30 ms. Des classificateurs fondés sur des arbres de décision simples, des forêts aléatoires, des machines à vecteurs de support (Support-vector machine, SVM) et des réseaux neuronaux ont été entraînés et testés pour distinguer trois fricatives non palatalisées [f], [s] et
This thesis represents an acoustic-phonetic investigation of phonetic details in Russian fricatives. The main aim was to detect acoustic correlates that carry linguistic and idiosyncratic information. The questions addressed were whether the place of articulation, speakers' gender and ID can be predicted by a set of acoustic cues and which acoustic measures represent the most reliable indicators. Furthermore, the distribution of speaker-specific characteristics and inter- and intra-speaker variation across acoustic cues were studied in more detail.The project started with the generation of a large audio database of Russian fricatives. Then, two follow-up analyses were conducted. Acoustic recordings were collected from 59 native Russian speakers. The resulting dataset consists of 22,561 tokens including the fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ].The first study employed a data sample of 6320 tokens (from 40 speakers). Temporal and spectral measurements were extracted using three acoustic cue extraction techniques (full sound, the noise part, and the middle 30ms windows). Furthermore, 13 Mel Frequency Cepstral Coefficients were computed from the middle 30ms window.Classifiers based on single decision trees, random forests, support vector machines, and neural networks were trained and tested to distinguish between the three non-palatalized fricatives [f], [s] and [ʃ].The results demonstrate that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s] and [ʃ] by using the centre of gravity and the spectral spread irrespective of contextual and speaker variation. The three acoustic cue extraction techniques performed similarly in terms of classification accuracy (93% and 99%), but the spectral measurements extracted from the noise parts resulted in slightly better accuracy. Furthermore, Mel Frequency Cepstral Coefficients show marginally higher predictive power over spectral cues (< 2%).This suggests that both spectral measures and Mel Frequency Cepstral provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application. The second study's dataset consists of 15812 tokens (59 speakers) that contain [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ]. As in the first study, two types of acoustic cues were extracted including 11 acoustic speech features (spectral cues, duration and HNR measures) and 13 Mel Frequency Cepstral Coefficients. Classifiers based on single decision trees and random forests were trained and tested to predict speakers' gender and ID
APA, Harvard, Vancouver, ISO, and other styles
12

Clark, Anna. "Acoustic correlates of linguistic prosody in the speech of children with cochlear implants: A study in comparison with typical-hearing peers." Connect to online resource, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:1453512.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Oh, Soo Hee. "Top-Down Processes in Simulated Combined Electric-Acoustic Hearing: The Effect of Context and the Role of Low-Frequency Cues in the Perception of Temporally Interrupted Speech." Scholar Commons, 2014. https://scholarcommons.usf.edu/etd/5379.

Full text
Abstract:
In recent years, the number of unilateral cochlear implant (CI) users with functional residual-hearing has increased and bimodal hearing has become more prevalent. According to the multi-source speech perception model, both bottom-up and top-down processes are important components of speech perception in bimodal hearing. Additionally, these two components are thought to interact with each other to different degrees depending on the nature of the speech materials and the quality of the bottom-up cues. Previous studies have documented the benefits of bimodal hearing as compared with a CI alone, but most of them have focused on the importance of bottom-up, low-frequency cues. Because only a few studies have investigated top-down processing in bimodal hearing, relatively little is known about the top-down mechanisms that contribute to bimodal benefit, or the interactions that may occur between bottom-up and top-down processes during bimodal speech perception. The research described in this dissertation investigated top-down processes of bimodal hearing, and potential interactions between top-down and bottom-up processes, in the perception of temporally interrupted speech. Temporally interrupted sentences were used to assess listeners' ability to fill in missing segments of speech by using top-down processing. Young normal hearing listeners were tested in simulated bimodal listening conditions in which noise band vocoded sentences were presented to one ear with or without low-pass (LP) filtered speech or LP harmonic complexes (LPHCs) presented to the contralateral ear. Sentences were square-wave gated at a rate of 5 Hz with a 50 percent duty cycle. Two factors that were expected to influence bimodal benefit were examined: the amount of linguistic context available in the speech stimuli, and the continuity of low-frequency cues. Experiment 1 evaluated the effect of sentence context on bimodal benefit for temporally interrupted sentences from the City University of New York (CUNY) and Institute of Electrical and Electronics and Engineers (IEEE) sentence corpuses. It was hypothesized that acoustic low-frequency information would facilitate linguistic top-down processing such that the higher context CUNY sentences would produce more bimodal benefit than the lower context IEEE sentences. Three vocoder channel conditions were tested for each type of sentence (8-, 12-, and 16-channels for CUNY; 12-, 16-, and 32-channels for IEEE), in combination with either LP speech or LPHCs. Bimodal benefit was compared for similar amounts of spectral degradation (matched-channels) and similar ranges of baseline performance. Two gain measures, percentage point gain and normalized gain, were examined. Experiment 1 revealed clear effects of context on bimodal benefit for temporally interrupted speech, when LP speech was presented to the residual-hearing ear, thereby providing additional support for the notion that low-frequency cues can enhance listeners' use of top-down processing. However, the bimodal benefits observed for temporally interrupted speech were considerably smaller than those observed in an earlier study that used continuous speech. In addition, unlike previous findings for continuous speech, no bimodal benefits were observed when LPHCs were presented to the LP ear. Experiments 2 and 3 further investigated the effects of low-frequency cues on bimodal benefit by systematically restoring continuity to temporally interrupted signals in the vocoder and/or LP ears. Stimuli were 12-channel CUNY sentences presented to the vocoder ear, and LPHCs presented to the LP ear. Signal continuity was restored to the vocoder ear by filling silent gaps in sentences with envelope-modulated, speech-shaped noise. Continuity was restored to signals in the LP ear by filling gaps with envelope-modulated LP noise or by using continuous LPHCs. It was hypothesized that the restoration of continuity in one or both ears would improve bimodal benefit relative to the condition in which both ears received temporally interrupted stimuli. The results from Experiments 2 and 3 showed that restoring continuity to the simulated residual-hearing or CI ear improved bimodal benefits, but that the greatest improvement was observed when continuity was restored to both ears. These findings support the conclusion that temporal interruption disrupts top-down enhancement effects in bimodal hearing. Lexical segmentation and perceptual continuity were identified as factors that could potentially explain the increased bimodal benefit for continuous, as compared to temporally interrupted, speech. Taken together, the findings from Experiments 1-3 provide additional evidence that low-frequency sensory information can provide bimodal benefit for speech that is spectrally and/or temporally degraded by improving listeners' ability to make use of top-down processing. Findings further suggest that temporal degradation reduces top-down enhancement effects in bimodal hearing, thereby reducing bimodal benefit for temporally interrupted speech as compared to continuous speech.
APA, Harvard, Vancouver, ISO, and other styles
14

Winters, Stephen James. "Empirical investigations into the perceptual and articulatory origins of cross-linguistic asymmetries in place assimilation." Connect to this title online, 2003. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1054756426.

Full text
Abstract:
Thesis (Ph. D.)--Ohio State University, 2003.
Title from first page of PDF file. Document formatted into pages; contains xx, 351 p.; also includes graphics Includes bibliographical references (leaves 344-351). Available online via OhioLINK's ETD Center
APA, Harvard, Vancouver, ISO, and other styles
15

暁芸, 王., and Xiaoyun Wang. "Phoneme set design for second language speech recognition." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13044980/?lang=0, 2017. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13044980/?lang=0.

Full text
Abstract:
本論文は第二言語話者の発話を高精度で認識するための音素セットの構成方法に関する研究結果を述べている.本論文では,第二言語話者の発話をネイティブ話者の発話とは異なる音響特徴量の頻度分布を持つ情報源とみなし,これを表現する適切な音素セットを構築する手法を提案している.具体的には,対象とする第二言語と母語との調音位置や調音様式などの類似性に加え,同音異義語の発生による単語識別性能の低下を総合した基準に基づき,最適な音素セットを決定する.提案手法を日本人学生の英語発話の音声認識に適用し,種々の条件下で認識精度の向上を検証した.
This dissertation focuses on the problem caused by confused mispronunciation to improve the recognition performance of second language speech. A novel method considering integrated acoustic and linguistic features is proposed to derive a reduced phoneme set for L2 speech recognition. The customized phoneme set is created with a phonetic decision tree (PDT)-based top-down sequential splitting method that utilizes the phonological knowledge between L1 and L2. The dissertation verifies the efficacy of the proposed method for Japanese English and shows that the feasibility of building a speech recognizer with the proposed method is able to alleviate the problem caused by confused mispronunciation by second language speakers.
博士(工学)
Doctor of Philosophy in Engineering
同志社大学
Doshisha University
APA, Harvard, Vancouver, ISO, and other styles
16

Santos, Jeylla Salomé Barbosa dos. "As realizações de /R/ rm coda silábica na comunidade de Porto da Rua, litoral norte de Alagoas : análise lingüística e sociolinguística." Universidade Federal de Alagoas, 2010. http://repositorio.ufal.br/handle/riufal/487.

Full text
Abstract:
In the light of the Theory of Variation and Change and Generative Phonology, in this study, aims to investigate the performance of a segment /R/ in the community of Porto da Rua (in the northern coast of Alagoas). The phonetic environment in which this realization occurs was determined, as well as the influence of extralinguistic factors analysed. The corpus for this research consisted of 48 informants among men and women born in the community. The categorization of data and statistical analysis were done using the package VARBRUL. Data were coded according to linguistic and social groups of factors (GF). The results indicated that the variant under study may be undergoing a process of linguistic change, since those responsible for spreading the informants are not in school and age over 50 years. Data collection was done through recordings with spontaneous narratives. The objective is thus to study the correlation between linguistic phenomena and stratified external variables (gender, age and education).
Fundação de Amparo a Pesquisa do Estado de Alagoas
À luz da Teoria da Variação e Mudança e da Fonologia gerativa, pretendeu-se, neste estudo, investigar a realização de um segmento /R/ na comunidade de Porto da Rua (litoral norte de Alagoas). Determinamos o ambiente fonético em que essa realização ocorre e verificamos a influência de fatores extralinguísticos. O corpus para a pesquisa constituiu-se de dados de fala, gravados em áudio, de textos espontâneos produzidos por 48 informantes, homens e mulheres nascidos (e que viveram sempre) na comunidade. A categorização dos dados e a análise estatística foram feitas com a utilização do pacote VARBRUL. Os dados foram codificados de acordo com grupos de fatores (GF) linguísticos e sociais. Objetivou-se, dessa forma, estudar a correlação entre fenômenos linguísticos e variáveis externas estratificadas (sexo, faixa etária e escolarização). Os resultados indicaram que a variante em estudo pode estar passando por um processo de mudança linguística, uma vez que os responsáveis pela sua realização são os informantes não escolarizados e a faixa etária com mais de 50 anos.
APA, Harvard, Vancouver, ISO, and other styles
17

Guedes, Clara Peron da Silva. "Investigação das interferências linguísticas e das modalidades tradutórias na tradução para o português brasileiro do conto "Tenth of december"." Universidade Federal de Pelotas, 2015. http://repositorio.ufpel.edu.br:8080/handle/prefix/2855.

Full text
Abstract:
Submitted by Aline Batista (alinehb.ufpel@gmail.com) on 2016-06-29T20:24:21Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Investigação das interferências linguísticas e das modalidades tradutórias na tradução para o português brasileiro do conto Thent of December.pdf: 2322908 bytes, checksum: 91bdf45387ed7f94530c6ea05fe59fbf (MD5)
Approved for entry into archive by Aline Batista (alinehb.ufpel@gmail.com) on 2016-06-30T20:26:25Z (GMT) No. of bitstreams: 2 Investigação das interferências linguísticas e das modalidades tradutórias na tradução para o português brasileiro do conto Thent of December.pdf: 2322908 bytes, checksum: 91bdf45387ed7f94530c6ea05fe59fbf (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5)
Made available in DSpace on 2016-06-30T20:26:38Z (GMT). No. of bitstreams: 2 Investigação das interferências linguísticas e das modalidades tradutórias na tradução para o português brasileiro do conto Thent of December.pdf: 2322908 bytes, checksum: 91bdf45387ed7f94530c6ea05fe59fbf (MD5) license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) Previous issue date: 2015-11-23
Sem bolsa
A teoria de Línguas em Contato foi desenvolvida a partir da investigação e da descrição de fenômenos linguísticos resultantes do contato entre idiomas em sujeitos e sociedades bi ou multilíngues. Atualmente, pesquisas relacionadas a tal abordagem abarcam diversos temas, dentre eles, a tradução. No entanto, a relação entre o contato linguístico e a atividade tradutória parece ser pouco investigada no meio acadêmico-científico. Em contrapartida, desde que a Linguística ampliou seu objeto de análise, os estudos tradutórios têm se valido do prisma dos fenômenos linguísticos, para além da investigação literária. Nesse sentido, esta dissertação pretende tecer vínculos entre a área de especialidade da Linguística Aplicada, Línguas em Contato, e o campo multidisciplinar do conhecimento, Estudos da Tradução, ambos pertencentes à grande área de Letras e Linguística. Para tanto, tem por objetivo investigar as interferências linguísticas baseadas nos fenômenos descritos por Weinreich (1970) e as modalidades de tradução propostas por Aubert (1998), contidas na tradução do conto “Tenth of December” (SAUNDERS, 2013) para o português brasileiro, realizada por José Geraldo Couto, com o título “Dez de Dezembro” (SAUNDERS, 2014). A fim de obter a quantificação total dos dados pesquisados, os sintagmas nominais do texto fonte foram selecionados e classificados de acordo com rótulos criados para cada categoria de interferências linguísticas e de modalidades de tradução presentes no texto meta. Subsequentemente, foram salvos em arquivo TXT e anotados no programa Notepad++, em um arquivo de extensão XML, o qual, combinado com a folha de estilos (XSL), permite obter a quantidade total de cada categoria, em números absolutos, em um arquivo HTML. Os resultados encontrados após a investigação do corpus apontam para a prevalência de interferências linguísticas na direção do inglês estadunidense, ou seja, os sintagmas nominais estão mais próximos da língua fonte. Com relação às modalidades de tradução, as opções adotadas indicaram um menor distanciamento do texto traduzido com relação ao texto fonte. No entanto, a pequena diferenciação, em números percentuais, entre as categorias mais próximas da língua fonte e as da língua meta, denota certa aproximação linguística, no corpus analisado, entre o português brasileiro e o inglês estadunidense. Do mesmo modo, a classificação das modalidades mais recorrentes, segundo a escala proposta por Aubert (1998), demonstra certa equivalência entre os textos fonte e meta. A partir das análises quantitativa e qualitativa de cada categoria de interferência linguística e de modalidade de tradução, foi possível tecer paralelismos entre ambas. Essa investigação permitiu relacionar a área de especialidade da Linguística Aplicada, Línguas em Contato, à área multidisciplinar do conhecimento, Estudos da Tradução.
The theory of Languages in Contact was developed from the investigation and from the description of linguistic phenomena that result from the contact between languages in bi or multilingual persons and societies. Currently, research related to this approach includes various themes, among them, translation. However, the relation between linguistic contact and translation seems to be scarcely investigated in academic and scientific fields. On the other hand, since Linguistics amplified its analysis object Translation Studies have been investigated the phenomena from the linguistic point of view in addition to the literary one. Thus, this thesis aims at linking Languages in Contact and Translation Studies, both belonging to the greatest area Linguistics and Literature. In order to do that, it aims at investigating the linguistic interferences based on the phenomena described by Weinreich (1970) and the translation modalities proposed by Aubert (1998) in the translation of the short story “Tenth of December” (SAUNDERS, 2013) to Brazilian Portuguese, done by José Geraldo Couto, “Dez de Dezembro” (SAUNDERS, 2014). In order to achieve the total amount of the investigated data noun phrases of the source text were selected and classified according to the tags created to each category of linguistic interferences and of translation modalities present in the target text. Then, the data were saved on TXT file and annotated within Notepad++ software, on a XML file. Combined with the stylesheet (XSL) the annotation of the text allows to achieve the total amount of each category, in absolute numbers, on a HTML file. Results found after the investigation of the corpus show the prevalence of linguistic interferences in North American English direction, that is, the noun phrases are nearer to the source language. Concerning the translation modalities the options selected indicate little distance between the translated text and the source text. However, the small differentiation in percentage between the categories nearer to the source language and the ones nearer to the target language demonstrates some linguistic proximity, in the analyzed corpus, between Brazilian Portuguese and North American English. Equally, classification of the translation modalities more present in the corpus, according to the scale proposed by Aubert (1998), shows some equivalence between the source text and the target text. Based on quantitative and qualitative analyses of each category of linguistic interference and of translation modality, it was possible to trace parallelisms between both of them. This investigation allows to relate Languages in Contact and Translation Studies.
APA, Harvard, Vancouver, ISO, and other styles
18

Azzabou-Kacem, Soundess. "Stress shift in English rhythm rule environments : effects of prosodic boundary strength and stress clash types." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/33200.

Full text
Abstract:
It is well-known that the early assignment of prominence in sequences like THIRteen MEN vs. thirTEEN, (defined as the Rhythm Rule, or post-lexical stress shift), is an optional phenomenon. This dissertation examines some of the factors that encourage the application of stress shift in English and how it is phonetically realised. The aim is to answer two sets of questions related to why and how stress shift occurs in English: 1a) Does prosodic boundary strength influence stress shift? 1b) Does the adjacency of prominences above the level of the segmental string encourage stress shift? 2) How is stress shift realized? a) Is stress shift only a perceptual phenomenon? and b) Which syllables, if any, change acoustically when stress shift is perceived? To answer these questions, four experiments were designed. The first three experiments test whether the strength of the prosodic boundaries before and after the target word (e.g., canteen) influence stress shift. The effect of the strength of the left-edge prosodic boundary was investigated by comparing perceived stress patterns of the target (e.g., canteen) as produced in isolation where it is preceded by an utterance- and a phrase- initial prosodic boundary (the Isolated condition) with its rendition when embedded in a frame sentence (e.g., Say canteen again) where the left prosodic boundary before canteen is weaker (the Embedded condition). Results show a very clear tendency towards late phrasal prominence on the final accentable syllable (e.g., -teen in canteen) in the Embedded condition while in the Isolated condition this pattern appeared in less than half of the targets, showing that the stronger left boundary increased the incidence of stress shift. Two more experiments manipulated the strength of the boundary to the right of the target (#) respectively by changing the syntactic parse of the critical phrase (e.g. canteen cook) in sequences like (1) and by manipulating constituent length as in (2). Results showed that the syntactic manipulation significantly affected the strength of the prosodic boundary between the clashing words which was stronger in (1b) relative to (1a), and affected the incidence of stress shift, which was higher in (1a) relative to (1b). The length manipulation also affected the rate of stress shift, which was significantly higher in the phrase with the shorter word, e.g., soups (2a) relative to phrase with the longer word, e.g., supervisors (2b). (1) Example from the Syntax Experiment a. Who is the canteen (#) cook these days? (Pre-modifier + Noun) b. How do the canteen (#) cook these days? (NP + VP) (2) Example from the Length Experiment a. It should include the canteen (#) soups again. (Shorter constituent) b. It should include the canteen (#) supervisors again. (Longer constituent) Whilst we knew from the literature that the grouping of the clashing words within one Intonational Phrase (IP) encourages stress shift, results from the Syntax and Length experiments indicate that this (i.e., the phrasing of the clashing words within same IP) is not sufficient condition for the occurrence of stress shift, and that fine-grained degrees of boundary strength below the Intonational Phrase can drive changes in prominence pattern. The fact that higher rates of stress shift (and associated significant acoustic changes) were driven by manipulations of constituent length --for sequences with the same syntactic structure-- provides support for the idea that prosodic (rather than syntactic) boundaries directly influence stress shift. The fourth experiment tests the definition of stress clash in English in cases like fourteen candles where the two main lexical prominences are strictly adjacent along the time dimension, in fourteen canoes where the prominences are not adjacent in time, but adjacent at the higher levels of the metrical hierarchy, and in fourteen canteens where the main lexical prominences are not adjacent, and do not clash. This experiment highlighted and resolved an unacknowledged disagreement about what clash status sequences with one weak intervening syllable (e.g., fourTEEN caNOES). The fourTEEN caNOES type were shown to behave like metrically clashing sequences (e.g., fourteen CANdles) in attracting stress shift, and differently from the non-metrically-clashing sequences (e.g., fourteen CANTEENS) in discouraging it. These results provide empirical support for the Standard Metrical Theory (e.g. Selkirk, 1984; Nespor & Vogel, 1989) claim that 1) stress clash matters in triggering stress shift and that 2) stress clash in English is defined at the higher prosodic levels and not restricted to the level of the segmental string as indirectly assumed in a growing body of research (e.g., Vogel, Bunnel & Hoskins, 1995; Tomlinson, Liu & Fox Tree, 2014). Along with the establishment of prosodic boundary strength as one of the predictors influencing stress shift, another important contribution of the thesis is providing empirical evidence that the English Rhythm Rule is not solely a perceptual phenomenon and that it is associated with acoustic correlates. The main correlates of perceived stress shift consistently appearing across experiments is the decrease in the duration of the main lexical prominence of the target (e.g., -teen in canteen) and the increase of fundamental frequency and Sound Pressure Level peaks and on the initial syllable (e.g., canin canteen), when followed by a main clashing phrasal prominence. The acoustic analysis shows that the first accentable syllable also contributes in the perception of stress shift. This latter result does not lend support to the deletion formulation of the Rhythm Rule (Gussenhoven, 1991) which stipulates that the impressions of stress shift are solely associated with changes of prominence in the last accentable syllable of the target (e.g. -teen in canteen). Along with the determination of the acoustic correlates of perceived stress shift in English, the present research 1) indicates that fine-grained gradations of prosodic boundary strength can influence stress shift, 2) shows that while stress clash can increase the incidence of stress shift, stress shift can take place even in environments completely free of stress clash, and 3) provides evidence that stress clash should not be construed simply as the concatenation of two main lexical prominences along the time dimension.
APA, Harvard, Vancouver, ISO, and other styles
19

Ghannay, Sahar. "Étude sur les représentations continues de mots appliquées à la détection automatique des erreurs de reconnaissance de la parole." Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1019/document.

Full text
Abstract:
Nous abordons, dans cette thèse, une étude sur les représentations continues de mots (en anglais word embeddings) appliquées à la détection automatique des erreurs dans les transcriptions de la parole. Notre étude se concentre sur l’utilisation d’une approche neuronale pour améliorer la détection automatique des erreurs dans les transcriptions automatiques, en exploitant les word embeddings. L’exploitation des embeddings repose sur l’idée que la détection d’erreurs consiste à trouver les possibles incongruités linguistiques ou acoustiques au sein des transcriptions automatiques. L’intérêt est donc de trouver la représentation appropriée du mot qui permet de capturer des informations pertinentes pour pouvoir détecter ces anomalies. Notre contribution dans le cadre de cette thèse porte sur plusieurs axes. D’abord, nous commençons par une étude préliminaire dans laquelle nous proposons une architecture neuronale capable d’intégrer différents types de descripteurs, y compris les embeddings. Ensuite, nous nous focalisons sur une étude approfondie des représentations continues de mots. Cette étude porte d’une part sur l’évaluation de différents types d’embeddings linguistiques puis sur leurs combinaisons. D’autre part, elle s’intéresse aux embeddings acoustiques de mots. Puis, nous présentons une étude sur l’analyse des erreurs de classifications, qui a pour objectif de percevoir les erreurs difficiles à détecter.Finalement, nous exploitons les embeddings linguistiques et acoustiques ainsi que l’information fournie par notre système de détections d’erreurs dans plusieurs cadres applicatifs
My thesis concerns a study of continuous word representations applied to the automatic detection of speech recognition errors. Our study focuses on the use of a neural approach to improve ASR errors detection, using word embeddings. The exploitation of continuous word representations is motivated by the fact that ASR error detection consists on locating the possible linguistic or acoustic incongruities in automatic transcriptions. The aim is therefore to find the appropriate word representation which makes it possible to capture pertinent information in order to be able to detect these anomalies. Our contribution in this thesis concerns several initiatives. First, we start with a preliminary study in which we propose a neural architecture able to integrate different types of features, including word embeddings. Second, we propose a deep study of continuous word representations. This study focuses on the evaluation of different types of linguistic word embeddings and their combination in order to take advantage of their complementarities. On the other hand, it focuses on acoustic word embeddings. Then, we present a study on the analysis of classification errors, with the aim of perceiving the errors that are difficult to detect. Perspectives for improving the performance of our system are also proposed, by modeling the errors at the sentence level. Finally, we exploit the linguistic and acoustic embeddings as well as the information provided by our ASR error detection system in several downstream applications
APA, Harvard, Vancouver, ISO, and other styles
20

Cristino, Luciana dos Santos. "Bilingüismo e code-switching: um estudo de caso." Pontifícia Universidade Católica de São Paulo, 2008. https://tede2.pucsp.br/handle/handle/13934.

Full text
Abstract:
Made available in DSpace on 2016-04-28T18:23:37Z (GMT). No. of bitstreams: 1 Luciana dos Santos Cristino.pdf: 1353094 bytes, checksum: b2d7384b589b3b92a00a9607815843dd (MD5) Previous issue date: 2008-02-11
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
This research aims at investigating the occurrence of code-switching in the speech of a late bilingual subject, under sociolinguistic and psycholinguistic perspectives. Code-switching or code alternation is a communicative strategy used by bilingual speakers in a given social situation. The word bilingual primarily describes someone who is proficient in two languages. This term can, however, also include the many people in the world who have varying degrees of proficiency in three, four or even more languages simultaneously (Wei, 2000) Adopting the parameters of qualitative research, we have done a case study of a 39-year-old Nigerian male bilingual who has lived in Brazil for about 6 years working as an English teacher and is married to a Brazilian. The data was collected by means of five different instruments: audio and video recording of an oral presentation of the subject to a group of students in a Brazilian school in a bilingual context (English/Portuguese), followed by an interview session; a closed individual interview recorded on audio tape, made by means of discrete questions; a written questionnaire in order to collect some personal data about the subject; a visual perception test to detect the preferential language in a free speech context; and an auto-confrontation or reflexive interview. Only the passages where the code-switching phenomenon occured were transcribed and analyzed. Some sentences of this corpus were selected for acoustic analysis and some charts of duration and F0 measures were made to analyze some prosody aspects of the native speaker when speaking the first language and the second language. The final results indicate that: (1) although the subject prefers the mother tongue (English), code-switching occurs in both ways: first language second language / second language first-tongue language; (2) the data analyses suggest that the subject uses different strategies for choosing lexical items, according to the context, the interlocutor, and the place, and that the change of the linguistic code appears most of the time initiated by the OK interjection. The emotional aspect is also worth mentioning: the subject is always worried about the interlocutor and wants to know whether he has made himself clear. The pronunciation of Portuguese words are heavily influenced by his first language; (3) we could observe, from the acoustic analyses , that the intonation curve of the yes/no questions produced in English bears much resemblance to English melodic patterning in that the subject keeps the the intonational aspects of the matrix language; (4) there is considerable alteration in the fonotaxe of some words used by the speaker; (5) the altered lexical item is replaced by words belonging to the same syntactic level
Esta pesquisa tem como objetivo investigar a ocorrência de code-switching na fala de um sujeito bilíngüe tardio (inglês/português), enfocando aspectos prosódicos e de uso lexical, sob uma perspectiva sociolingüística e psicolingüística. Code-switching ou alternância no código lingüístico é uma estratégia comunicativa usada pelo falante bilíngüe de acordo com a situação socialmente estabelecida. A palavra bilíngüe descreve primariamente alguém que seja proficiente em duas línguas. Este termo pode porém, ser usado para incluir muitas pessoas no mundo que tenham diversos níveis de proficiência em duas, três ou mais línguas simultaneamente (Wei, 2000). Seguindo os parâmetros da pesquisa qualitativa, fIzemos um estudo de caso de um bilíngüe do sexo masculino, com 39 anos de idade, nacionalidade Nigeriana, professor de língua inglesa, residente no Brasil há aproximadamente 6 anos e casado com uma brasileira. Os dados foram coletados por meio de cinco instrumentos distintos: gravação em áudio e vídeo de uma apresentação oral do sujeito de pesquisa acima citado a um grupo de alunos de uma escola brasileira em contexto bilíngüe (inglês/português), seguida de sessão de perguntas; uma entrevista fechada individual gravada em áudio, composta por perguntas pontuais; um questionário escrito para levantamento de dados pessoais do sujeito da pesquisa; um teste de percepção visual, para detectarmos a língua preferencialmente escolhida para o discurso livre; e uma auto-confrontação ou entrevista reflexiva. Foram transcritos e analisados apenas os trechos que ocorrem o code-switching. Foram selecionadas algumas sentenças deste corpus para a análise acústica e elaborados alguns gráficos das medidas de duração de F0 para análise dos aspectos prosódicos do falante nativo quando produz na primeira língua e na segunda língua. Os resultados obtidos indicam que: (1) embora o sujeito tenha preferência pela língua materna (inglês), o code-switching ocorre nos dois sentidos: primeira língua segunda língua / segunda língua primeira língua; (2) a análise dos dados trouxe à tona que o sujeito utiliza diferentes estratégias para escolha do léxico, de acordo com o contexto, do interlocutor, do local, e a mudança do código lingüístico aparece na maioria das vezes iniciado com a interjeição Ok . A questão emocional também aparece como um fator: o sujeito sempre se preocupa com o interlocutor, e com a compreensão das mensagens. A pronúncia das palavras do português é fortemente influenciada pela primeira língua do sujeito; (3) com o auxílio da análise acústica pudemos verificar que a curva entoacional de frases interrogativas totais produzidas em português revelam traços prosódicos do inglês, ou seja, o sujeito mantém a língua matriz nos aspectos entoacionais; (4) a fonotaxe sofre alteração em algumas palavras pelo falante utilizada; (5) o léxico alterado é substituído por palavras do mesmo nível sintático
APA, Harvard, Vancouver, ISO, and other styles
21

Castelli, Eric. "Caractérisation acoustique des voyelles nasales du français : mesures, modélisation et simulation temporelle." Grenoble INPG, 1989. http://www.theses.fr/1989INPG0055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Theron, Karin. "Temporal aspects of speech production in bilingual speakers with neurogenic speech disorders." Diss., Pretoria : [s.n.], 2003. http://upetd.up.ac.za/thesis/available/etd-08072003-152242.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Chien, To-Chang, and 錢鐸樟. "Integration of Acoustic and Linguistic Features for Maximum Entropy Speech Recognition." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/24325293971312481529.

Full text
Abstract:
碩士
國立成功大學
資訊工程學系碩博士班
93
In traditional speech recognition system, we assume that acoustic and linguistic information sources are independent. Parameters of acoustic hidden Markov model (HMM) and linguistic n-gram model are estimated individually and then combined together to build a plug-in maximum a posteriori (MAP) classification rule. However, the acoustic model and language model are correlated in essence. We should relax the independence assumption so as to improve speech recognition performance. In this study, we propose an integrated approach based on maximum entropy (ME) principle where acoustic and linguistic features are optimally combined in an unified framework. Using this approach, the associations between acoustic and linguistic features are explored and merged in the integrated models. On the issue of discriminative training, we also establish the relationship between ME and discriminative maximum mutual information (MMI) models. In addition, this ME integrated model is general so that the semantic topics and long distance association patterns can be further combined. In the experiments, we carry out the proposed ME model for broadcast news transcription using MATBN database. In preliminary experimental results, we obtain improvement compared to conventional speech recognition system based on plug-in MAP classification rule.
APA, Harvard, Vancouver, ISO, and other styles
24

Oasa, Hiroaki. "Vowel normalisation : an interface between acoustic and linguistic descriptions of speaker characteristics in Australian English." Phd thesis, 1993. http://hdl.handle.net/1885/110188.

Full text
Abstract:
This thesis examines existing normalisation procedures against the background of a theoretical model of inter-speaker formant variability, which describes observed formant differences in three major categories: phonetic variation, non-uniform variation, and uniform variation. A new normalisation strategy based on this model is proposed which involves the removal of uniform and non-uniform components of inter-speaker variation in order to isolate phonetic variation. The nature of this nonuniformity is subject to empirical investigation. Working along the above strategy, the method adopted in this thesis is to initially acquire a phonetically stable vowel database, which is then screened for phonetic variations through a rigorous phonetic control procedure. The resulting data, now considered to be phonetically homogeneous, are used for exploring two essential domains of inter-speaker variability that contribute to the designing of a future normalisation procedure: (1) By applying uniform transformations using a variety of published scaling parameters, the most effective uniform scaling parameters are identified. (2) Non-uniform inter-speaker variation patterns are analysed and compared with the published results of Fant (1975). A major discovery is that non-uniform inter-speaker variation patterns obtained from phonetically controlled data are grossly different from those observed by Fant. The present database comprises 594 vowels in the /h_d/ word context (11 phonemic monophthongs x 9 speakers x 6 repetitions), and the speakers include 4 adult females, 3 adult males and 2 children (male).
APA, Harvard, Vancouver, ISO, and other styles
25

Yu-WeiBai and 白育瑋. "A GMM-based Voice Conversion System using Linguistic and Acoustic Information for Customizable Text-To-Speech." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/85239144130615577973.

Full text
Abstract:
碩士
國立成功大學
電機工程學系碩博士班
101
In this thesis, a customizable speaker conversion system is implemented using linguistic classification-and-regression-tree (CART)-based spectrum, pitch conversion, and HMM-based speech synthesis system (HTS, T: triple). There are three major acoustic features in synthesis phase: spectrum, pitch and duration in HTS. Two major features are transformed by proposed methods respectively, to synthesize target speaker’s speech. In training phase, the parallel corpora are required for CART training, and due to the corpus collection efficiency and phonetic balance, a pre-designed phonetic balanced text corpus is established and a phonetic balanced sentence selection algorithm is proposed. Then, the linguistic CART and acoustic clusters of spectrum and pitch are constructed through the proposed mechanisms respectively. In synthesis phase, according to the label sequence generated by text analyzer, the conversion functions of spectrum and pitch are determined from the linguistic CART and acoustic clusters respectively. Next, the frame-based spectrum and pitch features are generated from the parameter generation process and then converted by the linguistic and acoustic conversion functions of spectrum and pitch. A complementary effect is achieved by using linguistic and acoustic conversion. Finally, target speaker’s speech is synthesized from MLSA vocoder with those converted features. In the experiments, objective and subjective evaluation tests are designed to compare the speaker conversion results. The objective evaluation of spectrum is carried out. In subjective evaluation, three types of MOS are used to estimate the conversion results: fluency, intelligibility and voice quality, MOS scores are achieved 4.03, 4.12 and 4.09 respectively. In summary, the proposed speaker conversion system has improved the conversion performance.
APA, Harvard, Vancouver, ISO, and other styles
26

Hsieh, Hung-yun, and 謝宏昀. "A Singl-Stage Architecture Integrating Acoustic, Linguistic and Prosodic Knowledge for Continuous Mandarin Speech Recognition with Very Large Vocabulary." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/15111599126186101389.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Mikula, Peter. "Makroekologie a makroevoluce ptačího zpěvu." Doctoral thesis, 2020. http://www.nusl.cz/ntk/nusl-434828.

Full text
Abstract:
Birdsong is one of the most astounding natural sounds which profoundly shaped our evolutionary thinking since the 19th century. Despite a strong interest in birdsong for over 100 years, our understanding of birdsong ecology and evolution over large spatial and phylogenetic scales is still very fragmentary. Answering many basic questions requires a global synthesis covering vast diversity of extant bird species and adoption of multidisciplinary approaches. In presented dissertation thesis, my co-workers and I have explored important patterns in macroecology and macroevolution of song in passerines (Order: Passeriformes), the most diverse and widespread bird order. We have focused on three key song phenomena: (1) song complexity, (2) song frequency and (3) the presence of song in female birds. We have exploited birdsong "big data" available on public citizen science databases and other open sources in order to fill several important gaps in the current knowledge. These data were analysed by a combination of phylogenetically-informed cross-species analyses and spatial macroecological approaches. Since the publication of Darwin's seminal work, elaborated songs are generally agreed to be the result of sexual selection. We developed a simple but reliable song complexity metric to explore a global diversity in...
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography