Rozprawy doktorskie na temat „Reconnaissance de la voix chantée”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 44 najlepszych rozpraw doktorskich naukowych na temat „Reconnaissance de la voix chantée”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Schaffhauser, Mireille. "La voix chantée : technique, pathologie, prévention". Université Louis Pasteur (Strasbourg) (1971-2008), 1985. http://www.theses.fr/1985STR1M138.
Pełny tekst źródłaLévêque, Yohana. "Le lien perception-production en voix chantée : place des représentations motrices". Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM3089.
Pełny tekst źródłaA growing body of research reveals that action production and action perception interact. In particular, it has been shown that speech perception entails articulatory motor representations in the listener. In the present work, we investigate the perception of a singing voice, a stimulus that is not primarily linked to articulatory processes. Does listening to a singing voice induce activity in the motor system? Is this motor activity stronger for a voice than for a non-biological musical sound? Two behavioral tasks, a og virtual lesionfg{} paradigm using TMS, the study of brain oscillations with EEG and an fMRI experiment carried out during my PhD have shed some light on these questions. Our results show that the perception of a singing voice is indeed associated with sensorimotor activity in repetition and discrimination tasks. Interestingly, the poorer singers displayed the stronger motor resonance. The motor system could facilitate the processing of sound or the preparation of the vocal response by internal model generation when the acoustic processing is not effective enough. The set of studies presented here thus suggests that audiomotor interactions in human voice perception are modulated by two factors: the biological dimension of sound and the listeners' vocal expertise. These results suggest new perspectives on our understanding of the auditory-vocal loop in speech and of sound perception in general
Liu, Ning. "La synthèse de la voix chantée : le cas de la langue chinoise". Paris 8, 2012. http://www.theses.fr/2012PA083495.
Pełny tekst źródłaThe objective of our research is to develop a synthesis of the singing voice for the Chinese language. The synthesis presents two main features: first the creation of the database in Chinese to use in the synthesis of the singing voice and second, the development of the control system. Our study requires knowledge and techniques from various scientific fields. We are looking for different fields including phonetics, music, speech synthesis and computer. We realized, at first, an overview of methods for speech synthesis, in which musical practices used artificial voice or the voice treated in the occidental countries. Studies and theoretical analysis are provided in a second time. They include Chinese phonetics, the theories of Chinese music, analysis for the phonetics of the spoken voice and singing voice, as well as the comparisons between the theories of Chinese and occidental music. Based on the studies and theoretical analysis performed in various fields, we realized various experimental trials. They allowed us to find a logical development of the singing voice synthesizer in Chinese. The research and experiments have presented us with good results. . They come in two sections: the first database in Chinese was created for the MBROLA algorithm, and in addition, has been improved to the singing voice. In the end, a singing speech synthesizer in Chinese and its applications, based on concatenative diphonic synthesis in real time is developed, providing new perspectives
Feugère, Lionel. "Synthèse par règles de la voix chantée contrôlée par le geste et applications musicales". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00926980.
Pełny tekst źródłaCaussade, Diane. "Troubles du langage verbal et non-verbal dans la maladie d'Alzheimer : Effets d'ateliers en voix chantée". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAL019/document.
Pełny tekst źródłaDespite the multimodal character of language, few researches studied the verbal and non-verbal communication abilities of people with Alzheimer’s disease, and even less of remediation via singing voice of those disorders. However their remediation could help to slowing down the symptomatic progression of language disorders. Given life expectancy, the exponential prevalence of neurocognitive disorders from 65 years old – of which most frequent cause is Alzheimer’s disease for which no curative treatment exists at this time –, the identification of factors slowing the symptoms progression is of the utmost importance. In view of those elements, this research focuses on the impact of singing on verbal and non-verbal communication disorders in Alzheimer’s disease. To do so, an original protocol has been set up consisting in a repetition task in singing or in speech, with or without the presentation of communicative manual gestures. This protocol helped evaluating multimodal communication abilities of people with Alzheimer’s disease and with ‘normal’ ageing. At the pre-tests, many verbal and non-verbal language disorders have been found. From the mild stage of the disease, the participants of the Patient group have produced more linguistic errors (of different types) and pauses and/or vocalic lengthenings than Control group participants. The manual gestures repetition ability of the participants of the Patient group also seems impacted, as the quality of iconic gestures production. From the moderate stage of the disease, the participants of the Patient group have produced more linguistic errors and on different types of linguistic unities, as well as more spontaneous co-verbal gestures than Control group participants. From the severe stage of the disease, the participants of the Patient group have repeated less utterances and produced more pauses and/or vocalic lengthenings than Control group participants. An impact of singing voice has only been noted on the utterances’ repetition rate, less high in singing and speech for all participants, which could be cause by a double task effect. The comparative results of verbal and non-verbal linguistic abilities have showed a positive impact of workshops in singing on the production of linguistic errors and the communicative gestures repetition of the participants of the Patient group. Our results have been discussed in the light of literature in order to distinguish verbal and non-verbal language disorders linked to ‘normal’ ageing and those symptomatic of Alzheimer’s disease. These findings enable us to make progress and to bring contribution in the current debate on the diverse possible origins of language in its multimodality, as well as suggest a line of research of the impact of singing voice on language disorders of people with Alzheimer’s disease
Cohen-Hadria, Alice. "Estimation de descriptions musicales et sonores par apprentissage profond". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS607.
Pełny tekst źródłaIn Music Information Retrieval (MIR) and voice processing, the use of machine learning tools has become in the last few years more and more standard. Especially, many state-of-the-art systems now rely on the use of Neural Networks.In this thesis, we propose a wide overview of four different MIR and voice processing tasks, using systems built with neural networks. More precisely, we will use convolutional neural networks, an image designed class neural networks. The first task presented is music structure estimation. For this task, we will show how the choice of input representation can be critical, when using convolutional neural networks. The second task is singing voice detection. We will present how to use a voice detection system to automatically align lyrics and audio tracks.With this alignment mechanism, we have created the largest synchronized audio and speech data set, called DALI. Singing voice separation is the third task. For this task, we will present a data augmentation strategy, a way to significantly increase the size of a training set. Finally, we tackle voice anonymization. We will present an anonymization method that both obfuscate content and mask the speaker identity, while preserving the acoustic scene
Vaglio, Andrea. "Leveraging lyrics from audio for MIR". Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAT027.
Pełny tekst źródłaLyrics provide a lot of information about music since they encapsulate a lot of the semantics of songs. Such information could help users navigate easily through a large collection of songs and to recommend new music to them. However, this information is often unavailable in its textual form. To get around this problem, singing voice recognition systems could be used to obtain transcripts directly from the audio. These approaches are generally adapted from the speech recognition ones. Speech transcription is a decades-old domain that has lately seen significant advancements due to developments in machine learning techniques. When applied to the singing voice, however, these algorithms provide poor results. For a number of reasons, the process of lyrics transcription remains difficult. In this thesis, we investigate several scientifically and industrially difficult ’Music Information Retrieval’ problems by utilizing lyrics information generated straight from audio. The emphasis is on making approaches as relevant in real-world settings as possible. This entails testing them on vast and diverse datasets and investigating their scalability. To do so, a huge publicly available annotated lyrics dataset is used, and several state-of-the-art lyrics recognition algorithms are successfully adapted. We notably present, for the first time, a system that detects explicit content directly from audio. The first research on the creation of a multilingual lyrics-toaudio system are as well described. The lyrics-toaudio alignment task is further studied in two experiments quantifying the perception of audio and lyrics synchronization. A novel phonotactic method for language identification is also presented. Finally, we provide the first cover song detection algorithm that makes explicit use of lyrics information extracted from audio
Henrich, Nathalie. "Etude de la source glottique en voix parlée et chantée : modélisation et estimation, mesures acoustiques et électroglottographiques, perception". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2001. http://tel.archives-ouvertes.fr/tel-00123133.
Pełny tekst źródłaLamesch, Sylvain. "Mécanismes laryngés et voyelles en voix chantée : dynamique vocale, phonétogrammes de paramètres glottiques et spectraux, transitions de mécanismes". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2010. http://tel.archives-ouvertes.fr/tel-00488701.
Pełny tekst źródłaCornaz, Sandra. "L'apport de la voix chantée pour l'intégration phonético-phonologique d'une langue étrangère : application auprès d'italophones apprenants de FLE". Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENL019/document.
Pełny tekst źródłaSpecialists in didactics aim to create an efficient method, whose teaching / learning content and tools improve phonetic skills in foreign languages. As for the educational content, research studies have proved that sounds and phonemes of a foreign language are processed according to the structure of the phonetic and phonological space of the native language. Other works point out that it is particularly relevant to compare linguistic systems in order to predict future difficulties and abilities language learners will be confronted with. As for transmission tools, studies have shown the beneficial effects of interdisciplinarity and the pertinent role music plays on cognitive and learning development. Our research objective falls within this scientific context. Our purpose has been two-fold. First, we tried to identify which parameter, related to the production of the singing voice whilst separate from the speaking voice, may facilitate the perception of non-native vowels. Secondly, we aimed at comparing the effects on the ability to produce non-native vowels of two corrective phonetic methods, one of which used the “singing voice” tool. Through the results of these studies, we tried to understand how Italian as a native language interacts with the perception and the production of French as a target language. Our studies have shown that vowel pitch and duration do not impact the discrimination of /y/ and /ø/, and that the consonant sharpness plays a role on the discrimination of /y/ in a CV type syllable. We found a positive effect of the method, which uses singing-voice as a tool, on the production of the sound spectrum of French closed vowels, but not on the evolution of the sounds and phonemes into the acoustic space. Our results support the theory that phonetic teaching and learning is relevant in language classes and suggest that singing-voice may be a useful tool to ease the perception and the production of non-native vowels
L’obiettivo dell’esperto di didattica è di elaborare un metodo efficace, il cui contenuto e gli strumenti d’insegnamento-apprendimento migliorino le competenze fonetiche in lingua straniera. Riguardo al contenuto pedagogico, le nostre ricerche hanno dimostrato che i suoni e i fonemi di una lingua sconosciuta sono trattati secondo l’organizzazione dello spazio fonetico e fonologico della lingua materna. Queste ricerche evidenziano l’utilità di confrontare sistemi linguistici differenti al fine di predire le difficoltà e le agevolazioni a cui sono esposti gli studenti di lingua straniera come lingua seconda (L2). Per quanto concerne gli strumenti d’insegnamento e apprendimento, le nostre ricerche dimostrano gli effetti benefici dell’interdisciplinarità ma anche del ruolo pertinente della musica sullo sviluppo cognitivo e sul piano degli studenti. Il nostro interesse di ricerca è doppio. In primo luogo, abbiamo tentato d’identificare quale parametro, inerente alla produzione in voce cantata e che la distingue della produzione del parlato, potesse agevolare la percezione di vocali assenti dalla lingua materna. In seguito, abbiamo voluto confrontare gli effetti di due metodi di correzione fonetica, uno dei quali sfrutta lo strumento “voce cantata”, sulla competenza di produzione delle vocali del francese /y ø/ non presenti nel sistema vocalico dei locutori di italiano madrelingua. I risultati di questi studi contribuiscono ad individuare l’impatto dell’italiano madrelingua sulla produzione e sulla percezione del francese lingua d’apprendimento. I nostri lavori non hanno evvidenziato un effetto delle modalità pitch e durata d’emissione della vocale /y/ e della vocale /ø/ sulla loro discriminazione, ma suggeriscono un ruolo del contesto pre-vocalico sulla percezione della vocale /y/ in contrasto /u/. Abbiamo scoperto un effetto favorevole del metodo di correzione fonetica includendo la voce cantata sulla produzione dello spettro sonoro delle vocali chiuse del francese, ma non sull’evoluzione delle categorie fonologiche all’interno dello spazio acustico. I risultati di questi studi sostengono la teoria secondo la quale l’insegnamento-apprendimento fonetico ha pienamente ragione di essere in classe di lingua, e suggeriscono che la voce cantata sarebbe, sottommessa ad alcune condizioni, uno strumento che facilita la percezione e la produzione di vocali assenti dalla madrelingua
Estienne, Nathalie. "La voix chantée des enseignants d’éducation musicale dans l’enseignement secondaire en France : entre modèles esthétiques et profils éducatifs". Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUL023.
Pełny tekst źródłaThe subject of the study is the singing activity of teachers in French music education. The study poses the following question: how should the teacher sing? Three important subjects inform the answer. The first Book considers literature on the teacher’s singing voice. This voice is in the first place a legitimate vocal model ; but this in itself is problematic in the context of mass education. So the voice strives to take on a vocality better suited to the teaching environment. The conclusion of the first book is that it is aesthetic choices that prevail in everything relating to vocality in music teaching. Book II tries to explain the reasons for the paradoxes which are inherent in the teacher’s singing. It analyses the socio-historical and didactic factors which make music education such a particular activity as it is grounded both in traditional representations of singing and in educational concepts. Book Three shows just how present aesthetic choices are, and how they affect the way teachers adapt them to their teaching purposes. The stylistic features of various repertoires used are studied and the sound of music teachers’ voices is analysed. This enables us to show that the way a teacher sings plays a vital role in their teaching. As a result of this research we have drawn up a conceptual proposal for ‘educational vocal profiles’
Jaumard-Hakoun, Aurore. "Modélisation et synthèse de voix chantée à partir de descripteurs visuels extraits d'images échographiques et optiques des articulateurs". Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066223/document.
Pełny tekst źródłaThis thesis reports newly developed methods which can be applied to extract relevant features from articulator images in rare singing: traditional Corsican and Sardinian polyphonies, Byzantine music, as well as Human Beat Box. We collected data, and modeled these using machine learning methods, specifically novel deep learning methods. We first modelled tongue ultrasound image sequences, carrying relevant articulatory information which would otherwise be difficult to interpret without specialized skills in ultrasound imaging. We developed methods to extract automatically the superior contour of the tongue displayed on ultrasound images. Our tongue contour extraction results are comparable with those obtained in the literature, which could lead to applications in singing pedagogy. Afterwards, we predicted the evolution of the vocal tract filter parameters from sequences of tongue and lip images, first on isolated vowel databases then on traditional Corsican singing. Applying the predicted filter parameters, combined with the development of a vocal source acoustic model exploiting electroglottographic recordings, allowed us to synthesize singing voice excerpts using articulatory images (of tongue and lips) and glottal activity, with results superior to those obtained using existing technics reported in the literature
Jaumard-Hakoun, Aurore. "Modélisation et synthèse de voix chantée à partir de descripteurs visuels extraits d'images échographiques et optiques des articulateurs". Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066223.
Pełny tekst źródłaThis thesis reports newly developed methods which can be applied to extract relevant features from articulator images in rare singing: traditional Corsican and Sardinian polyphonies, Byzantine music, as well as Human Beat Box. We collected data, and modeled these using machine learning methods, specifically novel deep learning methods. We first modelled tongue ultrasound image sequences, carrying relevant articulatory information which would otherwise be difficult to interpret without specialized skills in ultrasound imaging. We developed methods to extract automatically the superior contour of the tongue displayed on ultrasound images. Our tongue contour extraction results are comparable with those obtained in the literature, which could lead to applications in singing pedagogy. Afterwards, we predicted the evolution of the vocal tract filter parameters from sequences of tongue and lip images, first on isolated vowel databases then on traditional Corsican singing. Applying the predicted filter parameters, combined with the development of a vocal source acoustic model exploiting electroglottographic recordings, allowed us to synthesize singing voice excerpts using articulatory images (of tongue and lips) and glottal activity, with results superior to those obtained using existing technics reported in the literature
Etienne, Caroline. "Apprentissage profond appliqué à la reconnaissance des émotions dans la voix". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS517.
Pełny tekst źródłaThis thesis deals with the application of artificial intelligence to the automatic classification of audio sequences according to the emotional state of the customer during a commercial phone call. The goal is to improve on existing data preprocessing and machine learning models, and to suggest a model that is as efficient as possible on the reference IEMOCAP audio dataset. We draw from previous work on deep neural networks for automatic speech recognition, and extend it to the speech emotion recognition task. We are therefore interested in End-to-End neural architectures to perform the classification task including an autonomous extraction of acoustic features from the audio signal. Traditionally, the audio signal is preprocessed using paralinguistic features, as part of an expert approach. We choose a naive approach for data preprocessing that does not rely on specialized paralinguistic knowledge, and compare it with the expert approach. In this approach, the raw audio signal is transformed into a time-frequency spectrogram by using a short-term Fourier transform. In order to apply a neural network to a prediction task, a number of aspects need to be considered. On the one hand, the best possible hyperparameters must be identified. On the other hand, biases present in the database should be minimized (non-discrimination), for example by adding data and taking into account the characteristics of the chosen dataset. We study these aspects in order to develop an End-to-End neural architecture that combines convolutional layers specialized in the modeling of visual information with recurrent layers specialized in the modeling of temporal information. We propose a deep supervised learning model, competitive with the current state-of-the-art when trained on the IEMOCAP dataset, justifying its use for the rest of the experiments. This classification model consists of a four-layer convolutional neural networks and a bidirectional long short-term memory recurrent neural network (BLSTM). Our model is evaluated on two English audio databases proposed by the scientific community: IEMOCAP and MSP-IMPROV. A first contribution is to show that, with a deep neural network, we obtain high performances on IEMOCAP, and that the results are promising on MSP-IMPROV. Another contribution of this thesis is a comparative study of the output values of the layers of the convolutional module and the recurrent module according to the data preprocessing method used: spectrograms (naive approach) or paralinguistic indices (expert approach). We analyze the data according to their emotion class using the Euclidean distance, a deterministic proximity measure. We try to understand the characteristics of the emotional information extracted autonomously by the network. The idea is to contribute to research focused on the understanding of deep neural networks used in speech emotion recognition and to bring more transparency and explainability to these systems, whose decision-making mechanism is still largely misunderstood
De, Lepine Philippe. "Détection automatique de la voix criée en vue d'un système d'alarme". Nancy 1, 1992. http://www.theses.fr/1992NAN10410.
Pełny tekst źródłaMayorga, Ortiz Pedro. "Reconnaissance vocale dans un contexte de voix sur IP : diagnostic et propositions". Grenoble INPG, 2005. http://www.theses.fr/2005INPG0014.
Pełny tekst źródłaThe purpose of This work of thesis is to diagnose the new challenges for the speech recognition in the recent context of the voice over IP, and to propose some solutions making it possible to improve the performances of the automatic recognition systems. The first contribution of our work consequently consisted in diagnosing most precisely possible the problems due to the compression and the packet losses for two different recognition tasks: the automatic speech recognition and automatic speaker recognition. From the diagnosis result, we noted a more important degradation due to the compression on the speaker verification task. With regard to the automatic speech recognition, the most important degradation was caused by the packet losses. The second contribution of this thesis thus corresponds to the proposal for recovering techniques in order to improve the robustness of systems under significant packet losses conditions. The recovery techniques were applied on the basis of transmitter and receiver. The experimental results show that the techniques of interleaving based on the transmitter combined with the interpolation based on the receiver, prove to be the most efficient. In addition, our experiments also confirm the advantages of a "distributed architecture" where acoustic vectors traveling on the network from the client to the recognition server (concept of "distributed speech recognition" proposed by the international organization ETSI), compared to an architecture more traditional type "server pure" where the signal (or its compressed version) travels from the client terminal on the network to the recognition server
Pouchoulin, Gilles. "Approche statistique pour l'analyse objective et la caractérisation de la voix dysphonique". Avignon, 2008. http://www.theses.fr/2008AVIG0162.
Pełny tekst źródłaStill currently, assessment of the pathological voice quality and the reasons of its deterioration is the main clinical worry of the medical profession. In front of the limits of the auditory judgment of the vocal dysfunction, the voice therapists strongly express the need of an objective method for assessing the quality of the pathological voice, complementary to the perceptual analysis. In this context, this thesis is interested in the adaptation of techniques drawing upon the Automatic Speaker Recognition domain to the dysphonic voice classification task according to the grade of the GRBAS scale. Its objective is to acquire a better understanding of dysphonia by using an automatic classification system as a tool of characterization of associated acoustic phenomena in the speech signal in order to provide experts with novel knowledge on voice degradation. In this way, three research axes are proposed : (1) a comparison between different parametric representations of the speech signal (spectral, cepstral, predictive) showed the interest of the spectral analysis in this experimental context, as well as the relevance of the dynamic information. (2) a study, focusing on the manner in which the acoustic features related to dysphonia are spread on the overall frequency domain, outlined the relevance of the [0-3000]Hz frequency band. (3) a phonetic study which the main observation highlights the relevance of the consonant class (notably of the unvoiced consonants) rather unexpected given the type of studied pathology. This study permitted the automatic system to fulfill its role of a tool characterizing pathological phenomena, and thus putting them in evidence (for example the VOT) for a more extensive phonetic and clinical expertise
Ouellet, Simon. "Reconnaissance biométrique de personne utilisant le visage, la voix et la métrologie humaine pour robots mobiles". Mémoire, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/8186.
Pełny tekst źródłaKerkeni, Leila. "Analyse acoustique de la voix pour la détection des émotions du locuteur". Thesis, Le Mans, 2020. http://www.theses.fr/2020LEMA1003.
Pełny tekst źródłaThe aim of this thesis is to propose a speech emotion recognition (SER) system for application in classroom. This system has been built up using novel features based on the amplitude and frequency (AM-FM) modulation model of speech signal. This model is based on the joint use of empirical mode decomposition (EMD) and the Teager-Kaiser energy operator (TKEO). In this system, the discrete (or categorical) emotion theory was chosen to represent the six basic emotions (sadness, anger, joy, disgust, fear and surprise) and neutral emotion.Automatic recognition has been optimized by finding the best combination of features, selecting the most relevant ones and comparing different classification approaches. Two reference speech emotional databases, in German and Spanish, were used to train and evaluate this system. A new database in French, more appropriate for the educational context was built, tested andvalidated
Luherne, Viviane. "Reconnaissance des visages et des voix émotionnels dans une population adulte avec gliome et après accident vasculaire cérébral". Thesis, Paris 8, 2015. http://www.theses.fr/2015PA080053/document.
Pełny tekst źródłaEmotional domain was ignored for a long time, but today clinical neuropsychology acknowledges its overlapping with the cognitive domain and its importance in the follow-up of brain-damage patients, where difficulties in emotion recognition reduce the quality of interpersonal interactions and social cognition. The present thesis focuses on the recognition of five basic emotions (happiness, fear, anger, sadness, disgust) and of a neutral expression in two groups of patients with low-grade gliomas and post-stroke. The experimental protocol, which requires visual and auditory non-verbal processing, also includes a crossmodal condition. Three case studies of patients with gliomas allow us to refine our understanding of their emotional functioning. Our results show moderate visual and auditory difficulties in emotion recognition for both groups, with lower deficits in the glioma group than in the post-stroke group. These results confirm the relevance of a hodotopical view of the brain for emotional processes as in other cognitive domains. However, the behavioral benefit of crossmodal presentation observed in both groups is not sufficient to sustain normal results, which is likely to impact daily life. We highlight the necessity of evaluating emotion recognition as well as emotion experience in brain damage patients, in particular when they suffer from slowly infiltrating tumours
Padellini, Marc. "Optimisation d'un schéma de codage de la parole à très bas débit, par indexation d'unités de taille variable". Marne-la-Vallée, 2006. http://www.theses.fr/2006MARN0293.
Pełny tekst źródłaThis thesis aims at studying a speech coding scheme operating at a very low bit rate, around 500 bits/s, relying on speech recognition and speech synthesis techniques. It follows the work carried out by the RNRT project SYMPATEX and Cernocky’s [1] thesis. On one hand, elementary speech units are recognized by the coder, using Hidden Markov Models. On the other hand, a concatenative speech synthesis is used in the decoder. This system takes advantage of a large speech corpus stored in the system, and organized in a synthesis database. The encoder looks up in the corpus the units that best fit the speech to be encoded, then unit indexes and prosodic parameters are transmitted. The decoder retrieves in the database the units to be concatenated. This thesis deals with issues on the overall speech quality of the encoding scheme. A dynamic unit selection is proposed to this purpose. Furthermore, the scheme has been extended to operate under realistic conditions. Noisy environments have been studied, and a noise adaptation module was created. Extension to speaker independent mode is achieved by training the system on a large number of speakers, and using a hierarchic classification of speakers to create a set of synthesis databases which is close to the test speaker. Finally, complexity of the whole scheme is analyzed, and a method to compress the database is proposed
Tahon, Marie. "Analyse acoustique de la voix émotionnelle de locuteurs lors d'une interaction humain-robot". Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00780341.
Pełny tekst źródłaArdaillon, Luc. "Synthesis and expressive transformation of singing voice". Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066511/document.
Pełny tekst źródłaThis thesis aimed at conducting research on the synthesis and expressive transformations of the singing voice, towards the development of a high-quality synthesizer that can generate a natural and expressive singing voice automatically from a given score and lyrics. Mainly 3 research directions can be identified: the methods for modelling the voice signal to automatically generate an intelligible and natural-sounding voice according to the given lyrics; the control of the synthesis to render an adequate interpretation of a given score while conveying some expressivity related to a specific singing style; the transformation of the voice signal to improve its naturalness and add expressivity by varying the timbre adequately according to the pitch, intensity and voice quality. This thesis provides some contributions in each of those 3 directions. First, a fully-functional synthesis system has been developed, based on diphones concatenations. The modular architecture of this system allows to integrate and compare different signal modeling approaches. Then, the question of the control is addressed, encompassing the automatic generation of the f0, intensity, and phonemes durations. The modeling of specific singing styles has also been addressed by learning the expressive variations of the modeled control parameters on commercial recordings of famous French singers. Finally, some investigations on expressive timbre transformations have been conducted, for a future integration into our synthesizer. This mainly concerns methods related to intensity transformation, considering the effects of both the glottal source and vocal tract, and the modeling of vocal roughness
Tahon, Marie. "Analyse acoustique de la voix émotionnelle de locuteurs lors d’une interaction humain-robot". Thesis, Paris 11, 2012. http://www.theses.fr/2012PA112275/document.
Pełny tekst źródłaThis thesis deals with emotional voices during a human-robot interaction. In a natural interaction, we define at least, four kinds of variabilities: environment (room, microphone); speaker, its physic characteristics (gender, age, voice type) and personality; emotional states; and finally the kind of interaction (game scenario, emergency, everyday life). From audio signals collected in different conditions, we tried to find out, with acoustic features, to overlap speaker and his emotional state characterisation taking into account these variabilities.To find which features are essential and which are to avoid is hard challenge because it needs to work with a high number of variabilities and then to have riche and diverse data to our disposal. The main results are about the collection and the annotation of natural emotional corpora that have been recorded with different kinds of speakers (children, adults, elderly people) in various environments, and about how reliable are acoustic features across the four variabilities. This analysis led to two interesting aspects: the audio characterisation of a corpus and the drawing of a black list of features which vary a lot. Emotions are ust a part of paralinguistic features that are supported by the audio channel, other paralinguistic features have been studied such as personality and stress in the voice. We have also built automatic emotion recognition and speaker characterisation module that we have tested during realistic interactions. An ethic discussion have been driven on our work
Aman, Frédéric. "Reconnaissance automatique de la parole de personnes âgées pour les services d'assistance à domicile". Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM095/document.
Pełny tekst źródłaIn the context of the aging population, the aim of this thesis is to include in the living environment of the elderly people an automatic speech recognition (ASR) system, which can recognize calls to alert the emergency services. The acoustic models of ASR systems are mostly learned with non-elderly speech, delivered in a neutral way, and read. However, in our context, we are far from these ideal conditions (aging and expressive voice). So, our system must be adapted to the task. For our work, we recorded corpora made of elderly voices and distress calls. From these corpora, a study on the differences between young and old voices, and between neutral and emotional voice permit to develop an ASR system adapted to the task. This system was then evaluated on data recorded during an experiment in realistic situation, including falls played by volunteers
Ardaillon, Luc. "Synthesis and expressive transformation of singing voice". Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066511.
Pełny tekst źródłaThis thesis aimed at conducting research on the synthesis and expressive transformations of the singing voice, towards the development of a high-quality synthesizer that can generate a natural and expressive singing voice automatically from a given score and lyrics. Mainly 3 research directions can be identified: the methods for modelling the voice signal to automatically generate an intelligible and natural-sounding voice according to the given lyrics; the control of the synthesis to render an adequate interpretation of a given score while conveying some expressivity related to a specific singing style; the transformation of the voice signal to improve its naturalness and add expressivity by varying the timbre adequately according to the pitch, intensity and voice quality. This thesis provides some contributions in each of those 3 directions. First, a fully-functional synthesis system has been developed, based on diphones concatenations. The modular architecture of this system allows to integrate and compare different signal modeling approaches. Then, the question of the control is addressed, encompassing the automatic generation of the f0, intensity, and phonemes durations. The modeling of specific singing styles has also been addressed by learning the expressive variations of the modeled control parameters on commercial recordings of famous French singers. Finally, some investigations on expressive timbre transformations have been conducted, for a future integration into our synthesizer. This mainly concerns methods related to intensity transformation, considering the effects of both the glottal source and vocal tract, and the modeling of vocal roughness
Perrotin, Olivier. "Chanter avec les mains : interfaces chironomiques pour les instruments de musique numériques". Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112207/document.
Pełny tekst źródłaThis thesis deals with the real-time control of singing voice synthesis by a graphic tablet, based on the digital musical instrument Cantor Digitalis.The relevance of the graphic tablet for the intonation control is first considered, showing that the tablet provides a more precise pitch control than real voice in experimental conditions.To extend the accuracy of control to any situation, a dynamic pitch warping method for intonation correction is developed. It enables to play under the pitch perception limens preserving at the same time the musician's expressivity. Objective and perceptive evaluations validate the method efficiency.The use of new interfaces for musical expression raises the question of the modalities implied in the playing of the instrument. A third study reveals a preponderance of the visual modality over the auditive perception for the intonation control, due to the introduction of visual clues on the tablet surface. Nevertheless, this is compensated by the expressivity allowed by the interface.The writing or drawing ability acquired since early childhood enables a quick acquisition of an expert control of the instrument. An ensemble of gestures dedicated to the control of different vocal effects is suggested.Finally, an intensive practice of the instrument is made through the Chorus Digitalis ensemble, to test and promote our work. An artistic research has been conducted for the choice of the Cantor Digitalis' musical repertoire. Moreover, a visual feedback dedicated to the audience has been developed, extending the perception of the players' pitch and articulation
Péron, Julie. "Rôle du noyau sous-thalamique et de ses connexions cortico-sous-corticales dans la reconnaissance des émotions communiquées par le visage et par la voix". Rennes 1, 2008. http://www.theses.fr/2008REN1B118.
Pełny tekst źródłaTranquart, Nicolas. "Plateforme de test de qualité vocale et étude de faisabilité de tests subjectifs en dialogue contrôlé avec sujet virtuel". Versailles-St Quentin en Yvelines, 2011. http://www.theses.fr/2011VERS0053.
Pełny tekst źródłaGenerally speaking, the number of speech quality assessment methods, including objective methods has increased significantly over the past decade with the apparition of 2nd and 3rd generation mobile networks and Voice over IP. However, due to all of these limitations of current objective methods, the interest of subjective tests is not challenged, especially since their methodologies are now relatively successful and accurate Nevertheless, their implementation remains expansive due to a huge time and cost consuming, and can not necessarily be able to discriminate sufficiently different test conditions. In this context, to meet the needs of the industry, this thesis focuses on two axes. First of all, develop a test platform for objective and subjective voice quality meets the following requirements : 1°) Implementation of subjective listening tests, conversational, and double talk in accordance with the corresponding ITU-T recommendations. 2°) Facilitate the reproducibility of tests, for controlling and repeatability of the tests conditions. 3°) Reduce the time for a test campaign without changing the methodology. 4°) Increase the discrimination between the different test cases. 5°) Allow the simultaneous running of subjective and objective tests, to better benchmark and review new implementations of objective intrusive methods. On the other hand, study the feasibility of subjective tests in controlled dialogue with virtual subject, to offer a better flexibility, more than one person being needed for conversational subjective tests
Ambert-Dahan, Emmanuèle. "Perception des émotions non verbales dans la musique, les voix et les visages chez les adultes implantés cochléaires présentant une surdité évolutive". Thesis, Lille 3, 2014. http://www.theses.fr/2014LIL30027/document.
Pełny tekst źródłaWhile cochlear implantation is quite successful in restoring speech comprehension in quiet environments other auditory tasks, such as communication in noisy environments or music perception remain very challenging for cochlear implant (CI) users. Communication involves multimodal perception since information is transmitted by vocal and facial expressions which are crucial to interpret speaker’s emotional state. Indeed, very few studies have examined perception of non verbal emotions in case of progressive neurosensorial hearing loss in adults. The aim of this thesis was to test the influence of rehabilitation by CI after acquired deafness on emotional judgment of musical excerpts and in non verbal voices. We also examined the influence of acquired post-lingual progressive deafness on emotional judgment of faces. For this purpose, we conducted four experimental studies in which performances of deaf and cochlear implanted subjects were compared to those of normal hearing controls. To assess emotional judgment in music, voices and faces, we used a task that consisted of emotional categories identification (happiness, fear, anger or peacefulness for music and neutral) and dimensional judgment of valence and arousal. The first two studies evaluated emotional perception in auditory modality by successively examining recognition of emotions in music and voices. The two following studies focused on emotion recognition in visual modality, particularly on emotional facial expressions before and after cochlear implantation. Results of these studies revealed greater deficits in emotion recognition in the musical and vocal than visual domains as well as a disturbance of arousal judgments, stimuli being perceived less exciting by CI patients as compared to NH subjects. Yet, recognition of emotions in music and voices, although limited, was performed above chance level demonstrating CI benefits for auditory emotions processing. Conversely, valence judgments were not impaired in music, vocal and facial emotional tests. Surprisingly, results of these studies suggest that, at least for a sub-group of patients, recognition of facial emotions is affected by acquired deafness indicating the consequences of progressive hearing loss in processing emotion presented in another modality. Thus, it seems that progressive deafness as well as the lack of spectral cues transmitted by the cochlear implant might foster verbal communication to the detriment of the non verbal emotional communication
Ouni, Slim. "Modélisation de l'espace articulatoire par un codebook hypercubique pour l'inversion acoustico-articulatoire". Nancy 1, 2001. http://www.theses.fr/2001NAN10210.
Pełny tekst źródłaAglieri, Virginia. "Behavioural and neural inter-individual variability in voice perception processes". Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0176/document.
Pełny tekst źródłaIn humans, voice conveys heterogeneous information such as speaker’s identity, which can be automatically extracted even when language content and emotional state vary. We hypothesized that the ability to recognize a speaker considerably varied across the population, as previously observed for face recognition. To test this hypothesis, a short voice recognition test was delivered to 1120 subjects in order to observe how voice recognition abilities were distributed in the general population. Since it has been previously observed that there exists a considerable inter-individual variability in voice-elicited activity in temporal voice areas (TVAs), regions along the superior temporal sulcus/gyrus (STS/STG) that show preferentially activation for voices than other sounds, the second aim of this work was then to better characterize the link between the behavioral and neural mechanisms underlying inter-individual variability in voice recognition processes through functional magnetic resonance imaging (fMRI). The results of a first fMRI study showed that functional connectivity between frontal and temporal voice sensitive regions increased with voice recognition scores obtained at a voice recognition test. Another fMRI study showed that speaker’s identity was treated in an extended network of regions, including TVAs but also frontal regions and that voice/non-voice classification accuracy in right STS increased with speaker identification abilities. Altogether, these results suggest that voice recognition abilities considerably vary across subjects and that this variability can be mirrored by different neural profiles within the voice perception network
Sliwa, Julia. "Représentation des individus par le macaque Rhésus : approche neurophysiologique et comportementale". Phd thesis, Université Claude Bernard - Lyon I, 2012. http://tel.archives-ouvertes.fr/tel-00979701.
Pełny tekst źródłaJamet, Éric. "L'organisation des étapes de traitement de l'information pendant la dénomination : apports de l'amorçage phonologique". Rennes 2, 1995. http://www.theses.fr/1995REN20015.
Pełny tekst źródłaPicture naming requires more time than word naming. A number of independant line of research have suggested that semantic and phonological information become avaible differentially for pictures and words. We report here a series of experiments in which semantic and phonological priming were used. The data show that, for pictures, phonological coding is accessible only after semantic processing but that these two codes are realised partially in a parallel way
Ajili, Moez. "Reliability of voice comparison for forensic applications". Thesis, Avignon, 2017. http://www.theses.fr/2017AVIG0223/document.
Pełny tekst źródłaIt is common to see voice recordings being presented as a forensic trace in court. Generally, a forensic expert is asked to analyse both suspect and criminal’s voice samples in order to indicate whether the evidence supports the prosecution (same-speaker) or defence (different-speakers) hypotheses. This process is known as Forensic Voice Comparison (FVC). Since the emergence of the DNA typing model, the likelihood-ratio (LR) framework has become the new “golden standard” in forensic sciences. The LR not only supports one of the hypotheses but also quantifies the strength of its support. However, the LR accepts some practical limitations due to its estimation process itself. It is particularly true when Automatic Speaker Recognition (ASpR) systems are considered as they are outputting a score in all situations regardless of the case specific conditions. Indeed, several factors are not taken into account by the estimation process like the quality and quantity of information in both voice recordings, their phonological content or also the speakers intrinsic characteristics, etc. All these factors put into question the validity and reliability of FVC. In this Thesis, we wish to address these issues. First, we propose to analyse how the phonetic content of a pair of voice recordings affects the FVC accuracy. We show that oral vowels, nasal vowels and nasal consonants bring more speaker-specific information than averaged phonemic content. In contrast, plosive, liquid and fricative do not have a significant impact on the LR accuracy. This investigation demonstrates the importance of the phonemic content and highlights interesting differences between inter-speakers effects and intra-speaker’s ones. A further study is performed in order to study the individual speaker-specific information for each vowel based on formant parameters without any use of ASpR system. This study has revealed interesting differences between vowels in terms of quantity of speaker information. The results show clearly the importance of intra-speaker variability effects in FVC reliability estimation. Second, we investigate an approach to predict the LR reliability based only on the pair of voice recordings. We define a homogeneity criterion (NHM) able to measure the presence of relevant information and the homogeneity of this information between the pair of voice recordings. We are expecting that lowest values of homogeneity are correlated with the lowest LR’s accuracy measures, as well as the opposite behaviour for high values. The results showed the interest of the homogeneity measure for FVC reliability. Our studies reported also large differences of behaviour between FVC genuine and impostor trials. The results confirmed the importance of intra-speaker variability effects in FVC reliability estimation. The main takeaway of this Thesis is that averaging the system behaviour over a high number of factors (speaker, duration, content...) hides potentially many important details. For a better understanding of FVC approach and/or an ASpR system, it is mandatory to explore the behaviour of the system at an as-detailed-as-possible scale (The devil lies in the details)
Cheippe, Emmanuelle. "La voie musicale pour remédier aux difficultés de prononciation des voyelles de l'allemand dans des textes lus : expérimentation dans une classe bilingue : analyse acoustique". Phd thesis, Université de Strasbourg, 2012. http://tel.archives-ouvertes.fr/tel-00781335.
Pełny tekst źródłaDebladis, Jimmy. "Traitement des signaux de communication dans le syndrome de Prader-Willi : aspects descriptifs, analytiques et évolutifs". Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30036.
Pełny tekst źródłaPrader-Willi syndrome (PWS) is a rare genetic syndrome affecting around 1 in 20,000 births in France. The two most frequent genetic origins are either a deletion in the 15q11q12 region on the paternal chromosome 15 or maternal uniparental disomy. This syndrome is easily identified through hypotonia and feeding difficulties observed at birth; then marked by hyperphagia, a constant sensation of hunger and behavioural difficulties that appear in time. From a social point of view, these patients present with atypical social interactions, similar to those reported in autism spectrum disorder (ASD). In PWS, very little research has been done concerning the behavioural and social interaction difficulties observed. Previous research has shown that these patients have deficits in recognizing emotions as well as atypical cortical signatures in response to faces. Nonetheless, an unexplored gap remains regarding how social signals are treated and analyzed. This thesis brings new data on potentially altered vocal and facial treatment processes in PWS. We developed a completed battery of behavioural tests aiming to study how voices and faces are processed. We demonstrated that patients with PWS have slower motor and perceptive skills. Furthermore, we identified a facial processing deficit that is not present for voiced. We suggest that the facial processing deficits observed could originate from a global perception deficit and the unification of several sources of information, thereby relating to the central coherence. Finally, we showed that patients with a materal disomy suffered from more severe social interaction difficulties than patients presenting with a deletion. Additionally, a therapeutic axis will be developed with the administration of oxytocin in children and adults with PWS. Oxytocin, over these past few years, has gained renewed interest for individuals with social interaction deficits. This therapeutic axis will allow us to study the long-term effects of oxytocin on children and the potential benefits of a treatment on the social and feeding behaviours
Le, Moine Veillon Clément. "Neural Conversion of Social Attitudes in Speech Signals". Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS034.pdf.
Pełny tekst źródłaAs social animals, humans communicate with each other by transmitting various types of information about the world and about themselves. At the heart of this process, the voice allows the transmission of linguistic messages denoting a strict meaning that can be decoded by the interlocutor. By conveying other information such as attitudes or emotions that connote the strict meaning, the voice enriches and enhances the communication process. In the last few decades, the digital world has become an important part of our lives. In many everyday situations, we are moving away from keyboards, mice and even touch screens to interactions with voice assistants or even virtual agents that enable human-like communication with machines. In the emergence of a hybrid world where physical and virtual reality coexist, it becomes crucial to enable machines to capture, interpret, and replicate the emotions and attitudes conveyed by the human voice.This research focuses on speech social attitudes, which can be defined - in a context of interaction - as speech dispositions towards others and aims to develop algorithms for their conversion. Fulfilling this objective requires data, i.e. a collection of audio recordings of utterances conveying various vocal attitudes. This research is thus built out of this initial step in gathering raw material - a dataset dedicated to speech social attitudes. Designing such algorithms involves a thorough understanding of what these attitudes are both in terms of production - how do individuals use their vocal apparatus to produce attitudes? - and perception - how do they decode those attitudes in speech? We therefore conducted two studies, a first uncovering the production strategies of speech attitudes and a second - based on a Best Worst Scaling (BWS) experiment - mainly hinting at biases involved in the perception such vocal attitudes, thus providing a twofold account for how speech attitudes are communicated by French individuals. These findings were the basis for the choice of speech signal representation as well as the architectural and optimisation choices for the design of a speech attitude conversion algorithm. In order to extend the knowledge on the perception of vocal attitudes gathered during this second study to the whole database, we worked on the elaboration of a BWS-Net allowing the detection of mis-communicated attitudes, and thus provided clean data for conversion learning. In order to learn how to convert vocal attitudes, we adopted a transformer-based approach in a many-to-many conversion paradigm with mel-spectrogram as speech signal representation. Since early experiments revealed a loss of intelligibility in the converted utterances, we proposed a linguistic conditioning of the conversion algorithm through incorporation of a speech-to-text module. Both objective and subjective measures have shown the resulting algorithm achieves better performance than the baseline transformer both in terms of intelligibility and attitude conveyed
Regnier, Lise. "Localization, Characterization and Recognition of Singing Voices". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00687475.
Pełny tekst źródłaZoghlami, Naouel. "Processus ascendants et descendants en compréhension de l'oral en langue étrangère - Problèmes et retombées didactiques pour la compréhension de l'anglais". Electronic Thesis or Diss., Paris 8, 2015. http://www.theses.fr/2015PA080041.
Pełny tekst źródłaThis thesis focuses on the complex relationship between bottom-up and top-down processes in L2 speech comprehension; i.e. between the use of the signal and the linguistic input on one hand, and the integration of various types of knowledge (linguistic, discourse, pragmatic, general) on the other hand. Despite a large body of research on the cognitive processes underlying listening in psycholinguistics, foreign language (L2) acquisition and teaching (e.g., Cutler & Clifton, 1999; Field, 2008a; Rost, 2002; Brown, 1990), there are still gaps in our understanding of these processes and the impact certain factors have on listening comprehension. Assuming that L1 and L2 listening follow the same cognitive architecture, we first review recent psycholinguistic models of L1 listening. We also examine the main factors constraining L2 listening comprehension. As our summary of the few SLA studies that have investigated the role of bottom-up information and the strategic behavior of L2 listeners points to the important contribution of metacognition, we clarify the terminological fuzziness characterizing this concept, and propose a model of metacognition in real-world unidirectional L2 listening. We then present the results of a study that we conducted to investigate the exact contribution of these different factors to L2 listening. The participants in this study were EFL French and Tunisian teachers (n=23) and learners (n=226). Using mixed quantitative (different tests and questionnaires) and qualitative (protocol analysis and gating experiments - Ericsson & Simon, 1993; Grosjean, 1980) methods, our aim was to investigate: 1) the factors perceived by learners and teachers as influencing L2 listening; 2) the relative contribution of linguistic knowledge, auditory discrimination, spoken word recognition (SWR), and meta-comprehension knowledge to successful L2 listening; 3) on-line listening problems and strategy use. For all of these parameters, we looked more closely at different levels of listening proficiency (various analyses of the performance of skilled and unskilled L2 listeners), as well as the possible influence of the two L1s (French and Tunisian Arabic) involved in the study.Our analyses show that: 1) there is a general discrepancy between what is perceived as making L2 listening difficult and what really renders it problematic; 2) SWR and vocabulary knowledge contribute significantly to the variance in L2 listening, with SWR being a stronger predictor; 3) listening problems encountered on-line are mainly lower-level (segmentation) and, although strategies contribute to speech comprehension, they are not discriminatory. What characterizes a proficient L2 listener seems to be accurate formal processing, not strategic processing of oral input. The findings are discussed from a theoretical and pedagogical perspective. Keywords: listening comprehension, French and Tunisian learners of L2 English, bottom-up and top-down processes, formal processing, integration and situation model, attentional resources, gating, protocol analysis, comparative analysis
Zoghlami, Naouel. "Processus ascendants et descendants en compréhension de l'oral en langue étrangère - Problèmes et retombées didactiques pour la compréhension de l'anglais". Thesis, Paris 8, 2015. http://www.theses.fr/2015PA080041.
Pełny tekst źródłaThis thesis focuses on the complex relationship between bottom-up and top-down processes in L2 speech comprehension; i.e. between the use of the signal and the linguistic input on one hand, and the integration of various types of knowledge (linguistic, discourse, pragmatic, general) on the other hand. Despite a large body of research on the cognitive processes underlying listening in psycholinguistics, foreign language (L2) acquisition and teaching (e.g., Cutler & Clifton, 1999; Field, 2008a; Rost, 2002; Brown, 1990), there are still gaps in our understanding of these processes and the impact certain factors have on listening comprehension. Assuming that L1 and L2 listening follow the same cognitive architecture, we first review recent psycholinguistic models of L1 listening. We also examine the main factors constraining L2 listening comprehension. As our summary of the few SLA studies that have investigated the role of bottom-up information and the strategic behavior of L2 listeners points to the important contribution of metacognition, we clarify the terminological fuzziness characterizing this concept, and propose a model of metacognition in real-world unidirectional L2 listening. We then present the results of a study that we conducted to investigate the exact contribution of these different factors to L2 listening. The participants in this study were EFL French and Tunisian teachers (n=23) and learners (n=226). Using mixed quantitative (different tests and questionnaires) and qualitative (protocol analysis and gating experiments - Ericsson & Simon, 1993; Grosjean, 1980) methods, our aim was to investigate: 1) the factors perceived by learners and teachers as influencing L2 listening; 2) the relative contribution of linguistic knowledge, auditory discrimination, spoken word recognition (SWR), and meta-comprehension knowledge to successful L2 listening; 3) on-line listening problems and strategy use. For all of these parameters, we looked more closely at different levels of listening proficiency (various analyses of the performance of skilled and unskilled L2 listeners), as well as the possible influence of the two L1s (French and Tunisian Arabic) involved in the study.Our analyses show that: 1) there is a general discrepancy between what is perceived as making L2 listening difficult and what really renders it problematic; 2) SWR and vocabulary knowledge contribute significantly to the variance in L2 listening, with SWR being a stronger predictor; 3) listening problems encountered on-line are mainly lower-level (segmentation) and, although strategies contribute to speech comprehension, they are not discriminatory. What characterizes a proficient L2 listener seems to be accurate formal processing, not strategic processing of oral input. The findings are discussed from a theoretical and pedagogical perspective. Keywords: listening comprehension, French and Tunisian learners of L2 English, bottom-up and top-down processes, formal processing, integration and situation model, attentional resources, gating, protocol analysis, comparative analysis
Mermet, Michel. "Informatique et maîtrise de l'oral en maternelle bilingue breton-français : modèle de l'élève dans le dialogue enfant-ordinateur et ergonomie de la parole". Phd thesis, Université Rennes 2, 2006. http://tel.archives-ouvertes.fr/tel-00199337.
Pełny tekst źródłaRosa, Christine. "Spécialisation hémisphérique de la reconnaissance de sa propre voix". Thèse, 2008. http://hdl.handle.net/1866/6367.
Pełny tekst źródłaLe, Rolle Katia. "La dramaturgie vocale : approche herméneutique des qualités expressives de la voix chantée. Le cas du metal symphonique". Thèse, 2018. http://hdl.handle.net/1866/21472.
Pełny tekst źródła