Tesis sobre el tema "Synthèse de parole à partir du texte"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 26 mejores tesis para su investigación sobre el tema "Synthèse de parole à partir du texte".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Pouget, Maël. "Synthèse incrémentale de la parole à partir du texte". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAT008/document.
Texto completoIn this thesis, we investigate a new paradigm for text-to-speech synthesis (TTS) allowing to deliver synthetic speech while the text is being inputted : incremental text-to-speech synthesis. Contrary to conventional TTS systems, that trigger the synthesis after a whole sentence has been typed down, incremental TTS devices deliver speech in a ``piece-meal'' fashion (i.e. word after word) while aiming at preserving the speech quality achievable by conventional TTS systems.By reducing the waiting time between two speech outputs while maintaining a good speech quality, such a system should improve the quality of the interaction for speech-impaired people using TTS devices to express themselves.The main challenge brought by incremental TTS is the synthesis of a word, or of a group of words, with the same segmental and supra-segmental quality as conventional TTS, but without knowing the end of the sentence to be synthesized. In this thesis, we propose to adapt the two main modules (natural language processing and speech synthesis) of a TTS system to the incremental paradigm.For the natural language processing module, we focused on part-of-speech tagging, which is a key step for phonetization and prosody generation. We propose an ``adaptive latency algorithm'' for part-of-speech tagging, that estimates if the inferred part-of-speech for a given word (based on the n-gram approach) is likely to change when adding one or several words. If the Part-of-speech is considered as likely to change, the synthesis of the word is delayed. In the other case, the word may be synthesized without risking to alter the segmental or supra-segmental quality of the synthetic speech. The proposed method is based on a set of binary decision trees trained over a large corpus of text. We achieve 92.5% precision for the incremental part-of-speech tagging task and a mean delay of 1.4 words.For the speech synthesis module, in the context of HMM-based speech synthesis, we propose a training method that takes into account the uncertainty about contextual features that cannot be computed at synthesis time (namely, contextual features related to the following words). We compare the proposed method to other strategies (baselines) described in the literature. Objective and subjective evaluation show that the proposed method outperforms the baselines for French.Finally, we describe a prototype developed during this thesis implementing the proposed solution for incremental part-of-speech tagging and speech synthesis. A perceptive evaluation of the word grouping derived from the proposed adaptive latency algorithm as well as the segmental quality of the synthetic speech tends to show that our system reaches a good trade-off between reactivity (minimizing the waiting time between the input and the synthesis of a word) and speech quality (both at segmental and supra-segmental levels)
Tran, Do Dat. "Synthèse de la parole à partir du texte en langue vietnamienne". Grenoble INPG, 2007. http://www.theses.fr/2007INPG0181.
Texto completoThe development of a high quality speech synthesis system represents a considerable workload. Tt requires mainly the knowledge of two domains: linguistics and signal processing. Because of the intrinsic diversity of languages, designing a common universal structure for speech synthesis systems for ail languages still seems elusive. The work of this thesis not only chooses, implements and assesses techniques and theories already developed for other languages, but also contributes to complete and to perfect these techniques and theories to adapt them to Vietnamese language. This thesis also focuses on improving the quality of the synthetic signal specifically for Vietnamese by modeling the acoustic and temporal parameters of prosody of this tonal Asian language, a task not yet undertaken, to our knowledge
Le, Faucheur Laurent. "Traitement du signal de parole pour la synthèse à partir du texte". Brest, 1991. http://www.theses.fr/1991BRES2008.
Texto completoEvrard, Marc. "Synthèse de parole expressive à partir du texte : Des phonostyles au contrôle gestuel pour la synthèse paramétrique statistique". Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112202.
Texto completoThe subject of this thesis was the study and conception of a platform for expressive speech synthesis.The LIPS3 Text-to-Speech system — developed in the context of this thesis — includes a linguistic module and a parametric statistical module (built upon HTS and STRAIGHT). The system was based on a new single-speaker corpus, designed, recorded and annotated.The first study analyzed the influence of the precision of the training corpus phonetic labeling on the synthesis quality. It showed that statistical parametric synthesis is robust to labeling and alignment errors. This addresses the issue of variation in phonetic realizations for expressive speech.The second study presents an acoustico-phonetic analysis of the corpus, characterizing the expressive space used by the speaker to instantiate the instructions that described the different expressive conditions. Voice source parameters and articulatory settings were analyzed according to their phonetic classes, which allowed for a fine phonostylistic characterization.The third study focused on intonation and rhythm. Calliphony 2.0 is a real-time chironomic interface that controls the f0 and rhythmic parameters of prosody, using drawing/writing hand gestures with a stylus and a graphic tablet. These hand-controlled modulations are used to enhance the TTS output, producing speech that is more realistic, without degradation as it is directly applied to the vocoder parameters. Intonation and rhythm stylization using this interface brings significant improvement to the prototypicality of expressivity, as well as to the general quality of synthetic speech.These studies show that parametric statistical synthesis, combined with a chironomic interface, offers an efficient solution for expressive speech synthesis, as well as a powerful tool for the study of prosody
Baloul, Sofiane. "Développement d'un système automatique de synthèse de la parole à partir du texte arabe standard voyellé". Le Mans, 2003. http://cyberdoc.univ-lemans.fr/theses/2003/2003LEMA1015.pdf.
Texto completoThe work of this thesis is a contribution to the study and development of a voweled standard Arabic text-to-speech system based on the diphone. This contribution takes place at various levels of this system: construction of the acoustical database, syntax analysis, grapheme-phoneme conversion and generation of the prosody. The morpho-syntactic analysis implemented is based on a partial lexicon, the default tagging and the propagation of contextual deductions. It enables the segmentation of the text into non recursive chunks (intermediaries between the word and the sentence). The syntax-prosody interface enables the allocation of pauses and the generation of the prosodic parameters of pitch and duration. The whole treatments are integrated into the multilingual system of the Elan Speech Company
Zaki, Ahmed. "Modélisation de la prosodie pour la synthèse de la parole arabe standard à partir du texte". Bordeaux 1, 2004. http://www.theses.fr/2004BOR12913.
Texto completoBoula, de Mareüil Philippe. "Etude linguistique appliquee a la synthese de la parole a partir du texte". Paris 11, 1997. http://www.theses.fr/1997PA112371.
Texto completoLe, Maguer Sébastien. "Évaluation expérimentale d'un système statistique de synthèse de la parole, HTS, pour la langue française". Phd thesis, Université Rennes 1, 2013. http://tel.archives-ouvertes.fr/tel-00934060.
Texto completoMohamadi, Tayeb. "Synthèse à partir du texte de visages parlants : réalisation d'un prototype et mesures d'intelligibilité bimodale". Grenoble INPG, 1993. http://www.theses.fr/1993INPG0010.
Texto completoGibert, Guillaume. "Conception et évaluation d'un système de synthèse 3D de Langue française Parlée Complétée (LPC) à partir du texte". Phd thesis, Grenoble INPG, 2006. http://tel.archives-ouvertes.fr/tel-00203134.
Texto completoBaloul, Sofiane Baudry Marc. "Développement d'un système automatique de synthèse de la parole à partir du texte arabe standard voyellé". [S.l.] : [s.n.], 2003. http://cyberdoc.univ-lemans.fr/theses/2003/2003LEMA1015.pdf.
Texto completoEl, Kafi Jamal. "Contribution à la réalisation d'un système multilingue de synthèse de la parole à partir du texte autour d'un processeur spécialisé le TMS50C42". Bordeaux 1, 1990. http://www.theses.fr/1990BOR10512.
Texto completoKulkarni, Ajinkya. "Expressivity transfer in deep learning based text-to-speech synthesis". Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0122.
Texto completoRecently, text-to-speech (TTS) synthesis has gained immense success in the human-computer interaction domain. Current TTS systems are monotonous due to the absence of expressivity. Expressivity in speech generally refers to suprasegmental speech characteristics represented by emotions, speaking styles, and the relationship between speech and gestures, facial expressions, etc. It seems likely that expressive speech synthesis provides the ability to improve the user experience with machines greatly. The development of an expressive TTS system heavily relies on the speech data used in training the system. The thesis aims at developing an expressive TTS system in a speaker's voice for which only neutral speech data is available. The main focus of the thesis is to investigate deep learning approaches for exploring the disentanglement of speaker information and expressivity in a multispeaker TTS setting. The scope of the work incorporates expressivity as an emotion attribute with well-defined emotion classes. We present various deep neural network architectures to create latent representations of speaker and expressivity in multispeaker TTS settings. During the expressivity transfer phase, representations from expressivity and speaker are used to interpolate for synthesizing expressive speech in desired speaker's voice. We present a deep metric learning framework for improving the latent representation of expressivity in a multispeaker TTS system setting, which results in improved expressivity transfer. The thesis work also investigates the expressivity transfer capability of probability density estimation based on deep generative models. The usage of deep generative models provides scalable modeling of complex, high-dimensional speech data and tractability of the system, resulting in high-quality speech synthesis. The evaluation of the proposed systems is a challenging aspect of the thesis, as no reference expressive speech data was available in the target speaker's voice. Therefore, we propose two subjective evaluation metrics, speaker MOS and expressive MOS, which indicate the performance of the framework to transfer the expressivity and the retention of the target speaker's voice. As it is a time-consuming process to conduct a subjective evaluation each time system is developed, we propose a cosine similarity-based evaluation metric to measure the strength of expressivity and the speaker's voice. The obtained results demonstrate the ability of the proposed work to transfer the expressivity with maintaining the overall quality of synthesized expressive speech in the target speaker's voice. It is hard to identify which neural network parameters represent the attributes of speaker characteristics and expressivity. Moreover, expressivity and speaker characteristics are bounded aspects of prosody parameters
Maurel, Fabrice. "Transmodalité et multimodalité écrit/oral : modélisation, traitement automatique et évaluation de stratégies de présentation des structures "visuo-architecturale" des textes". Toulouse 3, 2004. http://www.theses.fr/2004TOU30256.
Texto completoWe are interested in the utility and, if the need arises, the usability of texts visual structure, within the framework of their oral transposition. We propose the synoptic of an oralisation system who leads to a text representation directly interpretable by Text-To-Speech systems. We partially realized the module specific to the oralisation strategies, in order to render some signifying parts of the text often “forgotten” by synthesis systems. The first results of this study led to specifications in the course of integration by an industrial partner. Predictive hypothesis, related to the impact on memorizing/understanding of two strategies coming from our Reformulation-based Oralisation Model for Texts Written to be Silently Read (MORTELS), have been formulated and tested. This work shows that cognitive functions was lost. Prototypes, exploiting the “Page Reflection” notion, have been conceived through interfaces in which multimodality is used to fill this gaps
Le, Goff Bertrand. "Synthèse à partir du texte de visage 3D parlant français". Grenoble INPG, 1997. http://www.theses.fr/1997INPG0140.
Texto completoDi, Cristo Philippe. "Génération automatique de la prosodie pour la synthèse à partir du texte". Aix-Marseille 1, 1998. http://www.theses.fr/1998AIX11050.
Texto completoNicolas, Pascale. "Contribution de la prosodie à l'amélioration de la parole de synthèse : cas du texte lu en français". Aix-Marseille 1, 1995. http://www.theses.fr/1995AIX10053.
Texto completoIn this study, we present an analysis of the intonative organisation of the read text in french. We note the rarity of studies in french which aim at describing the intonative organisation above the sentence in the reading of the text. It is however important to look for and to model the effects for a structure larger than that of the sentence, for two reas ons : firstly to increase the prosodic knowledge for continuous speech as distinct from that for isolated sentences, secondly to improve the generation of a synthetic intonation pattern for text-to-speech system. In the first part of the study, a definition of the concept of "text" is proposed, determining the prosodic specifications of the text with respect to the other linguistic activities, and this concept is applied to the speech domain. The integrationof the prosodic component into different text-to-speech systems is discussed. Finally, the variou s problems concerning the transcription of intonation are reviewed. In the second part of this study, it is shown that the largest unit with regard to the intonative structure of a text coincides with the paragraph rather than with the whole text. The analysis of a text read by for speakers leads to a modification of the transcription system used in this study to account for the observed phenomena. The temporal aspects linking the segmental string tot he intonation curve are also examined in a search for the simplest method to predict the influence of fundamental frequency height on the vocalic duration of the analysed text
Cotto, Daniel. "Traitement automatique des textes en vue de la synthèse vocale". Toulouse 3, 1992. http://www.theses.fr/1992TOU30225.
Texto completoTournemire, Stéphanie de. "Identification et génération automatique de contours prosodiques pour la synthèse vocale à partir du texte en français". Paris, ENST, 1998. http://www.theses.fr/1998ENST0017.
Texto completoLescop, Cyrille. "Synthèse d'analogues de nucléosides bicycliques et d'analogues de la 2,4-méthanoproline à partir de cyclobutènes : Texte imprimé". Le Mans, 2000. http://www.theses.fr/2000LEMA1008.
Texto completoThis work deals with the synthesis of two new types of bicyclic nucleoside analogues and of three analogues of 2,4-methanoproline from a common starting material : cis-cyclobut-3-en-1,2-dicarboxylic anhydride. A new synthesis of this anhydride, easy and safer than the traditional method, is described. This compound is available, in two steps, by photochemical [2+2] cycloaddition between trans 1,2-dichloroethene and maleic anhydride followed by a reductive chlorine elimination with activated zinc. Bicyclic nucleoside analogues with a [3. 3. 0]-fused y-butyrolactone moiety have been prepared. The key step involves a stereoselective rearrangement of two epoxycyclobutanes, in aqueous medium, which leads to a bicyclic lactol wirh a 3-oxo-2,7-dioxabicyclo[3. 3. 0]octane skeleton. A second type of bicyclic nucleosides with a fused cyclopropane ring substituted by a hydroxymethyl group has been synthesised from a bicyclic hydroxylactol. The 3-oxabicyclo[3. 1. 0]hexane structure is obtained from two convergent routes involving two ring contractions, one in acidic medium and the other one in hydride medium. For both categories of nucleosides, regio- and stereochemistry of the nucleobase condensation have been elucidated by NMR studies, using 1D and 2D experiments. Finally, three analogues of a natural bicyclic amino acid, 2,4-methanoproline, have been prepared. The synthesis of the unusual 2-azabicyclo[2. 1. 1]hexane skeleton involves a stereoselective electrophilic addition of phenylselenyl bromide ro a symmetrical cyclobutene with nitrogen substituents followed by an intramolecular nucleophilic substitution. Preparation of these three different types of bicyclic compounds from cis-cyclobut-3-en-1,2-dicarboxylic anhydride shows the importance of cyclobutenes in organic synthesis
Blin, Laurent. "Apprentissage de structures d'arbres à partir d'exemples ; application à la prosodie pour la synthèse de la parole". Rennes 1, 2002. http://www.theses.fr/2002REN10117.
Texto completoDoukhan, David. "Synthèse de parole expressive au delà du niveau de la phrase : le cas du conte pour enfant : conception et analyse de corpus de contes pour la synthèse de parole expressive". Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112165/document.
Texto completoThe aim of this thesis is to propose ways to improve the expressiveness of speech synthesis systems. One of the central propositions of this work is to define, use and measure the impact of linguistic structures operating beyond the sentence level, as opposed to approaches operating on sentences out of their context. The scope of the study is restricted to the case of storytelling for children. The stories have the distinction of having been the subject of a number of studies in order to highlight a narrative structure and involve a number of stereotypical characters (hero, villain, fairy) whose speech is often reported. These special features are used to model the prosodic properties tales beyond the sentence level. The oral transmission of tales was often associated with musical practice (vocals, instruments) and their reading is associated with rich melodic properties including reproduction remains a challenge for modern speech synthesizers. To address these issues, a first corpus of written tales is collected and annotated with information about the narrative structure of stories, identification and allocation of direct quotations, referencing references to characters as well as named entities and enumerations areas. The corpus analyzed is described in terms of coverage and inter-annotator agreement. It is used to model systems segmentation tales episode, detection of direct quotes, dialogue acts and modes of communication. A second corpus of stories read by a professional speaker is presented. The word is aligned with the lexical and phonetic transcriptions, annotations of the corpus text and meta-information describing the characteristics of the characters involved in the story. The relationship between linguistic annotations and prosodic properties observed in the speech corpus are described and modeled. Finally, a prototype control expressive synthesizer parameters by Acapela unit selection is made. The prototype generates prosodic operating instructions beyond the sentence level, including using the information related to the structure of the story and the distinction between direct speech and reported speech. Prototype validation control is performed through a perceptual experience, which shows a significant improvement in the quality of the synthesis
Garnier-Rizet, Martine. "Élaboration d'un module de règles phonético-acoustiques pour un système de synthèse à partir du texte pour le français". Paris 3, 1994. http://www.theses.fr/1994PA030146.
Texto completoThe purpose of this work is the elaboration of a rule-based module for a text-to-speech synthesizer for french. Speech synthesis has to deal with one of the main aspects of speech : speech is a continuum that is usually divided into units. The nature and complexity of these units are different depending on the level of description we work at. The input of the segmental module is a stream of phonetic units. When in isolation, the phonetic unit is instanciated in the vocal tract by a phonetic gesture, that is the canonical form. The acoustical result is a "target" with specific spectral values. In continuous speech, there is a temporal overlap in the succession of gestures which instanciate the segments. At the acoustic level, the gesturalinteraction inducts spectral modifications which operate on the target values. The elaboration of the module starts with the analysis of a large natural speech data base from a single speaker. First, the target values are extracted from the data base, for all the phonemes. They characterize the speaker. The coarticulation phenomena are then modeled by bontext-sensitive rules, at the acoustic level. This study is concerned by some major aspects of speech synthesis by rules. For example : the validity of a corpus with constraints ; the search for an interface between different levels of description ; the use of acoustic features for writing rules ; the intelligibility and quality of the synthesis obtained. This study has been carried out within the polyglot, esprit project 1024 "a multilingual text-tospeech and speech-to-text system. The aim of polyglot was to build up a multilingual text-to-speech system for six european languages
Essien, Akpan Jimmy. "Contribution à la recherche sur la perception des tons du yoruba : évidences expérimentales à partir des tambours, des signaux de la parole et la synthèse". Paris 3, 2000. http://www.theses.fr/2000PA030077.
Texto completoBusset, Julie. "Inversion acoustique articulatoire à partir de coefficients cepstraux". Electronic Thesis or Diss., Université de Lorraine, 2013. http://www.theses.fr/2013LORR0027.
Texto completoThe acoustic-to-articulatory inversion of speech consist in the recovery of the vocal tract shape from the speech signal. This problem is tackled with an analysis-by-synthesis method depending on a physical model of speech production controlled by a small number of parameters describing the vocal tract shape: the jaw opening, the shape and the position of the tongue and the position of lips and larynx. In order to approach the geometry of the speaker, the articulatory model is built with articulatory contours from cineradiographic images of the sagittal view of the vocal tract. This articulatory synthesizer allows us to create a table made up with couples associating a articulatory vector with the corresponding acoustic vector. The formants (resonance frequency of the vocal tract shape) are not used as acoustic vector because their extraction is not always reliable causing errors during inversion. The cepstral coefficients are used as acoustic vector. Moreover, the source effect and the mismatch between the speaker vocal tract and the articulatory model are considered explicitly comparing the natural spectrum with those produced by the synthesizer because we have the both signals
Busset, Julie. "Inversion acoustique articulatoire à partir de coefficients cepstraux". Phd thesis, Université de Lorraine, 2013. http://tel.archives-ouvertes.fr/tel-00838913.
Texto completo