Dissertationen zum Thema „Traitement Automatique des Langues cliniques“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit Top-50 Dissertationen für die Forschung zum Thema "Traitement Automatique des Langues cliniques" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Sehen Sie die Dissertationen für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.
Grouin, Cyril. „Anonymisation de documents cliniques : performances et limites des méthodes symboliques et par apprentissage statistique“. Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00848672.
Der volle Inhalt der QuelleBannour, Nesrine. „Information Extraction from Electronic Health Records : Studies on temporal ordering, privacy and environmental impact“. Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG082.
Der volle Inhalt der QuelleAutomatically extracting rich information contained in Electronic Health Records (EHRs) is crucial to improve clinical research. However, most of this information is in the form of unstructured text.The complexity and the sensitive nature of clinical text involve further challenges. As a result, sharing data is difficult in practice and is governed by regulations. Neural-based models showed impressive results for Information Extraction, but they need significant amounts of manually annotated data, which is often limited, particularly for non-English languages. Thus, the performance is still not ideal for practical use. In addition to privacy issues, using deep learning models has a significant environmental impact.In this thesis, we develop methods and resources for clinical Named Entity Recognition (NER) and Temporal Relation Extraction (TRE) in French clinical narratives.Specifically, we propose a privacy-preserving mimic models architecture by exploring the mimic learning approach to enable knowledge transfer through a teacher model trained on a private corpus to a student model. This student model could be publicly shared without disclosing the original sensitive data or the private teacher model on which it was trained. Our strategy offers a good compromise between performance and data privacy preservation.Then, we introduce a novel event- and task-independent representation of temporal relations. Our representation enables identifying homogeneous text portions from a temporal standpoint and classifying the relation between each text portion and the document creation time. This makes the annotation and extraction of temporal relations easier and reproducible through different event types, as no prior definition and extraction of events is required.Finally, we conduct a comparative analysis of existing tools for measuring the carbon emissions of NLP models. We adopt one of the studied tools to calculate the carbon footprint of all our created models during the thesis, as we consider it a first step toward increasing awareness and control of their environmental impact.To summarize, we generate shareable privacy-preserving NER models that clinicians can efficiently use. We also demonstrate that the TRE task may be tackled independently of the application domain and that good results can be obtained using real-world oncology clinical notes
Tirilly, Pierre. „Traitement automatique des langues pour l'indexation d'images“. Phd thesis, Université Rennes 1, 2010. http://tel.archives-ouvertes.fr/tel-00516422.
Der volle Inhalt der QuelleTirilly, Pierre. „Traitement automatique des langues pour l'indexation d'images“. Phd thesis, Rennes 1, 2010. http://www.theses.fr/2010REN1S045.
Der volle Inhalt der QuelleIn this thesis, we propose to integrate natural language processing (NLP) techniques in image indexing systems. We first address the issue of describing the visual content of images. We rely on the visual word-based image description, which raises problems that are well known in the text indexing field. First, we study various NLP methods (weighting schemes and stop-lists) to automatically determine which visual words are relevant to describe the images. Then we use language models to take account of some geometrical relations between the visual words. We also address the issue of describing the semantic content of images: we propose an image annotation scheme that relies on extracting relevant named entities from texts coming with the images to annotate
Colin, Émilie. „Traitement automatique des langues et génération automatique d'exercices de grammaire“. Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0059.
Der volle Inhalt der QuelleOur perspectives are educational, to create grammar exercises for French. Paraphrasing is an operation of reformulation. Our work tends to attest that sequence-to-sequence models are not simple repeaters but can learn syntax. First, by combining various models, we have shown that the representation of information in multiple forms (using formal data (RDF), coupled with text to extend or reduce it, or only text) allows us to exploit a corpus from different angles, increasing the diversity of outputs, exploiting the syntactic levers put in place. We also addressed a recurrent problem, that of data quality, and obtained paraphrases with a high syntactic adequacy (up to 98% coverage of the demand) and a very good linguistic level. We obtain up to 83.97 points of BLEU-4*, 78.41 more than our baseline average, without syntax leverage. This rate indicates a better control of the outputs, which are varied and of good quality in the absence of syntax leverage. Our idea was to be able to work from raw text : to produce a representation of its meaning. The transition to French text was also an imperative for us. Working from plain text, by automating the procedures, allowed us to create a corpus of more than 450,000 sentence/representation pairs, thanks to which we learned to generate massively correct texts (92% on qualitative validation). Anonymizing everything that is not functional contributed significantly to the quality of the results (68.31 of BLEU, i.e. +3.96 compared to the baseline, which was the generation of text from non-anonymized data). This second work can be applied the integration of a syntax lever guiding the outputs. What was our baseline at time 1 (generate without constraint) would then be combined with a constrained model. By applying an error search, this would allow the constitution of a silver base associating representations to texts. This base could then be multiplied by a reapplication of a generation under constraint, and thus achieve the applied objective of the thesis. The formal representation of information in a language-specific framework is a challenging task. This thesis offers some ideas on how to automate this operation. Moreover, we were only able to process relatively short sentences. The use of more recent neural modelswould likely improve the results. The use of appropriate output strokes would allow for extensive checks. *BLEU : quality of a text (scale from 0 (worst) to 100 (best), Papineni et al. (2002))
Dary, Franck. „Modèles incrémentaux pour le traitement automatique des langues“. Electronic Thesis or Diss., Aix-Marseille, 2022. http://www.theses.fr/2022AIXM0248.
Der volle Inhalt der QuelleThis thesis is about natural language processing, and more specifically concerns the prediction of the syntactic-morphological structure of sentences.This is the matter of segmenting a text into sentences and then into words and associating to each word a part of speech and morphological features and then linking the words to make the syntactic structure explicit.The thesis proposes a predictive model that performs these tasks simultaneously and in an incremental fashion: the text is read character by character and the entire linguistic predictions are updated by the information brought by each new character.The reason why we have explored this architecture is the will to be inspired by human reading which imposes these two constraints.From an experimental point of view, we compute the correlation between eye-tracking variables measured on human subjects and complexity metrics specific to our model.Moreover, we propose a backtracking mechanism, inspired by the regressive saccades observed in humans. To this end, we use reinforcement learning, which allows the model to perform backtracking when it reaches a dead end
Denoual, Etienne. „Méthodes en caractères pour le traitement automatique des langues“. Phd thesis, Université Joseph Fourier (Grenoble), 2006. http://tel.archives-ouvertes.fr/tel-00107056.
Der volle Inhalt der QuelleLe présent travail promeut l'utilisation de méthodes travaillant au niveau du signal de l'écrit: le caractère, unité immédiatement accessible dans toute langue informatisée, permet de se passer de segmentation en mots, étape actuellement incontournable pour des langues comme le chinois ou le japonais.
Dans un premier temps, nous transposons et appliquons en caractères une méthode bien établie d'évaluation objective de la traduction automatique, BLEU.
Les résultats encourageants nous permettent dans un deuxième temps d'aborder d'autres tâches de traitement des données linguistiques. Tout d'abord, le filtrage de la grammaticalité; ensuite, la caractérisation de la similarité et de l'homogénéité des ressources linguistiques. Dans toutes ces tâches, le traitement en caractères obtient des résultats acceptables, et comparables à ceux obtenus en mots.
Dans un troisième temps, nous abordons des tâches de production de données linguistiques: le calcul analogique sur les chaines de caractères permet la production de paraphrases aussi bien que la traduction automatique.
Ce travail montre qu'on peut construire un système complet de traduction automatique ne nécessitant pas de segmentation, a fortiori pour traiter des langues sans séparateur orthographique.
Pellegrino, François. „Une approche phonétique en identification automatique des langues“. Toulouse 3, 1998. http://www.theses.fr/1998TOU30294.
Der volle Inhalt der QuelleMoreau, Fabienne. „Revisiter le couplage traitement automatique des langues et recherche d'information“. Phd thesis, Université Rennes 1, 2006. http://tel.archives-ouvertes.fr/tel-00524514.
Der volle Inhalt der QuelleBardet, Adrien. „Architectures neuronales multilingues pour le traitement automatique des langues naturelles“. Thesis, Le Mans, 2021. http://www.theses.fr/2021LEMA1002.
Der volle Inhalt der QuelleThe translation of languages has become an essential need for communication between humans in a world where the possibilities of communication are expanding. Machine translation is a response to this evolving need. More recently, neural machine translation has come to the fore with the great performance of neural systems, opening up a new area of machine learning. Neural systems use large amounts of data to learn how to perform a task automatically. In the context of machine translation, the sometimes large amounts of data needed to learn efficient systems are not always available for all languages.The use of multilingual systems is one solution to this problem. Multilingual machine translation systems make it possible to translate several languages within the same system. They allow languages with little data to be learned alongside languages with more data, thus improving the performance of the translation system. This thesis focuses on multilingual machine translation approaches to improve performance for languages with limited data. I have worked on several multilingual translation approaches based on different transfer techniques between languages. The different approaches proposed, as well as additional analyses, have revealed the impact of the relevant criteria for transfer. They also show the importance, sometimes neglected, of the balance of languages within multilingual approaches
Moreau, Fabienne Sébillot Pascale. „Revisiter le couplage traitement automatique des langues et recherche d'information“. [S.l.] : [s.n.], 2006. ftp://ftp.irisa.fr/techreports/theses/2006/moreau.pdf.
Der volle Inhalt der QuelleManad, Otman. „Nettoyage de corpus web pour le traitement automatique des langues“. Thesis, Paris 8, 2018. http://www.theses.fr/2018PA080011.
Der volle Inhalt der QuelleCorpora are the main material of computer linguistics and natural language processing. Not many languages have corpora made from web resources (forums, blogs, etc.), even those that do not have other resources. Web resources contain lots of noise (menus, ads, etc.). Filtering boilerplate and repetitive data requires a large-scale manual cleaning by the researcher.This thesis presents an automatic system that construct web corpus with a low level of noise.It consists of three modules : (a) one for building corpora in any language and any type of data, intended to be collaborative and preserving corpus history; (b) one for crawling web forums and blogs; (c) one for extracting relevant data using clustering techniques with different distances, from the structure of web page.The system is evaluated in terms of the efficacy of noise filtering and of computing time. Our experiments, made on four languages, are evaluated using our own gold standard corpus. To measure quality, we use recall, precision and F-measure. Feature-distance and Jaro distance give the best results, but not in the same contexts, feature-distance having the best average quality.We compare our method with three methods dealing with the same problem, Nutch, BootCat and JusText. The performance of our system is better as regards the extraction quality, even if for computing time, Nutch and BootCat dominate
Vasilescu, Ioana Gabriela. „Contribution à l'identification automatique des langues romanes“. Lyon 2, 2001. http://theses.univ-lyon2.fr/documents/lyon2/2001/vasilescu_ig.
Der volle Inhalt der QuelleThis work deals with the automatic identification of Romance Languages. The aim of our study is to provide linguistic patterns potentially robust for the discrimination of 5 languages from the latin family (i. E. , Spanish, French, Italian, Portuguese and Romanian). The Romance Languages have the advantage of a secular linguistic tradition and represents official languages in several countries of the world, the study of the taxonomist approaches devoted to this linguistic family shows a spécial relevance of the typological classification. More precisely, the vocalic patterns provide relevant criteria for a division of the five idioms in two groups, according to the complexity of each Romance vocalic system : italian, Spanish vs. Romanian, French, Portuguese. The first group includes languages with prototypical vocalic systems, whereas the second group, languages with complex vocalic systems in terms of number of oppositions. In addition to the vocalic criteria, these hierarchy is supported by consonantal and prosodic particularities. We conducted two experimental paradigms to test the correspondence between the perceptual patterns used by nai͏̈f listeners to differentiate the Romance languages and the linguistic patterns employed by the typological classification. A first series of discrimination experiments on four groups of subjects, selected according to the criterion [+/- Romance native language] (i. E. , French, Romanian vs. Japanese, Americans), showed different perceptual strategies related both to the native language and to the familiarity with the Romance languages. The linguistic strategies lead to a macro-discrimination of the languages in two groups similar to those obtained via the typological taxonomy based on vocalic particularities (i. E. , Spanish, Italian vs. Romanian, French, Portuguese). The second series of perceptual experiments on two groups of subjects (French and American) consisted in the evaluation of the acoustic similarity of the have languages. The results confirmed the division of Romance Languages in the same two groups as those via the discrimination experiments. We concluded that the vocalic patterns may be a robust clue for the discrimination of the Latin idioms into two major linguistic groups : Italian, Spanish vs. Romanian, French, Portuguese
Vasilescu, Ioana Gabriela Hombert Jean-Marie. „Contribution à l'identification automatique des langues romanes“. [S.l.] : [s.n.], 2001. http://demeter.univ-lyon2.fr:8080/sdx/theses/lyon2/2001/vasilescu_ig.
Der volle Inhalt der QuelleBouamor, Houda. „Etude de la paraphrase sous-phrastique en traitement automatique des langues“. Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00717702.
Der volle Inhalt der QuelleFilhol, Michael. „Modèle descriptif des signes pour un traitement automatique des langues des signes“. Phd thesis, Université Paris Sud - Paris XI, 2008. http://tel.archives-ouvertes.fr/tel-00300591.
Der volle Inhalt der QuelleDégremont, Jean-François. „Ethnométhodologie et innovation technologique : le cas du traitement automatique des langues naturelles“. Paris 7, 1989. http://www.theses.fr/1989PA070043.
Der volle Inhalt der QuelleThe thesis begins with a short historical reminder of ethnomethodology, considered as a scientific field, since the whole beginners during the 30's until the 1967 explosion in US and Europe. The first part is an explication of the main concepts of ethnomethodology. They are developped from the pariseptist school theoretical point of view, which tries to associate the strongest refuse of inductions and the indifference principle, mainly when natural languages, considered as well as studies objects and communication tools, are used. The second part of the thesis is devoted to the concrete application of these theoretical concepts in the field of technological strategies which have been elaborated in France in the area of natural language processing. Three studies successively describe the ethnomethods and rational properties of practical activities which are used in an administrative team, the elaboration of a technology policy and indexical descriptions of the language industry field. The conclusion tries to show how the concepts and methods developped by ethnomethodology can increase, in this field, the efficacy of strategical analysis and the quality of research and development programs
Millour, Alice. „Myriadisation de ressources linguistiques pour le traitement automatique de langues non standardisées“. Thesis, Sorbonne université, 2020. http://www.theses.fr/2020SORUL126.
Der volle Inhalt der QuelleCitizen science, in particular voluntary crowdsourcing, represents a little experimented solution to produce language resources for some languages which are still little resourced despite the presence of sufficient speakers online. We present in this work the experiments we have led to enable the crowdsourcing of linguistic resources for the development of automatic part-of-speech annotation tools. We have applied the methodology to three non-standardised languages, namely Alsatian, Guadeloupean Creole and Mauritian Creole. For different historical reasons, multiple (ortho)-graphic practices coexist for these three languages. The difficulties encountered by the presence of this variation phenomenon led us to propose various crowdsourcing tasks that allow the collection of raw corpora, part-of-speech annotations, and graphic variants. The intrinsic and extrinsic analysis of these resources, used for the development of automatic annotation tools, show the interest of using crowdsourcing in a non-standardized linguistic framework: the participants are not seen in this context a uniform set of contributors whose cumulative efforts allow the completion of a particular task, but rather as a set of holders of complementary knowledge. The resources they collectively produce make possible the development of tools that embrace the variation.The platforms developed, the language resources, as well as the models of trained taggers are freely available
Shen, Ying. „Élaboration d'ontologies médicales pour une approche multi-agents d'aide à la décision clinique“. Thesis, Paris 10, 2015. http://www.theses.fr/2015PA100040/document.
Der volle Inhalt der QuelleThe combination of semantic processing of knowledge and modelling steps of reasoning employed in the clinical field offers exciting and necessary opportunities to develop ontologies relevant to the practice of medicine. In this context, multiple medical databases such as MEDLINE, PubMed are valuable tools but not sufficient because they cannot acquire the usable knowledge easily in a clinical approach. Indeed, abundance of inappropriate quotations constitutes the noise and requires a tedious sort incompatible with the practice of medicine.In an iterative process, the objective is to build an approach as automated as possible, the reusable medical knowledge bases is founded on an ontology of the concerned fields. In this thesis, the author will develop a series of tools for knowledge acquisition combining the linguistic analysis operators and clinical modelling based on the implemented knowledge typology and an implementation of different forms of employed reasoning. Knowledge is not limited to the information from data, but also and especially on the cognitive operators of reasoning for making them operational in the context relevant to the practitioner.A multi-agent system enables the integration and cooperation of the various modules used in the development of a medical ontology.The data sources are from medical databases such as MEDLINE, the citations retrieved by PubMed, and the concepts and vocabulary from the Unified Medical Language System (UMLS).Regarding the scope of produced knowledge bases, the research concerns the entire clinical process: diagnosis, prognosis, treatment, and therapeutic monitoring of various diseases in a given medical field.It is essential to identify the different approaches and the works already done.Different paradigms will be explored: 1) Evidence Based Medicine. An index can be defined as a sign related to its mode of implementation; 2) Case-based reasoning, which based on the analogy of clinical situations already encountered; 3) The different semantic approaches which are used to implement ontologies.On the whole, we worked on logical aspects related to cognitive operators of used reasoning, and we organized the cooperation and integration of exploited knowledge during the various stages of the clinical process (diagnosis, prognosis, treatment, therapeutic monitoring). This integration is based on a SMAAD: multi-agent system for decision support
Bourgeade, Tom. „Interprétabilité a priori et explicabilité a posteriori dans le traitement automatique des langues“. Thesis, Toulouse 3, 2022. http://www.theses.fr/2022TOU30063.
Der volle Inhalt der QuelleWith the advent of Transformer architectures in Natural Language Processing a few years ago, we have observed unprecedented progress in various text classification or generation tasks. However, the explosion in the number of parameters, and the complexity of these state-of-the-art blackbox models, is making ever more apparent the now urgent need for transparency in machine learning approaches. The ability to explain, interpret, and understand algorithmic decisions will become paramount as computer models start becoming more and more present in our everyday lives. Using eXplainable AI (XAI) methods, we can for example diagnose dataset biases, spurious correlations which can ultimately taint the training process of models, leading them to learn undesirable shortcuts, which could lead to unfair, incomprehensible, or even risky algorithmic decisions. These failure modes of AI, may ultimately erode the trust humans may have otherwise placed in beneficial applications. In this work, we more specifically explore two major aspects of XAI, in the context of Natural Language Processing tasks and models: in the first part, we approach the subject of intrinsic interpretability, which encompasses all methods which are inherently easy to produce explanations for. In particular, we focus on word embedding representations, which are an essential component of practically all NLP architectures, allowing these mathematical models to process human language in a more semantically-rich way. Unfortunately, many of the models which generate these representations, produce them in a way which is not interpretable by humans. To address this problem, we experiment with the construction and usage of Interpretable Word Embedding models, which attempt to correct this issue, by using constraints which enforce interpretability on these representations. We then make use of these, in a simple but effective novel setup, to attempt to detect lexical correlations, spurious or otherwise, in some popular NLP datasets. In the second part, we explore post-hoc explainability methods, which can target already trained models, and attempt to extract various forms of explanations of their decisions. These can range from diagnosing which parts of an input were the most relevant to a particular decision, to generating adversarial examples, which are carefully crafted to help reveal weaknesses in a model. We explore a novel type of approach, in parts allowed by the highly-performant but opaque recent Transformer architectures: instead of using a separate method to produce explanations of a model's decisions, we design and fine-tune an architecture which jointly learns to both perform its task, while also producing free-form Natural Language Explanations of its own outputs. We evaluate our approach on a large-scale dataset annotated with human explanations, and qualitatively judge some of our approach's machine-generated explanations
Mauger, Serge. „L'interpretation des messages enigmatiques. Essai de semantique et de traitement automatique des langues“. Caen, 1999. http://www.theses.fr/1999CAEN1255.
Der volle Inhalt der QuelleOedipus, the character in sophocle's tragedy, solves the sphinx's enigma by + his own intelligence ;. This is the starting point of a general reflection on the linguistic status of language games, the practice of which could be seen throughout all periods and in all cultures. Oedipus's intelligence is based on a capacity for + calculating ; the interpretation of the enigma by giving up inductive reasoning (by recurrence) so as to adopt analogical reasoning instead. In the second part, it is shown that the calculation of the meaning of the polysemous messages enables us to propose a pattern of a combinatory analysis which is a tool for the automatic treatment of language (atl), able to help calculate riddles and to interpret coded definitions of crosswords. This pattern is used as a touchstone for an analysis of the semantic structures underlying interpretations and shows which lexical items are concerned by isotopy. Isotopy is not in that case considered to be an element of the message but a process of the interpretation. The whole approach is then based on interpretative semantics. The third part is the developement of the reflection including the treatment of enigmatic messages in the issues of the man-machine dialogue (mmd) which enables us to deal with the ambiguities of some utterances and is able to understand + strange messages ; on the basis of propositions of interpretation extrapolated from the pattern. Then little by little we analyse the calculation of the one who gets messages like an activity which consists in analysing graphematic and acoustic signs. Taking the signs into account is a confrontation with what is expected in the linguistic system and it enables us to carry out a series of decisions leading to the identification of a coherent analysis. This coherence and the analysis are compared to the approach adopted when + reading ; an anamorphosis (in art painting) or when decoding the organisation rules in suites of cards in eleusis' game. We find a similar approach when we have to interpret the + scriptio continua ; on paleographic inscriptions, the technique of which serves as a basis for some literary experiences under duress and for hidden puns
Dubé, Martine. „Étude terminologique et analyse des modes de formation de 50 notions sur le traitement automatique des langues naturelles /“. Thèse, Québec : Université Laval, École des gradués, 1990. http://theses.uqac.ca.
Der volle Inhalt der Quelle"Mémoire présenté pour l'obtention du grade maître es arts (M.A.) dans le cadre d'une entente entre l'Université Laval et l'Université du Québec à Chicoutimi" CaQCU Bibliogr.: f. 137-141. Document électronique également accessible en format PDF. CaQCU
Knyazeva, Elena. „Apprendre par imitation : applications à quelques problèmes d'apprentissage structuré en traitement des langues“. Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS134/document.
Der volle Inhalt der QuelleStructured learning has become ubiquitousin Natural Language Processing; a multitude ofapplications, such as personal assistants, machinetranslation and speech recognition, to name just afew, rely on such techniques. The structured learningproblems that must now be solved are becomingincreasingly more complex and require an increasingamount of information at different linguisticlevels (morphological, syntactic, etc.). It is thereforecrucial to find the best trade-off between the degreeof modelling detail and the exactitude of the inferencealgorithm. Imitation learning aims to perform approximatelearning and inference in order to better exploitricher dependency structures. In this thesis, we explorethe use of this specific learning setting, in particularusing the SEARN algorithm, both from a theoreticalperspective and in terms of the practical applicationsto Natural Language Processing tasks, especiallyto complex tasks such as machine translation.Concerning the theoretical aspects, we introduce aunified framework for different imitation learning algorithmfamilies, allowing us to review and simplifythe convergence properties of the algorithms. With regardsto the more practical application of our work, weuse imitation learning first to experiment with free ordersequence labelling and secondly to explore twostepdecoding strategies for machine translation
Hamon, Olivier. „Vers une architecture générique et pérenne pour l'évaluation en traitement automatique des langues : spécifications, méthodologies et mesures“. Paris 13, 2010. http://www.theses.fr/2010PA132022.
Der volle Inhalt der QuelleThe development of Natural Language Processing (NLP) systems needs to determine the quality of their results. Whether aiming to compare several systems to each other or to identify both the strong and weak points of an isolated system, evaluation implies defining precisely and for each particular context a methodology, a protocol, language ressources (data needed for both system training and testing) and even evaluation measures and metrics. It is following these conditions that system improvement is possible so as to obtain more reliable and easy-to-exploit results. The contribution of evaluation to NLP is important due to the creation of new language resources, the homogenisation of formats for those data used or the promotion of a technology or a system. However, evaluation requires considerable manual work, whether to formulate human judgments or to manage the evaluation procedure. This compromises the evaluation’s reliability, increases costs and makes it harder to reproduce. We have tried to reduce and delimit those manual interventions. To do so, we have supported our work by either conducting or participating in evaluation campaigns where systems are compared to each other or where isolated systems are evaluated. The management of the evaluation procedure has been formalised in this work and its different phases have been listed so as to define a common evaluation framework, understandable by all. The main point of those evaluation phases regards quality measurement through the usage of metrics. Three consecutive studies have been carried out on human measures, automatic measures and the automation of quality computation, and the meta-evaluation of the mesures so as to evaluate their reliability. Moreover, evaluation measures use language resources whose practical and administrative aspects must be taken into account. Among these, we have their creation, standarisation, validation, impact on the results, costs of production and usage, identification and legal issues. In that context, the study of the similarities between the technologies and between their evaluations has allowed us to highlight their common features and class them. This has helped us to show that a small set of measures allows to cover a wide range of applications for different technologies. Our final goal has been to define a generic evaluation architecture, which is adaptable to different NLP technologies, and sustainable, namely allowing to reuse language resources, measures or methods over time. Our proposal has been built on the conclusions drawn fromprevious steps, with the objective of integrating the evaluation phases to our architecture and incorporating the evaluation measures, all of which bearing in mind the place of language resource usage. The definition of this architecture has been done with the aim of fully automating the evaluation management work, regardless of whether this concerns an evaluation campaign or the evaluation of an isolated system. Following initial experiments, we have designed an evaluation architecture taking into account all the constraints found as well as using Web services. These latter provide the means to interconnect architecture components and grant them accessible through the Internet
Dimon, Pierre. „Un système multilingual d'interprétation automatique : étape du sous-logiciel "analyse" pour les langues germaniques“. Metz, 1994. http://docnum.univ-lorraine.fr/public/UPV-M/Theses/1994/Dimon.Pierre.LMZ945_1.pdf.
Der volle Inhalt der QuelleIn part one of the thesis, the reader is reminded first of all the language models underlying grammars from which the systems of automatic processing of languages borrow, and second of the computing aids that make applications possible. A vast survey of the machine translation and computer-assisted translation systems incepted since the early beginnings up to 1991 illustrates the developments in connection with translating. In counterpart to the limits offered by the present systems, in part 2 of this thesis, another path is laid down, whose basis is the following hypothesis : is it possible to a minimum the quality of the target-text for a reader - a specialist of the area who, however, is not familiar with the language of the source-text-, to recreate its meaning through implicit comprehension? Hyperanalysis applies to the whole of the text. The local hypersyntactic module explores everything that introduces an object, defines it, names it (
Charnois, Thierry. „Accès à l'information : vers une hybridation fouille de données et traitement automatique des langues“. Habilitation à diriger des recherches, Université de Caen, 2011. http://tel.archives-ouvertes.fr/tel-00657919.
Der volle Inhalt der QuellePoibeau, Thierry. „Extraction d'information à base de connaissances hybrides“. Paris 13, 2002. http://www.theses.fr/2002PA132001.
Der volle Inhalt der QuelleStroppa, Nicolas. „Définitions et caractérisations de modèles à base d'analogies pour l'apprentissage automatique des langues naturelles /“. Paris : École nationale supérieure des télécommunications, 2006. http://catalogue.bnf.fr/ark:/12148/cb40129220d.
Der volle Inhalt der QuelleBeust, Pierre. „Pour une démarche centrée sur l'utilisateur dans les ENT. Apport au Traitement Automatique des Langues“. Habilitation à diriger des recherches, Université de Caen, 2013. http://tel.archives-ouvertes.fr/tel-01070522.
Der volle Inhalt der QuelleKirman, Jerome. „Mise au point d'un formalisme syntaxique de haut niveau pour le traitement automatique des langues“. Thesis, Bordeaux, 2015. http://www.theses.fr/2015BORD0330/document.
Der volle Inhalt der QuelleThe goal of computational linguistics is to provide a formal account linguistical knowledge, and to produce algorithmic tools for natural languageprocessing. Often, this is done in a so-called generative framework, where grammars describe sets of valid sentences by iteratively applying some set of rewrite rules. Another approach, based on model theory, describes instead grammaticality as a set of well-formedness logical constraints, relying on deep links between logic and automata in order to produce efficient parsers. This thesis favors the latter approach. Making use of several existing results in theoretical computer science, we propose a tool for linguistical description that is both expressive and designed to facilitate grammar engineering. It first tackles the abstract structure of sentences, providing a logical language based on lexical properties of words in order to concisely describe the set of grammaticaly valid sentences. It then draws the link between these abstract structures and their representations (both in syntax and semantics), through the use of linearization rules that rely on logic and lambda-calculus. Then in order to validate this proposal, we use it to model various linguistic phenomenas, ending with a specific focus on languages that include free word order phenomenas (that is, sentences which allow the free reordering of some of their words or syntagmas while keeping their meaning), and on their algorithmic complexity
Depain-Delmotte, Frédérique. „Proposition d'un modèle linguistique pour la résolution d'anaphores en vue du traitement automatique des langues“. Besançon, 2000. http://www.theses.fr/2000BESA1015.
Der volle Inhalt der QuelleNamer, Fiammetta. „Pronominalisation et effacement du sujet en génération automatique de textes en langues romanes“. Paris 7, 1990. http://www.theses.fr/1990PA077249.
Der volle Inhalt der QuelleOkabe, Shu. „Modèles faiblement supervisés pour la documentation automatique des langues“. Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG091.
Der volle Inhalt der QuelleIn the wake of the threat of extinction of half of the languages spoken today by the end of the century, language documentation is a field of linguistics notably dedicated to the recording, annotation, and archiving of data. In this context, computational language documentation aims to devise tools for linguists to ease several documentation steps through natural language processing approaches.As part of the CLD2025 computational language documentation project, this thesis focuses mainly on two tasks: word segmentation to identify word boundaries in an unsegmented transcription of a recorded sentence and automatic interlinear glossing to predict linguistic annotations for each sentence unit.For the first task, we improve the performance of the Bayesian non-parametric models used until now through weak supervision. For this purpose, we leverage realistically available resources during documentation, such as already-segmented sentences or dictionaries. Since we still observe an over-segmenting tendency in our models, we introduce a second segmentation level: the morphemes. Our experiments with various types of two-level segmentation models indicate a slight improvement in the segmentation quality. However, we also face limitations in differentiating words from morphemes, using statistical cues only. The second task concerns the generation of either grammatical or lexical glosses. As the latter cannot be predicted using training data solely, our statistical sequence-labelling model adapts the set of possible labels for each sentence and provides a competitive alternative to the most recent neural models
Ravaut, Frédéric. „Analyse automatique des manifestations faciales cliniques par techniques de traitement d'images : application aux manifestations de l'épilepsie“. Paris 5, 1999. http://www.theses.fr/1999PA05S027.
Der volle Inhalt der QuelleStroppa, Nicolas. „Définitions et caractérisations de modèles à base d'analogies pour l'apprentissage automatique des langues naturelles“. Phd thesis, Télécom ParisTech, 2005. http://tel.archives-ouvertes.fr/tel-00145147.
Der volle Inhalt der QuelleDans le cadre d'un apprentissage automatique de données linguistiques, des modèles inférentiels alternatifs ont alors été proposés qui remettent en cause le principe d'abstraction opéré par les règles ou les modèles probabilistes. Selon cette conception, la connaissance linguistique reste implicitement représentée dans le corpus accumulé. Dans le domaine de l'Apprentissage Automatique, les méthodes suivant les même principes sont regroupées sous l'appellation d'apprentissage \og{}paresseux\fg{}. Ces méthodes reposent généralement sur le biais d'apprentissage suivant~: si un objet $Y$ est \og{}proche\fg{} d'un objet $X$, alors son analyse $f(Y)$ est un bon candidat pour $f(X)$. Alors que l'hypothèse invoquée se justifie pour les applications usuellement traitées en Apprentissage Automatique, la nature structurée et l'organisation paradigmatique des données linguistiques suggèrent une approche légèrement différente. Pour rendre compte de cette particularité, nous étudions un modèle reposant sur la notion de \og{}proportion analogique\fg{}. Dans ce modèle, l'analyse $f(T)$ d'un nouvel objet $T$ s'opère par identification d'une proportion analogique avec des objets $X$, $Y$ et $Z$ déjà connus. L'hypothèse analogique postule ainsi que si \lana{X}{Y}{Z}{T}, alors \lana{$f(X)$}{$f(Y)$}{$f(Z)$}{$f(T)$}. Pour inférer $f(T)$ à partir des $f(X)$, $f(Y)$, $f(Z)$ déjà connus, on résout l'\og{}équation analogique\fg{} d'inconnue $I$~: \lana{$f(X)$}{$f(Y)$}{$f(Z)$}{$I$}.
Nous présentons, dans la première partie de ce travail, une étude de ce modèle de proportion analogique au regard d'un cadre plus général que nous qualifierons d'\og{}apprentissage par analogie\fg{}. Ce cadre s'instancie dans un certain nombre de contextes~: dans le domaine des sciences cognitives, il s'agit de raisonnement par analogie, faculté essentielle au c\oe{}ur de nombreux processus cognitifs~; dans le cadre de la linguistique traditionnelle, il fournit un support à un certain nombre de mécanismes tels que la création analogique, l'opposition ou la commutation~; dans le contexte de l'apprentissage automatique, il correspond à l'ensemble des méthodes d'apprentissage paresseux. Cette mise en perspective offre un éclairage sur la nature du modèle et les mécanismes sous-jacents.
La deuxième partie de notre travail propose un cadre algébrique unifié, définissant la notion de proportion analogique. Partant d'un modèle de proportion analogique entre chaînes de symboles, éléments d'un monoïde libre, nous présentons une extension au cas plus général des semigroupes. Cette généralisation conduit directement à une définition valide pour tous les ensembles dérivant de la structure de semigroupe, permettant ainsi la modélisation des proportions analogiques entre représentations courantes d'entités linguistiques telles que chaînes de symboles, arbres, structures de traits et langages finis. Des algorithmes adaptés au traitement des proportions analogiques entre de tels objets structurés sont présentés. Nous proposons également quelques directions pour enrichir le modèle, et permettre ainsi son utilisation dans des cas plus complexes.
Le modèle inférentiel étudié, motivé par des besoins en Traitement Automatique des Langues, est ensuite explicitement interprété comme une méthode d'Apprentissage Automatique. Cette formalisation a permis de mettre en évidence plusieurs de ses éléments caractéristiques. Une particularité notable du modèle réside dans sa capacité à traiter des objets structurés, aussi bien en entrée qu'en sortie, alors que la tâche classique de classification suppose en général un espace de sortie constitué d'un ensemble fini de classes. Nous montrons ensuite comment exprimer le biais d'apprentissage de la méthode à l'aide de l'introduction de la notion d'extension analogique. Enfin, nous concluons par la présentation de résultats expérimentaux issus de l'application de notre modèle à plusieurs tâches de Traitement Automatique des Langues~: transcription orthographique/phonétique, analyse flexionnelle et analyse dérivationnelle.
Saadane, Houda. „Le traitement automatique de l’arabe dialectalisé : aspects méthodologiques et algorithmiques“. Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAL022/document.
Der volle Inhalt der QuelleMunch, Damien. „Un modèle dynamique et parcimonieux du traitement automatisé de l'aspect dans les langues naturelles“. Electronic Thesis or Diss., Paris, ENST, 2013. http://www.theses.fr/2013ENST0058.
Der volle Inhalt der QuelleThe purpose of this work is to design and to implement a computational model for the processing of aspect in natural language.Our goal is to elaborate a detailed and explicative model of aspect. This model should be able to process aspect on a chosen number of sentences, while following strong constraints of parsimony and cognitive plausibility. We were successful in creating such a model, with both an original design and an extensive explanatory power. New explanations have been obtained for phenomena like repetition, perfectivity and inchoativity. We also propose a new mechanism based on the notion of “predication”
Perez, Laura Haide. „Génération automatique de phrases pour l'apprentissage des langues“. Thesis, Université de Lorraine, 2013. http://www.theses.fr/2013LORR0062/document.
Der volle Inhalt der QuelleIn this work, we explore how Natural Language Generation (NLG) techniques can be used to address the task of (semi-)automatically generating language learning material and activities in Camputer-Assisted Language Learning (CALL). In particular, we show how a grammar-based Surface Realiser (SR) can be usefully exploited for the automatic creation of grammar exercises. Our surface realiser uses a wide-coverage reversible grammar namely SemTAG, which is a Feature-Based Tree Adjoining Grammar (FB-TAG) equipped with a unification-based compositional semantics. More precisely, the FB-TAG grammar integrates a flat and underspecified representation of First Order Logic (FOL) formulae. In the first part of the thesis, we study the task of surface realisation from flat semantic formulae and we propose an optimised FB-TAG-based realisation algorithm that supports the generation of longer sentences given a large scale grammar and lexicon. The approach followed to optimise TAG-based surface realisation from flat semantics draws on the fact that an FB-TAG can be translated into a Feature-Based Regular Tree Grammar (FB-RTG) describing its derivation trees. The derivation tree language of TAG constitutes a simpler language than the derived tree language, and thus, generation approaches based on derivation trees have been already proposed. Our approach departs from previous ones in that our FB-RTG encoding accounts for feature structures present in the original FB-TAG having thus important consequences regarding over-generation and preservation of the syntax-semantics interface. The concrete derivation tree generation algorithm that we propose is an Earley-style algorithm integrating a set of well-known optimisation techniques: tabulation, sharing-packing, and semantic-based indexing. In the second part of the thesis, we explore how our SemTAG-based surface realiser can be put to work for the (semi-)automatic generation of grammar exercises. Usually, teachers manually edit exercises and their solutions, and classify them according to the degree of dificulty or expected learner level. A strand of research in (Natural Language Processing (NLP) for CALL addresses the (semi-)automatic generation of exercises. Mostly, this work draws on texts extracted from the Web, use machine learning and text analysis techniques (e.g. parsing, POS tagging, etc.). These approaches expose the learner to sentences that have a potentially complex syntax and diverse vocabulary. In contrast, the approach we propose in this thesis addresses the (semi-)automatic generation of grammar exercises of the type found in grammar textbooks. In other words, it deals with the generation of exercises whose syntax and vocabulary are tailored to specific pedagogical goals and topics. Because the grammar-based generation approach associates natural language sentences with a rich linguistic description, it permits defining a syntactic and morpho-syntactic constraints specification language for the selection of stem sentences in compliance with a given pedagogical goal. Further, it allows for the post processing of the generated stem sentences to build grammar exercise items. We show how Fill-in-the-blank, Shuffle and Reformulation grammar exercises can be automatically produced. The approach has been integrated in the Interactive French Learning Game (I-FLEG) serious game for learning French and has been evaluated both based in the interactions with online players and in collaboration with a language teacher
Baldy, Bernard. „Vérifications, détections et corrections syntaxico-sémantiques dans le traitement automatique des langues à partir du formalisme des grammaires syntagmatiques généralisées“. Paris 13, 1995. http://www.theses.fr/1995PA132029.
Der volle Inhalt der QuelleArnulphy, Béatrice. „Désignations nominales des événements : étude et extraction automatique dans les textes“. Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00758062.
Der volle Inhalt der QuelleLi, Yiping. „Étude des problèmes spécifiques de l'intégration du chinois dans un système de traitement automatique pour les langues européennes“. Université de Marne-la-Vallée, 2006. http://www.theses.fr/2006MARN0282.
Der volle Inhalt der QuelleLinguistic analysis is a fundamental and essential step for natural language processing. It often includes part-of-speech tagging and named entity identification in order to realize higher level applications, such as information retrieval, automatic translation, question answers, etc. Chinese linguistic analysis must perform the same tasks as that of other languages, but it must resolve a supplemental difficulty caused by the lack of delimiter between words. Since the word is the elementary unit for automated language processing, it is indispensable to segment sentences into words for Chinese language processing. In most existing system described in the literature, segmentation, part-of-speech tagging and named entity recognition are often presented as three sequential, independent steps. But since segmentation provides the basis for and impacts the other two steps, some statistical methods which collapse all three treatments or two of the three into one module have been proposed. With these combinations of steps, segmentation can be improved by complementary information supplied by part-of-speech tagging and named entity recognition, and global analysis of Chinese improved. However this unique treatment model is not modular and difficult to adapt to different languages other than Chinese. Consequently, this approach is not suitable for creating multilingual automatic analysis systems. This dissertation studies the integration Chinese automatic analysis into an existing multilingual analysis system LIMA. Originally built for European languages, LIMA’s modular approach imposes some constraints that a monolingual Chinese analysis system need not consider. Firstly, the treatment for Chinese should be compatible and follow the same flow as other languages. And secondly, in order to keep the system coherent, it is preferable to employ common modules for all the languages treated by the system, including a new language like Chinese. To respect these constraints, we chose to realize the phases of segmentation, part-of-speech tagging and named entity recognition separately. Our modular treatment includes a specific module for Chinese analysis that should be reusable for other languages with similar linguistic features. After error analysis of this purely modular approach, we were able to improve our segmentation with enriched information supplied by part-ofspeech tagging, named entity recognition and some linguistic knowledge. In our final results, three specific treatments have been added into the LIMA system: a pretreatment based on a co-occurrence model applied before segmentation, a term tokenization relative to numbers written in Chinese characters, and a complementary treatment after segmentation that identifies certain named entities before subsequent part-of-speech tagging. We evaluate and discuss the improvement that these additional treatments bring to our analysis, while retaining the modular and linear approach of the underlying LIMA natural language processing system
Munch, Damien. „Un modèle dynamique et parcimonieux du traitement automatisé de l'aspect dans les langues naturelles“. Thesis, Paris, ENST, 2013. http://www.theses.fr/2013ENST0058/document.
Der volle Inhalt der QuelleThe purpose of this work is to design and to implement a computational model for the processing of aspect in natural language.Our goal is to elaborate a detailed and explicative model of aspect. This model should be able to process aspect on a chosen number of sentences, while following strong constraints of parsimony and cognitive plausibility. We were successful in creating such a model, with both an original design and an extensive explanatory power. New explanations have been obtained for phenomena like repetition, perfectivity and inchoativity. We also propose a new mechanism based on the notion of “predication”
Boulaknadel, Siham. „Traitement automatique des langues et recherche d'information en langue arabe dans un domaine de spécialité : apport des connaissanaces morphologiques et syntaxiques pour l'indexation“. Nantes, 2008. http://www.theses.fr/2008NANT2052.
Der volle Inhalt der QuelleInformation retrieval aims to provide to an user an easy access to information. To achieve this goal, an information retrieval system (IRS) must represent, store and organize information, then provide to the user the elements corresponding to the need for information expressed by his query. Most of information retrieval systems (IRS) use simple terms to index and retrieve documents. However, this representation is not precise enough to represent the contents of documents and queries, because of the ambiguity of terms isolated from their context. A solution to this problem is to use multi-word terms to replace simple term. This approach is based on the assumption that a multi-word term is less ambiguous than a simple term. Our thesis is part of the information retrieval in Arabic specific domain. The objective of our work was on the one hand, identifying a multi-word terms present in queries and documents. On the other hand, exploiting the richness of language by combining several linguistic knowledge belonging at the morphological and syntax level, and showing how the contribution of syntactic and morphological knowledge helps to improve access to information. Thus, we proposed a platform integrating various components in the public domain; it leads to show significant contribution of these components. In addition, we have defined linguistically a multi-word term in Arabic and we developed a system of identification of multi-word terms which is based on a mixed approach combining statistical model and linguistic data
Thibeault, Mélanie, und Mélanie Thibeault. „La catégorisation grammaticale automatique : adaptation du catégoriseur de Brill au français et modification de l'approche“. Master's thesis, Université Laval, 2004. http://hdl.handle.net/20.500.11794/17984.
Der volle Inhalt der QuelleTableau d’honneur de la Faculté des études supérieures et postdoctorales, 2004-2005
La catégorisation grammaticale automatique est un domaine où il reste encore beaucoup à faire. De très bons catégoriseurs existent pour l'anglais, mais ceux dont dispose la communauté francophone sont beaucoup moins efficaces. Nous avons donc entraîné le catégoriseur de Brill pour le français pour ensuite en améliorer les résultats. Par ailleurs, quelle que soit la technique utilisée, certains problèmes restent irrésolus. Les mots inconnus sont toujours difficiles à catégoriser correctement. Nous avons tenté de trouver des solutions à ce problème. En somme, nous avons apporté une série de modifications à l'approche de Brill et évalué l'impact de celles-ci sur les performances. Les modifications apportées ont permis de faire passer les performances du traitement des mots inconnus français de 70,7% à 78,6%. Nous avons donc amélioré sensiblement les performances bien qu'il reste encore beaucoup de travail à faire avant que le traitement des mots inconnus français soit satisfaisant.
La catégorisation grammaticale automatique est un domaine où il reste encore beaucoup à faire. De très bons catégoriseurs existent pour l'anglais, mais ceux dont dispose la communauté francophone sont beaucoup moins efficaces. Nous avons donc entraîné le catégoriseur de Brill pour le français pour ensuite en améliorer les résultats. Par ailleurs, quelle que soit la technique utilisée, certains problèmes restent irrésolus. Les mots inconnus sont toujours difficiles à catégoriser correctement. Nous avons tenté de trouver des solutions à ce problème. En somme, nous avons apporté une série de modifications à l'approche de Brill et évalué l'impact de celles-ci sur les performances. Les modifications apportées ont permis de faire passer les performances du traitement des mots inconnus français de 70,7% à 78,6%. Nous avons donc amélioré sensiblement les performances bien qu'il reste encore beaucoup de travail à faire avant que le traitement des mots inconnus français soit satisfaisant.
Sébillot, Pascale. „Apprentissage sur corpus de relations lexicales sémantiques - La linguistique et l'apprentissage au service d'applications du traitement automatique des langues“. Habilitation à diriger des recherches, Université Rennes 1, 2002. http://tel.archives-ouvertes.fr/tel-00533657.
Der volle Inhalt der QuelleSébillot, Pascale. „Apprentissage sur corpus de relations lexicales sémantiques la linguistique et l'apprentissage au service d'applications du traitement automatique des langues /“. [S.l.] : [s.n.], 2002. http://www.irisa.fr/centredoc/publis/HDR/2002/irisapublication.2005-08-03.1402955054.
Der volle Inhalt der QuelleChalendar, Gaël de. „SVETLAN', un système de structuration du lexique guidé par la détermination automatique du contexte thématique“. Paris 11, 2001. http://www.theses.fr/2001PA112258.
Der volle Inhalt der QuelleSemantic knowledge is mandatory for Natural Language Processing. Unfortunately, classifications that have universal goals are an utopia. There exists systems that extracts semantic knowledge from specialized texts but it is well known that it is not possible to do such an extraction from texts said to be of "general" language. The goal of this doctoral dissertation is to show that this idea is false. We show that a thematic analysis of non-specialized texts (newspapers, newswires or HTML pages gathered from the Web) usually allows to reduce the problem to a classical one where the analysis of a technical corpus is done, but where the human interventions are limited. With our approach, the theme of text segments is detected by the statistical analysis of word distributions, designed notions of similarity and aggregation. They allow to aggregate the words of similar segments to build thematic domains where higher weighted words describe the theme. We then group the words that appear as the same argument of the same verb in the various text segments belonging to a theme. That forms classes of words. We have implemented our model in a system called SVETLAN' which has been tested on several French and English million words corpus. The empirical analysis of the results shows that, as anticipated, words are usually in a strong mutual semantic relation in the classes that are obtained, in the context determined by the theme. Human judgment of word classes is not very consistent. So, we indirectly validate the semantic knowledge obtained by SVETLAN' in using it in a request expansion task in order to improve the results of a natural language question answering system
Colotte, Vincent. „Techniques d'analyse et de synthèse de la parole appliquées à l'apprentissage des langues“. Nancy 1, 2002. http://www.theses.fr/2002NAN10222.
Der volle Inhalt der QuelleNowadays when exchanges between people are more and more international, foreign language grasp is becoming essential. The computer-assisted language learning seems to be a new stake. In particular, the improvement of oral comprehension constitutes one of keys to control a language. To improve intelligibility, I work out a first strategy based on selective slowing down of speech signal. The transitory parts - regions of high acoustic cue concentration - turns out to be privileged candidates to the slowing down. The detection of these regions is based on the computation of a coefficient which reflects spectrum variation rate. I work out a second strategy which enhances relevant events of speech, i. E. That its amplification improves intelligibility. This strategy is based on the preservation of phonetic contrasts, in particular between voiced and unvoiced consonants. Thus, I developed an algorithm of detection of unvoiced plosives and unvoiced fricatives from criteria on energy. Two experiments of perception have been carried out to validate these strategies of intelligibility improvement: the first, preliminary, with French listeners on American sentences and the second with foreign students (learning French as foreign language) on French sentences. At last, to modify the prosodic elements (rhythm, intensity, fundamental frequency), my work was based on PSOLA method (Pitch Synchronous OverLap and Add). I work out an algorithm of pitch marking and I improve the accuracy of synthesis method. These strategies are totally automatic and allow to improve intelligibility of speech signal in the framework of language learning
Ramisch, Carlos eduardo. „Un environnement générique et ouvert pour le traitement des expressions polylexicales : de l'acquisition aux applications“. Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00859910.
Der volle Inhalt der QuelleSamson, Juan Sarah Flora. „Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia“. Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM061/document.
Der volle Inhalt der QuelleLanguages in Malaysia are dying in an alarming rate. As of today, 15 languages are in danger while two languages are extinct. One of the methods to save languages is by documenting languages, but it is a tedious task when performed manually.Automatic Speech Recognition (ASR) system could be a tool to help speed up the process of documenting speeches from the native speakers. However, building ASR systems for a target language requires a large amount of training data as current state-of-the-art techniques are based on empirical approach. Hence, there are many challenges in building ASR for languages that have limited data available.The main aim of this thesis is to investigate the effects of using data from closely-related languages to build ASR for low-resource languages in Malaysia. Past studies have shown that cross-lingual and multilingual methods could improve performance of low-resource ASR. In this thesis, we try to answer several questions concerning these approaches: How do we know which language is beneficial for our low-resource language? How does the relationship between source and target languages influence speech recognition performance? Is pooling language data an optimal approach for multilingual strategy?Our case study is Iban, an under-resourced language spoken in Borneo island. We study the effects of using data from Malay, a local dominant language which is close to Iban, for developing Iban ASR under different resource constraints. We have proposed several approaches to adapt Malay data to obtain pronunciation and acoustic models for Iban speech.Building a pronunciation dictionary from scratch is time consuming, as one needs to properly define the sound units of each word in a vocabulary. We developed a semi-supervised approach to quickly build a pronunciation dictionary for Iban. It was based on bootstrapping techniques for improving Malay data to match Iban pronunciations.To increase the performance of low-resource acoustic models we explored two acoustic modelling techniques, the Subspace Gaussian Mixture Models (SGMM) and Deep Neural Networks (DNN). We performed cross-lingual strategies using both frameworks for adapting out-of-language data to Iban speech. Results show that using Malay data is beneficial for increasing the performance of Iban ASR. We also tested SGMM and DNN to improve low-resource non-native ASR. We proposed a fine merging strategy for obtaining an optimal multi-accent SGMM. In addition, we developed an accent-specific DNN using native speech data. After applying both methods, we obtained significant improvements in ASR accuracy. From our study, we observe that using SGMM and DNN for cross-lingual strategy is effective when training data is very limited