Dissertations / Theses on the topic 'Modèles multilingues'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 19 dissertations / theses for your research on the topic 'Modèles multilingues.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Charton, Eric. "Génération de phrases multilingues par apprentissage automatique de modèles de phrases." Phd thesis, Université d'Avignon, 2010. http://tel.archives-ouvertes.fr/tel-00622561.
Full textCharton, Éric. "Génération de phrases multilingues par apprentissage automatique de modèles de phrases." Thesis, Avignon, 2010. http://www.theses.fr/2010AVIG0175/document.
Full textNatural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system. In this thesis report, we present an architecture of NLG system relying on statistical methods. The originality of our proposition is its ability to use a corpus as a learning resource for sentences production. This method offers several advantages : it simplifies the implementation and design of a multilingual NLG system, capable of sentence production of the same meaning in several languages. Our method also improves the adaptability of a NLG system to a particular semantic field. In our proposal, sentence generation is achieved trough the use of sentence models, obtained from a training corpus. Extracted sentences are abstracted by a labelling step obtained from various information extraction and text mining methods like named entity recognition, co-reference resolution, semantic labelling and part of speech tagging. The sentence generation process is achieved by a sentence realisation module. This module provide an adapted sentence model to fit a communicative intent, and then transform this model to generate a new sentence. Two methods are proposed to transform a sentence model into a generated sentence, according to the semantic content to express. In this document, we describe the complete labelling system applied to encyclopaedic content to obtain the sentence models. Then we present two models of sentence generation. The first generation model substitute the semantic content to an original sentence content. The second model is used to find numerous proto-sentences, structured as Subject, Verb, Object, able to fit by part a whole communicative intent, and then aggregate all the selected proto-sentences into a more complex one. Our experiments of sentence generation with various configurations of our system have shown that this new approach of NLG have an interesting potential
Sam, Sethserey. "Vers une adaptation autonome des modèles acoustiques multilingues pour le traitement automatique de la parole." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00685204.
Full textBalikas, Georgios. "Explorer et apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM054/document.
Full textText is one of the most pervasive and persistent sources of information. Content analysis of text in its broad sense refers to methods for studying and retrieving information from documents. Nowadays, with the ever increasing amounts of text becoming available online is several languages and different styles, content analysis of text is of tremendous importance as it enables a variety of applications. To this end, unsupervised representation learning methods such as topic models and word embeddings constitute prominent tools.The goal of this dissertation is to study and address challengingproblems in this area, focusing on both the design of novel text miningalgorithms and tools, as well as on studying how these tools can be applied to text collections written in a single or several languages.In the first part of the thesis we focus on topic models and more precisely on how to incorporate prior information of text structure to such models.Topic models are built on the premise of bag-of-words, and therefore words are exchangeable. While this assumption benefits the calculations of the conditional probabilities it results in loss of information.To overcome this limitation we propose two mechanisms that extend topic models by integrating knowledge of text structure to them. We assume that the documents are partitioned in thematically coherent text segments. The first mechanism assigns the same topic to the words of a segment. The second, capitalizes on the properties of copulas, a tool mainly used in the fields of economics and risk management that is used to model the joint probability density distributions of random variables while having access only to their marginals.The second part of the thesis explores bilingual topic models for comparable corpora with explicit document alignments. Typically, a document collection for such models is in the form of comparable document pairs. The documents of a pair are written in different languages and are thematically similar. Unless translations, the documents of a pair are similar to some extent only. Meanwhile, representative topic models assume that the documents have identical topic distributions, which is a strong and limiting assumption. To overcome it we propose novel bilingual topic models that incorporate the notion of cross-lingual similarity of the documents that constitute the pairs in their generative and inference processes. Calculating this cross-lingual document similarity is a task on itself, which we propose to address using cross-lingual word embeddings.The last part of the thesis concerns the use of word embeddings and neural networks for three text mining applications. First, we discuss polylingual document classification where we argue that translations of a document can be used to enrich its representation. Using an auto-encoder to obtain these robust document representations we demonstrate improvements in the task of multi-class document classification. Second, we explore multi-task sentiment classification of tweets arguing that by jointly training classification systems using correlated tasks can improve the obtained performance. To this end we show how can achieve state-of-the-art performance on a sentiment classification task using recurrent neural networks. The third application we explore is cross-lingual information retrieval. Given a document written in one language, the task consists in retrieving the most similar documents from a pool of documents written in another language. In this line of research, we show that by adapting the transportation problem for the task of estimating document distances one can achieve important improvements
Zhang, Ying. "Modèles et outils pour des bases lexicales "métier" multilingues et contributives de grande taille, utilisables tant en traduction automatique et automatisée que pour des services dictionnairiques variés." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM017/document.
Full textOur research is in computational lexicography, and concerns not only the computer support to lexical resources useful for MT (machine translation) and MAHT (Machine Aided Human Translation), but also the linguistic architecture of lexical databases supporting these resources in an operational context (CIFRE thesis with L&M).We begin with a study of the evolution of ideas in this area, since the computerization of classical dictionaries to platforms for building up true "lexical databases" such as JIBIKI-1 [Mangeot, M. et al., 2003 ; Sérasset, G., 2004] and JIBIKI-2 [Zhang, Y. et al., 2014]. The starting point was the PIVAX-1 system [Nguyen, H.-T. et al., 2007 ; Nguyen, H. T. & Boitet, C., 2009] designed for lexical bases for heterogeneous MT systems with a lexical pivot, able to support multiple volumes in each "lexical space", be it natural or artificial (as UNL). Considering the industrial context, we focused our research on some issues, in informatics and lexicography.To scale up, and to add some new features enabled by JIBIKI-2, such as the "rich links", we have transformed PIVAX-1 into PIVAX-2, and reactivated the GBDLEX-UW++ project that started during the ANR TRAOUIERO project, by re-importing all (multilingual) data supported by PIVAX-1, and making them available on an open server.Hence a need for L&M for acronyms, we expanded the "macrostructure" of PIVAX incorporating volumes of "prolexemes" as in PROLEXBASE [Tran, M. & Maurel, D., 2006]. We also show how to extend it to meet new needs such as those of the INNOVALANGUES project. Finally, we have created a "lemmatisation middleware", LEXTOH, which allows calling several morphological analyzers or lemmatizers and then to merge and filter their results. Combined with a new dictionary creation tool, CREATDICO, LEXTOH allows to build on the fly a "mini-dictionary" corresponding to a sentence or a paragraph of a text being "post-edited" online under IMAG/SECTRA, which performs the lexical proactive support functionality foreseen in [Huynh, C.-P., 2010]. It could also be used to create parallel corpora with the aim to build MOSES-based "factored MT systems"
Daoud, Mohammad. "Utilisation de ressources non conventionnelles et de méthodes contributives pour combler le fossé terminologique entre les langues en développant des "préterminologies" multilingues." Phd thesis, Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00583682.
Full textDaoud, Mohammad. "Utilisation de ressources non conventionnelles et de méthodes contributives pour combler le fossé terminologique entre les langues en développant des "préterminologies" multilingues." Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM090.
Full textOur motivation is to bridge the terminological gap that grows with the massive production of new concepts (50 daily) in various domains, for which terms are often first coined in some well-resourced language, such as English or French. Finding equivalent terms in different languages is necessary for many applications, such as CLIR and MT. This task is very difficult, especially for some widely used languages such as Arabic, because (1) only a small proportion of new terms is properly recorded by terminologists, and for few languages; (2) specific communities continuously create equivalent terms without normalizing and even recording them (latent terminology); (3) in many cases, no equivalent terms are created, formally or informally (absence of terminology). This thesis proposes to replace the impossible goal of building in a continuous way an up-to-date, complete and high-quality terminology for a large number of languages by that of building a preterminology, using unconventional methods and passive or active contributions by communities of internauts: extracting potential parallel terms not only from parallel or comparable texts, but also from logs of visits to Web sites such as DSR (Digital Silk Road), and from data produced by serious games. A preterminology is a new kind of lexical resource that can be easily constructed and has good coverage. Following a growing trend in computational lexicography and NLP in general, we represent a multilingual preterminology by a graph structure (Multilingual Preterminological Graph, MPG), where nodes bear preterms and arcs simple preterminological relations (monolingual synonymy, translation, generalization, specialization, etc. ) that approximate usual terminological (or ontological) relations. A complete System for Eliciting Preterminology (SEpT) has been developed to build and maintain MPGs. Passive approaches have been experimented by developing an MPG for the DSR cultural Web site, and another for the domain of Arabic oneirology: the produced resources achieved good informational and linguistic coverage. The indirect active contribution approach is being tested since 8-9 months using the Arabic instance of the JeuxDeMots serious game
Le, Thi Hoang Diem. "Utilisation de ressources externes dans un modèle Bayésien de Recherche d'Information. Application à la recherche d'information multilingue avec UMLS." Phd thesis, Université Joseph Fourier (Grenoble), 2009. http://tel.archives-ouvertes.fr/tel-00463681.
Full textMuller, Benjamin. "How Can We Make Language Models Better at Handling the Diversity and Variability of Natural Languages ?" Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS399.
Full textDeep Learning for NLP has led to impressive empirical progress in recent years. In essence, this progress is based on better contextualized representations that can be easily used for a wide variety of tasks. However, these models usually require substantial computing power and large amounts of raw textual data. This makes language’s inherent diversity and variability a vivid challenge in NLP. We focus on the following: How can we make language models better at handling the variability and diversity of natural languages?. First, we explore the generalizability of language models by building and analyzing one of the first large-scale replication of a BERT model for a non-English language. Our results raise the question of using these language models on highly-variable domains such as these found online. Focusing on lexical normalization, we show that this task can be approached with BERT-like models. However, we show that it only partially helps downstream performance. In consequence, we focus on adaptation techniques using what we refer to as representation transfer and explore challenging settings such as the zero-shot setting, low-resource languages. We show that multilingual language models can be adapted and used efficiently with low-resource languages, even with the ones unseen during pretraining, and that the script is a critical component in this adaptation
Le, Thi Hoang Diem. "Utilisation de ressources externes dans un modèle Bayésien de recherche d'information : application à la recherche d'information médicale multilingue avec UMLS." Phd thesis, Grenoble 1, 2009. http://www.theses.fr/2009GRE10073.
Full textWith the availability of external resources to documents, the Information Retrieval Systems evolve. These resources provide not only information on the terms and concepts for a more precise indexing, but also the semantic relations between these terms or concepts. Our thesis work lies in the use of external resources in information retrieval. We study firstly the indexing conceptual in comparison with term-based indexing. The problem arrise when the documents and the query don't share the same concepts, but the concepts of the documents are semantically related with the concepts of the query. We propose to take into account these semantic relationships between concepts by a information retrieval model which is based on a Bayesian network of concepts and their semantic relationships. Ln addition, we propose the use of knowledge of the problem from external ressource to improve the performance of retrieval. The validation of the proposed mode is achieved by experiments in the medical domaine information retrieval, with the use of meta thesaurus UMU as external resource. The application for a system of information retrieval multi modality (text and images) was also performed
Bayeh, Rania. "Reconnaissance de la parole multilingue : adaptation de modèles acoustiques vers une langue cible." Paris, Télécom ParisTech, 2009. http://www.theses.fr/2009ENST0060.
Full textSpeech processing has become a key technology where different automatic speech recognition (ASR) systems are available for popular languages. With the constant interaction of different cultures, not all users of such systems are native speakers & conversations are often a mixture of several languages which is challenging for ASR. Therefore, a multilingual ASR system is needed. This thesis focuses on efficiently porting the acoustic models (AM) of an under resourced target language using the acoustic models of a more resourced source language with the goal of universal acoustic modeling. Different approaches are suggested & tested for porting models for the recognition of Modern Standard Arabic starting from French for different types of speech & applications. Porting includes the association of speech units, initialization & adaptation of AM. Initially, methods are proposed for the creation of one-to-one phone associations by a human expert or using an automatic data-driven approach. Initialization is done at the context independent level by copying Hidden Markov Models (HMM) target language phone models from a source language HMM phone model based on these associations. Resulting models are adapted using different amounts of target language data. Then, novel methods for one-to-many associations are introduced & multi-path models are used for initialization. Moreover, since the superiority of context dependency extends to cross-lingual & multilingual, different approaches are proposed to create context dependent AM for the under resourced target language using robust AM from a source language. Approaches are also validated for a new language, Colloquial Levantine Arabic
Ellouze, Nebrasse. "Approche de recherche intelligente fondée sur le modèle des Topic Maps : application au domaine de la construction durable." Phd thesis, Conservatoire national des arts et metiers - CNAM, 2010. http://tel.archives-ouvertes.fr/tel-00555929.
Full textBella, Gábor. "Modélisation de texte numérique multilingue : vers un modèle général et extensible fondé sur le concept de textème." Télécom Bretagne, 2008. http://www.theses.fr/2008TELB0067.
Full textThis thesis is concerned with the modelling of electronic text. This modelling involves the definition both of the atomic text elements and of the way these elements join together to form textual structures. In response to the growing need for internationalisation of information systems, historical models of text, based on the concept of code tables, have been extended by semi-formalised knowledge related to the writing system so that, by now, such knowledge is essential to text processing of even the simplest kind. Thus were born the Unicode character encoding and the so-called 'intelligent' font formats. Realising that this phenomenon marks only the beginning of a convergence towards models based on the principles of knowledge representation, we here propose an alternative approach to text modelling that defines a text element not as a table entry but through the properties that describe the element. The formal framework that we establish, initially developed for the purposes of knowledge representation, provides us with a method by which precise formal definitions can be given to much-used but ill-defined notions such as character, glyph, or usage. The same framework allows us to define a generalised text element that we call a texteme, the atomic element on which a whole family of new text models is based. The study of these models then leads us to the understanding
Haton, Sébastien. "Analyse et modélisation de la polysémie verbale dans une perspective multilingue : le dictionnaire bilingue vu dans un miroir." Nancy 2, 2006. http://www.theses.fr/2006NAN21016.
Full textLexical asymmetry and hidden data, i. E. Not directly visible into one lexical entry, are phenomena peculiar to most of the bilingual dictionaries. Our purpose is to establish a methodology to highlight both phenomena by extracting hidden data from the dictionary and by re-establishing symmetry between its two parts. So we studied a large number of verbs and integrated them into a unique multilingual database. In order to offset some lacks of the lexicography, we also studied verb occurrences from a literary database. The purpose is to expand dictionaires' data without criticizing these ones. At last, our database is turned into a "multilexical" graph thanks to an algorithm, which is binding words from different languages into the same semantic space
El, Abed Walid. "Meta modèle sémantique et noyau informatique pour l'interrogation multilingue des bases de données en langue naturelle (théorie et application)." Besançon, 2001. http://www.theses.fr/2001BESA1014.
Full textEllouze, Nebrasse. "Approche de recherche intelligente fondée sur le modèle des Topic Maps : application au domaine de la construction durable." Electronic Thesis or Diss., Paris, CNAM, 2010. http://www.theses.fr/2010CNAM0736.
Full textThe research work in this thesis is related to Topic Map construction and their use in semantic annotation of web resources in order to help users find relevant information in these resources. The amount of information sources available today is very huge and continuously increasing, for that, it is impossible to create and maintain manually a Topic Map to represent and organize all these information. Many Topic Maps building approaches can be found in the literature [Ellouze et al. 2008a]. However, none of these approaches takes as input multilingual document content. In addition, although Topic Maps are basically dedicated to users navigation and information search, no one approach takes into consideration users requests in the Topic Map building process. In this context, we have proposed ACTOM, a Topic Map building approach based on an automated process taking into account multilingual documents and Topic Map evolution according to content and usage changes. To enrich the Topic Map, we are based on a domain thesaurus and we propose also to explore all potential questions related to source documents in order to represent usage in the Topic Map. In our approach, we extend the Topic Map model that already exists by defining the usage links and a list of meta-properties associated to each Topic, these meta-properties are used in the Topic Map pruning process. In our approach ACTOM, we propose also to precise and enrich semantics of Topic Map links so, except occurrences links between Topics and resources, we classify Topic Map links in two different classes, those that we have called “ontological links” and those that we have named “usage links”
Montariol, Syrielle. "Models of diachronic semantic change using word embeddings." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG006.
Full textIn this thesis, we study lexical semantic change: temporal variations in the use and meaning of words, also called extit{diachrony}. These changes are carried by the way people use words, and mirror the evolution of various aspects of society such as its technological and cultural environment.We explore, compare and evaluate methods to build time-varying embeddings from a corpus in order to analyse language evolution.We focus on contextualised word embeddings using pre-trained language models such as BERT. We propose several approaches to extract and aggregate the contextualised representations of words over time, and quantify their level of semantic change.In particular, we address the practical aspect of these systems: the scalability of our approaches, with a view to applying them to large corpora or large vocabularies; their interpretability, by disambiguating the different uses of a word over time; and their applicability to concrete issues, for documents related to COVID19We evaluate the efficiency of these methods quantitatively using several annotated corpora, and qualitatively by linking the detected semantic variations with real-life events and numerical data.Finally, we extend the task of semantic change detection beyond the temporal dimension. We adapt it to a bilingual setting, to study the joint evolution of a word and its translation in two corpora of different languages; and to a synchronic frame, to detect semantic variations across different sources or communities on top of the temporal variation
Moumtzidou, Argyro. "L'éveil aux langues dans la formation des enseignant/es grec/ques : vers un modèle dynamique de formation-action." Thesis, Le Mans, 2011. http://www.theses.fr/2011LEMA3013/document.
Full textIntercultural Training in teacher education is not limited to the idea of tolerance and acceptance of others. It consists of three integrated principles: the awakening and strengthening of critical thinking among the teacher, his interest in the implementation of educational innovation and the ability to build a more holistic view and more effective management of human and social complexity. Our work presents an action-research project that lasted two years and was aimed at training (long-term) of the Greek teachers. The final sample who participated in our research is 10 persons, all early childhood, primary and high school teachers who are working in multilingual classes. The training model called "Evolutionary training model" is based on the general assumption that the innovation of the Awakening to Languages, when en-golfed by teachers education, may create among teachers knowledge, attitudes and skills that enable them to make better use of the linguistic and cultural capital of their students and provide them a set of practices and a typology of skills that can facilitate them to work with languages throughout the curriculum. To test our hypothesis we chose a triangular approach. Research tools in part have been developed by us, in part from comparable research. These are two types of questionnaires, group interviews recorded and transcribed. In addition, we have based on our own observations as well as the experiment conducted by teachers in multilingual early childhood and primary school classes. In our participatory and action-oriented training, a second set of assumptions has emerged : our long group discussions, individual interviews, our observations have led us to ask whether a dynamic and systemic approach to the type of action-research training, as has been the training at the Awakening Languages, may create the necessary conditions, intra psychic and intra groupal so that the teachers develop a reflexive attitude towards their own, representations, manage their own social and professional problems in a dynamic way and stop feeling professional isolation. The main conclusion is that before talking about an effective intercultural education, we need to modify some elements in the socio-professional and personal identity of the teachers because the innovation of Awakening to Languages can help teachers realize their own representations of linguistic and cultural diversity in the classroorn, as well as their teaching practices and renegotiate with them
Tiryakioglu, Gulay. "EFL learners' writing processes : the relationship between linguistic knowledge, composing processes and text quality." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSE2047.
Full textWriting is a complex process both in the first language (L1) and in a foreign or second language (L2). Researchon second- and foreign-language writing processes is increasing, thanks to the existence of research tools thatenable us to look more closely at what language learners actually do as they write (Hyland, 2016; Van Waes etal., 2012; Wengelin et al., 2019); research on plurilingual writing behaviour remains, however, scarce. Thisstudy looks at the relationship between knowledge of language, typing skills, writing processes (writing fluency,pauses and revisions) and the quality of texts written by 30 middle school French students (14-15 years old),during writing in their first (French), and second (English) languages. In the second study, we looked at thiscomplex relationship among a sub-group of 15 middle school French-Turkish bilingual students (14-15 yearsold, residing in France) during writing in their home language (Turkish), school language (French), and English(a foreign language, also learned at school). The third study explores this complex relationship between thesubgroup of 17 bilingual learners (15 Turkish-French bilinguals and 2 Arabic-French bilinguals) and 13 Frenchmonolingual learners.We used a mixed-method study design: a combination of keystroke loggings, pre- and post-writingquestionnaires, students' written texts and stimulated recall interviews. Our participants performed three writingtasks (a copy task, a descriptive and a narrative task) in each language on the computer using the keystrokeloggingtool Inputlog (Leijten & Van Waes, 2013). Keystroke logging (the possibility of measuring precisetyping behaviour), which has developed over the past two decades, enables empirical investigation of typingbehaviour during writing. Data related to writing processes were analyzed from this Inputlog data: writingfluency was measured as characters per minute, words per minute, and mean pause-bursts (text producedbetween two pauses of 2000 milliseconds); pausing was measured as numbers of pauses, pause length, andlocation (within and between words); and revisions were measured as numbers of deletions and additions, andrevision-bursts (additions and deletions between two long pauses of 2000 milliseconds). Typing speed wasmeasured with the Inputlog copy task tool in three languages; we developed the Turkish copy task for our study,and it has been standardized and added to the Inputlog software. To assess text quality, a team of evaluatorsused both a holistic and an analytical rating scale to judge content, organization and language use in the L1, L2and L3 texts, and this qualitative assessment is compared with the quantitative Inputlog measures. We alsocollected stimulated recall protocol data from a focus group of seven writers, as they watched the keystrokelogged data unfold; this fascinating process enabled us to obtain information related to the writers’ thoughtsduring long pauses and revisions. Finally, we obtained background data on the participants’ writing behaviorsoutside the classroom with a questionnaire.Analyses of the keystroke logging data reveal important differences between L1 and L2 as well as between L1,L2 and L3 writing processes, which appear to be linked to our bilingual subjects’ linguistic backgrounds, andespecially their contact with written Turkish (Akinci, 2016). Writing processes were more fluent in French, withlonger pause-bursts, fewer pauses and revisions than writing in English and Turkish. Post-hoc comparisons ofwriting processes in the three project languages show that although there are significant differences betweenFrench and Turkish/English writing processes, English and Turkish writing processes are similar, with,however, significant fluency differences