Dissertations / Theses on the topic 'Induction du sens des mots'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 25 dissertations / theses for your research on the topic 'Induction du sens des mots.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Soriano-Morales, Edmundo-Pavel. "Hypergraphs and information fusion for term representation enrichment : applications to named entity recognition and word sense disambiguation." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSE2009/document.
Full textMaking sense of textual data is an essential requirement in order to make computers understand our language. To extract actionable information from text, we need to represent it by means of descriptors before using knowledge discovery techniques.The goal of this thesis is to shed light into heterogeneous representations of words and how to leverage them while addressing their implicit sparse nature.First, we propose a hypergraph network model that holds heterogeneous linguistic data in a single unified model. In other words, we introduce a model that represents words by means of different linguistic properties and links them together accordingto said properties. Our proposition differs to other types of linguistic networks in that we aim to provide a general structure that can hold several types of descriptive text features, instead of a single one as in most representations. This representationmay be used to analyze the inherent properties of language from different points of view, or to be the departing point of an applied NLP task pipeline. Secondly, we employ feature fusion techniques to provide a final single enriched representation that exploits the heterogeneous nature of the model and alleviates the sparseness of each representation.These types of techniques are regularly used exclusively to combine multimedia data. In our approach, we consider different text representations as distinct sources of information which can be enriched by themselves. This approach has not been explored before, to the best of our knowledge. Thirdly, we propose an algorithm that exploits the characteristics of the network to identify and group semantically related words by exploiting the real-world properties of the networks. In contrast with similar methods that are also based on the structure of the network, our algorithm reduces the number of required parameters and more importantly, allows for the use of either lexical or syntactic networks to discover said groups of words, instead of the singletype of features usually employed.We focus on two different natural language processing tasks: Word Sense Induction and Disambiguation (WSI/WSD), and Named Entity Recognition (NER). In total, we test our propositions on four different open-access datasets. The results obtained allow us to show the pertinence of our contributions and also give us some insights into the properties of heterogeneous features and their combinations with fusion methods. Specifically, our experiments are twofold: first, we show that using fusion-enriched heterogeneous features, coming from our proposed linguistic network, we outperform the performance of single features’ systems and other basic baselines. We note that using single fusion operators is not efficient compared to using a combination of them in order to obtain a final space representation. We show that the features added by each combined fusion operation are important towards the models predicting the appropriate classes. We test the enriched representations on both WSI/WSD and NER tasks. Secondly, we address the WSI/WSD task with our network-based proposed method. While based on previous work, we improve it by obtaining better overall performance and reducing the number of parameters needed. We also discuss the use of either lexical or syntactic networks to solve the task.Finally, we parse a corpus based on the English Wikipedia and then store it following the proposed network model. The parsed Wikipedia version serves as a linguistic resource to be used by other researchers. Contrary to other similar resources, insteadof just storing its part of speech tag and its dependency relations, we also take into account the constituency-tree information of each word analyzed. The hope is for this resource to be used on future developments without the need to compile suchresource from zero
Mouton, Claire. "Ressources et méthodes semi-supervisées pour l’analyse sémantique de texte en français." Paris 11, 2010. http://www.theses.fr/2010PA112375.
Full textThe possibility of performing semantic rather than purely lexical search should improve information retrieval. This Ph. D. Work aims at developing modules of lexical semantic analysis, having as a further objective to improve the textual search engine of Exalead company. Presented works deal more specifically with semantic analysis on the French language. Processing of French language is more complex due to the Jack of semantic resources and corpora for this language. Thus, make such an analysis possible implies on the one hand to provide for needs of French linguistic resources, and on the other hand, to find alternate methods which do not require any manually annotated French corpus. Our thesis is divided in three main parts followed by a conclusion. The first part is composed of two chapters which define the objectives and the context of our work. The first of them introduces our thesis. It evokes some semantic issues in the field of lnformation Retrieval, then presents the notion of sense. Finally, it identifies two semantic analysis tasks, namely word sense disambiguation and semantic role labeling. These two tasks are the two main topics we address in our whole study. They are respectively handled in part 2 and 3. The second chapter draws up a state-of-the-art review of all the topics addressed in our work. The second part tackles the word sense disambiguation issue. Chapter 3 is devoted to the building of new French resources dedicated to this task. We first describe a method to automatically translate the nominal synsets of WordNet to French, by using bilingual dictionaries and distributional spaces. Secondly, we put forward an adaptation of two existing methods of word sense induction, in order to acquire a ward senses resource in a fully automatic way. Moreover, the sense clusters built in the latter step show originality as they contain words whose syntax is similar to the syntax of the given ambiguous words. The so-called sense clusters are then used in the ward sense disambiguation algorithm that we put forward in chapter 4. This chapter also provides recommendations in order to integrate such a module in a textual search engine. Semantic role labeling is handled in the third part. Ln a similar fashion, a first chapter deals with the building of resources for the French language, whereas the following chapter presents the algorithm developed for the labeling task itself. Chapter 5 thus describes the method we propose to translate and enrich FrameNet predicates, as well as the related evaluation. We propose in chapter 6 a semi-supervised approach which uses the distributional spaces to label semantic rotes. We conclude this chapter with some considerations on the use of semantic roles in information retrieval and more specifically in the scope of question answering systems. The conclusion of our thesis summarizes our contributions. It emphasizes the fact that each step of our work uses syntactical distributional spaces and that it provides interesting results. This conclusion also draws the main perspectives we see to pursue our studies. The main and immediate concern is to integrate these semantic analysis modules into prototypes for textual documents search
Hadi, Abdine. "Leveraging Transformer-Based Language Models to Bridge the Gap Between Language and Specialized Domains." Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAX020.
Full textThe era of transformer-based language models has led the way in a new paradigm in Natural Language Processing (NLP), enabling remarkable performance across a wide range of tasks from both fields Natural Language Understanding (NLU) and Natural Language Generation (NLG). This dissertation delves into the transformative potential of transformer-based language models when applied to specialized domains and languages. It comprises four distinct research endeavors, each contributing to the overarching goal of enhancing language understanding and generation in specialized contexts.To address the scarcity of non-English pretrained language models in both general and specialized domains, we explore the creation of two language models JuriBERT and GreekBART. JuriBERT is a set of French legal domain-specific BERT models tailored to French text, catering to the needs of legal professionals. JuriBERT is evaluated on two French legal tasks from the court of cassation in France. The findings underscore that certain specialized tasks can be better addressed with smaller domain-specific models compared to their larger generic counterparts. We equally introduce GreekBART, the first Greek Seq2Seq model. Being based on BART, these models are particularly well-suited for generative tasks. We evaluate GreekBART's performance against other models on various discriminative tasks and assess its capabilities in NLG using two Greek generative tasks from GreekSUM, a novel dataset introduced in this research. We show GreekBART to be very competitive with state-of-the-art BERT-based multi-lingual and mono-lingual language models such as GreekBERT and XLM-R.We dive next into the domain of semantics by leveraging the transformer-based contextual embeddings to solve the challenging problem of Word Sense Induction (WSI). We propose a novel unsupervised method that utilizes invariant information clustering (IIC) and agglomerative clustering to enrich and cluster the target word representations. Extensive evaluation on two WSI tasks and multiple pretrained language models demonstrates the competitiveness of our approach compared to state-of-the-art baselines.Finally, we introduce Prot2Text framework, a multi-modal approach for generating proteins’ functions in free text by combining three modalities: protein structure, protein sequence and natural language. Prot2Text advances protein function prediction beyond traditional classifications. Integrating Graph Neural Networks (GNNs) and Large Language Models (LLMs) in an encoder-decoder framework. Empirical evaluation on a multi-modal protein dataset showcases the effectiveness of Prot2Text, offering powerful tools for function prediction in a wide range of proteins
Lauzière, Lucie. "Le sens ordinaire des mots comme règle d'interprétation." Thesis, University of Ottawa (Canada), 1986. http://hdl.handle.net/10393/4945.
Full textLopa, de Carvalho Alex. "Le rôle de la prosodie et des mots grammaticaux dans l'acquisition du sens des mots." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE072/document.
Full textPrevious research demonstrates that having access to the syntactic structure of sentences helps children to discover the meaning of novel words. This implies that infants need to get access to aspects of syntactic structure before they know many words. Since in all the world’s languages the prosodic structure of a sentence correlates with its syntactic structure, and since function words/morphemes are useful to determine the syntactic category of words, infants might use phrasal prosody and function words to bootstrap their way into lexical and syntactic acquisition. In this thesis, I empirically investigated the role of phrasal prosody and function words to constrain syntactic analysis in young children (PART 1) and whether infants exploit this information to learn the meanings of novel words (PART 2). In part 1, I constructed minimal pairs of sentences in French and in English, testing whether children exploit the relationship between syntactic and prosodic structures to drive their interpretation of noun-verb homophones. I demonstrated that preschoolers use phrasal prosody online to constrain their syntactic analysis. When listening to French sentences such as [La petite ferme][…–[The little farm][…, children interpreted ferme as a noun, but in sentences such as [La petite][ferme…] – [The little girl][closes…, they interpreted ferme as a verb (Chapter 3). This ability was also attested in English-learning preschoolers who listened to sentences such as ‘The baby flies…’: they used prosodic information to decide whether “flies” was a noun or a verb (Chapter 4). Importantly, in further studies I demonstrated that even infants around 20-months use phrasal prosody to recover syntactic structures and to predict the syntactic category of upcoming words (Chapter 5), an ability which would be extremely useful to discover the meaning of unknown words. This is what I tested in part 2: whether the syntactic information obtained from phrasal prosody and function words could allow infants to constrain their acquisition of word meanings. A first series of studies relied on right-dislocated sentences containing a novel verb in French: [ili dase], [le bébéi] - ‘hei is dasing, the babyi’ (meaning ‘the baby is dasing’) which is minimally different from the transitive sentence [il dase le bébé] (he is dasing the baby). 28-montholds were shown to exploit prosodic information to constrain their interpretation of the novel verb meaning (Chapter 6). In a second series of studies, I investigated whether phrasal prosody and function words constrain the acquisition of nouns and verbs. I used sentences like ‘Regarde la petite bamoule’, which can be produced either as [Regarde la petite bamoule!] - Look at the little bamoule!, where ‘bamoule’ is a noun, or as [Regarde], [la petite] [bamoule!] - Look, the little (one) is bamouling, where bamoule is a verb. 18-month-olds correctly parsed such sentences and attributed a noun or verb meaning to the critical word depending on its position within the syntactic-prosodic structure of the sentences (Chapter 7). Taken together, these studies show that infants exploit function words and the prosodic structure of an utterance to recover the sentences’ syntactic structure, which in turn constrains the possible meaning of novel words. This powerful mechanism might be extremely useful for infants to construct a first-pass syntactic structure of spoken sentences even before they know the meanings of many words. Although prosodic information and functional elements can surface differently across languages, our studies suggest that this information may represent a universal and extremely useful tool for infants to access syntactic information through a surface analysis of the speech stream, and to bootstrap their way into language acquisition
Temple, Martine. "Le sens des mots construits : pour un traitement dérivationnel associatif." Lille 3, 1993. http://www.theses.fr/1993LIL30001.
Full textThis study is an attempt to work a method to elaborate adequate semantic representations of constructed words. Il is postulated that one of the main functions of any semantic analysis and representation is to amount for the processes whereby linguistic units refer. On the basis of this principle a case is made out for the elaboration of an associative derivational model in morphology, where the analysis and description of constructed words are concerned. It can be shown that neither lexicographical parapharsing nor lexical semantic theories can provide adequate descriptions of the specific ways in which the referential categories referred to by constructed words are determined - thèse being dependent on the morphological structure of words - while it appears that the associative derivational treatment offerts satisfactory solutions where other treatments fail
Richet, Bertrand. "Les fractales du sens." Habilitation à diriger des recherches, Université de Nanterre - Paris X, 2011. http://tel.archives-ouvertes.fr/tel-00661997.
Full textErbén, Tova. "Une étude diachronique du suffixe -ard : un examen du sens de quelques mots médiévaux." Thesis, Stockholms universitet, Romanska och klassiska institutionen, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-146838.
Full textWang, Ying. "Élucidation du sens des mots nouveaux en lecture du français par des étudiants chinois." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp04/mq23741.pdf.
Full textDerycke, Marc. "Lecture(s) : comment le sens vient aux mots : de la sémiotique au champ freudien." Paris 8, 1990. http://www.theses.fr/1991PA080574.
Full textThis work studies the mechanisms by which meaning is disclosed in reading. The material comes from "errors" and mistakes observed in adults, especially during the initial stages of learning, the method is based on a semiotic description of the symbol's charachteristics for combination and substitution by mathematical functions, it introduces the principle of qualitative order in the desorder of the "errors", it presupposes a renewed theory on the acts of speech and meaning. Results : 1) meaning is produced by anticipationretroaction process articulated by a basic fundamental syntax with two directed operations 2) like a joke, this process is both consciousness and unconscious, this is wky it is masked 3) the "error" is a fragment of this process that the principle of reality has not obliterated 4) the syntax of symbolic operations is at root of the cognitive functioning of the psychic apparatus 5) readoption of the process of meaning attribution in the subject himself using the topology of the borromean knot to outline plans for a speech clinic in the freudian field
Wocke, Brendon John. "De la jouissance au "jouis-sens" : le jeu de mots dans l'oeuvre de Jacques Derrida." Perpignan, 2013. http://www.theses.fr/2013PERP1138.
Full textThe central thrust of this thesis lies in a study of wordplay in the work of Derrida, seen through the optic of textual “jouissance”. The starting point of the thesis is the realisation that it is perhaps better to consider Derrida’s writing in literary as opposed to in philosophical terms. In so doing this thesis opens with a study of Derrida’s writing in terms of, following Barthes, we can consider to be a “texte de jouissance,” a text which foregrounds an unexpected form of writing, breaking with tradition. The thesis undertakes an analysis of the manner in which Derrida’s wordplay evokes an esthetic of joy and of “jouissance”. This refers to both the notion of “jouis-sens” as understood by Lacan, though which we can understand (with Kristeva) the multiplication of meaning at the heart of Derrida’s expressive style, as well as to Derrida’s “poetic philosophy” which recalls Nietzsche’s “gay science. ” The ultimate question is that of an underlying philosophy of joy and of the pleasure of “poetic” expression which is implicitly articulated through Derrida’s wordplay. Far from being a purely idealistic philosophy, this thesis offers a vision of Derrida’s work as a philosophy concerned with the material aspects of textual production, Derrida’s style can thus be characterised as the expression a semiotic chora
Oubali, Ahmed. "Les avatars du sens dans la traduction française du Quichotte." Rennes 2, 1990. http://www.theses.fr/1990REN20010.
Full textThis work deals with the past French translations of Don Quixote of Cervantes and its main points are: the questions: are these translations faithful to the original? What linguistic theories has the translator worked out? The hypothesis: if the French translations are correct from the syntagmatic point of view, they present flagrant gaps from the paradigmatic point of view. These gaps are caused by the contrastive translation theory that deals only with linguistic codes. The answers: since the nature of the linguistic structures of Don Quixote is more semiotic than semantic, they need another model of translation: the hermeneutic one that is based on the interpretation of the ideas of the text. Verification : thanks to this model and its methodology, the untranslated items of Don Quixote have been faithfully interpretated, conclusion : formulated in this way, the present thesis is defined as a whole of pragmatic strategies aiming to realize an interpretation search on a new linguistic field unexplored in don Quixote, called here the inter-said. Through this model we aim to found a new practice of translation
Fargier, Raphaël. "Cerveau et sens des mots : de l’émergence à la flexibilité des représentations sémantiques dans le cerveau." Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10050/document.
Full textThe aim of this work was to determine the role of these sensori-motor regions in the development of meaning representation of novel words. In a first learning study that involved EEG recordings, analysis of brain oscillations revealed that listening to novel action words, but not novel visual words, after training led to the activation of motor regions. This activity which was similar to what was seen during action observation was however associated with an additional activity that seemed to reflect the recruitment of a convergence zone between language and motor brain regions mediating more underspecified motor information, rather than the motor events experienced during training. Furthermore, analysis of the ERPs revealed that category-specific effects could be observed rapidly: action words and visual words were associated with specific electro-cortical activities on fronto-central electrodes and occipito-parietal electrodes respectively. In a third EEG study, we observed that only verbal stimuli but not tones, that were associated with action execution during training, triggered activity in motor regions. Lexical items seem thus to provide a unique substrate to associate with the sensory-motor attributes of the referent. Finally, fine-grained analyses of kinematics revealed that the verbalization of an action word semantically congruent to the action (i.e. “grasp”) led to a facilitation of an object-directed grasping movement. The results obtained during this work indicate that word-meaning is represented in modality- specific brain regions and in convergence zones between language and motor brain regions that mediate underspecified information. The specificity for verbal stimuli tends to indicate a pre- wired neural system for the representation of word meaning. Finally, although semantic representations partly reflect perceptual and motor experiences associated with the acquisition of words, the present work points to a phenomenon that has always been assumed: a certain degree of abstraction in word-meaning representation
Ji, Hyungsuk. "Étude d'un modèle computationnel pour la représentation du sens des mots par intégration des relations de contexte." Phd thesis, Grenoble INPG, 2004. http://tel.archives-ouvertes.fr/tel-00008384.
Full textGiacobbe, Jorge. "Construction des mots et construction du sens : cognition et interaction dans l'acquisition du français par des adultes hispanophones." Paris 7, 1989. http://www.theses.fr/1989PA070073.
Full textThe three research bodies which form the central part of this dissertation (chapters 2-3-4) share the following features : they study the formulation and testing of hypoteses which the adult learner is led to build during the acquisition of a second language in a social setting. For developing this activity, the learners have to diversify their cognitive and linguistic means. That diversification is a source of contradictions within the microsystems which form their interlanguage, and makes their productions unstable. As a consequence of this double conflict, there is a peculiar dynamics in the acquisition process, which the speaker has to control for fulfilling his interactions with native speakers. In chapter 2 an attempt is made to show that lexicon is an hypothetical construct of the learner. Chapter 3, which bears on the construction of pronominal forms, focusses the analyses on the complementary roles of the internal contradictions within the systems under construction on the one hand, and the contradictions between the learners' productions and the productions of their native interlocutors on the other hand. In chapter 4, a learner is followed along his three first years of stay in the host country, and it is shown that the acquisition of motion verbs in a second language is likely to need a process of building anew notions which are part of the cognitive means of an adult
Wocke, Brendon [Verfasser], and Dorothee [Akademischer Betreuer] Kimmich. "DE LA JOUISSANCE AU "JOUIS-SENS" - Le jeu de mots dans l'œvre de Jacques Derrida / Brendon Wocke ; Betreuer: Dorothee Kimmich." Tübingen : Universitätsbibliothek Tübingen, 2017. http://d-nb.info/1168011450/34.
Full textBernal, Savita. "De l'arbre (syntaxique) au fruit (du sens) : interactions des acquisitions lexicale et syntaxique chez l'enfant de moins de 2 ans." Paris 6, 2007. http://www.theses.fr/2006PA066494.
Full textPierre, Sylviane. "Le développement du vocabulaire actif dans des textes d'écoliers : éléments d'ordre didactique." Lyon 2, 1995. http://www.theses.fr/1995LYO20032.
Full textThe study of the vocabulary development dictates a necessary distinction between passive and active vocabulary. Concerning the signification of words, this difference joins contextualised understanding with attribution of an independent generic meaning. This development is a complex phenomenon : - it depends on didactics and thematic register induced by the text subject; - it is related with the parallel syntactic maturity; - and it is also marked by a sexual differentiation of the production. The written sentence grows in adjectival and adverbial specification and also shows an increase of nominalisation with a proportional decrease of the use of verbs. Owing to a better mastering of syntax, the density and the variety of the lexic decrease where as the lexical volume increases
Apidianaki, Marianna. "Acquisition automatique de sens pour la désambiguïsation et la sélection lexicale en traduction." Phd thesis, Université Paris-Diderot - Paris VII, 2008. http://tel.archives-ouvertes.fr/tel-00322285.
Full textNous proposons une méthode d'acquisition de sens permettant d'établir des correspondances sémantiques de granularité variable entre les mots de deux langues en relation de traduction. L'induction de sens est effectuée par une combinaison d'informations distributionnelles et traductionnelles extraites d'un corpus bilingue parallèle. La méthode proposée étant à la fois non supervisée et entièrement fondée sur des données, elle est, par conséquent, indépendante de la langue et permet l'élaboration d'inventaires sémantiques relatifs aux domaines représentés dans les corpus traités.
Les résultats de cette méthode sont exploités par une méthode de désambiguïsation lexicale, qui attribue un sens à de nouvelles instances de mots ambigus en contexte, et par une méthode de sélection lexicale, qui propose leur traduction la plus adéquate. On propose finalement une évaluation pondérée des résultats de désambiguïsation et de sélection lexicale, en nous fondant sur l'inventaire construit par la méthode d'acquisition de sens.
Mbame, Nazaire. "Relations partie-tout : aspects ontologiques, phénoménologiques et lexico-sémantiques." Clermont-Ferrand 2, 2006. http://www.theses.fr/2006CLF20009.
Full textDommes, Aurélie. "La compréhension d'ambiguïtés lexicales présentées dans différents contextes phrastiques et discursifs chez des adultes jeunes et âgés : effets des contraintes contextuelles introduites et de la familiarité des sens des mots ambigus." Paris 10, 2006. http://www.theses.fr/2006PA100096.
Full textOur researches studied the comprehension of sentences and utterances containing lexical ambiguities by younger and older adults according to the constraints involved by context and the familiarity of the meanings of the ambiguous words in the two age groups. The results indicated that older adults tend to rely more than the younger ones on the contextual constraints to access the appropriate meaning of the ambiguous word. In both groups, the dominant meaning of the ambiguous word seemed to be available at an early processing stage, independently of the contextual constraints. In the younger group, the dominant meaning activation appeared to decrease with time when this sense revealed to be incompatible with context. That decrease was attributed to the efficiency of the inhibitory mechanisms. However, the temporal pattern observed in the older group seemed to indicate that the suppression mechanisms are altered, especially when the meaning to be inhibited is the dominant sense
Mercure, Évelyne. "Dynamique interhémisphérique dans le traitement du sens métaphorique des mots." Thèse, 2004. http://hdl.handle.net/1866/14893.
Full textMejía-Constaín, Beatriz. "Vieillissement et réorganisation neurofonctionnelle pour le traitement du sens métaphorique des mots." Thèse, 2010. http://hdl.handle.net/1866/4858.
Full textGiven the significant increase in life expectancy of the general population observed in recent decades, the study of alterations in cognitive functions during normal and pathological ageing is of great importance. The results reported in this thesis contribute to a better understanding of the nature of the age-related changes on processing metaphoric meaning of words and the phenomenon of functional reorganization underlying these processes. After a brief literature review (chapter 1), a first article offering a general view of the problem of language processing in normal aging introduces the series of studies presented in this thesis. This article, presented in Chapter 2, points out the importance of developing specific protocols aiming to establish a link between the different hypotheses concerning cognitive changes during normal aging and those related to changes in neurobiological substrate of language. Chapter 3 presents a behavioural study aiming to assess the availability of attentional resources for the phonological and semantic processing of words and its possible evolution with age. The findings of this study are consistent with the idea of an age- related restriction of available attentional resources for the processing of metaphorical meaning of words. Chapter 4 presents a neuroimaging study. This study was conducted to compare patterns of brain activation of young and older participants during the processing of metaphoric meaning of words. The results emphasize that both, younger and older participants, require the sharing of attentional resources during processing metaphorical meaning of words, but show a functional reorganization in the older group. Taken together, the studies presented here support the hypothesis of an age-related restriction of available attentional resources and of an age-related functional reorganization for the processing of metaphorical meaning of words. The results enrich our understanding of neurocognitive aging models regarding the evolution of neurobiological bases of language.
Brosseau-Villeneuve, Bernard. "Désambiguisation de sens par modèles de contextes et son application à la Recherche d’Information." Thèse, 2010. http://hdl.handle.net/1866/5070.
Full textIt is known that the ambiguity present in natural language has a negative effect on Information Retrieval (IR) systems effectiveness. However, up to now, the efforts made to integrate Word Sense Disambiguation (WSD) techniques in IR systems have not been successful. Past studies end up with either poor or unconvincing results. Furthermore, investigations based on the addition of artificial ambiguity shows that a very high disambiguation accuracy would be needed in order to observe gains. This thesis has for objective to develop efficient and effective approaches for WSD, using co-occurrence statistics in order to build context models. Such models could then be used in order to do a word sense discrimination between a query and documents of a collection. In this two-part thesis, we will start by investigating the principle of strength of relation between a word and the words present in its context, proposing an approach to learn a function mapping word distance to count weights. This method is based on the idea that context models made from random samples of word in context should be similar. Experiments in English and Japanese shows that the strength of relation roughly follows a negative power law. The weights resulting from the experiments are then used in the construction of Naïve Bayes WSD systems. Evaluations of these systems in English with the Semeval-2007 English Lexical Sample (ELS), and then in Japanese with the Semeval-2010 Japanese WSD (JWSD) tasks shows that the systems have state-of-the-art accuracy even though they are much lighter and don't rely on linguistic tools or resources. The second part of this thesis aims to adapt the new methods to IR applications. Such applications put heavy constraints on performance and available resources. We thus propose the use of corpus-based latent context models based on Latent Dirichlet Allocation (LDA). The models are combined with the query likelihood Language Model (LM) approach for IR. Evaluating the systems on three collections from the Text REtrieval Conference (TREC), we observe average proportional improvement in the range of 12% in MAP and 23% in GMAP. We then observe that the gains are mostly made on hard queries, augmenting the robustness of the results. To our knowledge, these experiments are the first positive application of WSD techniques on standard IR tasks.
Jakubina, Laurent. "Induction de lexiques bilingues à partir de corpus comparables et parallèles." Thèse, 2017. http://hdl.handle.net/1866/20488.
Full text