Tesis sobre el tema "Génération de langage naturel"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Génération de langage naturel".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Patoz, Evelyne. "Génération de représentations topologiques à partir de requêtes en langage naturel". Besançon, 2006. http://www.theses.fr/2006BESA1031.
Texto completoFrom the reasoning’ study and the visual perceptions abilities that use a human being for locating in the space, we elaborate an example theoretic allowing a computing system to situate an object in the space by means of linguistics signs. For this fact, the rule of linguistic activity is studying in his constructive rule of the spatial representation, but also to the other cognitive effect, is revealed as essential: the visual perception. The visual perception resting in a huge part on the products informations in function of an observer’ knowledges of the universe, the interpretation can conduct to a mental representation. The notion of representation so is linked up to a reality of objects that existence by itself depends of the perceptive aptitude of a special individual. The representation is no more examined like a construction for a well-done configuration, but relative to an environmental perception. We can show that the dynamic generation for a spatial representation depend of parameters, which the more important factor is the identification of a point of reference. We can develop a logical application, integrating a speech factor, that permit to a user to directing a robot in an area, and thus to give an account to the state of the world how it can evaluate
Popesco, Liana. "Analyse et génération de textes à partir d'un seul ensemble de connaissances pour chaque langue naturelle et de meta-règles de structuration". Paris 6, 1986. http://www.theses.fr/1986PA066138.
Texto completoPetitjean, Simon. "Génération modulaire de grammaires formelles". Thesis, Orléans, 2014. http://www.theses.fr/2014ORLE2048/document.
Texto completoThe work presented in this thesis aim at facilitating the development of resources for natural language processing. Resources of this type take different forms, because of the existence of several levels of linguistic description (syntax, morphology, semantics, . . . ) and of several formalisms proposed for the description of natural languages at each one of these levels. The formalisms featuring different types of structures, a unique description language is not enough: it is necessary to create a domain specific language (or DSL) for every formalism, and to implement a new tool which uses this language, which is a long a complex task. For this reason, we propose in this thesis a method to assemble in a modular way development frameworks specific to tasks of linguistic resource generation. The frameworks assembled thanks to our method are based on the fundamental concepts of the XMG (eXtensible MetaGrammar) approach, allowing the generation of tree based grammars. The method is based on the assembling of a description language from reusable bricks, and according to a unique specification file. The totality of the processing chain for the DSL is automatically assembled thanks to the same specification. In a first time, we validated this approach by recreating the XMG tool from elementary bricks. Some collaborations with linguists also brought us to assemble compilers allowing the description of morphology and semantics
Balicco, Laurence. "Génération de repliques en français dans une interface homme-machine en langue naturelle". Grenoble 2, 1993. http://www.theses.fr/1993GRE21025.
Texto completoThis research takes place in the context of natural language generation. This field has benn neglected for a long time because it seemed a much easier phase that those of analysis. The thesis corresponds to a first work on generation in the criss team and places the problem of generation in the context of a manmachine dialogue in natural language. Some of its consequences are : generation from a logical content to be translated into natural language, this translation of the original content kept as close as possible,. . . After the study of the different works that have been done, we decided to create our own generation system, resusing when it is possible, the tools elaborated during the analyzing process. This generation process is based on a linguistic model, which uses syntactic and morphologic information and in which linguistic transformations called operations are defined (coodination, anaphorisation, thematisation,. . . ). These operations can be given by the dialogue or calulated during the generation process. The model allows the creation of several of the same utterance and therefore a best adaptation for different users. This thesis presents the studied works, essentially on the french and the english languages, the linguistic model developped, the computing model used, and a brief presentation of an european project which offers a possible application of ou
Belec, Yves. "Des règles expertes pour une méthode applicative d'analyse ou de génération du langage naturel". Toulouse 3, 1990. http://www.theses.fr/1990TOU30136.
Texto completoStriegnitz, Kristina. "Génération d'expressions anaphoriques : Raisonnement contextuel et planification de phrases". Nancy 1, 2004. http://www.theses.fr/2004NAN10186.
Texto completoThis thesis investigates the contextual reasoning involved in the production of anaphoric expressions in natural language generation systems. More specifically, I propose generation strategies for two types of discourse anaphora which have not been treated in generation before: bridging descriptions and additive particles. To this end the contextual conditions that govern the use of these expressions have to be formalized. The formalization that I propose is based on notions from linguistics and extends previous approaches to the generation of co-referential anaphora. I then specify the reasoning tasks that have to be carried out in order to check the contextual conditions. I describe how they can be implemented using a state-of-the-art reasoning system for description logics, and I compare my proposal to alternative approaches using other kinds of reasoning tools. Finally, I describe an experimental implementation of the proposed approach
Namer, Fiammetta. "Pronominalisation et effacement du sujet en génération automatique de textes en langues romanes". Paris 7, 1990. http://www.theses.fr/1990PA077249.
Texto completoHankach, Pierre. "Génération automatique de textes par satisfaction de contraintes". Paris 7, 2009. http://www.theses.fr/2009PA070027.
Texto completoWe address in this thesis the construction of a natural language generation System - computer software that transforms a formal representation of information into a text in natural language. In our approach, we define the generation problem as a constraint satisfaction problem (CSP). The implemented System ensures an integrated processing of generation operations as their different dependencies are taken into account and no priority is given to any type of operation over the others. In order to define the constraint satisfaction problem, we represent the construction operations of a text by decision variables. Individual operations that implement the same type of minimal expressions in the text form a generation task. We classify decision variables according to the type of operations they represent (e. G. Content selection variables, document structuring variables. . . ). The linguistic rules that govern the operations are represented as constraints on the variables. A constraint can be defined over variables of the same type or different types, capturing the dependency between the corresponding operations. The production of a text consists of resolving the global System of constraints, that is finding an evaluation of the variables that satisfies all the constraints. As part of the grammar of constraints for generation, we particularly formulate the constraints that govern document structuring operations. We model by constraints the rhetorical structure of SORT in order to yield coherent texts as the generator's output. Beforehand, in order to increase the generation capacities of our System, we extend the rhetorical structure to cover texts in the non-canonical order. Furthermore, in addition to defining these coherence constraints, we formulate a set of constraints that enables controlling the form of the macrostructure by communicative goals. Finally, we propose a solution to the problem of computational complexity of generating large texts. This solution is based on the generation of a text by groups of clauses. The problem of generating a text is therefore divided into many problems of reduced complexity, where each of them is concerned with generating a part of the text. These parts are of limited size so the associated complexity to their generation remains reasonable. The proposed partitioning of generation is motivated by linguistic considerations
Membrado, Miguel. "Génération d'un système conceptuel écrit en langage de type semi-naturel en vue d'un traitment des données textuelles : application au langage médical". Paris 11, 1989. http://www.theses.fr/1989PA112004.
Texto completoWe present our research and our own realization on a KBMS (Knowledge Based Management System) aiming at processing any kind of data, especially textual data, and the related knowledge. In this field of applied Artificial Intelligence, we propose a way for representing knowledge : to describe it in a semi-natural language able as well to describe structures or relations as rules. Knowledge is managed as conceptual definitions figuring in a dictionary which represents the knowledge base. The power of this language allows to process a lot of ambiguities, especially those coming from contextual polysemia, to deal with metonymia or incomplete knowledge, and to solve several kinds of paraphrases. Simultaneous polyhierarchies as well as chunks are taken into account. The system has been specially studied for automatic processing of medical reports. An application to neuro radiology has been taken as example. But it could be applied as well to any other field, included outside Medecine to any professional field. Text analysis is realized in two steps : first a conceptual extraction, secondly a structural analysis. The first step only is taken into account in this thesis. It aims at retrieving pertinent documents, matching them to the given question by comparison between concepts, not between character strings. An overview of the second step will be presented. The final goal is to be able to retrieve the knowledge contained into the texts, i. E. The data themselves, and to manage it in respect to the knowledge represented into the dictionaries
Popescu, Vladimir. "Formalisation des contraintes pragmatiques pour la génération des énoncés en dialogue homme-machine multi-locuteurs". Phd thesis, Grenoble INPG, 2008. http://www.theses.fr/2008INPG0175.
Texto completoWe have developed a framework for controlling utterance generation in multi-party human-computer dialogue. This process takes place in four stages: (i) the rhetorical structure for the dialogue is computed, by using an emulation of SDRT ("Segmented Discourse Representation Theory"); (ii) this structure is used for computing speakers' commitments; these commitments are used for driving the process of adjusting the illocutionary force degree of the utterances; (iii) the commitments are filtered and placed in a stack for each speaker; these stacks are used for performing semantic ellipses; (iv) the discourse structure drives the choice of concessive connectors (mais, quand même, pourtant and bien que) between utterances; to do this, the utterances are ordered from an argumentative viewpoint
Popescu, Vladimir. "Formalisation des contraintes pragmatiques pour la génération des énoncés en dialogue homme-machine multi-locuteurs". Phd thesis, Grenoble INPG, 2008. http://tel.archives-ouvertes.fr/tel-00343846.
Texto completoPonton, Claude (1966. "Génération automatique de textes en langue naturelle : essai de définition d'un système noyau". Grenoble 3, 1996. http://www.theses.fr/1996GRE39030.
Texto completoOne of the common features with many generation systems is the strong dependence on the application. If few definition attempts of "non dedicated" systems have been realised, none of them permis to take into account the application characteristics (as its formalism) and the communication context (application field, user,. . . ). The purpose of this thesis is the definition of a generation system both non dedicated and permitting to take into account these elements. Such a system is called a "kernel generation system". In this perspective, we have studied 94 generation systems through objective relevant criteria. This study is used as a basis in the continuation of our work. The definition of a kernel generator needs the determination of the frontier between the application and the kernel generation (generator tasks, inputs, outputs, data,. . . ). Effectively, it is necessary to be aware of the role of both parts and their communication ways before designing the kernel generator. It results of this study that our generator considers as input any formal content representation as well as a set of constraints describing the communication context. The kernel generator then processes what is generally called the "how to say it?" and is able to produce every solutions according to the input constraints. This definition part is followed by the achievement of a first generator prototype which has been tested through two applications distinct in all respects (formalism, field, type of texts,. . . ). Finally, this work opens out on some evolution perspectives for the generator particulary on knowledge representation formalism (cotopies d'objets) and on architecture (distributed architecture)
Pouchot, Stéphanie. "L'analyse de corpus et la génération automatique de texte : méthodes et usages". Grenoble 3, 2003. http://www.theses.fr/2003GRE39006.
Texto completoRaynaud, Tanguy. "Génération de questions à choix multiples thématiques à partir de bases de connaissances". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSES066.
Texto completoThe use of multiple choice questions to assess knowledge is a reliable and widely used method, even in official contexts. Such a method offers many advantages, including equality of marking between candidates, or, more pragmatically, the possibility of automatic correction.With the emergence of MOOCs (courses delivered in a digital format), the need for automatic evaluation has increased. The scope of this thesis is part of this context, by proposing a solution that enables automatic thematic question generation.The work presented in this thesis uses knowledge bases as data sources to automatically generate thematic multiple-choice questions.The use of knowledge bases in this context thus raises several scientific challenges that constitute the contributions of the presented work:- Knowledge base entities are generally not explicitly correlated to themes. This thesis presents a method based on Wikipedia metadata to identify and sort knowledge base entities according to predefined themes.- In order to be intelligible, a question must be grammatically correct, and must include enough information to remove any ambiguity about the correct answer. To that end, we have introduced question templates to identify entities within knowledge bases and generate natural language statements.- In a multiple choice questions, distractors (wrong answers) are no less important than the statement. Wrong distractors are easilly discarded and affect the whole question difficulty. In a last contribution, we present the method used to select distractors that are not only relevant to the question's statement, but also to its context
Pho, Van-Minh. "Génération automatique de questionnaires à choix multiples pédagogiques : évaluation de l'homogénéité des options". Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112192/document.
Texto completoRecent years have seen a revival of Intelligent Tutoring Systems. In order to make these systems widely usable by teachers and learners, they have to provide means to assist teachers in their task of exercise generation. Among these exercises, multiple-choice tests are very common. However, writing Multiple-Choice Questions (MCQ) that correctly assess a learner's level is a complex task. Guidelines were developed to manually write MCQs, but an automatic evaluation of MCQ quality would be a useful tool for teachers.We are interested in automatic evaluation of distractor (wrong answer choice) quality. To do this, we studied characteristics of relevant distractors from multiple-choice test writing guidelines. This study led us to assume that homogeneity between distractors and answer is an important criterion to validate distractors. Homogeneity is both syntactic and semantic. We validated the definition of homogeneity by a MCQ corpus analysis, and we proposed methods for automatic recognition of syntactic and semantic homogeneity based on this analysis.Then, we focused our work on distractor semantic homogeneity. To automatically estimate it, we proposed a ranking model by machine learning, combining different semantic homogeneity measures. The evaluation of the model showed that our method is more efficient than existing work to estimate distractor semantic homogeneity
Boulanger, Hugo. "Data augmentation and generation for natural language processing". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG019.
Texto completoMore and more fields are looking to automate part of their process.Automatic language processing contains methods for extracting information from texts.These methods can use machine learning.Machine learning requires annotated data to perform information extraction.Applying these methods to new domains requires obtaining annotated data related to the task.In this thesis, our goal is to study generation methods to improve the performance of learned models with low amounts of data.Different methods of generation are explored that either contain machine learning or do not, which are used to generate the data needed to learn sequence labeling models.The first method explored is pattern filling.This data generation method generates annotated data by combining sentences with slots, or patterns, with mentions.We have shown that this method improves the performance of labeling models with tiny amounts of data.The amount of data needed to use this method is also studied.The second approach tested is the use of language models for text generation alongside a semi-supervised learning method for tagging.The semi-supervised learning method used is tri-training and is used to add labels to the generated data.The tri-training is tested on several generation methods using different pre-trained language models.We proposed a version of tri-training called generative tri-training, where the generation is not done in advance but during the tri-training process and takes advantage of it.The performance of the models trained during the semi-supervision process and of the models trained on the data generated by it are tested.In most cases, the data produced match the performance of the models trained with the semi-supervision.This method allows the improvement of the performances at all the tested data levels with respect to the models without augmentation.The third avenue of study combines some aspects of the previous approaches.For this purpose, different approaches are tested.The use of language models to do sentence replacement in the manner of the pattern-filling generation method is unsuccessful.Using a set of data coming from the different generation methods is tested, which does not outperform the best method.Finally, applying the pattern-filling method to the data generated with the tri-training is tested and does not improve the results obtained with the tri-training.While much remains to be studied, we have highlighted simple methods, such as pattern filling, and more complex ones, such as the use of supervised learning with sentences generated by a language model, to improve the performance of labeling models through the generation of annotated data
Boutouhami, Sara. "Un système de générations de descriptions argumentées". Paris 13, 2010. http://www.theses.fr/2010PA132014.
Texto completoIn this thesis, we investigate the expression of arguments in natural language (NL). Our work has two motivations: theoretical motivation is to understand and simulate the sense of reasoning underlies the argumentative process and clarify the intuition that distinguishes between good and bad arguments, and a practical motivation: helping eventually, assistance in writing text descriptions "good" reasoned. The objective of this thesis is the realization of a system that can generate a description that is argued better for one of the protagonists of the accident. In this work, we cooperate in various ways within the same architecture as well as the reasoning component language. The idea is to take advantage of advanced artificial intelligence in terms of formalization of reasoning to reproduce a basic form of argument used by people everyday and who draws much of its force in the flexible and subjectivity of the LN. For knowledge representation and reasoning, we defined a language of first order reified which takes into account some useful terms, the temporal information and non-monotonic inferences expressed using a fragment of logic Reiter defects. For implementation, we used the paradigm Answer Set Programming by translating our rules of inference in extended logic programs expressed in the languages models. Finally, to validate the quality of the descriptions generated by our system, we used a psychological experience with the help of specialists in cognitive psychology. The results of this experiment are encouraging and have confirmed the overall relevance of the argumentative strategies that we simulated
Bourcier, Frédéric. "Représentation des connaissances pour la résolution de problèmes et la génération d'explications en langue naturelle : contribution au projet AIDE". Compiègne, 1996. http://www.theses.fr/1996COMPD903.
Texto completoShimorina, Anastasia. "Natural Language Generation : From Data Creation to Evaluation via Modelling". Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0080.
Texto completoNatural language generation is a process of generating a natural language text from some input. This input can be texts, documents, images, tables, knowledge graphs, databases, dialogue acts, meaning representations, etc. Recent methods in natural language generation, mostly based on neural modelling, have yielded significant improvements in the field. Despite this recent success, numerous issues with generation prevail, such as faithfulness to the source, developing multilingual models, few-shot generation. This thesis explores several facets of natural language generation from creating training datasets and developing models to evaluating proposed methods and model outputs. In this thesis, we address the issue of multilinguality and propose possible strategies to semi-automatically translate corpora for data-to-text generation. We show that named entities constitute a major stumbling block in translation exemplified by the English-Russian translation pair. We proceed to handle rare entities in data-to-text modelling exploring two mechanisms: copying and delexicalisation. We demonstrate that rare entities strongly impact performance and that the impact of these two mechanisms greatly varies depending on how datasets are constructed. Getting back to multilinguality, we also develop a modular approach for shallow surface realisation in several languages. Our approach splits the surface realisation task into three submodules: word ordering, morphological inflection and contraction generation. We show, via delexicalisation, that the word ordering component mainly depends on syntactic information. Along with the modelling, we also propose a framework for error analysis, focused on word order, for the shallow surface realisation task. The framework enables to provide linguistic insights into model performance on the sentence level and identify patterns where models underperform. Finally, we also touch upon the subject of evaluation design while assessing automatic and human metrics, highlighting the difference between the sentence-level and system-level type of evaluation
Pilana, Liyanage Vijini. "Detection of automatically generated academic Content". Electronic Thesis or Diss., Paris 13, 2024. http://www.theses.fr/2024PA131014.
Texto completoIn this thesis, we have focused our interest on identifying technologies /methodologies in detecting artificially generated academic content. The principal contributions of this thesis are threefold. First, we built several corpora that are composed of machine generated academic text. In this task we utilized several latest NLG models for the generation task. These corpora contain contents that are fully generated as well as contents that are composed in a hybrid manner (with human intervention). Then, we employed several statistical as well as deep learning models for the detection of generated contents from original (human written) content. In this scenario, we considered detection as a binary classification task. Thus several SOTA classification models were employed. The models were improved or modified using ensembling techniques to gain higher accuracies in detection. Moreover, we made use of several latest detection tools to identify their capability in distinguishing machine generated text. Finally, the generated corpora were tested against knowledge bases to find any mismatches that could help to improve the detection task. The results of this thesis underline the importance of mimicking human behavior in leveraging the generation models as well of using realistic and challenging corpora in future research aimed at detecting artificially generated text. Finally, we would like to highlight the fact that no matter how advanced the technology is, it is always crucial to concentrate on the ethical aspect of making use of such technology
Perez, Laura Haide. "Génération automatique de phrases pour l'apprentissage des langues". Electronic Thesis or Diss., Université de Lorraine, 2013. http://www.theses.fr/2013LORR0062.
Texto completoIn this work, we explore how Natural Language Generation (NLG) techniques can be used to address the task of (semi-)automatically generating language learning material and activities in Camputer-Assisted Language Learning (CALL). In particular, we show how a grammar-based Surface Realiser (SR) can be usefully exploited for the automatic creation of grammar exercises. Our surface realiser uses a wide-coverage reversible grammar namely SemTAG, which is a Feature-Based Tree Adjoining Grammar (FB-TAG) equipped with a unification-based compositional semantics. More precisely, the FB-TAG grammar integrates a flat and underspecified representation of First Order Logic (FOL) formulae. In the first part of the thesis, we study the task of surface realisation from flat semantic formulae and we propose an optimised FB-TAG-based realisation algorithm that supports the generation of longer sentences given a large scale grammar and lexicon. The approach followed to optimise TAG-based surface realisation from flat semantics draws on the fact that an FB-TAG can be translated into a Feature-Based Regular Tree Grammar (FB-RTG) describing its derivation trees. The derivation tree language of TAG constitutes a simpler language than the derived tree language, and thus, generation approaches based on derivation trees have been already proposed. Our approach departs from previous ones in that our FB-RTG encoding accounts for feature structures present in the original FB-TAG having thus important consequences regarding over-generation and preservation of the syntax-semantics interface. The concrete derivation tree generation algorithm that we propose is an Earley-style algorithm integrating a set of well-known optimisation techniques: tabulation, sharing-packing, and semantic-based indexing. In the second part of the thesis, we explore how our SemTAG-based surface realiser can be put to work for the (semi-)automatic generation of grammar exercises. Usually, teachers manually edit exercises and their solutions, and classify them according to the degree of dificulty or expected learner level. A strand of research in (Natural Language Processing (NLP) for CALL addresses the (semi-)automatic generation of exercises. Mostly, this work draws on texts extracted from the Web, use machine learning and text analysis techniques (e.g. parsing, POS tagging, etc.). These approaches expose the learner to sentences that have a potentially complex syntax and diverse vocabulary. In contrast, the approach we propose in this thesis addresses the (semi-)automatic generation of grammar exercises of the type found in grammar textbooks. In other words, it deals with the generation of exercises whose syntax and vocabulary are tailored to specific pedagogical goals and topics. Because the grammar-based generation approach associates natural language sentences with a rich linguistic description, it permits defining a syntactic and morpho-syntactic constraints specification language for the selection of stem sentences in compliance with a given pedagogical goal. Further, it allows for the post processing of the generated stem sentences to build grammar exercise items. We show how Fill-in-the-blank, Shuffle and Reformulation grammar exercises can be automatically produced. The approach has been integrated in the Interactive French Learning Game (I-FLEG) serious game for learning French and has been evaluated both based in the interactions with online players and in collaboration with a language teacher
Fan, Huihui. "Text Generation with and without Retrieval". Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0164.
Texto completoEvery day we write --- from sending your mother a quick text to drafting a scientific article such as this thesis. The writing we do often goes hand-in-hand with automated assistance. For example, modern instant messaging software often suggests what word to write next, emails can be started with an autocomposer, and essays are improved with machine-suggested edits. These technologies are powered by years of research on text generation, a natural language processing field with the goal of automatically producing fluent, human-readable natural language. At a small scale, text generation systems can generate individual words or sentences, but have wide-reaching applications beyond that. For instance, systems for summarization, dialogue, and even the writing of entire Wikipedia articles are grounded in foundational text generation technology.Producing fluent, accurate, and useful natural language faces numerous challenges. Recent advances in text generation, principally leveraging training neural network architectures on large datasets, have significantly improved the surface-level readability of machine-generated text. However, current systems necessitate improvement along numerous axes, including generation beyond English and writing increasingly longer texts. While the field has seen rapid progress, much research focus has been directed towards the English language, where large-scale training and evaluation datasets for various tasks are readily available. Nevertheless, applications from autocorrect to autocomposition of text should be available universally. After all, by population, the majority of the world does not write in English. In this work, we create text generation systems for various tasks with the capability of incorporating languages beyond English, either as algorithms that easily extend to new languages or multilingual models encompassing up to 20 languages in one model.Beyond our work in multilingual text generation, we focus on a critical piece of generation systems: knowledge. A pre-requisite to writing well is knowing what to write. This concept of knowledge is incredibly important in text generation systems. For example, automatically writing an entire Wikipedia article requires extensive research on that article topic. The instinct to research is often intuitive --- decades ago people would have gone to a library, replaced now by the information available on the World Wide Web. However, for automated systems, the question is not only what knowledge to use to generate text, but also how to retrieve that knowledge and best utilize it to achieve the intended communication goal.We face the challenge of retrieval-based text generation. We present several techniques for identifying relevant knowledge at different scales: from local knowledge available in a paragraph to sifting through Wikipedia, and finally identifying the needle-in-the-haystack on the scale of the full web. We describe neural network architectures that can perform large-scale retrieval efficiently, utilizing pre-computation and caching mechanisms. Beyond how to retrieve knowledge, we further investigate the form the knowledge should take --- from natural language such as Wikipedia articles or text on the web to structured inputs in the form of knowledge graphs. Finally, we utilize these architectures in novel, much more challenging tasks that push the boundaries of where text generation models work well today: tasks that necessitate knowledge but also require models to produce long, structured natural language output, such as answering complex questions or writing full Wikipedia articles
Baez, miranda Belen. "Génération de récits à partir de données ambiantes". Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAM049/document.
Texto completoStories are a communication tool that allow people to make sense of the world around them. It represents a platform to understand and share their culture, knowledge and identity. Stories carry a series of real or imaginary events, causing a feeling, a reaction or even trigger an action. For this reason, it has become a subject of interest for different fields beyond Literature (Education, Marketing, Psychology, etc.) that seek to achieve a particular goal through it (Persuade, Reflect, Learn, etc.).However, stories remain underdeveloped in Computer Science. There are works that focus on its analysis and automatic production. However, those algorithms and implementations remain constrained to imitate the creative process behind literary texts from textual sources. Thus, there are no approaches that produce automatically stories whose 1) the source consists of raw material that passed in real life and 2) and the content projects a perspective that seeks to convey a particular message. Working with raw data becomes relevant today as it increase exponentially each day through the use of connected devices.Given the context of Big Data, we present an approach to automatically generate stories from ambient data. The objective of this work is to bring out the lived experience of a person from the data produced during a human activity. Any areas that use such raw data could benefit from this work, for example, Education or Health. It is an interdisciplinary effort that includes Automatic Language Processing, Narratology, Cognitive Science and Human-Computer Interaction.This approach is based on corpora and models and includes the formalization of what we call the activity récit as well as an adapted generation approach. It consists of 4 stages: the formalization of the activity récit, corpus constitution, construction of models of activity and the récit, and the generation of text. Each one has been designed to overcome constraints related to the scientific questions asked in view of the nature of the objective: manipulation of uncertain and incomplete data, valid abstraction according to the activity, construction of models from which it is possible the Transposition of the reality collected though the data to a subjective perspective and rendered in natural language. We used the activity narrative as a case study, as practitioners use connected devices, so they need to share their experience. The results obtained are encouraging and give leads that open up many prospects for research
Perez, Laura Haide. "Génération automatique de phrases pour l'apprentissage des langues". Thesis, Université de Lorraine, 2013. http://www.theses.fr/2013LORR0062/document.
Texto completoIn this work, we explore how Natural Language Generation (NLG) techniques can be used to address the task of (semi-)automatically generating language learning material and activities in Camputer-Assisted Language Learning (CALL). In particular, we show how a grammar-based Surface Realiser (SR) can be usefully exploited for the automatic creation of grammar exercises. Our surface realiser uses a wide-coverage reversible grammar namely SemTAG, which is a Feature-Based Tree Adjoining Grammar (FB-TAG) equipped with a unification-based compositional semantics. More precisely, the FB-TAG grammar integrates a flat and underspecified representation of First Order Logic (FOL) formulae. In the first part of the thesis, we study the task of surface realisation from flat semantic formulae and we propose an optimised FB-TAG-based realisation algorithm that supports the generation of longer sentences given a large scale grammar and lexicon. The approach followed to optimise TAG-based surface realisation from flat semantics draws on the fact that an FB-TAG can be translated into a Feature-Based Regular Tree Grammar (FB-RTG) describing its derivation trees. The derivation tree language of TAG constitutes a simpler language than the derived tree language, and thus, generation approaches based on derivation trees have been already proposed. Our approach departs from previous ones in that our FB-RTG encoding accounts for feature structures present in the original FB-TAG having thus important consequences regarding over-generation and preservation of the syntax-semantics interface. The concrete derivation tree generation algorithm that we propose is an Earley-style algorithm integrating a set of well-known optimisation techniques: tabulation, sharing-packing, and semantic-based indexing. In the second part of the thesis, we explore how our SemTAG-based surface realiser can be put to work for the (semi-)automatic generation of grammar exercises. Usually, teachers manually edit exercises and their solutions, and classify them according to the degree of dificulty or expected learner level. A strand of research in (Natural Language Processing (NLP) for CALL addresses the (semi-)automatic generation of exercises. Mostly, this work draws on texts extracted from the Web, use machine learning and text analysis techniques (e.g. parsing, POS tagging, etc.). These approaches expose the learner to sentences that have a potentially complex syntax and diverse vocabulary. In contrast, the approach we propose in this thesis addresses the (semi-)automatic generation of grammar exercises of the type found in grammar textbooks. In other words, it deals with the generation of exercises whose syntax and vocabulary are tailored to specific pedagogical goals and topics. Because the grammar-based generation approach associates natural language sentences with a rich linguistic description, it permits defining a syntactic and morpho-syntactic constraints specification language for the selection of stem sentences in compliance with a given pedagogical goal. Further, it allows for the post processing of the generated stem sentences to build grammar exercise items. We show how Fill-in-the-blank, Shuffle and Reformulation grammar exercises can be automatically produced. The approach has been integrated in the Interactive French Learning Game (I-FLEG) serious game for learning French and has been evaluated both based in the interactions with online players and in collaboration with a language teacher
Hadjadj, Mohammed. "Modélisation de la Langue des Signes Française : Proposition d’un système à compositionalité sémantique". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS560/document.
Texto completoThe recognition of French Sign Language (LSF) as a natural language in 2005 has created an important need for the development of tools to make information accessible to the deaf public. With this prospect, this thesis aims at linguistic modeling for a system of generation of LSF. We first present the different linguistic approaches aimed at describing the sign language (SL). We then present the models proposed in computer science. In a second step, we propose an approach allowing to take into account the linguistic properties of the SL while respecting the constraints of a formalisation process.By studying the links between semantic functions and their observed forms in LSF Corpora, we have identified several production rules. We finally present the rule functioning as a system capable of modeling an entire utterance in LSF
Cripwell, Liam. "Controllable and Document-Level Text Simplification". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0186.
Texto completoText simplification is a task that involves rewriting a text to make it easier to read and understand for a wider audience, while still expressing the same core meaning. This has potential benefits for disadvantaged end-users (e.g. non-native speakers, children, the reading impaired), while also showing promise as a preprocessing step for downstream NLP tasks. Recent advancement in neural generative models have led to the development of systems that are capable of producing highly fluent outputs. However, these end-to-end systems often rely on training corpora to implicitly learn how to perform the necessary rewrite operations. In the case of simplification, these datasets are lacking in both quantity and quality, with most corpora either being very small, automatically constructed, or subject to strict licensing agreements. As a result, many systems tend to be overly conservative, often making no changes to the original text or being limited to the paraphrasing of short word sequences without substantial structural modifications. Furthermore, most existing work on text simplification is limited to sentence-level inputs, with attempts to iteratively apply these approaches to document-level simplification failing to coherently preserve the discourse structure of the document. This is problematic, as most real-world applications of text simplification concern document-level texts. In this thesis, we investigate strategies for mitigating the conservativity of simplification systems while promoting a more diverse range of transformation types. This involves the creation of new datasets containing instances of under-represented operations and the implementation of controllable systems capable of being tailored towards specific transformations and simplicity levels. We later extend these strategies to document-level simplification, proposing systems that are able to consider surrounding document context and use similar controllability techniques to plan which sentence-level operations to perform ahead of time, allowing for both high performance and scalability. Finally, we analyze current evaluation processes and propose new strategies that can be used to better evaluate both controllable and document-level simplification systems
Mickus, Timothee. "On the Status of Word Embeddings as Implementations of the Distributional Hypothesis". Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0066.
Texto completoThis dissertation studies the status of word embeddings, i.e, vectors produced by NLP systems, insofar they are relevant to linguistic studies. We more specifically focus on the relation between word embeddings and distributional semantics-the field of study based on the assumption that context correlates to meaning. We question whether word embeddings can be seen as a practical implementation of distributional semantics. Our first approach to this inquiry consists in comparing word embeddings to some other representation of meaning, namely dictionary definitions. The assumption underlying this approach is that semantic representations from distinct formalisms should be equivalent, and therefore the information encoded in distributional semantics representations should be equivalent to that of definitions. We test this assumption using two distinct experimental protocols: the first is based on overall metric space similarity, the second relies on neural networks. In both cases, we find limited success, suggesting that either distributional semantics and dictionaries encode different information, or that word embeddings are not linguistically coherent representations of distributional semantics. The second angle we adopt to study the relation between word embeddings and distributional semantics consists in formalizing our expectations for distributional semantics representations, and compare these expectations to what we observe for word embeddings. We construct a dataset of human judgments on the distributional hypothesis, which we use to elicit predictions on distributional substitutability from word embeddings. While word embeddings attain some degree of performance on this task, their behavior and that of our human annotators are found to drastically differ. Strengthening these results, we observe that a large family of broadly successful embedding models all exhibit artifacts imputable to the neural network architecture they use, rather than to any semantically meaningful factor. Our experiments suggest that, while we can formally delineate criteria we expect of distributional semantics models, the linguistic validity of word embeddings is not a solved problem. Three main conclusions emerge from our experiments. First, the diversity of studies in distributional semantics do not entail that no formal statements regarding this theory can be made: we saw that distributional substitutability provides a very convenient handle for the linguist to grasp. Second, that we cannot easily relate distributional semantics to another lexical semantic theory questions whether the distributional hypothesis actually provides an alternative account of meaning, or whether it deals with a very distinct set of facts altogether. Third, while the gap in quality between practical implementations of distributional semantics and our expectations necessarily adds on to the confusion, that we can make quantitative statements about this gap should be taken as a very encouraging sign for future research
Kervajan, LoÏc. "Contribution à la traduction automatique français/langue des signes française (LSF) au moyen de personnages virtuels : Contribution à la génération automatique de la LSF". Thesis, Aix-Marseille 1, 2011. http://www.theses.fr/2011AIX10172.
Texto completoSince the law was voted the 11-02-2005 for equal rights and opportunities: places open to anyone (public places, shops, internet, etc.) should welcome the Deaf in French Sign Language (FSL). We have worked on the development of technological tools to promote LSF, especially in machine translation from written French to FSL.Our thesis begins with a presentation of knowledge on FSL (theoretical resources and ways to edit FSL) and follows by further concepts of descriptive grammar. Our working hypothesis is: FSL is a language and, therefore, machine translation is relevant.We describe the language specifications for automatic processing, based on scientific knowledge and proposals of our native FSL speaker informants. We also expose our methodology, and do present the advancement of our work in the formalization of linguistic data based on the specificities of FSL which certain (verbs scheme, adjective and adverb modification, organization of nouns, agreement patterns) require further analysis.We do present the application framework in which we worked on: the machine translation system and virtual characters animation system of France Telecom R&D.After a short avatar technology presentation, we explain our control modalities of the gesture synthesis engine through the exchange format that we developed.Finally, we conclude with an evaluation, researches and developments perspectives that could follow this thesis.Our approach has produced its first results since we have achieved our goal of running the full translation chain: from the input of a sentence in French to the realization of the corresponding sentence in FSL with a synthetic character
Manishina, Elena. "Data-driven natural language generation using statistical machine translation and discriminative learning". Thesis, Avignon, 2016. http://www.theses.fr/2016AVIG0209/document.
Texto completoThe humanity has long been passionate about creating intellectual machines that can freely communicate with us in our language. Most modern systems communicating directly with the user share one common feature: they have a dialog system (DS) at their base. As of today almost all DS components embraced statistical methods and widely use them as their core models. Until recently Natural Language Generation (NLG) component of a dialog system used primarily hand-coded generation templates, which represented model phrases in a natural language mapped to a particular semantic content. Today data-driven models are making their way into the NLG domain. In this thesis, we follow along this new line of research and present several novel data-driven approaches to natural language generation. In our work we focus on two important aspects of NLG systems development: building an efficient generator and diversifying its output. Two key ideas that we defend here are the following: first, the task of NLG can be regarded as the translation between a natural language and a formal meaning representation, and therefore, can be performed using statistical machine translation techniques, and second, corpus extension and diversification which traditionally involved manual paraphrasing and rule crafting can be performed automatically using well-known and widely used synonym and paraphrase extraction methods. Concerning our first idea, we investigate the possibility of using NGRAM translation framework and explore the potential of discriminative learning, notably Conditional Random Fields (CRF) models, as applied to NLG; we build a generation pipeline which allows for inclusion and combination of different generation models (NGRAM and CRF) and which uses an efficient decoding framework (finite-state transducers' best path search). Regarding the second objective, namely corpus extension, we propose to enlarge the system's vocabulary and the set of available syntactic structures via integrating automatically obtained synonyms and paraphrases into the training corpus. To our knowledge, there have been no attempts to increase the size of the system vocabulary by incorporating synonyms. To date most studies on corpus extension focused on paraphrasing and resorted to crowd-sourcing in order to obtain paraphrases, which then required additional manual validation often performed by system developers. We prove that automatic corpus extension by means of paraphrase extraction and validation is just as effective as crowd-sourcing, being at the same time less costly in terms of development time and resources. During intermediate experiments our generation models showed a significantly better performance than the phrase-based baseline model and appeared to be more robust in handling unknown combinations of concepts than the current in-house rule-based generator. The final human evaluation confirmed that our data-driven NLG models is a viable alternative to rule-based generators
Moriceau, Véronique. "Intégration de données dans un système question-réponse sur le Web". Toulouse 3, 2007. http://www.theses.fr/2007TOU30019.
Texto completoIn the framework of question-answering systems on the Web, our main goals are to model, develop and evaluate a system which can, from a question in natural language, search for relevant answers on the Web and generate a synthetic answer, even if the search engine selected several candidate answers. We focused on temporal and numerical questions. Our system deals with : - the integration of data from candidate answers by using a knowledge base and knowledge extracted from the Web. This component allows the detection of data inconsistencies and deals with user expectations in order to produce a relevant answer, - the generation of synthetic answers in natural language which are relevant w. R. T users. Indeed, generated answers have to be short, understandable and have to express the cooperative know-how which has been used to solve data inconsistencies. We also propose evaluation methods to evaluate our system from a technical and cognitive point of view
Mba, Mathieu Leonel. "Génération automatique de plate-forme matérielles distribuées pour des applications de traitement du signal". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS341.
Texto completoLocal languages or mother tongues of individuals play an essential role in their fulfillment in their various socio-economic activities. African languages and specifically Cameroonian languages are exposed to disappearance in favor of foreign languages adopted as official languages after independence. This is why it is essential to digitalize and integrate them into the majority of dematerialized services for their sustainability. Speech recognition, widely used as a human-machine interface, can be not only a tool for integrating local languages into applications but also a tool for collecting and digitizing corpora. Embedded systems are the preferred environment for deploying applications that use this human-machine interface. This implies that it is necessary to take measures (through the reduction of the reaction time) to satisfy the real-time constraint very often met in this type of application. Two approaches exist for the reduction of the application's response time, namely parallelization and the use of efficient hardware architectures. In this thesis, we exploit a hybrid approach to reduce the response time of an application. We do this by parallelizing this application and implementing it on a reconfigurable architecture. An architecture whose implementation languages are known to be low-level. Moreover, given the multitude of problems posed by the implementation of parallel systems on reconfigurable architecture, there is a problem with design productivity for the engineer. In this thesis, to implement a real-time speech recognition system on an embedded system, we propose an approach for the productive implementation of parallel applications on reconfigurable architecture. Our approach exploits MATIP, a platform-based design tool, as an FPGA Overlay based on high-level synthesis. We exploit this approach to implement a parallel model of a feature extraction algorithm for the recognition of tonal languages (characteristic of the majority of Cameroonian languages). The experimentation of this implementation on isolated words of the Kóló language, in comparison to other implementations (software version and hardware IP), shows that our approach is not only productive in implementation time but also the obtained parallel application is efficient in processing time. This is the reason why we implemented XMATIP an extension of MATIP to make this approach compatible with hardware-software co-design and co-synthesis
Kow, Eric. "Réalisation de surface : ambiguïté et déterminisme". Phd thesis, Université Henri Poincaré - Nancy I, 2007. http://tel.archives-ouvertes.fr/tel-00192773.
Texto completoLa première extension augmente l'efficacité du réalisateur pour le traitement de l'ambiguïté lexicale. C'est une adaptation de l'optimisation par « étiquetage électrostatique » qui existe déjà pour l'analyse.
La deuxième extension concerne le nombre de sorties retournées par le réalisateur. En temps normal, l'algorithme GenI retourne toutes les phrases associées à une même forme logique. Alors qu'on peut considérer que ces entrées ont le même sens, elles présentent souvent de subtiles nuances. Ici, nous montrons comment la spécification de l'entrée peut être augmentée d'annotations qui permettent un contrôle de ces facteurs supplémentaires. L'extension est permise par le fait que la grammaire FB-LTAG utilisée par le générateur a été construite à partir d'une « métagrammaire », mettant explicitement en oeuvre les généralisations qu'elle code.
La dernière extension donne la possibilité au réalisateur de servir d'environnement de débuggage de la métagrammaire. Les erreurs dans la métagrammaire peuvent avoir des conséquences importantes pour la grammaire. Comme le réalisateur donne en sortie toutes les chaînes associées à une sémantique d'entrée, il peut être utilisé pour trouver ces erreurs et les localiser dans la métagrammaire.
Molina, Villegas Alejandro. "Compression automatique de phrases : une étude vers la génération de résumés". Phd thesis, Université d'Avignon, 2013. http://tel.archives-ouvertes.fr/tel-00998924.
Texto completoNarayan, Shashi. "Generating and simplifying sentences". Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0166.
Texto completoDepending on the input representation, this dissertation investigates issues from two classes: meaning representation (MR) to text and text-to-text generation. In the first class (MR-to-text generation, "Generating Sentences"), we investigate how to make symbolic grammar based surface realisation robust and efficient. We propose an efficient approach to surface realisation using a FB-LTAG and taking as input shallow dependency trees. Our algorithm combines techniques and ideas from the head-driven and lexicalist approaches. In addition, the input structure is used to filter the initial search space using a concept called local polarity filtering; and to parallelise processes. To further improve our robustness, we propose two error mining algorithms: one, an algorithm for mining dependency trees rather than sequential data and two, an algorithm that structures the output of error mining into a tree to represent them in a more meaningful way. We show that our realisers together with these error mining algorithms improves on both efficiency and coverage by a wide margin. In the second class (text-to-text generation, "Simplifying Sentences"), we argue for using deep semantic representations (compared to syntax or SMT based approaches) to improve the sentence simplification task. We use the Discourse Representation Structures for the deep semantic representation of the input. We propose two methods: a supervised approach (with state-of-the-art results) to hybrid simplification using deep semantics and SMT, and an unsupervised approach (with competitive results to the state-of-the-art systems) to simplification using the comparable Wikipedia corpus
Narayan, Shashi. "Generating and simplifying sentences". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0166/document.
Texto completoDepending on the input representation, this dissertation investigates issues from two classes: meaning representation (MR) to text and text-to-text generation. In the first class (MR-to-text generation, "Generating Sentences"), we investigate how to make symbolic grammar based surface realisation robust and efficient. We propose an efficient approach to surface realisation using a FB-LTAG and taking as input shallow dependency trees. Our algorithm combines techniques and ideas from the head-driven and lexicalist approaches. In addition, the input structure is used to filter the initial search space using a concept called local polarity filtering; and to parallelise processes. To further improve our robustness, we propose two error mining algorithms: one, an algorithm for mining dependency trees rather than sequential data and two, an algorithm that structures the output of error mining into a tree to represent them in a more meaningful way. We show that our realisers together with these error mining algorithms improves on both efficiency and coverage by a wide margin. In the second class (text-to-text generation, "Simplifying Sentences"), we argue for using deep semantic representations (compared to syntax or SMT based approaches) to improve the sentence simplification task. We use the Discourse Representation Structures for the deep semantic representation of the input. We propose two methods: a supervised approach (with state-of-the-art results) to hybrid simplification using deep semantics and SMT, and an unsupervised approach (with competitive results to the state-of-the-art systems) to simplification using the comparable Wikipedia corpus
TOMEH, BACHIRA. "Les quantificateurs dans le langage naturel". Poitiers, 1991. http://www.theses.fr/1991POIT5004.
Texto completoQuantifiers are generators of natural languages (either universal or existential, either affirmative or negative) for wh there in an interval between their use in the fields of discusse and logic. They have been the topic of logicians' and psychologists' manifold works in the study of, for instance, syllogism whose logical operators are quantifiers, or the processes fo reasoning. But the focus of most of those studies has been standard quantifiers (some, each even, none) and not other formulations. The chief aim of this work of research is to study, on the hand, the cognitive generations which lie under the processing of quantifiers, and whether, on the one hand, those operations are the same or not, in relation to the different formulations belaying to the same type of quantifiers. We have tried to survey the representation of quantifiers through the following experiments : - a study of reaction time while checking quantified sentences - a study of the processing of quantifiers while solving syllogisms. - a study of the semantic proximity of quantifiers - a study of the numerical evaluation which might be represented by a given quantifier - a survey of the graphic representation of quantifiers in most experiments, we have chosen the following quantifiers : - universal : all the, all, the, each - existential : some, many, there is the results enable us to stress several dimensions in the meaning of the quantifiers of natural language
Fouquère, Christophe. "Systèmes d'analyse tolérante du langage naturel". Grenoble 2 : ANRT, 1988. http://catalogue.bnf.fr/ark:/12148/cb376136855.
Texto completoFouqueré, Christophe. "Systèmes d'analyse tolérante du langage naturel". Paris 13, 1988. http://www.theses.fr/1988PA132003.
Texto completoSabatier, Paul. "Contribution au développement d'interfaces en langage naturel". Paris 7, 1987. http://www.theses.fr/1987PA077081.
Texto completoJOAB, MICHELE. "Modelisation d'un dialogue pedagogique en langage naturel". Paris 6, 1990. http://www.theses.fr/1990PA066556.
Texto completoSabatier, Paul. "Contribution au développement d'interfaces en langage naturel". Grenoble 2 : ANRT, 1987. http://catalogue.bnf.fr/ark:/12148/cb37609547k.
Texto completoMax, Aurélien. "De la création de documents normalisés à la normalisation de documents en domaine contraint". Grenoble 1, 2003. http://www.theses.fr/2003GRE10227.
Texto completoWell-formedness conditions on documents in constrained domains are often hard to apply. An active research trend approaches the authoring of normalized documents through semantic specification, thereby facilitating such applications as multilingual production. However, the current systems are not able to analyse an existing document in order to normalize it. We therefore propose an approach that reuses the resources of such systems to recreate the semantic content of a document, from which a normalized textual version can be generated. This approach is based on two main paradigms : fuzzy inverted generation, which heuristically finds candidate semantic representations, and interactive negotiation, which allows an expert of the domain to progressively validate the semantic representation that corresponds to the original document
Vert, Jean-Philippe. "Méthodes statistiques pour la modélisation du langage naturel". Paris 6, 2001. http://www.theses.fr/2001PA066247.
Texto completoGaita, Mihai. "Le langage naturel en tant que problème philosophique". Paris 8, 1993. http://www.theses.fr/1993PA080867.
Texto completoStarting from aristotle's opposition between deixis ("pointing") and apodeixis ("proof"), we argue that language is radically indexical. Some authors take indexicality as a phenomenon of context-sensitivity we claim that indexicality means total neutrality towards apodeixis and suggest a definition of speech that seems to get on well with heraclite's notion of logos xunos
Ameli, Samila. "Construction d'un langage de dictionnaire conceptuel en vue du traitement du langage naturel : application au langage médical". Compiègne, 1989. http://www.theses.fr/1989COMPD226.
Texto completoThis study deals with the realisation of a « new generation » information retrieval system, taking consideration of texts signification. This system compares texts (questions and documents) by their content. A knowledge base being indispensable for text “comprehension”, a dictionary of concepts has been designed in which are defined the concepts and their mutual relations thru a user friendly language called SUMIX. SUMIX enables us (1) to solve ambiguities due to polysemia by considering context dependencies, (2) to make use of property inheritance and so can largely help cogniticiens in the creation of the knowledge and inference base, (3) to define subject dependant relation between concepts which make possible metaknowledge handling. The dictionary of concepts is essentially used (1) to index concepts (and not characters string) which enables us to select a wide range of documents in the conceptual extraction phase, (2) to filter the previously selected documents by comparing the structure of each document with that of the query in the structural analysis phase
Faille, Juliette. "Data-Based Natural Language Generation : Evaluation and Explainability". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0305.
Texto completoRecent Natural Language Generation (NLG) models achieve very high average performance. Their output texts are generally grammatically and syntactically correct which makes them sound natural. Though the semantics of the texts are right in most cases, even the state-of-the-art NLG models still produce texts with partially incorrect meanings. In this thesis, we propose evaluating and analyzing content-related issues of models used in the NLG tasks of Resource Description Framework (RDF) graphs verbalization and conversational question generation. First, we focus on the task of RDF verbalization and the omissions and hallucinations of RDF entities, i.e. when an automatically generated text does not mention all the input RDF entities or mentions other entities than those in the input. We evaluate 25 RDF verbalization models on the WebNLG dataset. We develop a method to automatically detect omissions and hallucinations of RDF entities in the outputs of these models. We propose a metric based on omissions or hallucination counts to quantify the semantic adequacy of the NLG models. We find that this metric correlates well with what human annotators consider to be semantically correct and show that even state-of-the-art models are subject to omissions and hallucinations. Following this observation about the tendency of RDF verbalization models to generate texts with content-related issues, we propose to analyze the encoder of two such state-of-the-art models, BART and T5. We use the probing explainability method and introduce two probing classifiers (one parametric and one non-parametric) to detect omissions and distortions of RDF input entities in the embeddings of the encoder-decoder models. We find that such probing classifiers are able to detect these mistakes in the encodings, suggesting that the encoder of the models is responsible for some loss of information about omitted and distorted entities. Finally, we propose a T5-based conversational question generation model that in addition to generating a question based on an input RDF graph and a conversational context, generates both a question and its corresponding RDF triples. This setting allows us to introduce a fine-grained evaluation procedure automatically assessing coherence with the conversation context and the semantic adequacy with the input RDF. Our contributions belong to the fields of NLG evaluation and explainability and use techniques and methodologies from these two research fields in order to work towards providing more reliable NLG models
Nazarenko, Adeline. "Compréhension du langage naturel : le problème de la causalité". Paris 13, 1994. http://www.theses.fr/1994PA132007.
Texto completoCavazza, Marc. "Analyse semantique du langage naturel par construction de modeles". Paris 7, 1991. http://www.theses.fr/1991PA077220.
Texto completoMaršík, Jiří. "Les effects et les handlers dans le langage naturel". Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0322/document.
Texto completoIn formal semantics, researchers assign meanings to sentences of a natural language. This work is guided by the principle of compositionality: the meaning of an expression is a function of the meanings of its parts. These functions are often formalized using the [lambda]-calculus. However, there are areas of language which challenge the notion of compositionality, e.g. anaphoric pronouns or presupposition triggers. These force researchers to either abandon compositionality or adjust the structure of meanings. In the first case, meanings are derived by processes that no longer correspond to pure mathematical functions but rather to context-sensitive procedures, much like the functions of a programming language that manipulate their context with side effects. In the second case, when the structure of meanings is adjusted, the new meanings tend to be instances of the same mathematical structure, the monad. Monads themselves being widely used in functional programming to encode side effects, the common theme that emerges in both approaches is the introduction of side effects. Furthermore, different problems in semantics lead to different theories which are challenging to unite. Our thesis claims that by looking at these theories as theories of side effects, we can reuse results from programming language research to combine them.This thesis extends [lambda]-calculus with a monad of computations. The monad implements effects and handlers, a recent technique in the study of programming language side effects. In the first part of the thesis, we prove some of the fundamental properties of this calculus: subject reduction, confluence and termination. Then in the second part, we demonstrate how to use the calculus to implement treatments of several linguistic phenomena: deixis, quantification, conventional implicature, anaphora and presupposition. In the end, we build a grammar that features all of these phenomena and their interactions
Maršík, Jiří. "Les effects et les handlers dans le langage naturel". Electronic Thesis or Diss., Université de Lorraine, 2016. http://www.theses.fr/2016LORR0322.
Texto completoIn formal semantics, researchers assign meanings to sentences of a natural language. This work is guided by the principle of compositionality: the meaning of an expression is a function of the meanings of its parts. These functions are often formalized using the [lambda]-calculus. However, there are areas of language which challenge the notion of compositionality, e.g. anaphoric pronouns or presupposition triggers. These force researchers to either abandon compositionality or adjust the structure of meanings. In the first case, meanings are derived by processes that no longer correspond to pure mathematical functions but rather to context-sensitive procedures, much like the functions of a programming language that manipulate their context with side effects. In the second case, when the structure of meanings is adjusted, the new meanings tend to be instances of the same mathematical structure, the monad. Monads themselves being widely used in functional programming to encode side effects, the common theme that emerges in both approaches is the introduction of side effects. Furthermore, different problems in semantics lead to different theories which are challenging to unite. Our thesis claims that by looking at these theories as theories of side effects, we can reuse results from programming language research to combine them.This thesis extends [lambda]-calculus with a monad of computations. The monad implements effects and handlers, a recent technique in the study of programming language side effects. In the first part of the thesis, we prove some of the fundamental properties of this calculus: subject reduction, confluence and termination. Then in the second part, we demonstrate how to use the calculus to implement treatments of several linguistic phenomena: deixis, quantification, conventional implicature, anaphora and presupposition. In the end, we build a grammar that features all of these phenomena and their interactions