Tesis sobre el tema "Génération de textes"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Génération de textes".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Hankach, Pierre. "Génération automatique de textes par satisfaction de contraintes". Paris 7, 2009. http://www.theses.fr/2009PA070027.
Texto completoWe address in this thesis the construction of a natural language generation System - computer software that transforms a formal representation of information into a text in natural language. In our approach, we define the generation problem as a constraint satisfaction problem (CSP). The implemented System ensures an integrated processing of generation operations as their different dependencies are taken into account and no priority is given to any type of operation over the others. In order to define the constraint satisfaction problem, we represent the construction operations of a text by decision variables. Individual operations that implement the same type of minimal expressions in the text form a generation task. We classify decision variables according to the type of operations they represent (e. G. Content selection variables, document structuring variables. . . ). The linguistic rules that govern the operations are represented as constraints on the variables. A constraint can be defined over variables of the same type or different types, capturing the dependency between the corresponding operations. The production of a text consists of resolving the global System of constraints, that is finding an evaluation of the variables that satisfies all the constraints. As part of the grammar of constraints for generation, we particularly formulate the constraints that govern document structuring operations. We model by constraints the rhetorical structure of SORT in order to yield coherent texts as the generator's output. Beforehand, in order to increase the generation capacities of our System, we extend the rhetorical structure to cover texts in the non-canonical order. Furthermore, in addition to defining these coherence constraints, we formulate a set of constraints that enables controlling the form of the macrostructure by communicative goals. Finally, we propose a solution to the problem of computational complexity of generating large texts. This solution is based on the generation of a text by groups of clauses. The problem of generating a text is therefore divided into many problems of reduced complexity, where each of them is concerned with generating a part of the text. These parts are of limited size so the associated complexity to their generation remains reasonable. The proposed partitioning of generation is motivated by linguistic considerations
Godbout, Mathieu. "Approches par bandit pour la génération automatique de résumés de textes". Master's thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69488.
Texto completoThis thesis discusses the use of bandit methods to solve the problem of training extractive abstract generation models. The extractive models, which build summaries by selecting sentences from an original document, are difficult to train because the target summary of a document is usually not built in an extractive way. It is for this purpose that we propose to see the production of extractive summaries as different bandit problems, for which there exist algorithms that can be leveraged for training summarization models.In this paper, BanditSum is first presented, an approach drawn from the literature that sees the generation of the summaries of a set of documents as a contextual bandit problem. Next,we introduce CombiSum, a new algorithm which formulates the generation of the summary of a single document as a combinatorial bandit. By exploiting the combinatorial formulation,CombiSum manages to incorporate the notion of the extractive potential of each sentence of a document in its training. Finally, we propose LinCombiSum, the linear variant of Com-biSum which exploits the similarities between sentences in a document and uses the linear combinatorial bandit formulation instead
Boussema, Kaouther. "Système de génération automatique de programmes d'entrées-sorties : le système IO". Paris 9, 1998. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1998PA090048.
Texto completoChali, Yllias. "L'expansion de texte. Une approche basée sur l'explication par questions/réponses pour la génération de versions de textes". Toulouse 3, 1997. http://www.theses.fr/1997TOU30078.
Texto completoManuélian, Hélène. "Descriptions définies et démonstratives : analyses de corpus pour la génération de textes". Phd thesis, Nancy 2, 2003. http://tel.archives-ouvertes.fr/tel-00526602.
Texto completoPonton, Claude (1966. "Génération automatique de textes en langue naturelle : essai de définition d'un système noyau". Grenoble 3, 1996. http://www.theses.fr/1996GRE39030.
Texto completoOne of the common features with many generation systems is the strong dependence on the application. If few definition attempts of "non dedicated" systems have been realised, none of them permis to take into account the application characteristics (as its formalism) and the communication context (application field, user,. . . ). The purpose of this thesis is the definition of a generation system both non dedicated and permitting to take into account these elements. Such a system is called a "kernel generation system". In this perspective, we have studied 94 generation systems through objective relevant criteria. This study is used as a basis in the continuation of our work. The definition of a kernel generator needs the determination of the frontier between the application and the kernel generation (generator tasks, inputs, outputs, data,. . . ). Effectively, it is necessary to be aware of the role of both parts and their communication ways before designing the kernel generator. It results of this study that our generator considers as input any formal content representation as well as a set of constraints describing the communication context. The kernel generator then processes what is generally called the "how to say it?" and is able to produce every solutions according to the input constraints. This definition part is followed by the achievement of a first generator prototype which has been tested through two applications distinct in all respects (formalism, field, type of texts,. . . ). Finally, this work opens out on some evolution perspectives for the generator particulary on knowledge representation formalism (cotopies d'objets) and on architecture (distributed architecture)
Namer, Fiammetta. "Pronominalisation et effacement du sujet en génération automatique de textes en langues romanes". Paris 7, 1990. http://www.theses.fr/1990PA077249.
Texto completoFaiz, Rim. "Modélisation formelle des connaissances temporelles à partir de textes en vue d'une génération automatique de programmes". Paris 9, 1996. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1996PA090023.
Texto completoFan, Huihui. "Text Generation with and without Retrieval". Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0164.
Texto completoEvery day we write --- from sending your mother a quick text to drafting a scientific article such as this thesis. The writing we do often goes hand-in-hand with automated assistance. For example, modern instant messaging software often suggests what word to write next, emails can be started with an autocomposer, and essays are improved with machine-suggested edits. These technologies are powered by years of research on text generation, a natural language processing field with the goal of automatically producing fluent, human-readable natural language. At a small scale, text generation systems can generate individual words or sentences, but have wide-reaching applications beyond that. For instance, systems for summarization, dialogue, and even the writing of entire Wikipedia articles are grounded in foundational text generation technology.Producing fluent, accurate, and useful natural language faces numerous challenges. Recent advances in text generation, principally leveraging training neural network architectures on large datasets, have significantly improved the surface-level readability of machine-generated text. However, current systems necessitate improvement along numerous axes, including generation beyond English and writing increasingly longer texts. While the field has seen rapid progress, much research focus has been directed towards the English language, where large-scale training and evaluation datasets for various tasks are readily available. Nevertheless, applications from autocorrect to autocomposition of text should be available universally. After all, by population, the majority of the world does not write in English. In this work, we create text generation systems for various tasks with the capability of incorporating languages beyond English, either as algorithms that easily extend to new languages or multilingual models encompassing up to 20 languages in one model.Beyond our work in multilingual text generation, we focus on a critical piece of generation systems: knowledge. A pre-requisite to writing well is knowing what to write. This concept of knowledge is incredibly important in text generation systems. For example, automatically writing an entire Wikipedia article requires extensive research on that article topic. The instinct to research is often intuitive --- decades ago people would have gone to a library, replaced now by the information available on the World Wide Web. However, for automated systems, the question is not only what knowledge to use to generate text, but also how to retrieve that knowledge and best utilize it to achieve the intended communication goal.We face the challenge of retrieval-based text generation. We present several techniques for identifying relevant knowledge at different scales: from local knowledge available in a paragraph to sifting through Wikipedia, and finally identifying the needle-in-the-haystack on the scale of the full web. We describe neural network architectures that can perform large-scale retrieval efficiently, utilizing pre-computation and caching mechanisms. Beyond how to retrieve knowledge, we further investigate the form the knowledge should take --- from natural language such as Wikipedia articles or text on the web to structured inputs in the form of knowledge graphs. Finally, we utilize these architectures in novel, much more challenging tasks that push the boundaries of where text generation models work well today: tasks that necessitate knowledge but also require models to produce long, structured natural language output, such as answering complex questions or writing full Wikipedia articles
Popesco, Liana. "Analyse et génération de textes à partir d'un seul ensemble de connaissances pour chaque langue naturelle et de meta-règles de structuration". Paris 6, 1986. http://www.theses.fr/1986PA066138.
Texto completoColin, Émilie. "Traitement automatique des langues et génération automatique d'exercices de grammaire". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0059.
Texto completoOur perspectives are educational, to create grammar exercises for French. Paraphrasing is an operation of reformulation. Our work tends to attest that sequence-to-sequence models are not simple repeaters but can learn syntax. First, by combining various models, we have shown that the representation of information in multiple forms (using formal data (RDF), coupled with text to extend or reduce it, or only text) allows us to exploit a corpus from different angles, increasing the diversity of outputs, exploiting the syntactic levers put in place. We also addressed a recurrent problem, that of data quality, and obtained paraphrases with a high syntactic adequacy (up to 98% coverage of the demand) and a very good linguistic level. We obtain up to 83.97 points of BLEU-4*, 78.41 more than our baseline average, without syntax leverage. This rate indicates a better control of the outputs, which are varied and of good quality in the absence of syntax leverage. Our idea was to be able to work from raw text : to produce a representation of its meaning. The transition to French text was also an imperative for us. Working from plain text, by automating the procedures, allowed us to create a corpus of more than 450,000 sentence/representation pairs, thanks to which we learned to generate massively correct texts (92% on qualitative validation). Anonymizing everything that is not functional contributed significantly to the quality of the results (68.31 of BLEU, i.e. +3.96 compared to the baseline, which was the generation of text from non-anonymized data). This second work can be applied the integration of a syntax lever guiding the outputs. What was our baseline at time 1 (generate without constraint) would then be combined with a constrained model. By applying an error search, this would allow the constitution of a silver base associating representations to texts. This base could then be multiplied by a reapplication of a generation under constraint, and thus achieve the applied objective of the thesis. The formal representation of information in a language-specific framework is a challenging task. This thesis offers some ideas on how to automate this operation. Moreover, we were only able to process relatively short sentences. The use of more recent neural modelswould likely improve the results. The use of appropriate output strokes would allow for extensive checks. *BLEU : quality of a text (scale from 0 (worst) to 100 (best), Papineni et al. (2002))
Moyse, Gilles. "Résumés linguistiques de données numériques : interprétabilité et périodicité de séries". Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066526.
Texto completoOur research is in the field of fuzzy linguistic summaries (FLS) that allow to generate natural language sentences to describe very large amounts of numerical data, providing concise and intelligible views of these data. We first focus on the interpretability of FLS, crucial to provide end-users with an easily understandable text, but hard to achieve due to its linguistic form. Beyond existing works on that topic, based on the basic components of FLS, we propose a general approach for the interpretability of summaries, considering them globally as groups of sentences. We focus more specifically on their consistency. In order to guarantee it in the framework of standard fuzzy logic, we introduce a new model of oppositions between increasingly complex sentences. The model allows us to show that these consistency properties can be satisfied by selecting a specific negation approach. Moreover, based on this model, we design a 4-dimensional cube displaying all the possible oppositions between sentences in a FLS and show that it generalises several existing logical opposition structures. We then consider the case of data in the form of numerical series and focus on linguistic summaries about their periodicity: the sentences we propose indicate the extent to which the series are periodic and offer an appropriate linguistic expression of their periods. The proposed extraction method, called DPE, standing for Detection of Periodic Events, splits the data in an adaptive manner and without any prior information, using tools from mathematical morphology. The segments are then exploited to compute the period and the periodicity, measuring the quality of the estimation and the extent to which the series is periodic. Lastly, DPE returns descriptive sentences of the form ``Approximately every 2 hours, the customer arrival is important''. Experiments with artificial and real data show the relevance of the proposed DPE method. From an algorithmic point of view, we propose an incremental and efficient implementation of DPE, based on established update formulas. This implementation makes DPE scalable and allows it to process real-time streams of data. We also present an extension of DPE based on the local periodicity concept, allowing the identification of local periodic subsequences in a numerical series, using an original statistical test. The method validated on artificial and real data returns natural language sentences that extract information of the form ``Every two weeks during the first semester of the year, sales are high''
Cripwell, Liam. "Controllable and Document-Level Text Simplification". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0186.
Texto completoText simplification is a task that involves rewriting a text to make it easier to read and understand for a wider audience, while still expressing the same core meaning. This has potential benefits for disadvantaged end-users (e.g. non-native speakers, children, the reading impaired), while also showing promise as a preprocessing step for downstream NLP tasks. Recent advancement in neural generative models have led to the development of systems that are capable of producing highly fluent outputs. However, these end-to-end systems often rely on training corpora to implicitly learn how to perform the necessary rewrite operations. In the case of simplification, these datasets are lacking in both quantity and quality, with most corpora either being very small, automatically constructed, or subject to strict licensing agreements. As a result, many systems tend to be overly conservative, often making no changes to the original text or being limited to the paraphrasing of short word sequences without substantial structural modifications. Furthermore, most existing work on text simplification is limited to sentence-level inputs, with attempts to iteratively apply these approaches to document-level simplification failing to coherently preserve the discourse structure of the document. This is problematic, as most real-world applications of text simplification concern document-level texts. In this thesis, we investigate strategies for mitigating the conservativity of simplification systems while promoting a more diverse range of transformation types. This involves the creation of new datasets containing instances of under-represented operations and the implementation of controllable systems capable of being tailored towards specific transformations and simplicity levels. We later extend these strategies to document-level simplification, proposing systems that are able to consider surrounding document context and use similar controllability techniques to plan which sentence-level operations to perform ahead of time, allowing for both high performance and scalability. Finally, we analyze current evaluation processes and propose new strategies that can be used to better evaluate both controllable and document-level simplification systems
Faille, Juliette. "Data-Based Natural Language Generation : Evaluation and Explainability". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0305.
Texto completoRecent Natural Language Generation (NLG) models achieve very high average performance. Their output texts are generally grammatically and syntactically correct which makes them sound natural. Though the semantics of the texts are right in most cases, even the state-of-the-art NLG models still produce texts with partially incorrect meanings. In this thesis, we propose evaluating and analyzing content-related issues of models used in the NLG tasks of Resource Description Framework (RDF) graphs verbalization and conversational question generation. First, we focus on the task of RDF verbalization and the omissions and hallucinations of RDF entities, i.e. when an automatically generated text does not mention all the input RDF entities or mentions other entities than those in the input. We evaluate 25 RDF verbalization models on the WebNLG dataset. We develop a method to automatically detect omissions and hallucinations of RDF entities in the outputs of these models. We propose a metric based on omissions or hallucination counts to quantify the semantic adequacy of the NLG models. We find that this metric correlates well with what human annotators consider to be semantically correct and show that even state-of-the-art models are subject to omissions and hallucinations. Following this observation about the tendency of RDF verbalization models to generate texts with content-related issues, we propose to analyze the encoder of two such state-of-the-art models, BART and T5. We use the probing explainability method and introduce two probing classifiers (one parametric and one non-parametric) to detect omissions and distortions of RDF input entities in the embeddings of the encoder-decoder models. We find that such probing classifiers are able to detect these mistakes in the encodings, suggesting that the encoder of the models is responsible for some loss of information about omitted and distorted entities. Finally, we propose a T5-based conversational question generation model that in addition to generating a question based on an input RDF graph and a conversational context, generates both a question and its corresponding RDF triples. This setting allows us to introduce a fine-grained evaluation procedure automatically assessing coherence with the conversation context and the semantic adequacy with the input RDF. Our contributions belong to the fields of NLG evaluation and explainability and use techniques and methodologies from these two research fields in order to work towards providing more reliable NLG models
Barrère, Killian. "Architectures de Transformer légères pour la reconnaissance de textes manuscrits anciens". Electronic Thesis or Diss., Rennes, INSA, 2023. http://www.theses.fr/2023ISAR0017.
Texto completoTransformer architectures deliver low error rates but are challenging to train due to limited annotated data in handwritten text recognition. We propose lightweight Transformer architectures to adapt to the limited amounts of annotated handwritten text available. We introduce a fast Transformer architecture with an encoder, processing up to 60 pages per second. We also present architectures using a Transformer decoder to incorporate language modeling into character recognition. To effectively train our architectures, we offer algorithms for generating synthetic data adapted to the visual style of modern and historical documents. Finally, we propose strategies for learning with limited data and reducing prediction errors. Our architectures, combined with synthetic data and these strategies, achieve competitive error rates on lines of text from modern documents. For historical documents, they train effectively with minimal annotated data, surpassing state-ofthe- art approaches. Remarkably, just 500 annotated lines are sufficient for character error rates close to 5%
Raynaud, Tanguy. "Génération de questions à choix multiples thématiques à partir de bases de connaissances". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSES066.
Texto completoThe use of multiple choice questions to assess knowledge is a reliable and widely used method, even in official contexts. Such a method offers many advantages, including equality of marking between candidates, or, more pragmatically, the possibility of automatic correction.With the emergence of MOOCs (courses delivered in a digital format), the need for automatic evaluation has increased. The scope of this thesis is part of this context, by proposing a solution that enables automatic thematic question generation.The work presented in this thesis uses knowledge bases as data sources to automatically generate thematic multiple-choice questions.The use of knowledge bases in this context thus raises several scientific challenges that constitute the contributions of the presented work:- Knowledge base entities are generally not explicitly correlated to themes. This thesis presents a method based on Wikipedia metadata to identify and sort knowledge base entities according to predefined themes.- In order to be intelligible, a question must be grammatically correct, and must include enough information to remove any ambiguity about the correct answer. To that end, we have introduced question templates to identify entities within knowledge bases and generate natural language statements.- In a multiple choice questions, distractors (wrong answers) are no less important than the statement. Wrong distractors are easilly discarded and affect the whole question difficulty. In a last contribution, we present the method used to select distractors that are not only relevant to the question's statement, but also to its context
Moyse, Gilles. "Résumés linguistiques de données numériques : interprétabilité et périodicité de séries". Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066526/document.
Texto completoOur research is in the field of fuzzy linguistic summaries (FLS) that allow to generate natural language sentences to describe very large amounts of numerical data, providing concise and intelligible views of these data. We first focus on the interpretability of FLS, crucial to provide end-users with an easily understandable text, but hard to achieve due to its linguistic form. Beyond existing works on that topic, based on the basic components of FLS, we propose a general approach for the interpretability of summaries, considering them globally as groups of sentences. We focus more specifically on their consistency. In order to guarantee it in the framework of standard fuzzy logic, we introduce a new model of oppositions between increasingly complex sentences. The model allows us to show that these consistency properties can be satisfied by selecting a specific negation approach. Moreover, based on this model, we design a 4-dimensional cube displaying all the possible oppositions between sentences in a FLS and show that it generalises several existing logical opposition structures. We then consider the case of data in the form of numerical series and focus on linguistic summaries about their periodicity: the sentences we propose indicate the extent to which the series are periodic and offer an appropriate linguistic expression of their periods. The proposed extraction method, called DPE, standing for Detection of Periodic Events, splits the data in an adaptive manner and without any prior information, using tools from mathematical morphology. The segments are then exploited to compute the period and the periodicity, measuring the quality of the estimation and the extent to which the series is periodic. Lastly, DPE returns descriptive sentences of the form ``Approximately every 2 hours, the customer arrival is important''. Experiments with artificial and real data show the relevance of the proposed DPE method. From an algorithmic point of view, we propose an incremental and efficient implementation of DPE, based on established update formulas. This implementation makes DPE scalable and allows it to process real-time streams of data. We also present an extension of DPE based on the local periodicity concept, allowing the identification of local periodic subsequences in a numerical series, using an original statistical test. The method validated on artificial and real data returns natural language sentences that extract information of the form ``Every two weeks during the first semester of the year, sales are high''
Shimorina, Anastasia. "Natural Language Generation : From Data Creation to Evaluation via Modelling". Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0080.
Texto completoNatural language generation is a process of generating a natural language text from some input. This input can be texts, documents, images, tables, knowledge graphs, databases, dialogue acts, meaning representations, etc. Recent methods in natural language generation, mostly based on neural modelling, have yielded significant improvements in the field. Despite this recent success, numerous issues with generation prevail, such as faithfulness to the source, developing multilingual models, few-shot generation. This thesis explores several facets of natural language generation from creating training datasets and developing models to evaluating proposed methods and model outputs. In this thesis, we address the issue of multilinguality and propose possible strategies to semi-automatically translate corpora for data-to-text generation. We show that named entities constitute a major stumbling block in translation exemplified by the English-Russian translation pair. We proceed to handle rare entities in data-to-text modelling exploring two mechanisms: copying and delexicalisation. We demonstrate that rare entities strongly impact performance and that the impact of these two mechanisms greatly varies depending on how datasets are constructed. Getting back to multilinguality, we also develop a modular approach for shallow surface realisation in several languages. Our approach splits the surface realisation task into three submodules: word ordering, morphological inflection and contraction generation. We show, via delexicalisation, that the word ordering component mainly depends on syntactic information. Along with the modelling, we also propose a framework for error analysis, focused on word order, for the shallow surface realisation task. The framework enables to provide linguistic insights into model performance on the sentence level and identify patterns where models underperform. Finally, we also touch upon the subject of evaluation design while assessing automatic and human metrics, highlighting the difference between the sentence-level and system-level type of evaluation
Hadjadj, Mohammed. "Modélisation de la Langue des Signes Française : Proposition d’un système à compositionalité sémantique". Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS560/document.
Texto completoThe recognition of French Sign Language (LSF) as a natural language in 2005 has created an important need for the development of tools to make information accessible to the deaf public. With this prospect, this thesis aims at linguistic modeling for a system of generation of LSF. We first present the different linguistic approaches aimed at describing the sign language (SL). We then present the models proposed in computer science. In a second step, we propose an approach allowing to take into account the linguistic properties of the SL while respecting the constraints of a formalisation process.By studying the links between semantic functions and their observed forms in LSF Corpora, we have identified several production rules. We finally present the rule functioning as a system capable of modeling an entire utterance in LSF
Pascual, Elsa. "Représentation de l'architecture textuelle et génération de texte". Toulouse 3, 1991. http://www.theses.fr/1991TOU30123.
Texto completoMickus, Timothee. "On the Status of Word Embeddings as Implementations of the Distributional Hypothesis". Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0066.
Texto completoThis dissertation studies the status of word embeddings, i.e, vectors produced by NLP systems, insofar they are relevant to linguistic studies. We more specifically focus on the relation between word embeddings and distributional semantics-the field of study based on the assumption that context correlates to meaning. We question whether word embeddings can be seen as a practical implementation of distributional semantics. Our first approach to this inquiry consists in comparing word embeddings to some other representation of meaning, namely dictionary definitions. The assumption underlying this approach is that semantic representations from distinct formalisms should be equivalent, and therefore the information encoded in distributional semantics representations should be equivalent to that of definitions. We test this assumption using two distinct experimental protocols: the first is based on overall metric space similarity, the second relies on neural networks. In both cases, we find limited success, suggesting that either distributional semantics and dictionaries encode different information, or that word embeddings are not linguistically coherent representations of distributional semantics. The second angle we adopt to study the relation between word embeddings and distributional semantics consists in formalizing our expectations for distributional semantics representations, and compare these expectations to what we observe for word embeddings. We construct a dataset of human judgments on the distributional hypothesis, which we use to elicit predictions on distributional substitutability from word embeddings. While word embeddings attain some degree of performance on this task, their behavior and that of our human annotators are found to drastically differ. Strengthening these results, we observe that a large family of broadly successful embedding models all exhibit artifacts imputable to the neural network architecture they use, rather than to any semantically meaningful factor. Our experiments suggest that, while we can formally delineate criteria we expect of distributional semantics models, the linguistic validity of word embeddings is not a solved problem. Three main conclusions emerge from our experiments. First, the diversity of studies in distributional semantics do not entail that no formal statements regarding this theory can be made: we saw that distributional substitutability provides a very convenient handle for the linguist to grasp. Second, that we cannot easily relate distributional semantics to another lexical semantic theory questions whether the distributional hypothesis actually provides an alternative account of meaning, or whether it deals with a very distinct set of facts altogether. Third, while the gap in quality between practical implementations of distributional semantics and our expectations necessarily adds on to the confusion, that we can make quantitative statements about this gap should be taken as a very encouraging sign for future research
Billiez, Jacqueline. "La génération du texte cendrarsien : poétique et sémiotique du fragmentaire". Grenoble 3, 1993. http://www.theses.fr/1993GRE39002.
Texto completoThis study reverses the usual strategy that privileged biographical or symbolic anecdotes. Thanks to a scriptural approach that connected word after word back to its production process, it shows that the chain of often fantastic stories credited to a prodigious imagination, also has a metatextual dimension. Cendrars'writing, an area of confrontation between the logics of meaning and the logics of text, often chooses as its narratives object the operation required by the act of writing and the act of reading, either masked by a stock of themes in which the warlike and the mystical are prominent, or conducting a conflictual dialogue with another writer's text. Cendrars'text, fragmented, re-written, can only be reconstructed by a translinear reading informed by a theory of text that defeats the effect of representation. A given fragment may offer a reflection of the system without that reflection being perceptible in such as a restrical space: defensive strategy of the text. Conversely, cross-references to a limited end ever-recycled lexicon make it possible to reconstitute the system : offensive strategy of the reader. Such as approach changes the writer's class. A traveller on books rather than trains, equipped with a cutting than cut hand, cendrars the poet chisels the words of the dictionary and builds on its typewriter a vast network of which the reader is a part
Cavalier, Arthur. "Génération procédurale de textures pour enrichir les détails surfaciques". Thesis, Limoges, 2019. http://www.theses.fr/2019LIMO0108.
Texto completoWith the increasing power of consumer machines, Computer Graphics is offering us the opportunity to immerse ourselves in ever more detailed virtual worlds. The artists are thus tasked to model and animate these complex virtual scenes. This leads to a prohibitive authoring time, a bigger memory cost and difficulties to correctly and efficiently render this abundance of details. Many tools for procedural content generation have been proposed to resolve these issues. In this thesis, we focused our work on on-the-fly generation of mesoscopic details in order to easily add tiny details on 3D mesh surfaces. By focusing on procedural texture synthesis, we proposed some improvements in order to correctly render textures that modify not only the surface color but faking the surface meso-geometry in real-time. We have presented a methodology for rendering high quality textures without aliasing issues for controllable structured pattern synthesis. We also proposed an on-the-fly normal map generation to disturb the shading calculation and to add irregularites and relief to the textured surface
Di, Cristo Philippe. "Génération automatique de la prosodie pour la synthèse à partir du texte". Aix-Marseille 1, 1998. http://www.theses.fr/1998AIX11050.
Texto completoKou, Huaizhong. "Génération d'adaptateurs web intelligents à l'aide de techniques de fouilles de texte". Versailles-St Quentin en Yvelines, 2003. http://www.theses.fr/2003VERS0011.
Texto completoThis thesis defines a system framework of semantically integrating Web information, called SEWISE. It can integrate text information from various Web sources belonging to an application domain into common domain-specific concept ontology. In SEWISE, Web wrappers are built around different Web sites to automatically extract interesting information from. Text mining technologies are then used to discover the semantics Web documents talk about. SEWISE can ease topic-oriented information researches over the Web. Three problems related to the document categorization are studied. Firstly, we investigate the approaches to feature selection and proposed two approaches CBA and IBA to select features. To estimate statistic term associations and integrate them within document similarity model, a mathematical model is proposed. Finally, the category score calculation algorithms used by k-NN classifiers are studied. Two weighted algorithms CBW and IBW to calculate category score are proposed
Pouchot, Stéphanie. "L'analyse de corpus et la génération automatique de texte : méthodes et usages". Grenoble 3, 2003. http://www.theses.fr/2003GRE39006.
Texto completoDischler, Jean-Michel. "La génération de textures 3D et de textures a microstructure complexe pour la synthese d'images". Université Louis Pasteur (Strasbourg) (1971-2008), 1996. http://www.theses.fr/1996STR13015.
Texto completoAndriamarozakaniaina, Tahiry. "Du texte à la génération d'environnements virtuels 3D : application à la scénographie théâtrale". Phd thesis, Université Toulouse le Mirail - Toulouse II, 2012. http://tel.archives-ouvertes.fr/tel-00772129.
Texto completoAlaa, eddine Jalal. "Technique de caractérisation de textiles nouvelle génération pour le blindage électromagnétique". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT064.
Texto completoElectromagnetic shielding consists in reducing the electromagnetic field in the vicinity of an object by interposing a barrier between the source of the field and the object to be protected. The objective of the thesis is the development of techniques for characterizing and predicting new generation armor performance based on textiles and polymers for avionics applications between DC and 1 GHz. It takes place within the framework of the FUI NextGen project which is part of the field of aeronautical wiring protection. The thesis is composed of two main parts: 1) the characterization of the shielding efficiency from tests on plane samples (solid material or metal braid), 2) the transfer impedance measurement for sheaths.In the literature, the measurement of shielding efficiency (SE) is generally obtained either in free space or in a reverberating or anechoic chamber. In these methods, the material under test is placed between two antennas. The value of the measured shielding efficiency depends on the operating band of the antennas which starts at a few MHz and cannot reach low frequencies in the band of a few kHz. Shielding efficiency is obtained by measuring the attenuation of the electromagnetic field through a material relative to the transmission of the field without material. Herein, the shielding efficiency measurement is made by a coaxial cell, considering all the parameters that affect the measurement such as electrical contact, measurement dynamics, etc., allowing leasuring the effectiveness of shielding from DC to 1 GHz and this suits the demand in the aeronautical field. These measurements are validated with theoretical modeling and electromagnetic simulations.Concerning the sheath transfer impedance, its measurement is carried out with a triaxial cell. The transfer impedance (Z_t) is defined as the quotient of the voltage (V1) induced in the internal circuit by the current (I2) introduced into the external circuit over a given coupling length. The current standard of transfer impedance measurement is defined up to 100 MHz while in our developed cell the measurement reaches the frequency of 300 MHz without adaptation.Finally, to make the link between the shielding efficiency and the transfer impedance, a model for predicting the transfer impedance from the measurement of the shielding efficiency is proposed.Jalal ALAA EDDINE
Salmon, Raphael. "Natural language generation using abstract categorial grammars". Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCC196/document.
Texto completoThis thesis explores the usage of Abstract Categorial Grammars (ACG) for Natural Language Generation (NLG) in an industrial context. While NLG system based on linguistic theories have a long history, they are not prominent in industry, which, for the sake of simplicity and efficiency, usually prefer more ``pragmatic" methods. This study shows that recent advances in computational linguistics allow to conciliate the requirements of soundness and efficiency, by using ACG to build the main elements of a production grade NLG framework (document planner and microplanner), with performance comparable to existing, less advanced methods used in industry
Aslanides, Sophie. "Syntaxe et structure d'un texte : les connecteurs du français dans un système de génération automatique". Paris 7, 1995. http://www.theses.fr/1995PA070081.
Texto completoThis study aims defining the content and structure of the linguistic databases of a nlg system. More precisely, it concentrates on the lexical encoding of cue-prases - in which we include the full-stop, complex verb- phrases, relativization and participles - and the evaluation of the potential ambiguities of a complex discourse structure. As demonstrated by danlos (1985), the relevant item for lexical choice is not the connective by itself, but a set of constraints attached to if (henceforth, discourse structure, or ds). To define the relevant dss for a given semantic relation, a thorough analysis of the linguistic properties of cue-phrases is required, and more specifically, the determination of differential syntactic properties that reflect semantic variation. Once defined the dss families, i. E. All the possible dss built around a given cue-phrase - they are organised in a hierarchy which can serve as an interface between the conceptual level and the lexicon. But the ambiguities of complex discourse structures are thus only partly controlled. We therefore study the possible scope ambiguities in p1 c1 p2 c2 p3 discourses, and show the various factors which interfere with the choice of cue-phrases to create ambiguity (subordinate clause moving, ellipsis, pronominalisation, causal inference). The last part of this work proposes a tag-inspired tree representation for elementary dss and discusses the linguistic relevance of possible representations for complex dss as tree-structures
Ulysse, Jean-Christophe. "Génération d'atlas de textures de radiosité pour le rendu réaliste en temps réel". Vandoeuvre-les-Nancy, INPL, 2003. http://www.theses.fr/2003INPL052N.
Texto completoRadiosity is a method both physically correct and efficient to simulate global illumination. However, huge and complex CAD models are difficult to visualise when simulated by radiosity. In this thesis, we present an approach to optimise visualisation of such illuminated models. It is based on textures atlas that can store illumination in textures which are mapped on models in real time by graphies hardware. Our contributions are : (1) a robust approach to generate atlas and (2) a method to efficiently build those textures from global illumination data, which are both well suited to complex facettized models. The results presented are applied to architectural models and models from design industry. They show that, while this approach is simple to be integrated in an existing rendering software, it allows a real time visualisation of this kind of illuminated models
Paranthoën, Thomas. "Génération aléatoire et structure des automates à états finis". Rouen, 2004. http://www.theses.fr/2004ROUES032.
Texto completoRandom generation of combinatoric structures allows one to test algorithms based on this structure, and to investigate the behavior of these structures. In the case of deterministic automata, we give the generation algorithms that allow us to build these objects on any alphabets. We show that almost all complete accessible deterministic automata are minimal. In the case of nondeterministic automata we establish a probabilistic generation protocol that maximise the deterministic automata associated with these nondeterministic automata. Finally we continue the progress in the use of determinization for the pattern-matching problem. We formalize the technique of the partial determinization. We establish a data structure: the deterministic cover. This structure allows one to manipulate and to give properties of non-deterministic automata. We deduce from this structure a technique that reduces the complexity of the classical brute force determinization algorithm
Tournemire, Stéphanie de. "Identification et génération automatique de contours prosodiques pour la synthèse vocale à partir du texte en français". Paris, ENST, 1998. http://www.theses.fr/1998ENST0017.
Texto completoCharton, Eric. "Génération de phrases multilingues par apprentissage automatique de modèles de phrases". Phd thesis, Université d'Avignon, 2010. http://tel.archives-ouvertes.fr/tel-00622561.
Texto completoNarayan, Shashi. "Generating and simplifying sentences". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0166/document.
Texto completoDepending on the input representation, this dissertation investigates issues from two classes: meaning representation (MR) to text and text-to-text generation. In the first class (MR-to-text generation, "Generating Sentences"), we investigate how to make symbolic grammar based surface realisation robust and efficient. We propose an efficient approach to surface realisation using a FB-LTAG and taking as input shallow dependency trees. Our algorithm combines techniques and ideas from the head-driven and lexicalist approaches. In addition, the input structure is used to filter the initial search space using a concept called local polarity filtering; and to parallelise processes. To further improve our robustness, we propose two error mining algorithms: one, an algorithm for mining dependency trees rather than sequential data and two, an algorithm that structures the output of error mining into a tree to represent them in a more meaningful way. We show that our realisers together with these error mining algorithms improves on both efficiency and coverage by a wide margin. In the second class (text-to-text generation, "Simplifying Sentences"), we argue for using deep semantic representations (compared to syntax or SMT based approaches) to improve the sentence simplification task. We use the Discourse Representation Structures for the deep semantic representation of the input. We propose two methods: a supervised approach (with state-of-the-art results) to hybrid simplification using deep semantics and SMT, and an unsupervised approach (with competitive results to the state-of-the-art systems) to simplification using the comparable Wikipedia corpus
Pichon, Noémie. "Méthode de génération de données d’inventaire du génie des procédés textiles : contribution à l’écoconception des vêtements". Electronic Thesis or Diss., Centrale Lille Institut, 2023. http://www.theses.fr/2023CLIL0039.
Texto completoThe fashion and textile industry is a complex, highly fragmented, and globalized valuechain, requiring a wide range of professions with specific expertise, and a highly heterogeneous level ofknowledge regarding the sector's environmental burdens. Given that climate and environmental issueshave never been so high on the agenda, scientific literature has been growing in recent years to assessthe environmental and human health impacts of this sector, which has been identified as the fourth mostpolluting industry in Europe, all impact categories combined. The eco-design of products is today acentral approach to achieve the sector's impact reduction targets. The challenge today is to extend itsuse to as many players as possible.The main aim of this research was to develop a method for generating textile Life Cycle Inventory(LCI) data, in order to promote eco-design and continuous improvement in the production stage of agarment's life cycle. The research work was carried out at the finest scale of textile process engineering,i.e. at the unit process scale. An illustration of this method for a specific transformation stage in textileengineering: from fiber to yarn, also known as spinning, was therefore carried out, including thecalculation of uncertainties. Finally, the analysis of the contributions to the results highlighted eco-design leverages
Charton, Éric. "Génération de phrases multilingues par apprentissage automatique de modèles de phrases". Thesis, Avignon, 2010. http://www.theses.fr/2010AVIG0175/document.
Texto completoNatural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system. In this thesis report, we present an architecture of NLG system relying on statistical methods. The originality of our proposition is its ability to use a corpus as a learning resource for sentences production. This method offers several advantages : it simplifies the implementation and design of a multilingual NLG system, capable of sentence production of the same meaning in several languages. Our method also improves the adaptability of a NLG system to a particular semantic field. In our proposal, sentence generation is achieved trough the use of sentence models, obtained from a training corpus. Extracted sentences are abstracted by a labelling step obtained from various information extraction and text mining methods like named entity recognition, co-reference resolution, semantic labelling and part of speech tagging. The sentence generation process is achieved by a sentence realisation module. This module provide an adapted sentence model to fit a communicative intent, and then transform this model to generate a new sentence. Two methods are proposed to transform a sentence model into a generated sentence, according to the semantic content to express. In this document, we describe the complete labelling system applied to encyclopaedic content to obtain the sentence models. Then we present two models of sentence generation. The first generation model substitute the semantic content to an original sentence content. The second model is used to find numerous proto-sentences, structured as Subject, Verb, Object, able to fit by part a whole communicative intent, and then aggregate all the selected proto-sentences into a more complex one. Our experiments of sentence generation with various configurations of our system have shown that this new approach of NLG have an interesting potential
Narayan, Shashi. "Generating and simplifying sentences". Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0166.
Texto completoDepending on the input representation, this dissertation investigates issues from two classes: meaning representation (MR) to text and text-to-text generation. In the first class (MR-to-text generation, "Generating Sentences"), we investigate how to make symbolic grammar based surface realisation robust and efficient. We propose an efficient approach to surface realisation using a FB-LTAG and taking as input shallow dependency trees. Our algorithm combines techniques and ideas from the head-driven and lexicalist approaches. In addition, the input structure is used to filter the initial search space using a concept called local polarity filtering; and to parallelise processes. To further improve our robustness, we propose two error mining algorithms: one, an algorithm for mining dependency trees rather than sequential data and two, an algorithm that structures the output of error mining into a tree to represent them in a more meaningful way. We show that our realisers together with these error mining algorithms improves on both efficiency and coverage by a wide margin. In the second class (text-to-text generation, "Simplifying Sentences"), we argue for using deep semantic representations (compared to syntax or SMT based approaches) to improve the sentence simplification task. We use the Discourse Representation Structures for the deep semantic representation of the input. We propose two methods: a supervised approach (with state-of-the-art results) to hybrid simplification using deep semantics and SMT, and an unsupervised approach (with competitive results to the state-of-the-art systems) to simplification using the comparable Wikipedia corpus
El, Jed Olfa. "WebSum : système de résumé automatique de réponses des moteurs de recherche". Toulouse 3, 2006. http://www.theses.fr/2006TOU30145.
Texto completoThis thesis lies within the general framework of the information retrieval and more precisely, within the framework of the web document classification and organization. Our objective is to develop a system of automatic summarizing of the search engine answers in the encyclopaedic style (WebSum). This type of summary aims at classifying the search engine answers according to the various topics or what we call in our work, facets of the user query. To carry out this objective, we propose : - A method of identification of the facets of a given query based on the generative lexicon; - An approach of classification of the search engine answers under this various facets; - And a method of evaluation of the relevance of the web pages
Solanki, Jigar. "Approche générative conjointe logicielle-matérielle au développement du support protocolaire d’applications réseaux". Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0301/document.
Texto completoCommunications between network applications is achieved by using rulesets known as protocols. Protocol messages are managed by the application layer known as the protocol parsing layer or protocol handling layer. Protocol parsers are coded in software, in hardware or based on a co-design approach. They represent the interface between the application logic and the outside world. Thus, they are critical components of network applications. Global performances of network applications are directly linked to the performances of their protocol parser layers.Developping protocol parsers consists of translating protocol specifications, written in a high level language such as ABNF towards low level software or hardware code. As the use of embedded systems is growing, hardware ressources become more and more available to applications on systems on chip (SoC). Nonetheless, developping a network application that uses hardware ressources is challenging, requiring not only expertise in hardware design, but also a knowledge of the protocols involved and an understanding of low-level network programming.This thesis proposes a generative hardware-software co-design based approach to the developpement of network protocol message parsers, to improve their performances without increasing the expertise the developper may need. Our approach is based on a dedicated language, called Zebra, that generates both hardware and software elements that compose protocol parsers. The necessary expertise is deported in the use of the Zebra language and the generated hardware components permit to improve global performances.The contributions of this thesis are as follows : We provide an analysis of network protocols and applications. This analysis allows us to detect the elements which performances can be improved using hardware ressources. We present the domain specific language Zebra to describe protocol handling layers. Software and hardware components are then generated according to Zebra specifications. We have built a SoC running a Linux operating system to assess our approach.We have designed hardware accelerators for different network protocols that are deployed and driven by applications. To increase sharing of parsing units between several tasks, we have developped a middleware that seamlessly manages all the accesses to the hardware components. The Zebra middleware allows several clients to access the ressources of a hardware accelerator. We have conducted several set of experiments in real conditions. We have compared the performances of our approach with the performances of well-knownprotocol handling layers. We observe that protocol handling layers baded on our approach are more efficient that existing approaches
Chen, Yong. "Analyse et interprétation d'images à l'usage des personnes non-voyantes : application à la génération automatique d'images en relief à partir d'équipements banalisés". Thesis, Paris 8, 2015. http://www.theses.fr/2015PA080046/document.
Texto completoVisual information is a very rich source of information to which blind and visually impaired people (BVI) not always have access. The presence of images is a real handicap for the BVI. The transcription into an embossed image may increase the accessibility of an image to BVI. Our work takes into account the aspects of tactile cognition, the rules and the recommendations for the design of an embossed image. We focused our work on the analysis and comparison of digital image processing techniques in order to find the suitable methods to create an automatic procedure for embossing images. At the end of this research, we tested the embossed images created by our system with users with blindness. In the tests, two important points were evaluated: The degree of understanding of an embossed image; The time required for exploration.The results suggest that the images made by this system are accessible to blind users who know braille. The implemented system can be regarded as an effective tool for the creation of an embossed image. The system offers an opportunity to generalize and formalize the procedure for creating an embossed image. The system gives a very quick and easy solution.The system can process pedagogical images with simplified semantic contents. It can be used as a practical tool for making digital images accessible. It also offers the possibility of cooperation with other modalities of presentation of the image to blind people, for example a traditional interactive map
Bourreau, Pierre. "Jeux de typage et analyse de lambda-grammaires non-contextuelles". Phd thesis, Université Sciences et Technologies - Bordeaux I, 2012. http://tel.archives-ouvertes.fr/tel-00733964.
Texto completoVaillant, Pascal. "Interaction entre modalités sémiotiques : de l'icône à la langue". Phd thesis, Université Paris Sud - Paris XI, 1997. http://tel.archives-ouvertes.fr/tel-00327266.
Texto completoMax, Aurélien. "De la création de documents normalisés à la normalisation de documents en domaine contraint". Grenoble 1, 2003. http://www.theses.fr/2003GRE10227.
Texto completoWell-formedness conditions on documents in constrained domains are often hard to apply. An active research trend approaches the authoring of normalized documents through semantic specification, thereby facilitating such applications as multilingual production. However, the current systems are not able to analyse an existing document in order to normalize it. We therefore propose an approach that reuses the resources of such systems to recreate the semantic content of a document, from which a normalized textual version can be generated. This approach is based on two main paradigms : fuzzy inverted generation, which heuristically finds candidate semantic representations, and interactive negotiation, which allows an expert of the domain to progressively validate the semantic representation that corresponds to the original document
Dufour-Lussier, Valmi. "Reasoning with qualitative spatial and temporal textual cases". Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0182/document.
Texto completoThis thesis proposes a practical model making it possible to implement a case-based reasoning system that adapts processes represented as natural language text in response to user queries. While the cases and the solutions are in textual form, the adaptation itself is performed on networks of temporal constraints expressed with a qualitative algebra, using a belief revision operator. Natural language processing methods are used to acquire case representations and to regenerate text based on the adaptation result
Chikhi, Nacim Fateh. "Calcul de centralité et identification de structures de communautés dans les graphes de documents". Phd thesis, Université Paul Sabatier - Toulouse III, 2010. http://tel.archives-ouvertes.fr/tel-00619177.
Texto completoGzawi, Mahmoud. "Désambiguïsation de l’arabe écrit et interprétation sémantique". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE2006.
Texto completoThis thesis lies at the frontier of the fields of linguistic research and the automatic processing of language. These two fields intersect for the construction of natural language processing tools, and industrial applications integrating solutions for disambiguation and interpretation of texts.A challenging task, briefly approached and applied, has come to the work of the Techlimed company, that of the automatic analysis of texts written in Arabic. Novel resources have emerged as language lexicons and semantic networks allowing the creation of formal grammars to accomplish this task.An important meta-data for text analysis is "what is being said, and what does it mean". The field of computational linguistics offers very diverse and, mostly, partial methods to allow the computer to answer such questions.The main purpose of this thesis is to introduce and apply the rules of descriptive language grammar in formal languages specific to computer language processing.Beyond the realization of a system of processing and interpretation of texts in Arabic language based on computer modeling, our interest has been devoted to the evaluation of the linguistic phenomena described by the literature and the methods of their formalization in computer science.In all cases, our research was tested and validated in a rigorous experimental framework around several formalisms and computer tools.The experiments concerning the contribution of syntaxico-semantic grammar, a priori, have demonstrated a significant reduction of linguistic ambiguity in the case of the use of a finite-state grammar written in Java and a transformational generative grammarwritten in Prolog, integrating morphological, syntactic and semantic components.The implementation of our study required the construction of tools for word processing, information retrieval tools. These tools were built by us and are available in Open-source.The success of the application of our work in large scale was concluded by the requirement of having rich and comprehensive semantic resources. Our work has been redirected towards a process of production of such resources, in terms of informationretrieval and knowledge extraction. The tests for this new perspective were favorable to further research and experimentation
Landes, Pierre-Edouard. "Extraction d'information pour l'édition et la synthèse par l'exemple en rendu expressif". Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00637651.
Texto completoDufour-Lussier, Valmi. "Reasoning with qualitative spatial and temporal textual cases". Electronic Thesis or Diss., Université de Lorraine, 2014. http://www.theses.fr/2014LORR0182.
Texto completoThis thesis proposes a practical model making it possible to implement a case-based reasoning system that adapts processes represented as natural language text in response to user queries. While the cases and the solutions are in textual form, the adaptation itself is performed on networks of temporal constraints expressed with a qualitative algebra, using a belief revision operator. Natural language processing methods are used to acquire case representations and to regenerate text based on the adaptation result