Dissertations / Theses on the topic 'Simplification Automatique des Textes'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Simplification Automatique des Textes.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Hijazi, Rita. "Simplification syntaxique de textes à base de représentations sémantiques exprimées avec le formalisme Dependency Minimal Recursion Semantics (DMRS)." Electronic Thesis or Diss., Aix-Marseille, 2022. http://theses.univ-amu.fr.lama.univ-amu.fr/221214_HIJAZI_602vzfxdu139bxtesm225byk629aeqyvw_TH.pdf.
Full textText simplification is the task of making a text easier to read and understand and more accessible to a target audience. This goal can be reached by reducing the linguistic complexity of the text while preserving the original meaning as much as possible. This thesis focuses on the syntactic simplification of texts in English, a task for which these automatic systems have certain limitations. To overcome them, we first propose a new method of syntactic simplification exploiting semantic dependencies expressed in DMRS (Dependency Minimal Recursion Semantics), a deep semantic representation in the form of graphs combining semantics and syntax. Syntactic simplification enables to represent the complex sentence in a DMRS graph, transforming this graph according to specific strategies into other DMRS graphs, which will generate simpler sentences. This method allows the syntactic simplification of complex constructions, in particular division operations such as subordinate clauses, appositive clauses, coordination and also the transformation of passive forms into active forms. The results obtained by this system of syntactic simplification surpass those of the existing systems of the same type in the production of simple, grammatical sentences and preserving the meaning, thus demonstrating all the interest of our approach to syntactic simplification based on semantic representations in DMRS
Tremblay, Christian. "L' apport de la modélisation des connaissances à la codification et à la simplification des textes normatifs : Analyse sémantico-syntaxique des textes normatifs ou la linguistique générale au service du droit." Paris 2, 2002. http://www.theses.fr/2002PA020115.
Full textFarzindar, Atefeh. "Résumé automatique de textes juridiques." Paris 4, 2005. http://www.theses.fr/2005PA040032.
Full textWe have developed a summarization system, called LetSum, for producing short summaries for legal decisions. We have collaborated with the lawyers of the Public Law Research Center of Université de Montréal. Our method is based on the manual analysis of the judgments by comparing manually written summaries and source documents, which investigates the extraction of the most important units based on the identification of thematic structure of the document. The production of the summary is done in four steps:1. Thematic segmentation detects the thematic structure of a judgment. We distinguish seven themes: Decision data (gives the complete reference of the decision and the relation between the parties for planning the decision. ), Introduction (who? did what? to whom?), Context (recomposes the story from the facts and events), Submission (presents the point of view the parties), Issues (identifies the questions of law), Juridical Analysis (describes the analysis of the judge), Conclusion (the final decision of the court). 2. Filtering identifies parts of the text which can be eliminated, without losing relevant information for the summary, like the citations. 3. Selection builds a list of the best candidate units for each structural level of the summary. 4. Production chooses the units for the final summary and combines them in order to produce a summary of about 10% of the judgement. The evaluations of 120 summaries by 12 lawyers show the quality of summaries produced by LetSum, which are judgedexcellent
Narayan, Shashi. "Generating and simplifying sentences." Thesis, Université de Lorraine, 2014. http://www.theses.fr/2014LORR0166/document.
Full textDepending on the input representation, this dissertation investigates issues from two classes: meaning representation (MR) to text and text-to-text generation. In the first class (MR-to-text generation, "Generating Sentences"), we investigate how to make symbolic grammar based surface realisation robust and efficient. We propose an efficient approach to surface realisation using a FB-LTAG and taking as input shallow dependency trees. Our algorithm combines techniques and ideas from the head-driven and lexicalist approaches. In addition, the input structure is used to filter the initial search space using a concept called local polarity filtering; and to parallelise processes. To further improve our robustness, we propose two error mining algorithms: one, an algorithm for mining dependency trees rather than sequential data and two, an algorithm that structures the output of error mining into a tree to represent them in a more meaningful way. We show that our realisers together with these error mining algorithms improves on both efficiency and coverage by a wide margin. In the second class (text-to-text generation, "Simplifying Sentences"), we argue for using deep semantic representations (compared to syntax or SMT based approaches) to improve the sentence simplification task. We use the Discourse Representation Structures for the deep semantic representation of the input. We propose two methods: a supervised approach (with state-of-the-art results) to hybrid simplification using deep semantics and SMT, and an unsupervised approach (with competitive results to the state-of-the-art systems) to simplification using the comparable Wikipedia corpus
Nakamura-Delloye, Yayoi. "Alignement automatique de textes parallèles Français-Japonais." Phd thesis, Université Paris-Diderot - Paris VII, 2007. http://tel.archives-ouvertes.fr/tel-00266261.
Full textLa présente thèse est constituée de deux types de travaux : les travaux introducteurs et ceux constituant le noyau central. Ce dernier s'articule autour de la notion de proposition syntaxique.
Les travaux introducteurs comprennent l'étude des généralités sur l'alignement ainsi que des travaux consacrés à l'alignement des phrases. Ces travaux ont conduit à la réalisation d'un système d'alignement des phrases adapté au traitement des textes français et japonais.
Le noyau de la thèse est composé de deux types de travaux, études linguistiques et réalisations informatiques. Les études linguistiques se divisent elles-mêmes en deux sujets : la proposition en français et la proposition en japonais. Le but de nos études sur la proposition française est de définir une grammaire pour la détection des propositions. Pour cet effet, nous avons cherché à définir une typologie des propositions, basée sur des critères uniquement formels. Dans les études sur le japonais, nous définissons d'abord la phrase japonaise sur la base de l'opposition thème-rhème. Nous tentons ensuite d'élucider la notion de proposition.
Les réalisations informatiques comportent trois tâches composant ensemble au final l'opération d'alignement des propositions, incarnées par trois systèmes informatiques distincts : deux détecteurs de propositions (un pour le français et un pour le japonais), ainsi qu'un système d'alignement des propositions.
Nakamura, Delloye Yayoi. "Alignement automatique de textes parallèles français - japonais." Paris 7, 2007. http://www.theses.fr/2007PA070054.
Full textAutomatic alignment aims to match elements of parallel texts. We are interested especially in the implementation of a System which carries out alignment at the clause level. Clause is a beneficial linguistic unit for many applications. This thesis consists of two types of works: the introductory works and those that constitute the thesis core. It is structured around the concept of syntactic clause. The introductory works include an overview of alignment and studies on sentence alignment. These works resulted in the creation of a sentence alignment System adapted to French and Japanese text processing. The thesis core consists of two types of works: linguistic studies and implementations. The linguistic studies are themselves divided into two topics: French clause and Japanese clause. The goal of our French clause studies is to define a grammar for clause identification. For this purpose, we attempted to define a typological classification of clauses, based on formal criteria only. In Japanese studies, we first define the Japanese sentence on the basis of the theme-rheme structure. We then try to elucidate the notion of clause. Implementation works consist of three tasks which finally constitute the clause alignment processing. These tasks are carried out by three separate tools: two clauses identification Systems (one for French texts and one for Japanese texts) and a clause alignment System
Feat, Jym. "Parametres enonciatifs et comprehension automatique de textes." Paris 6, 1986. http://www.theses.fr/1986PA066553.
Full textJalam, Radwan. "Apprentissage automatique et catégorisation de textes multilingues." Lyon 2, 2003. http://theses.univ-lyon2.fr/documents/lyon2/2003/jalam_r.
Full textFeat, Jym. "Paramètres énonciatifs et compréhension automatique de textes." Grenoble 2 : ANRT, 1986. http://catalogue.bnf.fr/ark:/12148/cb37597622q.
Full textJalam, Radwan Chauchat Jean-Hugues. "Apprentissage automatique et catégorisation de textes multilingues." Lyon : Université Lumière Lyon 2, 2003. http://demeter.univ-lyon2.fr/sdx/theses/lyon2/2003/jalam_r.
Full textGarneau, Cyril. "Simplification automatique de modèle et étude du régime permanent." Master's thesis, Université Laval, 2009. http://hdl.handle.net/20.500.11794/21802.
Full textLes modèles mathématiques servant à simuler le comportement de stations d'épurations représentent un outil puissant pour concevoir une nouvelle installation ou prédire le comportement d'une station d'épuration déjà existante. Cependant, ces modèles ne fournissent aucune information sur un système particulier sans un algorithme pour les résoudre. Il existe actuellement un grand nombre d'algorithmes d'intégration capables de calculer la solution d'un modèle avec précision. Cependant, les temps de calcul en jeux représentent toujours l'un des obstacles à une utilisation extensive des modèles. Deux approches permettent de réduire les temps de calcul, à savoir l'utilisation de matériel informatique plus puissant ou le développement de logiciels et algorithmes plus performants. L'objectif principal de ce mémoire est de proposer une troisième voie, soit la simplification automatique d'un modèle sur la base de ses valeurs propres. Le jacobien, une approximation locale du modèle, est utilisé comme base de l'étude des valeurs propres. Une méthode d'homotopie est ensuite utilisée pour maintenir le lien entre les valeurs propres et les variables d'état d'un jacobien simplifié à sa seule diagonale aux valeurs propres du jacobien entier. Puisque les valeurs propres représentent une approximation valable de la dynamique des variables d'état d'un modèle, il est possible de trier ces variables d'état sur la base de leurs valeurs propres associées. Les variables d'état présentant une dynamique très rapide par rapport à l'échelle de temps d'intérêt seront alors considérées comme étant toujours à l'équilibre, ce qui permet de négliger leur dynamique transitoire et donc d'accélérer la résolution du modèle. Cette simplification est réalisée à l'intérieur d'un algorithme d'intégration de type Diagonal Implicite de Runge-Kutta capable de résoudre des systèmes d'équations différentielles et algébriques. Ce mémoire s'attaque également à un cas particulier de la simulation, soit le calcul du régime permanent. Ce calcul peut être réalisé par des algorithmes performants ne recherchant que les valeurs des variables d'état mettant à zéro les équations différentielles. Ces algorithmes sont cependant peu fiables puisque toute solution mathématique est jugée valide, peu importe la réalité physique. La solution proposée est l'injection de connaissance sous forme de bornes aux valeurs que peuvent prendre les variables d'état. Des équations algébriques implicites sont construites automatiquement sur ces bornes pour forcer la convergence dans l'intervalle voulu.
Hankach, Pierre. "Génération automatique de textes par satisfaction de contraintes." Paris 7, 2009. http://www.theses.fr/2009PA070027.
Full textWe address in this thesis the construction of a natural language generation System - computer software that transforms a formal representation of information into a text in natural language. In our approach, we define the generation problem as a constraint satisfaction problem (CSP). The implemented System ensures an integrated processing of generation operations as their different dependencies are taken into account and no priority is given to any type of operation over the others. In order to define the constraint satisfaction problem, we represent the construction operations of a text by decision variables. Individual operations that implement the same type of minimal expressions in the text form a generation task. We classify decision variables according to the type of operations they represent (e. G. Content selection variables, document structuring variables. . . ). The linguistic rules that govern the operations are represented as constraints on the variables. A constraint can be defined over variables of the same type or different types, capturing the dependency between the corresponding operations. The production of a text consists of resolving the global System of constraints, that is finding an evaluation of the variables that satisfies all the constraints. As part of the grammar of constraints for generation, we particularly formulate the constraints that govern document structuring operations. We model by constraints the rhetorical structure of SORT in order to yield coherent texts as the generator's output. Beforehand, in order to increase the generation capacities of our System, we extend the rhetorical structure to cover texts in the non-canonical order. Furthermore, in addition to defining these coherence constraints, we formulate a set of constraints that enables controlling the form of the macrostructure by communicative goals. Finally, we propose a solution to the problem of computational complexity of generating large texts. This solution is based on the generation of a text by groups of clauses. The problem of generating a text is therefore divided into many problems of reduced complexity, where each of them is concerned with generating a part of the text. These parts are of limited size so the associated complexity to their generation remains reasonable. The proposed partitioning of generation is motivated by linguistic considerations
Friburger, Nathalie. "Reconnaissance automatique des noms propres : application à la classification automatique de textes journalistiques." Tours, 2002. http://www.theses.fr/2002TOUR4011.
Full textBuet, François. "Modèles neuronaux pour la simplification de parole, application au sous-titrage." Electronic Thesis or Diss., université Paris-Saclay, 2022. https://theses.hal.science/tel-03920729.
Full textIn the context of linguistics, simplification is generally defined as the process consisting in reducing the complexity of a text (or speech), while preserving its meaning as much as possible. Its primary application is to make understanding and reading easier for a user. It is regarded, inter alia, as a way to enhance the legibility of texts toward deaf and hard-of-hearing people (deafness often causes a delay in reading development), in particular in the case of subtitling. While interlingual subtitles are used to disseminate movies and programs in other languages, intralingual subtitles (or captions) are the only means, with sign language interpretation, by which the deaf and hard-of-hearing can access audio-visual contents. Yet videos have taken a prominent place in society, wether for work, recreation, or education. In order to ensure the equality of people through participation in public and social life, many countries in the world (including France) have implemented legal obligations concerning television programs subtitling. ROSETTA (Subtitling RObot and Adapted Translation) is a public-private collaborative research program, seeking to develop technological accessibility solutions for audio-visual content in French. This thesis, conducted within the ROSETTA project, aims to study automatic speech simplification with neural models, and to apply it into the context of intralinguistic subtitling for French television programs. Our work mainly focuses on analysing length control methods, adapting subtitling models to television genres, and evaluating subtitles segmentation. We notably present a new subtitling corpus created from data collected as part of project ROSETTA, as well as a new metric for subtitles evaluation, Sigma
Kosawat, Krit. "Méthodes de segmentation et d'analyse automatique de textes thaï." Phd thesis, Université Paris-Est, 2003. http://tel.archives-ouvertes.fr/tel-00626256.
Full textVinot, Romain. "Classification automatique de textes dans des catégories non thématiques." Phd thesis, Télécom ParisTech, 2004. http://pastel.archives-ouvertes.fr/pastel-00000812.
Full textJilani, Inès. "Extraction automatique de connaissances à partir de textes biomédicaux." Paris 6, 2009. http://www.theses.fr/2009PA066271.
Full textNosary, Ali. "Reconnaissance automatique de textes manuscrits par adaptation au scripteur." Rouen, 2002. http://www.theses.fr/2002ROUES007.
Full textThis thesis deals with the problem of off-line handwritten text recognition. It describes a system of text recognition which exploits an original principle of adaptation to the handwriting to be recognized. The adaptation principle, inspired by contextual effects observed from a human reader, is based on the automatic learning, during the recognition, of the graphical characteristics of the handwriting (writer invariants). The word recognition proceeds according to an analytical approach based on a segmentation-recognition principle. The on-line adaptation of the recognition system relies on the iteration of two steps : a word recognition step which allows to label the writer's representations (allographes) on the whole text and a revaluation step of character models. The implementation of our adaptation strategy requires an interactive recognition scheme able to make interact treatments at various contextual levels. The interaction model retained is based on the multi-agent paradigm
Vinot, Romain. "Classification automatique de textes dans des catégories non thématiques /." Paris : École nationale supérieure des télécommunications, 2004. http://catalogue.bnf.fr/ark:/12148/cb39294964h.
Full textRosmorduc, Serge. "Analyse morpho-syntaxique de textes non ponctués : application aux textes hieroglyphiques." Cachan, Ecole normale supérieure, 1996. http://www.theses.fr/1996DENS0028.
Full textKraif, Olivier. "Constitution et exploitation de bi-textes pour l'Aide à la traduction." Nice, 2001. http://www.theses.fr/2001NICE2018.
Full textMartin, Louis. "Simplification automatique de phrases à l'aide de méthodes contrôlables et non supervisées." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS265.
Full textIn this thesis we study the task of automatic sentence simplification. We first study the different methods used to evaluate simplification models, highlight several shortcomings of current approaches, and propose new contributions. We then propose to train sentence simplification models that can be adapted to the target user, allowing for greater simplification flexibility. Finally, we extend the scope of sentence simplification to several languages, by proposing methods that do not require annotated training data, but that nevertheless achieve very strong performance
Cotto, Daniel. "Traitement automatique des textes en vue de la synthèse vocale." Toulouse 3, 1992. http://www.theses.fr/1992TOU30225.
Full textGurtner, Karine. "Extraction automatique de connaissances à partir de corpus de textes." Paris 7, 2000. http://www.theses.fr/2000PA077104.
Full textPham, Thi Nhung. "Résolution des anaphores nominales pour la compréhension automatique des textes." Thesis, Sorbonne Paris Cité, 2017. http://www.theses.fr/2017USPCD049/document.
Full textIn order to facilitate the interpretation of texts, this thesis is devoted to the development of a system to identify and resolve the indirect nominal anaphora and the associative anaphora. Resolution of the indirect nominal anaphora is based on calculating salience weights of candidate antecedents with the purpose of associating these antecedents with the anaphoric expressions identified. It is processed by twoAnnexe317different methods based on a linguistic approach: the first method uses lexical and morphological parameters; the second method uses morphological and syntactical parameters. The resolution of associative anaphora is based on syntactical and semantic parameters.The results obtained are encouraging: 90.6% for the indirect anaphora resolution with the first method, 75.7% for the indirect anaphora resolution with the second method and 68.7% for the associative anaphora resolution. These results show the contribution of each parameter used and the utility of this system in the automatic interpretation of the texts
Arnulphy, Béatrice. "Désignations nominales des événements : étude et extraction automatique dans les textes." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00758062.
Full textMuhammad, Humayoun. "Développement du système MathNat pour la formalisation automatique des textes mathématiques." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00680095.
Full textFourour, Nordine. "Identification et catégorisation automatique des entités nommées dans les textes français." Nantes, 2004. http://www.theses.fr/2004NANT2126.
Full textNamed Entity (NE) Recognition is a recurring problem in the different domain of Natural Language Processing. As a result of, a linguistic investigation allowing to set-up operational parameters defining the concept of named entity, a state of art of the domain, and a corpus investigation using referential and graphical criteria, we present Nemesis - a French named entity recognizer. This system analyzes the internal and external evidences by using grammar rules and trigger word lexicons, and includes a learning process. With these processes, Nemesis performance achieves about 90% of precision and 80% of recall. To increase the recall, we put forward optional modules (analysis of the wide context and utilization of the Web as a source of new contexts) and investigate in setting up a disambiguation and grammar rules inference module
Godbout, Mathieu. "Approches par bandit pour la génération automatique de résumés de textes." Master's thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69488.
Full textThis thesis discusses the use of bandit methods to solve the problem of training extractive abstract generation models. The extractive models, which build summaries by selecting sentences from an original document, are difficult to train because the target summary of a document is usually not built in an extractive way. It is for this purpose that we propose to see the production of extractive summaries as different bandit problems, for which there exist algorithms that can be leveraged for training summarization models.In this paper, BanditSum is first presented, an approach drawn from the literature that sees the generation of the summaries of a set of documents as a contextual bandit problem. Next,we introduce CombiSum, a new algorithm which formulates the generation of the summary of a single document as a combinatorial bandit. By exploiting the combinatorial formulation,CombiSum manages to incorporate the notion of the extractive potential of each sentence of a document in its training. Finally, we propose LinCombiSum, the linear variant of Com-biSum which exploits the similarities between sentences in a document and uses the linear combinatorial bandit formulation instead
Frath, Pierre. "Semantique, reference et acquisition automatique de connaissances a partir de textes." Strasbourg 2, 1997. http://www.theses.fr/1997STR20079.
Full textAutomatic knowledge acquisition from text ideally consists in generating a structured representation of a corpus, which a human or a machine should be able to query. Designing and realising such a system raises a number of difficulties, both theoretical and practical, which we intend to look into. The first part of this dissertation studies the two main approaches to the problem : automatic terminology retrieval, and model driven knowledge acquisition. The second part studies the mostly implicit theoretical foundations of natural language processing i. E. Logical positivism and componential lexical semantics. We offer an alternative inspired from the work of charles sanders peirce, ludwig wittgenstein and georges kleiber, i. E. A semantics based on the notions of sign, usage and reference. The third part is devoted to a detailed semantic analysis of a medical corpus. Reference is studied through two notions, denomination and denotation. Denominations allow for arbitrary, preconstructed and opaque reference; denotations, for discursive, constructed and transparent reference. In the fourth part, we manually construct a detailed representation of a fragment of the corpus. The aim is to study the relevance of the theoretical analysis and to set precise objectives to the system. The fifth part focuses on implementation. It is devoted to the construction of a terminological knowledge base capable of representing a domain corpus, and sufficiently structured for use by applications in terminology or domain modelling for example. In a nutshell, this dissertation examines automatic knowledge acquisition from text from a theoretical and technical point of view, with the technology setting the guidelines for the theoretical discussions
Ould, Abdel Vetah Mohamed. "Apprentissage automatique appliqué à l'extraction d'information à partir de textes biologiques." Paris 11, 2005. http://www.theses.fr/2005PA112133.
Full textThis thesis is about information extraction from textual data. Two main approaches co-exist in this field. The first approach is based on shallow text analysis. These methods are easy to implement but the information they extract is often incomplete and noisy. The second approach requires deeper structural linguistic information. Compared to the first approach, it has the double advantage of being easily adaptable and of taking into account the diversity of formulation which is an intrinsic characteristic of textual data. In this thesis, we have contributed to the realization of a complete information extraction tool based on this latter approach. Our tool is dedicated to the automatic extraction of gene interactions described in MedLine abstracts. In the first part of the work, we develop a filtering module that allows the user to identify the sentences referring to gene interactions. The module is available on line and already used by biologists. The second part of the work introduces an original methodology based on an abstraction of the syntactic analysis for automatical learning of information extraction rules. The preliminary results are promising and show that our abstraction approach provides a good representation for learning extraction rules
Boussema, Kaouther. "Système de génération automatique de programmes d'entrées-sorties : le système IO." Paris 9, 1998. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1998PA090048.
Full textChabbat, Bertrand Pinon Jean-Marie Ou-Halima Mohamed. "Modélisation multiparadigme de textes réglementaires." Villeurbanne : Doc'INSA, 2005. http://docinsa.insa-lyon.fr/these/pont.php?id=chabbat.
Full textSzulman, Sylvie. "Enrichissement d'une base de connaissances à partir de textes en langage naturel." Paris 13, 1990. http://www.theses.fr/1990PA132020.
Full textNam, Hyeonsook. "Analyse linguistique de textes économiques en français en vue d'un traitement automatique." Nice, 1996. http://www.theses.fr/1996NICE2033.
Full textThe present study is a terminological analysis of the financial domain, in a natural language processing perspective. The derived and compounded forms of a corpus are analysed, emphasis being put on the syntactic and semantic relations between their elements, in order to pick out the constituents that are most productive in the economical language. The research also includes the idiomatic and recurrent expressions of the corpus. As a conclusion, korean and french economical terms are contrasted, so as to supply the translater with an editing and translation toolbox
Wandji, Tchami Ornella. "Analyse contrastive des verbes dans des corpus médicaux et création d’une ressource verbale de simplification de textes." Thesis, Lille 3, 2018. http://www.theses.fr/2018LIL3H015/document.
Full textWith the evolution of Web technology, healthcare documentation is becoming increasinglyabundant and accessible to all, especially to patients, who have access to a large amount ofhealth information. Unfortunately, the ease of access to medical information does not guaranteeits correct understanding by the intended audience, in this case non-experts. Our PhD work aimsat creating a resource for the simplification of medical texts, based on a syntactico-semanticanalysis of verbs in four French medical corpora, that are distinguished according to the levelof expertise of their authors and that of the target audiences. The resource created in thepresent thesis contains 230 syntactico-semantic patterns of verbs (called pss), aligned withtheir non-specialized equivalents. The semi-automatic method applied, for the analysis of verbs,in order to achieve our goal is based on four fundamental tasks : the syntactic annotation of thecorpora, carried out thanks to the Cordial parser (Laurent et al., 2009) ; the semantic annotationof verb arguments, based on semantic categories of the French version of a medical terminologyknown as Snomed International (Côté, 1996) ; the acquisition of syntactico-semantic patternsof verbs and the contrastive analysis of the verbs behaviors in the different corpora. Thepss, acquired at the end of this process, undergo an evaluation (by three teams of medicalexperts) which leads to the selection of candidates constituting the nomenclature of our textsimplification resource. These pss are then aligned with their non-specialized equivalents, thisalignment leads to the creation of the simplification resource, which is the main result of ourPhD study. The content of the resource was evaluated by two groups of people : linguists andnon-linguists. The results show that the simplification of pss makes it easier for non-expertsto understand the meaning of verbs used in a specialized way, especially when a certain set ofparameters is collected
Loughraïeb, Mounira. "Valence et rôles thématiques comme outils de réduction d’ambiguïtés en traitement automatique de textes écrits." Nancy 2, 1990. http://www.theses.fr/1990NAN21005.
Full textChabbat, Bertrand. "Modélisation multiparadigme de textes réglementaires." Lyon, INSA, 1997. http://theses.insa-lyon.fr/publication/1997ISAL0118/these.pdf.
Full textThe topic of this thesis is the design of a model that is able to represent legal texts so that they can be handled by an organization for which legal texts are a raw material. The coordinated and consistent maintenance of the legal objects (texts and expert system rules) is the main goal of our study. The French Famdy Allowance National Fund (Cnaf) has supported this research work. First of all, we analyse the text flows from the parliament to the final users, and we highlight the specificities of these legal texts. Then, we propose a metamodel able to represent different kinds of semantic models for documents. We choose the SGML and HyTime norms and propose a logical paradigm defined by a logical modeling of legal texts relying on the specificities of these texts. We also propose another paradigm called indexing and information retrieval taking account of the semantics of information. To answer to the need of a coordinated maintenance for legal objects, we then propose a semantic paradigm defined by a semantic modeling (using SGML and HyTime) relying on the legal theories. This modeling enables the users to locate precisely inside the texts the expert system rules and predicates that are concerned by legislative changes. At last, we synthesize the whole in a multiparadigm modeling of legal texts
Alsandouk, Fatima. "Grammaire de scene : processus de comprehension de textes de description geometrique." Toulouse 2, 1990. http://www.theses.fr/1990TOU20058.
Full textMorsi, Youcef Ihab. "Analyse linguistique et extraction automatique de relations sémantiques des textes en arabe." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCC019.
Full textThis thesis focuses on the development of a tool for the automatic processing of Modern Standard Arabic, at the morphological and semantic levels, with the final objective of Information Extraction on technological innovations. As far as the morphological analysis is concerned, our tool includes several successive processing stages that allow to label and disambiguate occurrences in texts: a morphological layer (Gibran 1.0), which relies on Arabic pattern as distinctive features; a contextual layer (Gibran 2.0), which uses contextual rules; and a third layer (Gibran 3.0), which uses a machine learning model. Our methodology is evaluated using the annotated corpus Arabic-PADT UD treebank. The evaluations obtain an F-measure of 0.92 and 0.90 for the morphological analyses. These experiments demontrate the possibility of improving such a corpus through linguistic analyses. This approach allowed us to develop a prototype of information extraction on technological innovations for the Arabic language. It is based on the morphological analysis and syntaxico-semantic patterns. This thesis is part of a PhD-entrepreneur course
Constant, Mathieu. "Grammaires locales pour l'analyse automatique de textes : méthodes de construction et outils de gestion." Marne-la-Vallée, 2003. http://www.theses.fr/2003MARN0169.
Full textMany researchers in the field of Natural Language Processing have shown the significance of descriptive linguistics and especially the use of large-scaled databases of fine-grained linguistic components composed of lexicons and grammars. This approach has a drawback: it requires long-term investment. It is then necessary to develop methods and computational tools to help the construction of such data that are required to be directly applicable to texts. This work focuses on a specific linguistic representation: local grammars that describe precise and local constraints in the form of graphs. Two issues arise : How to efficiently build precise, complete and text-applicable grammars? How to deal with their growing number and their dispersion ? To handle the first problem, a set of simple and empirical methods have been exposed on the basis of M. Gross (1975)'s lexicon-grammar methodology. The whole process of linguistic analysis and formal representation has been described through the examples of two original phenomena: expressions of measurement (un immeuble d'une hauteur de 20 mètres) and locative prepositional phrases containing geographical proper names (à l'île de la Réunion). Each phenomenon has been narrowed to elementary sentences. This enables semantically classify them according to formal criteria. The syntactical behavior of these sentences has been systematically studied according to the lexical value of their elements. Then, the observed properties have been encoded either directly in the form of graphs with an editor or in the form of syntactical matrices then semi-automatically converted into graphs according to E. Roche (1993). These studies led to develop new conversion algorithms in the case of matrix systems where linguistic information is encoded in several matrices. For the second issue, a prototype on-line library of local grammars have been designed and implemented. The objective is to centralize and distribute local grammars constructed within the RELEX network of laboratories. We developed a set of tools allowing users to both store new graphs and search for graphs according to different criteria. The implementation of a grammar search engine led to an investigation into a new field of information retrieval: searching of linguistic information into sets of local grammars
Roussarie, Laurent. "Un modele theorique d'inference de structures semantiques et discursives dans le cadre de la generation automatique de textes." Paris 7, 2000. http://www.theses.fr/2000PA070059.
Full textHue, Jean-François. "L'analyse contextuelle des textes en langue naturelle : les systèmes de réécritures typées." Nantes, 1995. http://www.theses.fr/1995NANT2034.
Full textDenjean, Pascale. "Interrogation d'un système vidéotex arborescent : l"indexation des textes." Toulouse 3, 1989. http://www.theses.fr/1989TOU30235.
Full textYousfi-Monod, Mehdi. "Compression automatique ou semi-automatique de textes par élagage des constituants effaçables : une approche interactive et indépendante des corpus." Phd thesis, Université Montpellier II - Sciences et Techniques du Languedoc, 2007. http://tel.archives-ouvertes.fr/tel-00185367.
Full textL'originalité de la thèse consiste à s'attaquer à une variété fort peu explorée, la compression de textes, par une technique non supervisée.
Ce travail propose un système incrémental et interactif d'élagage de l'arbre syntagmatique des phrases, tout en préservant la cohérence syntaxique et la conservation du contenu informationnel important.
Sur le plan théorique, le travail s'appuie sur la théorie du gouvernement de Noam Chomsky et plus particulièrement sur la représentation formelle de la théorie X-barre pour aboutir à un fondement théorique important pour un modèle computationnel compatible avec la compression syntaxique de phrases.
Le travail a donné lieu a un logiciel opérationnel, nommé COLIN, qui propose deux modalités : une compression automatique, et une aide au résumé sous forme semi-automatique, dirigée par l'interaction avec l'utilisateur.
Le logiciel a été évalué grâce à un protocole complexe par 25 utilisateurs bénévoles.
Les résultats de l'expérience montrent que 1) la notion de résumé de référence qui sert aux évaluations classiques est discutable 2) les compressions semi-automatiques ont été fortement appréciées 3) les compressions totalement automatiques ont également obtenu de bons scores de satisfaction.
À un taux de compression supérieur à 40% tous genres confondus, COLIN fournit un support appréciable en tant qu'aide à la compression de textes, ne dépend d'aucun corpus d'apprentissage, et présente une interface convivial.
Yousfi, Monod Mehdi. "Compression automatique ou semi-automatique de textes par élagage des constituants effaçables : une approche interactive et indépendante des corpus." Montpellier 2, 2007. http://www.theses.fr/2007MON20228.
Full textNguyen, Thi Minh Huyen. "Outils et ressources linguistiques pour l'alignement de textes multilingues français-vietnamiens." Phd thesis, Université Henri Poincaré - Nancy I, 2006. http://tel.archives-ouvertes.fr/tel-00105592.
Full textZemirli, Zouhir. "Synthèse vocale de textes arabes voyellés." Toulouse 3, 2004. http://www.theses.fr/2004TOU30262.
Full textThe text to speech synthesis consists in creating speech by analysis of a text which is subjected to no restriction. The object of this thesis is to describe the modeling and the taking into account of knowledge in phonetic, phonological, morpho-lexical and syntactic necessary to the development of a complete system of voice synthesis starting from diacritized arab texts. The automatic generation of the prosodico-phonetics sequence required the development of several components. The morphosyntaxic labelling "TAGGAR" carries out grammatical labelling, a marking and a syntactic grouping and the automatic insertion of the pauses. Graphemes to phonemes conversion is ensured by using lexicons, syntactic grammars, morpho-orthographical and phonological rules. A multiplicative model of prediction of the duration of the phonemes is described and a model of generation of the prosodic contours based on the accents of the words and the syntactic group is presented
Scharff, Christelle. "Déduction avec contraintes et simplification dans les théories équationnelles." Nancy 1, 1999. http://docnum.univ-lorraine.fr/public/SCD_T_1999_0271_SCHARFF.pdf.
Full textMoulinier, Isabelle. "Une approche de la categorisation de textes par l'apprentissage symbolique." Paris 6, 1996. http://www.theses.fr/1996PA066638.
Full text