Dissertations / Theses on the topic 'Ukrainien (langue) – Analyse automatique (linguistique)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Ukrainien (langue) – Analyse automatique (linguistique).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Saint-Joanis, Olena. "Formalisation de la langue ukrainienne avec NooJ : préparation du module ukrainien." Electronic Thesis or Diss., Bourgogne Franche-Comté, 2024. http://www.theses.fr/2024UBFCC005.
Full textLthough interest in the Ukrainian language has increased greatly in recent years, it remains poorly described and schematized. The few Natural Language Processing (NLP) software applications available do not necessarily meet the needs of students or researchers. These tools have been developed using stochastic approaches and, therefore, do not have a solid linguistic basis. Consequently, their usefulness is questionable, as they produce too many errors. After studying these available NLP applications, we chose to use the NooJ linguistic platform to process Ukrainian because it provides us with the tools we need to develop linguistic resources in the form of dictionaries and orthographic, morphological, syntactic, and semantic grammars. Note that NooJ also provides users with tools to manage corpora, perform various statistical analyses, and is well adapted to construct pedagogical applications. We have built a Ukrainian module for NooJ that consists of a main dictionary, "Ukr_dictionary_V.1.3," and two secondary dictionaries, "Ukr_dictionary_Participle_V.1.3" and "Ukr_dictionary_Proper_lowercase_V.1.3". The main dictionary contains 157,534 entries and recognizes 3,184,522 inflected forms. It describes simple ALUs made up of a single graphic form, but also locutions made up of two or more forms; it recognizes and analyzes ALUs with alternative spellings, and makes abbreviations explicit. The inflected forms of variable entries are formalized through 303 inflectional paradigms. We have also formalized 114 derivational paradigms that link perfective verbs to imperfective verbs. The 19 morphological grammars describe numerous derived forms and spelling variants not found in the dictionary. Finally, we have listed certain forms in secondary dictionaries, notably lower-case participles, and proper nouns. The "Ukr_dictionary_Participle_V.1.3" dictionary contains 13,070 entries and complements the main dictionary when the morphological grammar describing participles does not allow the participle to be recognized in the text. Thanks to these resources, 98.3% of occurrences in the test corpus were recognized and annotated with their morphological information. We also built ten syntactic grammars, which removed many ambiguities, as we went from 206,445 annotations to 131,415 for a corpus of 108,137 occurrences. We have also outlined several avenues for future work to improve our module, namely: the development of new additional morphological grammars and syntactic grammars that will remove the remaining ambiguities
Ranaivo, Balisoamanandray. "Analyse automatique de l'affixation en malais." Paris, INALCO, 2001. http://www.theses.fr/2001INAL0016.
Full textThe final aim of this thesis is the creation of an affixation analyser able of identifying , segmenting and interpreting affixed words containing prefix(es), suffix(es), and circumfix(es). The analyser has an input in Malaysian or Indonesian text. In this work, we study the standard Malay used in Malaysia, bahasa Melayu or bahasa Malaysia, which is written with Latin alphabet. To evaluate the accuracy of the analyser, we submitted Malaysian texts and one Indonesian text to the system. This analyser uses : a set of rules, a few list of exceptions, a restricted list of bases and formal identification criteria. The algorithm is non deterministic. Analysed words are treated without taking account of their contexts. The evaluation of the analyser gave around 97% of correct analysis and 2% of incorrect analysis. Very few affixed words were not analysed (rate less than 0,5%)
Boizou, Loïc. "Analyse lexicale automatique du lituanien." Paris, INALCO, 2009. http://www.theses.fr/2009INAL0004.
Full textThe aim of this thesis is to carry out lexical analysis of written texts in Lithuanian by automatic means, according to a heuristics from form to content based on symbolic methods. This study attempts to make an expanded use of marks given by linguistic forms, drawing on graphemic and morphological aspects. This formal starting point in conjunction with automation of linguistic tasks required a revision of the traditional grammatical point of view, concerning mainly parts of speech, lexical structure and suffixation. This linguistic model, which needs further expansion, served as a basis for ALeksas, an analyzer of lexical forms. This software implements a hybrid structure expanding a system of finite state automata. The prototype computes the analysis of word forms, giving grammatical interpretations according to a set of formal criteria, instead of making use of a lexical database. The results of the analysis of a corpus complied from various texts allowed us to delineate more precisely the advantages and shortcomings of Aleksas, as compared with other similar tools, and to also suggest possible enhancements
Hagège, Caroline. "Analyse syntaxique automatique du portugais." Clermont-Ferrand 2, 2000. http://www.theses.fr/2000CLF20028.
Full textNakamura, Delloye Yayoi. "Alignement automatique de textes parallèles français - japonais." Paris 7, 2007. http://www.theses.fr/2007PA070054.
Full textAutomatic alignment aims to match elements of parallel texts. We are interested especially in the implementation of a System which carries out alignment at the clause level. Clause is a beneficial linguistic unit for many applications. This thesis consists of two types of works: the introductory works and those that constitute the thesis core. It is structured around the concept of syntactic clause. The introductory works include an overview of alignment and studies on sentence alignment. These works resulted in the creation of a sentence alignment System adapted to French and Japanese text processing. The thesis core consists of two types of works: linguistic studies and implementations. The linguistic studies are themselves divided into two topics: French clause and Japanese clause. The goal of our French clause studies is to define a grammar for clause identification. For this purpose, we attempted to define a typological classification of clauses, based on formal criteria only. In Japanese studies, we first define the Japanese sentence on the basis of the theme-rheme structure. We then try to elucidate the notion of clause. Implementation works consist of three tasks which finally constitute the clause alignment processing. These tasks are carried out by three separate tools: two clauses identification Systems (one for French texts and one for Japanese texts) and a clause alignment System
Segal, Natalia. "Analyse, représentation et modélisation de la prosodie pour la reconnaissance automatique de la parole." Paris 7, 2011. http://www.theses.fr/2011PA070041.
Full textThis thesis presents a new approach to automatic prosodic boundary and prosodic structure detection based on a theoretical hierarchical representation of prosodic organization of speech in French. We used a descriptive theory of the French prosodic System to create a rule based linguistic prosodic model suitable for the automatic treatment of spontaneous speech. This model allows finding automatically prosodic group boundaries and structuring them hierarchically. The prosodic structure of every phrase is thus represented in the form of a prosodic tree. This representation proved to be efficient for automatic processing of continuous speech in French. The resulting prosodic segmentation was compared to manual prosodic segmentation. Prosodic structure accuracy was also verified manually by an expert. We applied our model to different kinds of continuous spontaneous speech data with different phonemic and lexical segmentations: manual segmentation and different kinds of automatic segmentations. In particular, the application of our prosodic model to the output of a speech recognition System showed a satisfactory performance. There also bas been established a correlation between the level of the prosodic tree node and the boundary detection accuracy. Thus, it is possible to improve the precision of boundary detection by attributing a degree of confidence to the boundary according to its level in prosodic tree
Gaubert, Christian. "Stratégies et règles minimales pour un traitement automatique de l'arabe." Aix-Marseille 1, 2001. http://www.theses.fr/2001AIX10040.
Full textRayon, Nadine. "Segmentation et analyse morphologique automatiques du japonais en univers ouvert." Paris, INALCO, 2003. http://www.theses.fr/2003INAL0002.
Full textThe present thesis proposes an automatic morphological analysis of the kanji sequences in Japanese texts. This analysis is based on the graphemic, morphological and syntactic characteristics of the Japanese language. It does not employ any dictionary and is based on the recognition of the immediate contexts of the kanji sequences. It leads to a tagging of the recognized linguistic units and to a segmentation of the text. The first part of the thesis describes the Japanese writing system and its encoding methods. The second part deals with the Japanese parts of speech, in particular verbs, adjectives, particles and flexional suffixes which morphosyntaxic characteristics are essential for the morphological analysis. The third part describes the module of analysis: identification and formalization of the data necessary to the analysis, algorithm of the analysis and the related treatments, formalization of models of objects necessary to the data-processing handling of Japanese
Li, Yiping. "Étude des problèmes spécifiques de l'intégration du chinois dans un système de traitement automatique pour les langues européennes." Université de Marne-la-Vallée, 2006. http://www.theses.fr/2006MARN0282.
Full textLinguistic analysis is a fundamental and essential step for natural language processing. It often includes part-of-speech tagging and named entity identification in order to realize higher level applications, such as information retrieval, automatic translation, question answers, etc. Chinese linguistic analysis must perform the same tasks as that of other languages, but it must resolve a supplemental difficulty caused by the lack of delimiter between words. Since the word is the elementary unit for automated language processing, it is indispensable to segment sentences into words for Chinese language processing. In most existing system described in the literature, segmentation, part-of-speech tagging and named entity recognition are often presented as three sequential, independent steps. But since segmentation provides the basis for and impacts the other two steps, some statistical methods which collapse all three treatments or two of the three into one module have been proposed. With these combinations of steps, segmentation can be improved by complementary information supplied by part-of-speech tagging and named entity recognition, and global analysis of Chinese improved. However this unique treatment model is not modular and difficult to adapt to different languages other than Chinese. Consequently, this approach is not suitable for creating multilingual automatic analysis systems. This dissertation studies the integration Chinese automatic analysis into an existing multilingual analysis system LIMA. Originally built for European languages, LIMA’s modular approach imposes some constraints that a monolingual Chinese analysis system need not consider. Firstly, the treatment for Chinese should be compatible and follow the same flow as other languages. And secondly, in order to keep the system coherent, it is preferable to employ common modules for all the languages treated by the system, including a new language like Chinese. To respect these constraints, we chose to realize the phases of segmentation, part-of-speech tagging and named entity recognition separately. Our modular treatment includes a specific module for Chinese analysis that should be reusable for other languages with similar linguistic features. After error analysis of this purely modular approach, we were able to improve our segmentation with enriched information supplied by part-ofspeech tagging, named entity recognition and some linguistic knowledge. In our final results, three specific treatments have been added into the LIMA system: a pretreatment based on a co-occurrence model applied before segmentation, a term tokenization relative to numbers written in Chinese characters, and a complementary treatment after segmentation that identifies certain named entities before subsequent part-of-speech tagging. We evaluate and discuss the improvement that these additional treatments bring to our analysis, while retaining the modular and linear approach of the underlying LIMA natural language processing system
Badia, Toni. "Aspectes del sintagma nominal en català des de la perspectiva de la traducció automàtica /." Montserrat : Abadia de Montserrat, 1994. http://catalogue.bnf.fr/ark:/12148/cb357951358.
Full textTzoukermann, Evelyne. "Morphologie et génération automatique du verbe français : implémentation d'un module conversationnel." Paris, INALCO, 1986. http://www.theses.fr/1986INAL0004.
Full textMesfar, Slim. "Analyse morpho-syntaxique automatique et reconnaissance des entités nommées en arabe standard." Besançon, 2008. http://www.theses.fr/2008BESA1022.
Full textThe Arabic language, although very important by the number of its speakers, it presents special morpho-syntactic phenomena. This particularity is mainly related to the inflectional and agglutinative morphology, the lack of vowels in currents written texts, and the multiplicity of its forms; this induces a high level of lexical and syntactic ambiguity. It follows considerable difficulties for the automatic processing. The selection of a linguistic environment providing powerful tools and the ability to improve performance according to our needs has led us to use the platform language NooJ. We begin with a study followed by a large-coverage formalization of the Arabic lexicon. The built dictionary, baptised "El-DicAr" allows to link all the inflexional, morphological, syntactico-semantic information to the list of lemmas. Automatic inflexional and derivational routines applied to this list produce more than 3 million inflected forms. We propose a new finite state machine compiler that leads to an optimal storage through a combination of a sequential minimization algorithm and a dynamic compression routine for stored information. This dictionary acts as the linguistic engine for the automatic morpho-syntactic analyzer that we have developed. This analyzer includes a set of tools: a morphological analyzer that identifies the component morphemes of agglutinative forms using large coverage morphological grammars, a new algorithm for looking through finite-state transducers in order to deal with texts written in Arabic with regardless of their vocalisation statements, a corrector of the most frequent typographical errors, a named entities recognition tool based on a combination of the morphological analysis results and rules described into local grammar presented as Augmented Transition Networks ( ATNS), an automatic annotator and some tools for linguistic research and contextual exploration. In order to make our work available to the scientific community, we have developed an online concordance service “NooJ4Web: NooJ for the Web”. It provides instant results to different types of queries and displays statistical reports as well as the corresponding histograms. The listed services are offered in order to collect feedbacks and improve performance. This system is used to process Arabic, as well as French and English
Park, Jungyeul. "Extraction automatique d'une grammaire d'arbres adjoints à partir d'un corpus arboré pour le coréen." Paris 7, 2006. http://www.theses.fr/2006PA070007.
Full textAn electronic grammar is one of the most important elements in the natural language processing. Since traditional manual grammar development is a time-consuming and labor-intensive task, many efforts for automatic grammar development have been taken during last décades. Automatic grammar development means that a System extracts a grammar from a Treebank. Since we might extract the grammar automatically without many efforts if a reliable Treebank is provided, we implement a System which extracts not only a LTAG but also a FB-LTAG from Sejong Korean Treebank. Full-scale syntactic tags and morphological analysis in Sejong Korean Treebank allow us to extract syntactic features automatically and to develop FB-LTAG. During extraction experiments, we modify thé Treebank to improve extracted grammars and extract five différent types of grammars; four lexicalized grammars and one feature-based lexicalized grammar. Extracted grammars are evaluated by ils size, ils coverage and ils average ambiguity. The number of tree schemata is not stabilized at thé end of the extraction process, which seems to indicate that thé size of a Treebank is not enough to reach thé convergence of extracted grammars. However, the number of tree schemata appeared at least twice in the Treebank is nearly stabilized at the end of the extraction process, and the number of superior grammars (the ones which are extracted after thé modification of Treebank) is also much stabilized than inferior grammars. We also evaluate extracted grammars using LLP2 and our extracting System using other Treebank. Finally, we compare extracted grammars with the one of Han et al. (2001) whicis manual ly constructed
Morsi, Youcef Ihab. "Analyse linguistique et extraction automatique de relations sémantiques des textes en arabe." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCC019.
Full textThis thesis focuses on the development of a tool for the automatic processing of Modern Standard Arabic, at the morphological and semantic levels, with the final objective of Information Extraction on technological innovations. As far as the morphological analysis is concerned, our tool includes several successive processing stages that allow to label and disambiguate occurrences in texts: a morphological layer (Gibran 1.0), which relies on Arabic pattern as distinctive features; a contextual layer (Gibran 2.0), which uses contextual rules; and a third layer (Gibran 3.0), which uses a machine learning model. Our methodology is evaluated using the annotated corpus Arabic-PADT UD treebank. The evaluations obtain an F-measure of 0.92 and 0.90 for the morphological analyses. These experiments demontrate the possibility of improving such a corpus through linguistic analyses. This approach allowed us to develop a prototype of information extraction on technological innovations for the Arabic language. It is based on the morphological analysis and syntaxico-semantic patterns. This thesis is part of a PhD-entrepreneur course
Battistelli, Delphine. "Passer du texte a une sequence d'images : analyse spatio-temporelle de textes, modelisation et realisation informatique (systeme spat)." Paris 4, 2000. http://www.theses.fr/1999PA040279.
Full textHue, Jean-François. "L'analyse contextuelle des textes en langue naturelle : les systèmes de réécritures typées." Nantes, 1995. http://www.theses.fr/1995NANT2034.
Full textMAHMOUDI, SEYED MOHAMM. "Contribution au traitement automatique de la langue persane : analyse et reconnaissance des syntagmes nominaux." Lyon 2, 1994. http://www.theses.fr/1994LYO20070.
Full textThe aim of this thesis is the conception and realisation of a morpho-syntaxic parser of Persian designed for applications to automatic indexing and computer-assisted instruction (or learning) of the language (cai or cal). One of the chief extensions to this research is the automatic processing of natural language by means of artificial intelligence systems. The main interest of this contribution is to study the automatic recognition of noun phrases in Persian. Each stage of the parsing is described in a program in Prolog language (Turbo-Prolog). The whole of the lexical datas necessary for the categorisation of morpho-syntaxic forms is presented as a database
Delagneau, Jean-Marc. "Etude quantitative assistée par ordinateur d'une langue allemande de spécialité." Caen, 2004. http://www.theses.fr/2004CAEN1409.
Full textBove, Rémi. "Analyse syntaxique automatique de l'oral : étude des disfluences." Phd thesis, Université de Provence - Aix-Marseille I, 2008. http://tel.archives-ouvertes.fr/tel-00647900.
Full textNasser, Eldin Safa. "Synthèse de la parole arabe : traitement automatique de l'intonation." Bordeaux 1, 2003. http://www.theses.fr/2003BOR12745.
Full textŠmilauer, Ivan. "Acquisition du tchèque par les francophones : analyse automatique des erreurs de déclinaison." Paris, INALCO, 2008. http://www.theses.fr/2008INAL0019.
Full textThe object of this thesis is the automatic analysis of errors made by French-speaking learners in declension exercices in Czech. Our work presents the conception and realization of a platform of computer-assisted language learing CETLEF, featuring on-line fill-in-the-blank exercices with feedback on errors. This device can also be useful in the collection of learner production samples in the context of research into second langauge acquisition via error analysis. CETLEF, consisting of a relational data base and author and learner interfaces, rendered necessary the definition of a model for declension in Czech. This model contains a detailed classification of the paradigms and rules for the realization of vocalic and consonantal alternations. It enables the morphological annotation of required forms, the didactic presentation of the morphological system of Czech on the learning platform, as well as the realization of a procedure of automatic error diagnosis. Diagnosis is carried out by the comparison of an erroneous production with hypothetical forms generated from the radical of the required form and various haphazard endings. If a correspondence is found, the error is interpreted according to the differences in the morphological features of the required form and the hypothetical form. An appraisal of the diagnosis of the productions collected on CETLEF shows that the vast majority of errors can be interpreted with the aid of this technique
Houle, Annie. "Délit de langue et paternité textuelle : une approche informatisée." Thesis, Université Laval, 2013. http://www.theses.ulaval.ca/2013/29405/29405.pdf.
Full textJaccarini, André. "Grammaires modulaires de l'arabe : modélisations, mise en oeuvre informatique et stratégies." Paris 4, 1997. http://www.theses.fr/1997PA040025.
Full textIn this work we expound, in a unified theoretical frame, the main linguistic models and the associated parsers we have developed in the D. A. T. A. T (département d'analyse et de traitement automatique des textes, IREMAN-CNRS). The most salient feature of these parsers is that they can work without a lexicon but can be enhanced by the introduction of selective lexicons. Our aim is then to design a syntactic monitor for the morphological program in order to reduce different ambiguities which are inherent to Arabic writing systems. In order to achieve accurate descriptions we have designed modular programs that we can modify according to the "complexification" of linguistic data and an evaluation method for grammar. The already existing morphological parser without a lexicon can be applied to non-vocalized as well as vocalized Arabic texts in order to extract roots, to vocalize partially automatically and hierarchize ambiguities. In this sense this parser constitutes a powerful tool for research in linguistic engineering itself: the method of grammar variations will allow the design of compact modular grammars applicable to various needs and research areas. Our aim is to create a generator for linguistic applications rather than the mere applications themselves. For example optical character recognition (OCR) and speech processing require compact linguistic modules of verification. The use of enormous lexicons may be a handicap in some computational configurations. Our method allows the calculation of the optimum grammar
Chen, Chao-Jan. "Modélisation de la sémantique des verbes composés chinois de type V-V." Paris 7, 2005. http://www.theses.fr/2005PA070015.
Full textThis thesis presents a model of automatic sense determination for the V-V compound verbs in Chinese. First, we explore two major problems in the automatic semantic processing of the V-V compounds: the incomplete collection of character senses in the source dictionaries and the Gestalt effects in the semantic composition of a V-V compound, which means that the senses of the components V influence the sense of the construction, and vice versa. To solve the problems, we propose an approach with the use of two new concepts: the "latent senses" of characters and the "compounding semantic template" associated to a V-V compound. We calculate the measures of association between characters and senses, which allows us to retrieve the character senses that are not explicitly listed in the source dictionary (the latent senses). Based on the association measures, we can also calculate the similarity between the semantic templates of two V-V compounds, which allows us to retrieve potential synonyms of a given V-V compound. We have thus irnplemented a system of automatic synonym retrieval and a system of automatic semantic classification based on the former one. The evaluation experiments show that the performance of our systems is very encouraging
Rinzler, Simone. "Passif et passivoi͏̈des en anglais contemporain : étude d'un corpus informatisé sous MS-Excel." Poitiers, 2000. http://www.theses.fr/2000POIT5024.
Full textJamborova-Lemay, Diana. "Analyse morphologique automatique du slovaque : étude approfondie du système linguistique slovaque et sa reconnaissance d'après la forme dans les textes scientifiques et techniques, application au machinisme agricole." Paris, INALCO, 2003. http://www.theses.fr/2003INAL0013.
Full textAutomatic morphological analysis of Slovak language is the first level of an automatical analyser for Slovak's scientifical and technical texts. Such a system could be used for different applications : automatic text indexation, automatic research of terminology or translation systems. A rule-based description of language's regularities as well as the use of all the formal level elements of words allow to reduce considerably the volume of dictionaries. Notably in case of inflectionally rich languages such as Slovak. The results obtained by our morphological analyser justify such an approach and confirm the high reliability of morphological analysis based on form-recognition for all lexical categories
Émorine, Martine. "Formalisation syntaxique et sémantique des constructions à verbes supports en français et en espagnol dans une grammaire catégorielle d'unification." Clermont-Ferrand 2, 1992. http://www.theses.fr/1992CLF2A001.
Full textWalther, Markus. "Deklarative prosodische Morphologie : Constraint-basierte Analysen und Computermodelle zum Finnischen und Tigrinya /." Tübingen : Niemeyer, 1999. http://catalogue.bnf.fr/ark:/12148/cb38814312v.
Full textLallich-Boidin, Geneviève. "Analyse syntaxique automatique du français écrit : applications à l'indexation automatique." Phd thesis, Ecole Nationale Supérieure des Mines de Saint-Etienne, 1986. http://tel.archives-ouvertes.fr/tel-00849913.
Full textKhruathong, Sombat. "Vers une analyse micro-systémique en vue d'une traduction automatique thaï-français : application aux verbes sériels." Besançon, 2007. http://www.theses.fr/2007BESA1004.
Full textThis thesis, "Towards a Micro-Systemic Parsing for a Thai-French Machine Translation: Application to the Serial Verbs", is divided into 6 chapters : Chapter one presents the linguistic and data-processing approaches used in the field of computational linguistics. Chapter two explains the characteristics of the Thai language compared to the French language, the general problems of Thai-French translation, as well as the parsing models of noun phrases in Thai. Chapter three is concerned with trying to parse adjectival and adverbial syntagms of Thai. Chapter four is devoted to the parsing models for Thai serial verbs. The hypothesis there presented is the result of successive observations on the general problems of our mother tongue, the Thai language, in particular with regard to natural language processing. This has enabled us to observe that Thai serial verbs play a particular role not only in lexical formation, but also in the syntactic order of the sentence. It is not necessary to say how much the interpretation of the meaning would be obstructed if these verbs were badly analyzed. Quantitatively, Thai serial verbs are not numerous. However, in their pre or post verbal and nominal employment, even at the level of the sentence, the research outcome shows that they play a particular role which deserves to be studied. Chapter five applies the results of chapters 3 and 4 to the implementation of a Thai-French machine translation system in "interactive mode"; we believe that such analysis models for machine translation can be better developed in interactive mode because the problems, which concern the difference of the two distant languages as well as in the lexical formation in syntax, are thereby highlighted. In conclusion, we wish to underline that a Thai-French machine translation system could have many applications in particular in the area of Teaching of French as a Foreign Language for the Thai public or Teaching of Thai as a Foreign Language for French speaking countries
Lutrand-Pezant, Brigitte. "Les propositions complétives en that en anglais contemporain." Paris 4, 2003. http://www.theses.fr/2003PA040212.
Full textThis study, based on a computer-based corpus of over 7 000 examples, falls into three parts : first a presentation of the linguistic knowledge so far on the subject of that clauses, then a description of the forms present in the corpus together with statistics and third an analysis of their lexical and syntactic environments. The issue of the choice between that and Ø has been carefully examined. The behaviour of these clauses has been studied through literary texts of the 19th and 20th centuries as well as in journalistic and scientific writings. Oral English has been compared to written English when necessary. This research did also try to show the characteristics of South African, Irish, British and American English regarding these clauses
Culioli-Atwood, Marie-Hélène. "Operations referentielles. Analyse de la determination en francais en vue d'un traitement informatise." Paris 7, 1992. http://www.theses.fr/1992PA070014.
Full textThe purpose of the thesis is (1) to gather a maximun of systematic and detailed observations concerning the occurence of determiners in french ( in the pattern det. + n ); (2) to build a system of metalinguistic representation enabling the modelling of facts; (3) to build procedures of reasoning having in mind an algorithmic treatment whether in generation or in analysis. The work gives the conceptual basis for modelling both on a formal and a semantic level. The thesis is made up of three parts: analysis of the problems in relation to the paraphrastic manipulations; study of groups of nominalised predicates based on semantic classifications; study of determiners in prepositional phrases. This work of research builds the preliminary steps of any computerized treatment of determination as used in a french text
Sedogbo, Célestin. "De la grammaire en chaîne du français à un système question-réponse." Aix-Marseille 2, 1987. http://www.theses.fr/1987AIX22092.
Full textBioud, Mounira. "Une normalisation de l'emploi de la majuscule et sa représentation formelle pour un système de vérification automatique des majuscules dans un texte." Besançon, 2006. http://www.theses.fr/2006BESA1002.
Full textThis research deals with the study of the problems relating to the use of the upper case letter from the point of view of Natural Language Processing for an automatic spelling correction. The use of the French capital letters suffers from a lack of fixed standardization which inevitably involves that they are used without methodology. This absence reveals on the one hand phenomenon called “majusculite” (abuse of the capital letters) and “minusculite” (abuse of small letters) and on the other hand the presence of spelling variants (la Montagne noire, la montagne Noire, la Montagne Noire, la montagne noire). The current spelling checkers seem unable to say which the good form is. The true direction of upper case letters tends to disappear and their relevance becoming less obvious. Such an amount of doubts, hesitations and fluctuations in the rules of employment, so many differences between the different authors return any attempt of automatic processing very difficult. This wobbly normality more particularly touches the proper nouns known as complex or “dénominations”. The most logical solution so that cease the drift, is to standardize the use of the capital letters. Basing us on various reference works, we worked out clear and logical rules governing the use of the capital letter in order to create a theoretical model of an automatic system checking capital letters. Thus, this solution sees the disappearance of the spelling variants whose existence also constitutes a major problem in research in extraction of fixed forms
Hassoun, Mohamed. "Conception d'un dictionnaire pour le traitement automatique de l'arabe dans différents contextes d'application." Lyon 1, 1987. http://www.theses.fr/1987LYO10035.
Full textBraud, Chloé. "Identification automatique des relations discursives implicites à partir de corpus annotés et de données brutes." Sorbonne Paris Cité, 2015. https://hal.inria.fr/tel-01256884.
Full textBuilding discourse parsers is currently a major challenge in Natural Language Processing. The identification of the relations (such as Explanation, Contrast. . . ) linking spans of text in the document is the main difficulty. Especially, identifying the so-called implicit relations, that is the relations that lack a discourse connective (such as but, because. . . ), is known as an hard tank sine it requires to take into account varions factors, and because it leads to specific difficulties in a classification system. In this thesis, we use raw data to improve automatic identification of implicit relations. First, we propose to use discourse markers in order to automatically annotate new data. We use domain adaptation methods to deal with the distributional differences between automatically and manually annotated data : we report improvements for systems built on the French corpus ANNODIS and on the English corpus Penn Discourse Treebank. Then, we propose to use word representations built from raw data, which may be automatically annotated with discourse markers, in order to feed a representation of the data based on the words found in the spans of text to be linked. We report improvements on the English corpus Penn Discourse Treebank, and especially we show that this method alleviates the need for rich resources, available but for a few languages
Brault, Frédérick. "Forces et faiblesses de l'utilisation de trigrams dans l'étiquetage automatique du français : exploration à partir des homographes de type verbe-substantif." Thesis, Université Laval, 2004. http://www.theses.ulaval.ca/2004/22111/22111.pdf.
Full textZouari, Lotfi. "Construction automatique d'un dictionnaire orienté vers l'analyse morpho-syntaxique de l'arabe, écrit voyellé ou non voyellé." Paris 11, 1989. http://www.theses.fr/1989PA112073.
Full textThis thesis adresses the problem of the automatic treatment of a natural langage : arabic. Its purpose is to treat written arabic, as it is printed, without any pre-editing. First play, we describe the automatic construction of a dictionary, which allows the recognition of the lexical units that makeup the text, units which do not always appear in the dictionary because of agglutination in Arabic. As for syntactic analyses, we resolve grammatic ambiguities, taking into account the problems caused by agglutination
Svášek, Martin. "Définitions, élaboration et exploitation d'un corpus parallèle bidirectionnel français-tchèque tchèque français." Paris, INALCO, 2007. http://www.theses.fr/2007INAL0020.
Full textAt the beginning the concept of a parallel corpus is defined. French and Czech texts forming the parallel Fratchèque corpus come from literature; only texts after the year 1945 have been selected. Fratchèque is not marked up explicitly by XML tags because the tagging is not necessary for the proper functioning of the corpus manager ParaConc. The building-up of the corpus is thoroughly described following all steps and settings of the software used. The process starts with the optical character recognition program FineReader and, after checking the accuracy of numerical texts by using MS Word 2002, it goes on building up a corpus managed by ParaConc. The linguistic investigations of the thesis rely primarily on the realization of a parallel corpus. The main purpose is to tackle a phenomenon that is known in Czech as částice but has no direct equivalent in French. The most frequent terms used in the French approach are mots du discours and particules énonciatives. The existing descriptions suggest a close relationship between these words and the discourse. It is demonstrated on two Czech částice - přece, vždyt̕ and their variants - using huge Czech corpora (Analysis A) and Fratchèque (Analysis B). The study continues analysing systematically all kind of usage of vždyt̕, přece in order to present lexicographical description for a bilingual Czech-French dictionary. Through some exercices based on the results of the linguistic analysis it is shown how to use the bilingual corpus in teaching foreign languages. Finally, some issues concerning automatic evaluation of translation quality are discussed taking into account the work with částice
Yoon, SinWon. "Une grammaire électronique en TAG pour le coréen." Paris 7, 2010. http://www.theses.fr/2010PA070100.
Full textThis dissertation presents the development of an electronic grammar for Korean in Tree Adjoining Grammars (TAG), a formalism using the combination of trees. We define the topology of elementary trees associated with lexical items (nouns, verbs, adverbs, determiners, conjunctions). We specify in particular the definition of verb families. It presents the formalization of structural variants selected by verbs for a same subcategorization frame. We first present the representations of syntactic constructions which are specified b; the verbal suffixes for Korean : the declarative/interrogative/propositive/imperative. We then justify and present the formalization of the various syntactic phenomena that can be distinguished for a subcategorization frame by the redistribution of arguments (the passive and the causative), and the realization of arguments (the extraction). Besides the definition of a TAG grammar for Korean, we are interested to solve the formal problems of TAG, which are derived from a generative capacity insufficient for the free word order. This leads us to propose an extension Pro-VTAG of the V-TAG formalism, based on the idea of dividing a standard structure of the verb into several trees that may adjoin freely within the extended projection of the verb. We show that the Pro-VTAG has the potential capacity to allow the analysis of the free word order in Korean, and that the analysis in Pro-VTAG has the advantages, compared to that in V-TAG, to localize dependencies and to avoid adding an artificial mechanism for blocking extraction
Clément, Lionel. "Construction et exploitation d'un corpus syntaxiquement annoté pour le français." Paris 7, 2001. http://www.theses.fr/2001PA070029.
Full textVery few gold standard annotated corpora are currently available for French. We present a project to build a reference tree bank for French. We annotate a newspaper corpus of 1 Million words (Abeillé et al 1998, 1999, 2000), following EAGLES recommendations (von Rekowski 1996, Ide et al. 1996, Sanfilippo et al. 1996, Kahrel et al. 1997) and developing specific annotation guidelines for French. Similarly to the Penn Tree Bank (Marcus et al. 1993), we distinguish a tagging and a parsing phase, and reach a process of automatic annotation followed by a systematic manual validation and correction. Similarly to the Suzanne Corpus (Sampson 1994, this volume), on the Prague tree bank (Hajicova et al 1998, this volume), we rely on several types of morphosyntactic and syntactic annotations for which we define extensive guidelines. Our goal is to provide a theory neutral, surface oriented, error-free tree bank for French. Similarly to the Negra project (Brants et al. 1999, this volume), we annotate both constituents and functional relations. Due to the lack of robust reusable annotation tools at the beginning of the project, we chose to develop our own
Ben, Mlouka Monia. "Le référencement en langue des signes : analyse et reconnaissance du pointé." Toulouse 3, 2014. http://thesesups.ups-tlse.fr/2676/.
Full textThis thesis focuses on the role and analysis of gaze in sign language where it plays an important role. In any language, the gaze keeps the communication relationship. In addition to that, it allows structuring a sign language discourse or interaction between signers, by investing in complex linguistic features. We focus on the role of reference, which is to put the focus on an element of the discourse. In sign language, the components of the discourse are localized in the signing space; thus putting the focus on an element of discourse which is to identify and activate its spatial location (locus), which will mobilize one or more body parts, hands, shoulders, head and eyes. We therefore analyzed the concept of reference in its manual and / or non- manual gestures and set up a reference-based recognition system that takes as input a video in sign language. The recognition system consists of three steps: - 3D modeling of the concept of reference. - The transformation of the 3D model into a 2D model useable by a 2D recognition system. - The detection system, which uses this 2D model. Modeling involves the extraction of gestural characteristics of the concept of reference from corpus consisted of 3D motion capture and gaze and manually annotated videos and the temporal pattern of time lags between motions. Modeling concerns the description of body parts that play a role in reference and the quantification of their gestural. The resulting models describe: 1) The dynamic movement of the dominant hand and 2) the distances between body parts and locus and 3) the time lags between the beginning of motions. The implementation of the recognition method integrates these 3D models. Since the resulting models are three-dimensional and the recognition system has, as input, a 2D video, we propose a transformation of 3D models to 2D to allow their use in the analysis of 2D video and in pattern recognition of reference structures. We can then apply a recognition algorithm to the 2D video corpus. The recognition results are a set of time slots with two main variants of reference. This pioneering work on the characterization and detection of references structures would need to be applied on much larger corpus, consistent and rich and more sophisticated classification methods. However, it allowed to make a reusable methodology of analysis
Jackiewicz, Agata. "L'expression de la causalité dans les textes : contribution au filtrage sémantique par une méthode informatique d'exploration contextuelle." Paris 4, 1998. http://www.theses.fr/1998PA040003.
Full textThe object of this thesis is to study causality through its discursive expression in French texts. This linguistic study has been made in the perspective of automatic language processing. This work takes place within a project of semantic filtering of texts (named SAFIR : automatic information filtering for summarizing texts) which is dedicated to the production of syntheses and summaries. But the present work ranges over knowledge acquisition through texts. Our first objective is to index the various linguistic processes that are used by authors to convey causal relations. We use an original contextual exploration method that is not based upon a "deep representation" of the text under consideration, rather upon an automatic identification of markers that are considered as relevant. We propose a map made of 1500 markers (verbs, phrases, adverbs,. . . ) which are automatically identifiable indices of the causal relations conveyed by a speaker or a third party
Hathout, Nabil. "Théorie du gourvernement et du liage et programmation logique avec contraintes : une application à l'analyse automatique du français." Toulouse 3, 1992. http://www.theses.fr/1992TOU30200.
Full textGoulet, Marie-Josée. "Analyse d'évaluations en résumé automatique : proposition d'une terminologie française, description des paramètres expérimentaux et recommandations." Thesis, Université Laval, 2008. http://www.theses.ulaval.ca/2008/25346/25346.pdf.
Full textWang, Zhen. "Extraction en langue chinoise d'actions spatiotemporalisées réalisées par des personnes ou des organismes." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016INAL0006.
Full textWe have developed an automatic analyser and an extraction module for Chinese langage processing. The analyser performs automatic Chinese word segmentation based on linguistic rules and dictionaries, part-of-speech tagging based on n-gram statistics and dependency grammar parsing. The module allows to extract information around named entities and activities. In order to achieve these goals, we have tackled the following main issues: segmentation and part-of-speech ambiguity; unknown word identification in Chinese text; attachment ambiguity in parsing. Chinese texts are analysed sentence by sentence. Given a sentence, the analyzer begins with typographic processing to identify sequences of Latin characters and numbers. Then, dictionaries are used for preliminary segmentation into words. Linguistic-based rules are used to create proper noun hypotheses and change the weight of some word categories. These rules take into account word context. An n-gram language model is created from a training corpus and selects the best word segmentation and parts-of-speech. Dependency grammar parsing is used to annotate relations between words. A first step of named entity recognition is performed after parsing. Its goal is to identify single-word named entities and noun-phrase-based named entities and to determine their semantic type. These named entities are then used in knowledge extraction. Knowledge extraction rules are used to validate named entities or to change their types. Knowledge extraction consists of two steps: automatic content extraction and tagging from analysed text; extracted contents control and ontology-based co-reference resolution
Lin, Huei-Chi. "Un module NooJ pour le traitement automatique du chinois : formalisation du vocabulaire et des têtes de groupes nominaux." Besançon, 2010. http://www.theses.fr/2010BESA1025.
Full textThis study presents the development of a module for the automatic parsing of Chinese that will allow to recognize automatically lexical units in modern Chinese, as well as central Noun Phrases in texts. In order to reach these two principle objectives, we solved the following problems: 1) identify lexical units in modern Chinese ; 2) determine their categories ; 3) describe certain local syntactic structures as well as the structure of central Noun Phrases. Firstly we constructed a corpus regrouping literary and journalistic texts published in the XXth century. These texts are written in modern Chinese with traditional characters. Thanks to textual data, we could collect linguistic information such as lexical units, syntagmatic structures or grammatical rules. Then, we constructed several electronic dictionaries in which each entry represents a lexeme, with which is associated linguistic information such as its lexical category, its semantic distributional class or certain formal properties. At this stage, we tried to identify the lexical units of Chinese lexicon and their categories in order to list them. Thanks to this list, an automatic lexical analyzer can process various types of lexical units in bloc, without deconstructing them in components. For instance, the lexical parser processes the following lexical units as atomic units : 理髮lǐfà / fǎ ‘have a haircut’. 放假fàngjià ‘have vacation’. 刀子口dāozikǒu ‘straight talk’. 研究員yánjiū / jiù yuán ‘researcher’. 翻譯系統fānyì xìtǒng ‘translation system’. 浪漫主義làngmàn zhŭyì ‘romanticism’. Then, we described formally certain local syntagms and five types of central Noun Phrases. Finally, we used this Chinese module to study thematic evolution in literary texts
Kosawat, Krit. "Méthodes de segmentation et d'analyse automatique de textes thaï." Phd thesis, Université Paris-Est, 2003. http://tel.archives-ouvertes.fr/tel-00626256.
Full textApidianaki, Marianna. "Acquisition automatique de sens pour la désambiguïsation et la sélection lexicale en traduction." Phd thesis, Université Paris-Diderot - Paris VII, 2008. http://tel.archives-ouvertes.fr/tel-00322285.
Full textNous proposons une méthode d'acquisition de sens permettant d'établir des correspondances sémantiques de granularité variable entre les mots de deux langues en relation de traduction. L'induction de sens est effectuée par une combinaison d'informations distributionnelles et traductionnelles extraites d'un corpus bilingue parallèle. La méthode proposée étant à la fois non supervisée et entièrement fondée sur des données, elle est, par conséquent, indépendante de la langue et permet l'élaboration d'inventaires sémantiques relatifs aux domaines représentés dans les corpus traités.
Les résultats de cette méthode sont exploités par une méthode de désambiguïsation lexicale, qui attribue un sens à de nouvelles instances de mots ambigus en contexte, et par une méthode de sélection lexicale, qui propose leur traduction la plus adéquate. On propose finalement une évaluation pondérée des résultats de désambiguïsation et de sélection lexicale, en nous fondant sur l'inventaire construit par la méthode d'acquisition de sens.
Kanoun, Slim. "Identification et analyse de textes arabes par approche affixale." Rouen, 2002. http://www.theses.fr/2002ROUES040.
Full textThe presented work in this memory tackles the problems involved in differentiation and text recognition in off-line mode in Arabic and Latin multilingual documents. The first part of this work relates to a method of differentiation between Arabic texts and Latin texts in two natures printed and handwritten. The second part proposes a new approach, called affixal approach, for Arabic word recognition and text analysis. This approach is characterized by modelling from morph-syntactic entities (word basic morphemes) by integrating the morpho-phonological aspects of Arabic vocabulary in the recognition process compared to the traditional approaches which proceed by the modelling of grahic entities (word, letter, pseudo word). The tests carried out show well the contribution of the approach on the recognition simplification and the morph-syntactic categorization of the words in an Arabic text