Дисертації з теми "Traitement automatique du language"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Traitement automatique du language".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Colin, Émilie. "Traitement automatique des langues et génération automatique d'exercices de grammaire." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0059.
Our perspectives are educational, to create grammar exercises for French. Paraphrasing is an operation of reformulation. Our work tends to attest that sequence-to-sequence models are not simple repeaters but can learn syntax. First, by combining various models, we have shown that the representation of information in multiple forms (using formal data (RDF), coupled with text to extend or reduce it, or only text) allows us to exploit a corpus from different angles, increasing the diversity of outputs, exploiting the syntactic levers put in place. We also addressed a recurrent problem, that of data quality, and obtained paraphrases with a high syntactic adequacy (up to 98% coverage of the demand) and a very good linguistic level. We obtain up to 83.97 points of BLEU-4*, 78.41 more than our baseline average, without syntax leverage. This rate indicates a better control of the outputs, which are varied and of good quality in the absence of syntax leverage. Our idea was to be able to work from raw text : to produce a representation of its meaning. The transition to French text was also an imperative for us. Working from plain text, by automating the procedures, allowed us to create a corpus of more than 450,000 sentence/representation pairs, thanks to which we learned to generate massively correct texts (92% on qualitative validation). Anonymizing everything that is not functional contributed significantly to the quality of the results (68.31 of BLEU, i.e. +3.96 compared to the baseline, which was the generation of text from non-anonymized data). This second work can be applied the integration of a syntax lever guiding the outputs. What was our baseline at time 1 (generate without constraint) would then be combined with a constrained model. By applying an error search, this would allow the constitution of a silver base associating representations to texts. This base could then be multiplied by a reapplication of a generation under constraint, and thus achieve the applied objective of the thesis. The formal representation of information in a language-specific framework is a challenging task. This thesis offers some ideas on how to automate this operation. Moreover, we were only able to process relatively short sentences. The use of more recent neural modelswould likely improve the results. The use of appropriate output strokes would allow for extensive checks. *BLEU : quality of a text (scale from 0 (worst) to 100 (best), Papineni et al. (2002))
Saadane, Houda. "Le traitement automatique de l’arabe dialectalisé : aspects méthodologiques et algorithmiques." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAL022/document.
Millour, Alice. "Myriadisation de ressources linguistiques pour le traitement automatique de langues non standardisées." Thesis, Sorbonne université, 2020. http://www.theses.fr/2020SORUL126.
Citizen science, in particular voluntary crowdsourcing, represents a little experimented solution to produce language resources for some languages which are still little resourced despite the presence of sufficient speakers online. We present in this work the experiments we have led to enable the crowdsourcing of linguistic resources for the development of automatic part-of-speech annotation tools. We have applied the methodology to three non-standardised languages, namely Alsatian, Guadeloupean Creole and Mauritian Creole. For different historical reasons, multiple (ortho)-graphic practices coexist for these three languages. The difficulties encountered by the presence of this variation phenomenon led us to propose various crowdsourcing tasks that allow the collection of raw corpora, part-of-speech annotations, and graphic variants. The intrinsic and extrinsic analysis of these resources, used for the development of automatic annotation tools, show the interest of using crowdsourcing in a non-standardized linguistic framework: the participants are not seen in this context a uniform set of contributors whose cumulative efforts allow the completion of a particular task, but rather as a set of holders of complementary knowledge. The resources they collectively produce make possible the development of tools that embrace the variation.The platforms developed, the language resources, as well as the models of trained taggers are freely available
Kessler, Rémy. "Traitement automatique d’informations appliqué aux ressources humaines." Thesis, Avignon, 2009. http://www.theses.fr/2009AVIG0167/document.
Since the 90s, Internet is at the heart of the labor market. First mobilized on specific expertise, its use spreads as increase the number of Internet users in the population. Seeking employment through "electronic employment bursary" has become a banality and e-recruitment something current. This information explosion poses various problems in their treatment with the large amount of information difficult to manage quickly and effectively for companies. We present in this PhD thesis, the work we have developed under the E-Gen project, which aims to create tools to automate the flow of information during a recruitment process.We interested first to the problems posed by the routing of emails. The ability of a companie to manage efficiently and at lower cost this information flows becomes today a major issue for customer satisfaction. We propose the application of learning methods to perform automatic classification of emails to their routing, combining technical and probabilistic vector machines support. After, we present work that was conducted as part of the analysis and integration of a job ads via Internet. We present a solution capable of integrating a job ad from an automatic or assisted in order to broadcast it quickly. Based on a combination of classifiers systems driven by a Markov automate, the system gets very good results. Thereafter, we present several strategies based on vectorial and probabilistic models to solve the problem of profiling candidates according to a specific job offer to assist recruiters. We have evaluated a range of measures of similarity to rank candidatures by using ROC curves. Relevance feedback approach allows to surpass our previous results on this task, difficult, diverse and higly subjective
Coria, Juan Manuel. "Continual Representation Learning in Written and Spoken Language." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG025.
Although machine learning has recently witnessed major breakthroughs, today's models are mostly trained once on a target task and then deployed, rarely (if ever) revisiting their parameters.This problem affects performance after deployment, as task specifications and data may evolve with user needs and distribution shifts.To solve this, continual learning proposes to train models over time as new data becomes available.However, models trained in this way suffer from significant performance loss on previously seen examples, a phenomenon called catastrophic forgetting.Although many studies have proposed different strategies to prevent forgetting, they often rely on labeled data, which is rarely available in practice. In this thesis, we study continual learning for written and spoken language.Our main goal is to design autonomous and self-learning systems able to leverage scarce on-the-job data to adapt to the new environments they are deployed in.Contrary to recent work on learning general-purpose representations (or embeddings), we propose to leverage representations that are tailored to a downstream task.We believe the latter may be easier to interpret and exploit by unsupervised training algorithms like clustering, that are less prone to forgetting. Throughout our work, we improve our understanding of continual learning in a variety of settings, such as the adaptation of a language model to new languages for sequence labeling tasks, or even the adaptation to a live conversation in the context of speaker diarization.We show that task-specific representations allow for effective low-resource continual learning, and that a model's own predictions can be exploited for full self-learning
Grandchamp, Jean-Michel. "L'argumentation dans le traitement automatique de la langue." Paris 11, 1996. http://www.theses.fr/1996PA112016.
Mela, Augusta. "Traitement automatique de la coordination par et." Paris 13, 1992. http://www.theses.fr/1992PA132040.
Moncecchi, Guillermo. "Recognizing speculative language in research texts." Paris 10, 2013. http://www.theses.fr/2013PA100039.
This thesis presents a methodology to solve certain classification problems, particularly those involving sequential classification for Natural Language Processing tasks. It proposes the use of an iterative, error-based approach to improve classification performance, suggesting the incorporation of expert knowledge into the learning process through the use of knowledge rules. We applied and evaluated the methodology to two tasks related with the detection of hedging in scientific articles: those of hedge cue identification and hedge cue scope detection. Results are promising: for the first task, we improved baseline results by 2. 5 points in terms of F-score incorporating cue cooccurence information, while for scope detection, the incorporation of syntax information and rules for syntax scope pruning allowed us to improve classification performance from an F-score of 0. 712 to a final number of 0. 835. Compared with state-of-the-art methods, results are competitive, suggesting that the approach of improving classifiers based only on committed errors on a held out corpus could be successfully used in other, similar tasks. Additionally, this thesis proposes a class schema for representing sentence analysis in a unique structure, including the results of different linguistic analysis. This allows us to better manage the iterative process of classifier improvement, where different attribute sets for learning are used in each iteration. We also propose to store attributes in a relational model, instead of the traditional text-based structures, to facilitate learning data analysis and manipulation
Le, Kien Van. "Generation automatique de l'accord du participe passe." Paris 7, 1987. http://www.theses.fr/1987PA077257.
Nasser, Eldin Safa. "Synthèse de la parole arabe : traitement automatique de l'intonation." Bordeaux 1, 2003. http://www.theses.fr/2003BOR12745.
Norman, Christopher. "Systematic review automation methods." Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASS028.
Recent advances in artificial intelligence have seen limited adoption in systematic reviews,and much of the systematic review process remains manual, time-consuming, and expensive. Authors conducting systematic reviews face issues throughout the systematic review process. It is difficult and time-consuming to search and retrieve,collect data, write manuscripts, and perform statistical analyses. Screening automation has been suggested as a way to reduce the workload, but uptake has been limited due to a number of issues,including licensing, steep learning curves, lack of support, and mismatches to workflow. There is a need to better a lign current methods to the need of the systematic review community.Diagnostic test accuracy studies are seldom indexed in an easily retrievable way, and suffer from variable terminology and missing or inconsistently applied database labels. Methodological search queries to identify diagnostic studies therefore tend to have low accuracy, and are discouraged for use in systematic reviews. Consequently, there is a particular need for alternative methods to reduce the workload in systematic reviews of diagnostic test accuracy.In this thesis we have explored the hypothesis that automation methods can offer an efficient way tomake the systematic review process quicker and less expensive, provided we can identify and overcomebarriers to their adoption. Automated methods have the opportunity to make the process cheaper as well as more transparent, accountable, and reproducible
Hagège, Caroline. "Analyse syntaxique automatique du portugais." Clermont-Ferrand 2, 2000. http://www.theses.fr/2000CLF20028.
Kervajan, LoÏc. "Contribution à la traduction automatique français/langue des signes française (LSF) au moyen de personnages virtuels : Contribution à la génération automatique de la LSF." Thesis, Aix-Marseille 1, 2011. http://www.theses.fr/2011AIX10172.
Since the law was voted the 11-02-2005 for equal rights and opportunities: places open to anyone (public places, shops, internet, etc.) should welcome the Deaf in French Sign Language (FSL). We have worked on the development of technological tools to promote LSF, especially in machine translation from written French to FSL.Our thesis begins with a presentation of knowledge on FSL (theoretical resources and ways to edit FSL) and follows by further concepts of descriptive grammar. Our working hypothesis is: FSL is a language and, therefore, machine translation is relevant.We describe the language specifications for automatic processing, based on scientific knowledge and proposals of our native FSL speaker informants. We also expose our methodology, and do present the advancement of our work in the formalization of linguistic data based on the specificities of FSL which certain (verbs scheme, adjective and adverb modification, organization of nouns, agreement patterns) require further analysis.We do present the application framework in which we worked on: the machine translation system and virtual characters animation system of France Telecom R&D.After a short avatar technology presentation, we explain our control modalities of the gesture synthesis engine through the exchange format that we developed.Finally, we conclude with an evaluation, researches and developments perspectives that could follow this thesis.Our approach has produced its first results since we have achieved our goal of running the full translation chain: from the input of a sentence in French to the realization of the corresponding sentence in FSL with a synthetic character
Gaubert, Christian. "Stratégies et règles minimales pour un traitement automatique de l'arabe." Aix-Marseille 1, 2001. http://www.theses.fr/2001AIX10040.
Curiel, Diaz Arturo Tlacaélel. "Using formal logic to represent sign language phonetics in semi-automatic annotation tasks." Thesis, Toulouse 3, 2015. http://www.theses.fr/2015TOU30308/document.
This thesis presents a formal framework for the representation of Signed Languages (SLs), the languages of Deaf communities, in semi-automatic recognition tasks. SLs are complex visio-gestural communication systems; by using corporal gestures, signers achieve the same level of expressivity held by sound-based languages like English or French. However, unlike these, SL morphemes correspond to complex sequences of highly specific body postures, interleaved with postural changes: during signing, signers use several parts of their body simultaneously in order to combinatorially build phonemes. This situation, paired with an extensive use of the three-dimensional space, make them difficult to represent with tools already existent in Natural Language Processing (NLP) of vocal languages. For this reason, the current work presents the development of a formal representation framework, intended to transform SL video repositories (corpus) into an intermediate representation layer, where automatic recognition algorithms can work under better conditions. The main idea is that corpora can be described with a specialized Labeled Transition System (LTS), which can then be annotated with logic formulae for its study. A multi-modal logic was chosen as the basis of the formal language: the Propositional Dynamic Logic (PDL). This logic was originally created to specify and prove properties on computer programs. In particular, PDL uses the modal operators [a] and to denote necessity and possibility, respectively. For SLs, a particular variant based on the original formalism was developed: the PDL for Sign Language (PDLSL). With the PDLSL, body articulators (like the hands or head) are interpreted as independent agents; each articulator has its own set of valid actions and propositions, and executes them without influence from the others. The simultaneous execution of different actions by several articulators yield distinct situations, which can be searched over an LTS with formulae, by using the semantic rules of the logic. Together, the use of PDLSL and the proposed specialized data structures could help curb some of the current problems in SL study; notably the heterogeneity of corpora and the lack of automatic annotation aids. On the same vein, this may not only increase the size of the available datasets, but even extend previous results to new corpora; the framework inserts an intermediate representation layer which can serve to model any corpus, regardless of its technical limitations. With this, annotations is possible by defining with formulae the characteristics to annotate. Afterwards, a formal verification algorithm may be able to find those features in corpora, as long as they are represented as consistent LTSs. Finally, the development of the formal framework led to the creation of a semi-automatic annotator based on the presented theoretical principles. Broadly, the system receives an untreated corpus video, converts it automatically into a valid LTS (by way of some predefined rules), and then verifies human-created PDLSL formulae over the LTS. The final product, is an automatically generated sub-lexical annotation, which can be later corrected by human annotators for their use in other areas such as linguistics
Denoual, Etienne. "Méthodes en caractères pour le traitement automatique des langues." Phd thesis, Université Joseph Fourier (Grenoble), 2006. http://tel.archives-ouvertes.fr/tel-00107056.
Le présent travail promeut l'utilisation de méthodes travaillant au niveau du signal de l'écrit: le caractère, unité immédiatement accessible dans toute langue informatisée, permet de se passer de segmentation en mots, étape actuellement incontournable pour des langues comme le chinois ou le japonais.
Dans un premier temps, nous transposons et appliquons en caractères une méthode bien établie d'évaluation objective de la traduction automatique, BLEU.
Les résultats encourageants nous permettent dans un deuxième temps d'aborder d'autres tâches de traitement des données linguistiques. Tout d'abord, le filtrage de la grammaticalité; ensuite, la caractérisation de la similarité et de l'homogénéité des ressources linguistiques. Dans toutes ces tâches, le traitement en caractères obtient des résultats acceptables, et comparables à ceux obtenus en mots.
Dans un troisième temps, nous abordons des tâches de production de données linguistiques: le calcul analogique sur les chaines de caractères permet la production de paraphrases aussi bien que la traduction automatique.
Ce travail montre qu'on peut construire un système complet de traduction automatique ne nécessitant pas de segmentation, a fortiori pour traiter des langues sans séparateur orthographique.
Jalalzai, Hamid. "Learning from multivariate extremes : theory and application to natural language processing." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT043.
Extremes surround us and appear in a large variety of data. Natural data likethe ones related to environmental sciences contain extreme measurements; inhydrology, for instance, extremes may correspond to floods and heavy rainfalls or on the contrary droughts. Data related to human activity can also lead to extreme situations; in the case of bank transactions, the money allocated to a sale may be considerable and exceed common transactions. The analysis of this phenomenon is one of the basis of fraud detection. Another example related to humans is the frequency of encountered words. Some words are ubiquitous while others are rare. No matter the context, extremes which are rare by definition, correspond to uncanny data. These events are of particular concern because of the disastrous impact they may have. Extreme data, however, are less considered in modern statistics and applied machine learning, mainly because they are substantially scarce: these events are out numbered –in an era of so-called ”big data”– by the large amount of classical and non-extreme data that corresponds to the bulk of a distribution. Thus, the wide majority of machine learning tools and literature may not be well-suited or even performant on the distributional tails where extreme observations occur. Through this dissertation, the particular challenges of working with extremes are detailed and methods dedicated to them are proposed. The first part of the thesisis devoted to statistical learning in extreme regions. In Chapter 4, non-asymptotic bounds for the empirical angular measure are studied. Here, a pre-established anomaly detection scheme via minimum volume set on the sphere, is further im-proved. Chapter 5 addresses empirical risk minimization for binary classification of extreme samples. The resulting non-parametric analysis and guarantees are detailed. The approach is particularly well suited to treat new samples falling out of the convex envelop of encountered data. This extrapolation property is key to designing new embeddings achieving label preserving data augmentation. Chapter 6 focuses on the challenge of learning the latter heavy-tailed (and to be precise regularly varying) representation from a given input distribution. Empirical results show that the designed representation allows better classification performanceon extremes and leads to the generation of coherent sentences. Lastly, Chapter7 analyses the dependence structure of multivariate extremes. By noticing that extremes tend to concentrate on particular clusters where features tend to be recurrently large simulatenously, we define an optimization problem that identifies the aformentioned subgroups through weighted means of features
Boulanger, Hugo. "Data augmentation and generation for natural language processing." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG019.
More and more fields are looking to automate part of their process.Automatic language processing contains methods for extracting information from texts.These methods can use machine learning.Machine learning requires annotated data to perform information extraction.Applying these methods to new domains requires obtaining annotated data related to the task.In this thesis, our goal is to study generation methods to improve the performance of learned models with low amounts of data.Different methods of generation are explored that either contain machine learning or do not, which are used to generate the data needed to learn sequence labeling models.The first method explored is pattern filling.This data generation method generates annotated data by combining sentences with slots, or patterns, with mentions.We have shown that this method improves the performance of labeling models with tiny amounts of data.The amount of data needed to use this method is also studied.The second approach tested is the use of language models for text generation alongside a semi-supervised learning method for tagging.The semi-supervised learning method used is tri-training and is used to add labels to the generated data.The tri-training is tested on several generation methods using different pre-trained language models.We proposed a version of tri-training called generative tri-training, where the generation is not done in advance but during the tri-training process and takes advantage of it.The performance of the models trained during the semi-supervision process and of the models trained on the data generated by it are tested.In most cases, the data produced match the performance of the models trained with the semi-supervision.This method allows the improvement of the performances at all the tested data levels with respect to the models without augmentation.The third avenue of study combines some aspects of the previous approaches.For this purpose, different approaches are tested.The use of language models to do sentence replacement in the manner of the pattern-filling generation method is unsuccessful.Using a set of data coming from the different generation methods is tested, which does not outperform the best method.Finally, applying the pattern-filling method to the data generated with the tri-training is tested and does not improve the results obtained with the tri-training.While much remains to be studied, we have highlighted simple methods, such as pattern filling, and more complex ones, such as the use of supervised learning with sentences generated by a language model, to improve the performance of labeling models through the generation of annotated data
Guibon, Dinabyll. "Recommandation automatique et adaptative d'émojis." Electronic Thesis or Diss., Aix-Marseille, 2019. http://www.theses.fr/2019AIXM0202.
The first emojis were created in 1999. Since then, their propularity constantly raised in communication systems. Being images representing either an idea, a concept, or an emotion, emojis are available to the users in multiple software contexts: instant messaging, emails, forums, and other types of social medias. Their usage grew constantly and, associated to the constant addition of new emojis, there are now more than 2,789 standard emojis since winter 2018.To access a specific emoji, scrolling through huge emoji librairies or using a emoji search engines is not enough to maximize their usage and their diversity. An emoji recommendation system is required. To answer this need, we present our research work facused on the emoji recommendation topic. The objectives are to create an emoji recommender system adapted to a private and informal conversationnal context. This system must enhance the user experience, the communication quality, and take into account possible new emerging emojis.Our first contribution is to show the limits of a emoji prediction for the real usage case, and to demonstrate the need of a more global recommandation. We also veifie the correlation between the real usage of emojis representing facial expressions and a related theory on facial expressions. We also tackle the evaluation part of this system, with the metrics' limits and the importance of a dedicated user interface.The approach is based on supervised and unsupervised machine learning, associated to language models. Several parts of this work were published in national and international conferences, including the best software award and best poster award for its social media track
Al, Imam Nahed Hamza. "Traitement automatique du système d'écriture de l'arabe : l'abjad et unicode." Besançon, 2008. http://www.theses.fr/2008BESA1027.
In this thesis, we sought to show the difficulties due to the religious dimension of the Abjad, the scripting of the Arabs, when implementing on computers the scripting systems of all the people in the world, following the Unicode project, and the consequences waited in NLP. First of all the Abjad is not an alphabet in the European meaning of the word, its holy scripting. This scripting is also the one of a Semitic language, endowed with a particular morphology based upon roots and schemes and many diacritics which depends upon the pronunciation of different dialects, event if the classical Arabic scripting present in the Koran is accessible to all man who can read and write. This scripting is not disappearing in the modern world, Arabic and the Abjad have succeeded in finding words to describe the concepts of the European political thought and all the ways and means to spread Islam throughout the world. The rare words borrowed from the Occident have to be accustomed into Arabic, because of its morphology with roots ands schemes with which European words cannot match. . . The Abjad is essential to translations, or high quality information research. Abjad is not only a cultural problem but also a technical, computer one : the only solution to install Arabic on Microsoft’s systems was the manual or external choice of the language. It was evident that the whole Abjad was not taken into account. Unicode was a revolution, allowing taking into account the whole Abjad with its 943 characters, but Unicode required an encoding scheme. To implement theses caracters on the machine, we made up our mind on the utf-8, in spite of its drawbacks, in particular when doing natural language processing. All attempts to normalise Abjad is to rule out for religious reasons : Abjad is the holy scripting of the Koran and that prevents to become a modern communication tool for business or trade and the cultural and religious dimension of the Abjad and the Koran are at the centre of the Arabian life and more than any other preoccupation, be it the economical development of their countries, their cultural presence on modern communication media or the implementation of automatic tools to process written information
Al-Shafi, Bilal. "Traitement informatique des signes diacritiques : pour une application automatique et didactique." Université de Besançon, 1996. http://www.theses.fr/1996BESA1029.
Arnulphy, Béatrice. "Désignations nominales des événements : étude et extraction automatique dans les textes." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00758062.
Fernandez, Sabido Silvia. "Applications exploratoires des modèles de spins au Traitement Automatique de la Langue." Phd thesis, Université Henri Poincaré - Nancy I, 2009. http://tel.archives-ouvertes.fr/tel-00412369.
Fernández, Sabido Silvia Fidelina Berche Bertrand Torres Moreno Juan-Manuel. "Applications exploratoires des modèles de spins au traitement automatique de la langue." S. l. : S. n, 2009. http://www.scd.uhp-nancy.fr/docnum/SCD_T_2009_0055_FERNANDEZ-SABIDO.pdf.
Fernández, Sabido Silvia Fidelina. "Applications exploratoires des modèles de spins au traitement automatique de la langue." Thesis, Nancy 1, 2009. http://www.theses.fr/2009NAN10055/document.
In this thesis we explored the ability of magnetic models of statistical physics to extract the essential information contained in texts. Documents are represented as sets of interacting magnetic units, the intensity of such interactions are measured and they are used to calculate quantities that are evidence of the importance of information scope. We propose two new methods. Firstly, we studied a spin model which allowed us to introduce the textual energy. This quantity was used as an indicator of information relevance. Several adaptations were necessary to adapt the energy calculation to a wide range of tasks such as summarisation, information retrieval, document classification and thematic segmentation. Furthermore, and even exploratory, we propose a second algorithm that defines a grammatical coupling between types of terms to retain the important terms and produce contractions. In this way, the compression of a sentence is the ground state of the chain of terms. As this compression is not necessarily good, it was interesting produce variants by thermal fluctuations. We have done simulations Metropolis Monte-Carlo with the aim of finding the ground state of this system that is analogous to spin glass
Gaschi, Félix. "Understanding and Evaluating Unsupervised Cross-lingual Embeddings in the General and in the Clinical Domains." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0347.
Labeled and unlabeled data are more often available in English than in other languages. In the clinical domain, non-English data can be even more scarce. Multilingual word representations can have two properties that can help with this situation. The first one is multilingual alignment, where representations from different languages share the same latent space. More concretely, words that are translations of each other must have similar representations, which is useful for cross-lingual information retrieval. The second property is cross-lingual transfer learning: it allows a model to be trained on a supervised task in one language, and to provide good results for the same task in another language, without the need for any labeled data in that language. This thesis addresses some gaps in the literature regarding the understanding of multilingual embeddings. It namely studies the link between multilingual alignment and cross-lingual transfer, showing that models, like mBERT and XLM-R, that can perform cross-lingual transfer produce representations that have a stronger form of multilingual alignment than word embeddings that were explicitly trained for such alignment. It also finds a high correlation between cross-lingual transfer abilities and multilingual alignment suggesting that both multilingual properties are linked. This link allows to improve cross-lingual transfer for smaller models by simply improving alignment, which can allow them to match the performances of larger models but only for a low-level task like POS-tagging, due to the impact of fine-tuning itself on multilingual alignment. While mainly focusing on the general domain, this thesis eventually evaluates cross-lingual transfer in the clinical domain. It shows that translation-based methods can achieve similar performance to cross-lingual transfer but require more care in their design. While they can take advantage of monolingual clinical language models, those do not guarantee better results than large general-purpose multilingual models, whether with cross-lingual transfer or translation
Knyazeva, Elena. "Apprendre par imitation : applications à quelques problèmes d'apprentissage structuré en traitement des langues." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS134/document.
Structured learning has become ubiquitousin Natural Language Processing; a multitude ofapplications, such as personal assistants, machinetranslation and speech recognition, to name just afew, rely on such techniques. The structured learningproblems that must now be solved are becomingincreasingly more complex and require an increasingamount of information at different linguisticlevels (morphological, syntactic, etc.). It is thereforecrucial to find the best trade-off between the degreeof modelling detail and the exactitude of the inferencealgorithm. Imitation learning aims to perform approximatelearning and inference in order to better exploitricher dependency structures. In this thesis, we explorethe use of this specific learning setting, in particularusing the SEARN algorithm, both from a theoreticalperspective and in terms of the practical applicationsto Natural Language Processing tasks, especiallyto complex tasks such as machine translation.Concerning the theoretical aspects, we introduce aunified framework for different imitation learning algorithmfamilies, allowing us to review and simplifythe convergence properties of the algorithms. With regardsto the more practical application of our work, weuse imitation learning first to experiment with free ordersequence labelling and secondly to explore twostepdecoding strategies for machine translation
Samson, Juan Sarah Flora. "Exploiting resources from closely-related languages for automatic speech recognition in low-resource languages from Malaysia." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM061/document.
Languages in Malaysia are dying in an alarming rate. As of today, 15 languages are in danger while two languages are extinct. One of the methods to save languages is by documenting languages, but it is a tedious task when performed manually.Automatic Speech Recognition (ASR) system could be a tool to help speed up the process of documenting speeches from the native speakers. However, building ASR systems for a target language requires a large amount of training data as current state-of-the-art techniques are based on empirical approach. Hence, there are many challenges in building ASR for languages that have limited data available.The main aim of this thesis is to investigate the effects of using data from closely-related languages to build ASR for low-resource languages in Malaysia. Past studies have shown that cross-lingual and multilingual methods could improve performance of low-resource ASR. In this thesis, we try to answer several questions concerning these approaches: How do we know which language is beneficial for our low-resource language? How does the relationship between source and target languages influence speech recognition performance? Is pooling language data an optimal approach for multilingual strategy?Our case study is Iban, an under-resourced language spoken in Borneo island. We study the effects of using data from Malay, a local dominant language which is close to Iban, for developing Iban ASR under different resource constraints. We have proposed several approaches to adapt Malay data to obtain pronunciation and acoustic models for Iban speech.Building a pronunciation dictionary from scratch is time consuming, as one needs to properly define the sound units of each word in a vocabulary. We developed a semi-supervised approach to quickly build a pronunciation dictionary for Iban. It was based on bootstrapping techniques for improving Malay data to match Iban pronunciations.To increase the performance of low-resource acoustic models we explored two acoustic modelling techniques, the Subspace Gaussian Mixture Models (SGMM) and Deep Neural Networks (DNN). We performed cross-lingual strategies using both frameworks for adapting out-of-language data to Iban speech. Results show that using Malay data is beneficial for increasing the performance of Iban ASR. We also tested SGMM and DNN to improve low-resource non-native ASR. We proposed a fine merging strategy for obtaining an optimal multi-accent SGMM. In addition, we developed an accent-specific DNN using native speech data. After applying both methods, we obtained significant improvements in ASR accuracy. From our study, we observe that using SGMM and DNN for cross-lingual strategy is effective when training data is very limited
Muller, Benjamin. "How Can We Make Language Models Better at Handling the Diversity and Variability of Natural Languages ?" Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS399.
Deep Learning for NLP has led to impressive empirical progress in recent years. In essence, this progress is based on better contextualized representations that can be easily used for a wide variety of tasks. However, these models usually require substantial computing power and large amounts of raw textual data. This makes language’s inherent diversity and variability a vivid challenge in NLP. We focus on the following: How can we make language models better at handling the variability and diversity of natural languages?. First, we explore the generalizability of language models by building and analyzing one of the first large-scale replication of a BERT model for a non-English language. Our results raise the question of using these language models on highly-variable domains such as these found online. Focusing on lexical normalization, we show that this task can be approached with BERT-like models. However, we show that it only partially helps downstream performance. In consequence, we focus on adaptation techniques using what we refer to as representation transfer and explore challenging settings such as the zero-shot setting, low-resource languages. We show that multilingual language models can be adapted and used efficiently with low-resource languages, even with the ones unseen during pretraining, and that the script is a critical component in this adaptation
Pho, Van-Minh. "Génération automatique de questionnaires à choix multiples pédagogiques : évaluation de l'homogénéité des options." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112192/document.
Recent years have seen a revival of Intelligent Tutoring Systems. In order to make these systems widely usable by teachers and learners, they have to provide means to assist teachers in their task of exercise generation. Among these exercises, multiple-choice tests are very common. However, writing Multiple-Choice Questions (MCQ) that correctly assess a learner's level is a complex task. Guidelines were developed to manually write MCQs, but an automatic evaluation of MCQ quality would be a useful tool for teachers.We are interested in automatic evaluation of distractor (wrong answer choice) quality. To do this, we studied characteristics of relevant distractors from multiple-choice test writing guidelines. This study led us to assume that homogeneity between distractors and answer is an important criterion to validate distractors. Homogeneity is both syntactic and semantic. We validated the definition of homogeneity by a MCQ corpus analysis, and we proposed methods for automatic recognition of syntactic and semantic homogeneity based on this analysis.Then, we focused our work on distractor semantic homogeneity. To automatically estimate it, we proposed a ranking model by machine learning, combining different semantic homogeneity measures. The evaluation of the model showed that our method is more efficient than existing work to estimate distractor semantic homogeneity
Thollard, Franck. "Inférence grammaticale probabiliste pour l'apprentissage de la syntaxe en traitement de la langue naturelle." Saint-Etienne, 2000. http://www.theses.fr/2000STET4010.
Rouabhi, Miloud. "Analyse sémantico-cognitive de prépositions en vue d'un traitement automatique." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUL032.
This study aims to unify in a single approach the descriptions given by cognitive semantics and associated representations, studied by formal semantics. Cognitive semantics consists to associate the meanings of the analyzed with schemes. Formal semantics consists in studying the modes of representation of these schemes and the relations to the observables. Our study is based on the general model developed at the Paris-Sorbonne University in the LaLIC group, using the GAC model (Applicative and Cognitive Grammar) and GRACE (GRammar Applicative Cognitive and Enunciative), these two models use the one hand to the topology and on the other hand to the combinatory logic in order to an automatic processing of meanings. We have chosen to study the problem of the three prepositions: dans, sous and à of French and their equivalences in Arabic, this leads us to search for invariants associated with these three prepositions or relators, the preposition dans refers to the interiority of a place, be it spatial, temporal, spatial-temporal, notional or activity and the preposition sous refers to a specific place or generated by another place whose closing is taken. The preposition à refers to the closing of a place, here the place is cognitive or abstract place, sufficiently general that according to the context can take more particular values
Shang, Guokan. "Spoken Language Understanding for Abstractive Meeting Summarization Unsupervised Abstractive Meeting Summarization with Multi-Sentence Compression and Budgeted Submodular Maximization. Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding Speaker-change Aware CRF for Dialogue Act Classification." Thesis, Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAX011.
With the impressive progress that has been made in transcribing spoken language, it is becoming increasingly possible to exploit transcribed data for tasks that require comprehension of what is said in a conversation. The work in this dissertation, carried out in the context of a project devoted to the development of a meeting assistant, contributes to ongoing efforts to teach machines to understand multi-party meeting speech. We have focused on the challenge of automatically generating abstractive meeting summaries.We first present our results on Abstractive Meeting Summarization (AMS), which aims to take a meeting transcription as input and produce an abstractive summary as output. We introduce a fully unsupervised framework for this task based on multi-sentence compression and budgeted submodular maximization. We also leverage recent advances in word embeddings and graph degeneracy applied to NLP, to take exterior semantic knowledge into account and to design custom diversity and informativeness measures.Next, we discuss our work on Dialogue Act Classification (DAC), whose goal is to assign each utterance in a discourse a label that represents its communicative intention. DAC yields annotations that are useful for a wide variety of tasks, including AMS. We propose a modified neural Conditional Random Field (CRF) layer that takes into account not only the sequence of utterances in a discourse, but also speaker information and in particular, whether there has been a change of speaker from one utterance to the next.The third part of the dissertation focuses on Abstractive Community Detection (ACD), a sub-task of AMS, in which utterances in a conversation are grouped according to whether they can be jointly summarized by a common abstractive sentence. We provide a novel approach to ACD in which we first introduce a neural contextual utterance encoder featuring three types of self-attention mechanisms and then train it using the siamese and triplet energy-based meta-architectures. We further propose a general sampling scheme that enables the triplet architecture to capture subtle patterns (e.g., overlapping and nested clusters)
Wurbel, Nathalie. "Dictionnaires et bases de connaissances : traitement automatique de données dictionnairiques de langue française." Aix-Marseille 3, 1995. http://www.theses.fr/1995AIX30035.
Petitjean, Simon. "Génération modulaire de grammaires formelles." Thesis, Orléans, 2014. http://www.theses.fr/2014ORLE2048/document.
The work presented in this thesis aim at facilitating the development of resources for natural language processing. Resources of this type take different forms, because of the existence of several levels of linguistic description (syntax, morphology, semantics, . . . ) and of several formalisms proposed for the description of natural languages at each one of these levels. The formalisms featuring different types of structures, a unique description language is not enough: it is necessary to create a domain specific language (or DSL) for every formalism, and to implement a new tool which uses this language, which is a long a complex task. For this reason, we propose in this thesis a method to assemble in a modular way development frameworks specific to tasks of linguistic resource generation. The frameworks assembled thanks to our method are based on the fundamental concepts of the XMG (eXtensible MetaGrammar) approach, allowing the generation of tree based grammars. The method is based on the assembling of a description language from reusable bricks, and according to a unique specification file. The totality of the processing chain for the DSL is automatically assembled thanks to the same specification. In a first time, we validated this approach by recreating the XMG tool from elementary bricks. Some collaborations with linguists also brought us to assemble compilers allowing the description of morphology and semantics
Dutrey, Camille. "Analyse et détection automatique de disfluences dans la parole spontanée conversationnelle." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112415/document.
Extracting information from linguistic data has gain more and more attention in the last decades inrelation with the increasing amount of information that has to be processed on a daily basis in the world. Since the 90’s, this interest for information extraction has converged to the development of researches on speech data. In fact, speech data involves extra problems to those encountered on written data. In particular, due to many phenomena specific to human speech (e.g. hesitations, corrections, etc.). But also, because automatic speech recognition systems applied on speech signal potentially generates errors. Thus, extracting information from audio data requires to extract information by taking into account the "noise" inherent to audio data and output of automatic systems. Thus, extracting information from speech data cannot be as simple as a combination of methods that have proven themselves to solve the extraction information task on written data. It comes that, the use of technics dedicated for speech/audio data processing is mandatory, and epsecially technics which take into account the specificites of such data in relation with the corresponding signal and transcriptions (manual and automatic). This problem has given birth to a new area of research and raised new scientific challenges related to the management of the variability of speech and its spontaneous modes of expressions. Furthermore, robust analysis of phone conversations is subject to a large number of works this thesis is in the continuity.More specifically, this thesis focuses on edit disfluencies analysis and their realisation in conversational data from EDF call centres, using speech signal and both manual and automatic transcriptions. This work is linked to numerous domains, from robust analysis of speech data to analysis and management of aspects related to speech expression. The aim of the thesis is to propose appropriate methods to deal with speech data to improve text mining analyses of speech transcriptions (treatment of disfluencies). To address these issues, we have finely analysed the characteristic phenomena and behavior of spontaneous speech (disfluencies) in conversational data from EDF call centres and developed an automatic method for their detection using linguistic, prosodic, discursive and para-linguistic features.The contributions of this thesis are structured in three areas of research. First, we proposed a specification of call centre conversations from the prespective of the spontaneous speech and from the phenomena that specify it. Second, we developed (i) an enrichment chain and effective processings of speech data on several levels of analysis (linguistic, acoustic-prosodic, discursive and para-linguistic) ; (ii) an system which detect automaticcaly the edit disfluencies suitable for conversational data and based on the speech signal and transcriptions (manual or automatic). Third, from a "resource" point of view, we produced a corpus of automatic transcriptions of conversations taken from call centres which has been annotated in edition disfluencies (using a semi-automatic method)
Tafforeau, Jérémie. "Modèle joint pour le traitement automatique de la langue : perspectives au travers des réseaux de neurones." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0430/document.
NLP researchers has identified different levels of linguistic analysis. This lead to a hierarchical division of the various tasks performed in order to analyze a text statement. The traditional approach considers task-specific models which are subsequently arranged in cascade within processing chains (pipelines). This approach has a number of limitations: the empirical selection of models features, the errors accumulation in the pipeline and the lack of robusteness to domain changes. These limitations lead to particularly high performance losses in the case of non-canonical language with limited data available such as transcriptions of conversations over phone. Disfluencies and speech-specific syntactic schemes, as well as transcription errors in automatic speech recognition systems, lead to a significant drop of performances. It is therefore necessary to develop robust and flexible systems. We intend to perform a syntactic and semantic analysis using a deep neural network multitask model while taking into account the variations of domain and/or language registers within the data
Dymetman, Marc. "Transformations de grammaires logiques et réversibilité en traduction automatique." Grenoble 1, 1992. http://www.theses.fr/1992GRE10097.
MAHMOUDI, SEYED MOHAMM. "Contribution au traitement automatique de la langue persane : analyse et reconnaissance des syntagmes nominaux." Lyon 2, 1994. http://www.theses.fr/1994LYO20070.
The aim of this thesis is the conception and realisation of a morpho-syntaxic parser of Persian designed for applications to automatic indexing and computer-assisted instruction (or learning) of the language (cai or cal). One of the chief extensions to this research is the automatic processing of natural language by means of artificial intelligence systems. The main interest of this contribution is to study the automatic recognition of noun phrases in Persian. Each stage of the parsing is described in a program in Prolog language (Turbo-Prolog). The whole of the lexical datas necessary for the categorisation of morpho-syntaxic forms is presented as a database
Chen, Lihu. "Towards efficient, general and robust entity disambiguation systems." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT017.
Entity disambiguation aims to map mentions in documents to standard entities in a given knowledge base, which is important for various applications such as information extraction, Web search and question answering.Although the field is very vibrant with many novel works popping up, there are three questions that are underexplored by prior work.1) Can we use a small model to approach the performance of a big model?2) How to develop a single disambiguation system adapted to multiple domains?3) Are existing systems robust to out-of-vocabulary words and different word orderings?Based on the three questions, we explore how to construct an efficient, general and robust entity disambiguation system. We also successfully apply entity disambiguation to the knowledge base completion task, especially for the long-tail entities
Buet, François. "Modèles neuronaux pour la simplification de parole, application au sous-titrage." Electronic Thesis or Diss., université Paris-Saclay, 2022. https://theses.hal.science/tel-03920729.
In the context of linguistics, simplification is generally defined as the process consisting in reducing the complexity of a text (or speech), while preserving its meaning as much as possible. Its primary application is to make understanding and reading easier for a user. It is regarded, inter alia, as a way to enhance the legibility of texts toward deaf and hard-of-hearing people (deafness often causes a delay in reading development), in particular in the case of subtitling. While interlingual subtitles are used to disseminate movies and programs in other languages, intralingual subtitles (or captions) are the only means, with sign language interpretation, by which the deaf and hard-of-hearing can access audio-visual contents. Yet videos have taken a prominent place in society, wether for work, recreation, or education. In order to ensure the equality of people through participation in public and social life, many countries in the world (including France) have implemented legal obligations concerning television programs subtitling. ROSETTA (Subtitling RObot and Adapted Translation) is a public-private collaborative research program, seeking to develop technological accessibility solutions for audio-visual content in French. This thesis, conducted within the ROSETTA project, aims to study automatic speech simplification with neural models, and to apply it into the context of intralinguistic subtitling for French television programs. Our work mainly focuses on analysing length control methods, adapting subtitling models to television genres, and evaluating subtitles segmentation. We notably present a new subtitling corpus created from data collected as part of project ROSETTA, as well as a new metric for subtitles evaluation, Sigma
Sileo, Damien. "Représentations sémantiques et discursives pour la compréhension automatique du langage naturel." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30201.
Computational models for automatic text understanding have gained a lot of interest due to unusual performance gains over the last few years, some of them leading to super-human scores. This success reignited some grandeur claims about artificial intelligence, such as universal sentence representation. In this thesis, we question these claims through two complementary angles. Firstly, are neural networks and vector representations expressive enough to process text and perform a wide array of complex tasks? In this thesis, we will present currently used computational neural models and their training techniques. We propose a criterion for expressive compositions and show that a popular evaluation suite and sentence encoders (SentEval/InferSent) have an expressivity bottleneck; minor changes can yield new compositions that are expressive and insightful, but might not be sufficient, which may justify the paradigm shift towards newer Transformers-based models. Secondly, we will discuss the question of universality in sentence representation: what actually lies behind these universality claims? We delineate a few theories of meaning, and in a subsequent part of this thesis, we argue that semantics (unsituated, literal content) as opposed to pragmatics (meaning as use) is preponderant in the current training and evaluation data of natural language understanding models. To alleviate that problem, we show that discourse marker prediction (classification of hidden discourse markers between sentences) can be seen as a pragmatics-centered training signal for text understanding. We build a new discourse marker prediction dataset that yields significantly better results than previous work. In addition, we propose a new discourse-based evaluation suite that could incentivize researchers to take into account pragmatic considerations when evaluating text understanding models
Hassoun, Mohamed. "Conception d'un dictionnaire pour le traitement automatique de l'arabe dans différents contextes d'application." Lyon 1, 1987. http://www.theses.fr/1987LYO10035.
Balicco, Laurence. "Génération de repliques en français dans une interface homme-machine en langue naturelle." Grenoble 2, 1993. http://www.theses.fr/1993GRE21025.
This research takes place in the context of natural language generation. This field has benn neglected for a long time because it seemed a much easier phase that those of analysis. The thesis corresponds to a first work on generation in the criss team and places the problem of generation in the context of a manmachine dialogue in natural language. Some of its consequences are : generation from a logical content to be translated into natural language, this translation of the original content kept as close as possible,. . . After the study of the different works that have been done, we decided to create our own generation system, resusing when it is possible, the tools elaborated during the analyzing process. This generation process is based on a linguistic model, which uses syntactic and morphologic information and in which linguistic transformations called operations are defined (coodination, anaphorisation, thematisation,. . . ). These operations can be given by the dialogue or calulated during the generation process. The model allows the creation of several of the same utterance and therefore a best adaptation for different users. This thesis presents the studied works, essentially on the french and the english languages, the linguistic model developped, the computing model used, and a brief presentation of an european project which offers a possible application of ou
Kla, Régis. "Osmose : a natural language based object oriented approach with its CASE tool." Paris 1, 2004. http://www.theses.fr/2004PA010020.
Gianola, Lucie. "Aspects textuels de la procédure judiciaire exploitée en analyse criminelle et perspectives pour son traitement automatique." Thesis, CY Cergy Paris Université, 2020. http://www.theses.fr/2020CYUN1065.
Criminal analysis is a discipline that supports investigations practiced within the National Gendarmerie. It is based on the use of the documents compiled in the judicial procedure file (witness interviews, search warrants, expert reports, phone and bank data, etc.) to synthesize the information collected and to propose a new understanding of the facts examined. While criminal analysis uses data visualization software (i. e. IBM Analyst’s Notebook) to display the hypotheses formulated, the digital and textual management of the file documents is entirely manual. However, criminal analysis relies on entities to formalize its practice.The presentation of the research context details the practice of criminal analysis as well as the constitution of judicial procedure files as textual corpora.We then propose perspectives for the adaptation of natural language processing(NLP) and information extraction methods to the case study, including a comparison of the concepts of entity in criminal analysis and named entity in NLP. This comparison is done on the conceptual and linguistic plans. A first approach to the detection of entities in witness interviews is presented.Finally, since textual genre is a parameter to be taken into account when applying automatic processing to text, we develop a structure of the « legal » textual genre into discourse, genres, and sub-genres through a textometric study aimed at characterizing different types of texts (including witness interviews) produced by the field of justice
Gonzalez, Preciado Matilde. "Computer vision methods for unconstrained gesture recognition in the context of sign language annotation." Toulouse 3, 2012. http://thesesups.ups-tlse.fr/1798/.
This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency
Blanvillain, Odile. "Représentation formelle de l'emploi de should en anglais contemporain, en vue d'un traitement automatique." Paris 7, 1993. http://www.theses.fr/1993PA070074.
The aim of this thesis is to propose a formal (metalinguistic) representation of the different uses of should, in modern english, within an utterercentered approach, and to examine how this study can contribute to a possible automatic processing of the modal, in context. Its aim is also to bring out the particular status of should in comparison with shall, must or ought to, for example, and to understand the interplay of epistemic and root modalities that allows should, in its "ambiguous" uses. It appears that the difference between should and shall, must or ought to is essentially qualitative. Besides, the different possible ambiguities in the interpretation of should can be explained, in each case, by the specific characteristics of its fonctioning (positive valuation, particularities of its deontic origin, etc. . ) concerning the contribution to a possible automatic processing, the bringing out of these characteristics can be of importance within a work of automatic comprehension of texts. But, on the syntactic point of view, on the contrary, it appears extremely difficult to determine automatically the value of the modal
Filhol, Michael. "Modèle descriptif des signes pour un traitement automatique des langues des signes." Phd thesis, Université Paris Sud - Paris XI, 2008. http://tel.archives-ouvertes.fr/tel-00300591.
Eyango, Mouen Alexis. "Lexique interactif pour l'analyse automatique du français." Lyon 1, 1986. http://www.theses.fr/1986LYO11712.