Índice
Literatura académica sobre el tema "Extraction d'informations multilingues"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Extraction d'informations multilingues".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Tesis sobre el tema "Extraction d'informations multilingues"
Yeh, Hui-Syuan. "Prompt-based Relation Extraction for Pharmacovigilance". Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG097.
Texto completoExtracting and maintaining up-to-date knowledge from diverse linguistic sources is imperative for the benefit of public health. While professional sources, including scientific journals and clinical notes, provide the most reliable knowledge, observations reported in patient forums and social media can bring complementary information for certain themes. Spotting entities and their relationships in these varied sources is particularly valuable. We focus on relation extraction in the medical domain. At the outset, we highlight the inconsistent terminology in the community and clarify the diverse setups used to build and evaluate relation extraction systems. To obtain reliable comparisons, we compare systems using the same setup. Additionally, we conduct a series of stratified evaluations to further investigate which data properties affect the models' performance. We show that model performance tends to decrease with relation density, relation diversity, and entity distance. Subsequently, this work explores a new training paradigm for biomedical relation extraction: prompt-based methods with masked language models. In this context, performance depends on the quality of prompt design. This requires manual efforts and domain knowledge, especially when designing the label words that link model predictions to relation classes. To overcome this overhead, we introduce an automated label word generation technique leveraging a dependency parser and training data. This approach minimizes manual intervention and enhances model performance with fewer parameters to be fine-tuned. Our approach performs on par with other verbalizer methods without additional training. Then, this work addresses information extraction from text written by laypeople about adverse drug reactions. To this end, as part of a joint effort, we have curated a tri-lingual corpus in German, French, and Japanese collected from patient forums and social media platforms. The challenge and the potential applications of the corpus are discussed. We present baseline experiments on the corpus that highlight three points: the effectiveness of a multilingual model in the cross-lingual setting, preparing negative samples for relation extraction by considering the co-reference and the distance between entities, and methods to address the highly imbalanced distribution of relations. Lastly, we integrate information from a medical knowledge base into the prompt-based approach with autoregressive language models for biomedical relation extraction. Our goal is to use external factual knowledge to enrich the context of the entities involved in the relation to be classified. We find that general models particularly benefit from external knowledge. Our experimental setup reveals that different entity markers are effective across different corpora. We show that the relevant knowledge helps, though the format of the prompt has a greater impact on performance than the additional information itself
Hanoka-Maitenaz, Valérie. "Extraction et complétion de terminologies multilingues". Sorbonne Paris Cité, 2015. https://hal.science/tel-01257201.
Texto completoThis work focuses on the analysis of verbatim produced in the context of employee surveys carried out within multinational companies and processed by the Verbatim Analysis - VERA company. It involves the design and development of a processing pi¬peline for automatically extracting terminologies in a virtually language-independent, register-independent and domain-independent way
Nguyen, Tuan Dang. "Extraction d'information `a partir de documents Web multilingues : une approche d'analyses structurelles". Phd thesis, Université de Caen, 2006. http://tel.archives-ouvertes.fr/tel-00258948.
Texto completoNguyen, Dang Tuan. "Extraction d'information à partir de documents Web multilingues : une approche d'analyses structurelles". Caen, 2006. http://www.theses.fr/2006CAEN2023.
Texto completoMultilingual Web Document (MWD) processing has become one of the major interests of research and development in the area of information retrieval. Therefore, we observed that the structure of the multilingual resources has not been enough explored in most of the research works in this area. We consider that links structure embed crucial information for both hyperdocument retrieving and mining process. Discarding the multilingual information structures could affect the processing performance and generate various problems : i)°Redundancy : if the site proposes simultaneously translations in several languages, ii)° Noisy information: by using labels to shift from language to another, iii)° Loosing information: if the process does not consider the structure specificity of each language. In this context, we wonder to remind that each Web site is considered as a hyper-document that contains a set of Web documents (pages, screen, messages) which can be explored through the links paths. Therefore, detecting the dominant languages, in a Web Site, could be done in a different ways. The framework of this experimental research thesis is structures analysis for information extraction from a great number of heterogeneous structured or semi-structured electronic documents (essentially the Web document). It covers the following aspects : Enumerating the dominants languages, Setting-up (virtual) frontiers between those languages, enabling further processing, Recognizing the dominants languages. To experiment and validate our aim we have developed Hyperling which is a formal, language independent, system dealing with Web Documents. Hyperling proposes a Multilingual Structural Analysis approach to cluster and retrieve Web Document. Hyperling’s fundamental hypothesis is based on the notion of relation-density : The Monolingual relation density: i. E. Links between Web Documents written in the same language, The Interlingual relation density: i. E. Links between Web Documents written in different languages. In a Web document representation we can encounter a high level of monolingual relation density and low level of inter-lingual relation density. Therefore, we can consider a MWD to be represented by a set of clusters. Regarding the density level of each cluster, it may represent a dominant language. This hypothesis has been the core of Hyperling and has been experimented and approved on a real multilingual web documents (IMF, UNDP, UNFPA, UNICEF, WTO)
Charton, Eric. "Génération de phrases multilingues par apprentissage automatique de modèles de phrases". Phd thesis, Université d'Avignon, 2010. http://tel.archives-ouvertes.fr/tel-00622561.
Texto completoRouquet, David. "Multilinguisation d'ontologies dans le cadre de la recherche d'information translingue dans des collections d'images accompagnées de textes spontanés". Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00743652.
Texto completoKorenchuk, Yuliya. "Méthode d'enrichissement et d'élargissement d'une ontologie à partir de corpus de spécialité multilingues". Thesis, Strasbourg, 2017. http://www.theses.fr/2017STRAC014/document.
Texto completoThis thesis proposes a method of enrichment and population of an ontology, a structure of concepts linked by semantic relations, by terms in French, English and German from comparable domain-specific corpora. Our main contribution is the development of extraction methods based on endogenous resources, learned from the corpus and the ontology being analyzed. Using caracter n-grams, these resources are available and independent of a particular language or domain. The first contribution concerns the use of endogenous morphological and morphosyntactic resources for mono- and polylexical terms extraction from the corpus. The second contribution aims to use endogenous resources to identify translations for these terms. The third contribution concerns the construction of endogenous morphological families designed to enrich and populate the ontology
Doucet, Antoine. "Extraction, Exploitation and Evaluation of Document-based Knowledge". Habilitation à diriger des recherches, Université de Caen, 2012. http://tel.archives-ouvertes.fr/tel-01070505.
Texto completoSchleider, Thomas. "Knowledge Modeling and Multilingual Information Extraction for the Understanding of the Cultural Heritage of Silk". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS280.
Texto completoModeling any type of human knowledge is a complex effort and needs to consider all specificities of its domain including niche vocabulary. This thesis focuses on such an endeavour for the knowledge about the European silk object production, which can be considered obscure and therefore endangered. However, the fact that such Cultural Heritage data is heterogenous, spread across many museums worldwide, sparse and multilingual poses particular challenges for which knowledge graphs have become more and more popular in recent years. Our main goal is not only into investigating knowledge representations, but also in which ways such an integration process can be accompanied through enrichments, such as information reconciliation through ontologies and vocabularies, as well as metadata predictions to fill gaps in the data. We will first propose a workflow for the management for the integration of data about silk artifacts and afterwards present different classification approaches, with a special focus on unsupervised and zero-shot methods. Finally, we study ways of making exploration of such metadata and images afterwards as easy as possible
Guénec, Nadège. "Méthodologies pour la création de connaissances relatives au marché chinois dans une démarche d'Intelligence Économique : application dans le domaine des biotechnologies agricoles". Phd thesis, Université Paris-Est, 2009. http://tel.archives-ouvertes.fr/tel-00554743.
Texto completo