Bibliografías temáticas / Multilingual information extraction

Índice

Artículos de revistas
Tesis
Libros
Capítulos de libros
Actas de conferencias

Literatura académica sobre el tema "Multilingual information extraction"

Autor: Grafiati

Publicado: 1 de marzo de 2025

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Multilingual information extraction".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Multilingual information extraction"

Claro, Daniela Barreiro, Marlo Souza, Clarissa Castellã Xavier y Leandro Oliveira. "Multilingual Open Information Extraction: Challenges and Opportunities". Information 10, n.º 7 (2 de julio de 2019): 228. http://dx.doi.org/10.3390/info10070228.

Texto completo

Resumen

The number of documents published on the Web in languages other than English grows every year. As a consequence, the need to extract useful information from different languages increases, highlighting the importance of research into Open Information Extraction (OIE) techniques. Different OIE methods have dealt with features from a unique language; however, few approaches tackle multilingual aspects. In those approaches, multilingualism is restricted to processing text in different languages, rather than exploring cross-linguistic resources, which results in low precision due to the use of general rules. Multilingual methods have been applied to numerous problems in Natural Language Processing, achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We argue that a multilingual approach can enhance OIE methods as it is ideal to evaluate and compare OIE systems, and therefore can be applied to the collected facts. In this work, we discuss how the transfer knowledge between languages can increase acquisition from multilingual approaches. We provide a roadmap of the Multilingual Open IE area concerning state of the art studies. Additionally, we evaluate the transfer of knowledge to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Khairova, Nina, Orken Mamyrbayev, Kuralay Mukhsina, Anastasiia Kolesnyk y Saurabh Pratap. "Logical-linguistic model for multilingual Open Information Extraction". Cogent Engineering 7, n.º 1 (1 de enero de 2020): 1714829. http://dx.doi.org/10.1080/23311916.2020.1714829.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Hashemzahde, Bahare y Majid Abdolrazzagh-Nezhad. "Improving keyword extraction in multilingual texts". International Journal of Electrical and Computer Engineering (IJECE) 10, n.º 6 (1 de diciembre de 2020): 5909. http://dx.doi.org/10.11591/ijece.v10i6.pp5909-5916.

Texto completo

Resumen

The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Vasilkovsky, Michael, Anton Alekseev, Valentin Malykh, Ilya Shenbin, Elena Tutubalina, Dmitriy Salikhov, Mikhail Stepnov, Andrey Chertok y Sergey Nikolenko. "DetIE: Multilingual Open Information Extraction Inspired by Object Detection". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 10 (28 de junio de 2022): 11412–20. http://dx.doi.org/10.1609/aaai.v36i10.21393.

Texto completo

Resumen

State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We use an order-agnostic loss based on bipartite matching that forces unique predictions and a Transformer-based encoder-only architecture for sequence labeling. The proposed approach is faster and shows superior or similar performance in comparison with state of the art models on standard benchmarks in terms of both quality metrics and inference time. Our model sets the new state of the art performance of 67.7% F1 on CaRB evaluated as OIE2016 while being 3.35x faster at inference than previous state of the art. We also evaluate the multilingual version of our model in the zero-shot setting for two languages and introduce a strategy for generating synthetic multilingual data to fine-tune the model for each specific language. In this setting, we show performance improvement of 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages. Code and models are available at https://github.com/sberbank-ai/DetIE.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ghimire, Dadhi Ram, Sanjeev Panday y Aman Shakya. "Information Extraction from a Large Knowledge Graph in the Nepali Language". National College of Computer Studies Research Journal 3, n.º 1 (9 de diciembre de 2024): 33–49. https://doi.org/10.3126/nccsrj.v3i1.72336.

Texto completo

Resumen

Information is abundant in the web. The knowledge graph is used for organizing information in a structured format that can be retrieved using specialized queries. There are many Knowledge graphs but they differ in their ontologies and taxonomies as well as property types that bind the relation between the entities, which creates problems while extracting the knowledge from them. There is an issue in multilingual support. While most of them claim to be multilingual they are more suitable for querying in the English language. Most of the existing knowledge graphs in existence are based on Wikipedia Info box. In this work, we have devised an information extraction pipeline for retrieving knowledge in Nepali Language from Wikidata using SPARQL endpoint. Queries based on Wikipedia info box has more accurate responses than the Queries based on the paragraph content of Wikipedia articles. The main reason behind that is that the information inside the paragraph is not linked properly in the Wikipedia info box.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Azzam, Saliha, Kevin Humphreys, Robert Gaizauskas y Yorick Wilks. "Using a language independent domain model for multilingual information extraction". Applied Artificial Intelligence 13, n.º 7 (octubre de 1999): 705–24. http://dx.doi.org/10.1080/088395199117252.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Seretan, Violeta y Eric Wehrli. "Multilingual collocation extraction with a syntactic parser". Language Resources and Evaluation 43, n.º 1 (1 de octubre de 2008): 71–85. http://dx.doi.org/10.1007/s10579-008-9075-7.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Zhang, Ruijuan. "Multilingual pretrained based multi-feature fusion model for English text classification". Computer Science and Information Systems, n.º 00 (2025): 4. https://doi.org/10.2298/csis240630004z.

Texto completo

Resumen

Deep learning methods have been widely applied to English text classification tasks in recent years, achieving strong performance. However, current methods face two significant challenges: (1) they struggle to effectively capture long-range contextual structure information within text sequences, and (2) they do not adequately integrate linguistic knowledge into representations for enhancing the performance of classifiers. To this end, a novel multilingual pre-training based multi-feature fusion method is proposed for English text classification (MFFMP-ETC). Specifically, MFFMP-ETC consists of the multilingual feature extraction, the multilevel structure learning, and the multi-view representation fusion. MFFMP-ETC utilizes the Multilingual BERT as deep semantic extractor to introduce language information into representation learning, which significantly endows text representations with robustness. Then, MFFMP-ETC integrates Bi-LSTM and TextCNN into multilingual pre-training architecture to capture global and local structure information of English texts, via modelling bidirectional contextual semantic dependencies and multi-granularity local semantic dependencies. Meanwhile, MFFMP-ETC devises the multi-view representation fusion within the invariant semantic learning of representations to aggregate consistent and complementary information among views. MFFMP-ETC synergistically integrates Multilingual BERT?s deep semantic features, Bi-LSTM?s bidirectional context processing, and TextCNN local feature extraction, offering a more comprehensive and effective solution for capturing long-distance dependencies and nuanced contextual information in text classification. Finally, results on three datasets show MFFMP-ETC conducts a new baseline in terms of accuracy, sensitivity, and precision, verifying progressiveness and effectiveness of MFFMP-ETC in the text classification.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Danielsson, Pernilla. "Automatic extraction of meaningful units from corpora". International Journal of Corpus Linguistics 8, n.º 1 (14 de agosto de 2003): 109–27. http://dx.doi.org/10.1075/ijcl.8.1.06dan.

Texto completo

Resumen

In this article, we will reconsider the notion of a word as the basic unit of analysis in language and propose that in an information and meaning carrying system the unit of analysis should be a unit of meaning (UM). Such a UM may consist of one or more words. A method will be promoted that attempts to automatically retrieve UMs from corpora. To illustrate the results that may be obtained by this method, the node word ‘stroke’ will be used in a small study. The results will be discussed, with implications considered for both monolingual and multilingual use. The monolingual study will benefit from using the British National Corpus, while the multilingual study introduces a parallel corpus consisting of Swedish novels and their translations into English.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Aysa, Anwar, Mijit Ablimit, Hankiz Yilahun y Askar Hamdulla. "Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision". Information 13, n.º 4 (31 de marzo de 2022): 175. http://dx.doi.org/10.3390/info13040175.

Texto completo

Resumen

Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language is a derivative language, and its language resources are scarce and noisy. Moreover, it is difficult to find a bilingual resource to utilize the linguistic knowledge of other large resource languages, such as Chinese or English. There is little related research on unsupervised extraction for the Chinese-Uyghur languages, and the existing methods mainly focus on term extraction methods based on translated parallel corpora. Accordingly, unsupervised knowledge extraction methods are effective, especially for the low-resource languages. This paper proposes a method to extract a Chinese-Uyghur bilingual dictionary by combining the inter-word relationship matrix mapped by the neural network cross-language word embedding vector. A seed dictionary is used as a weak supervision signal. A small Chinese-Uyghur parallel data resource is used to map the multilingual word vectors into a unified vector space. As the word-particles of these two languages are not well-coordinated, stems are used as the main linguistic particles. The strong inter-word semantic relationship of word vectors is used to associate Chinese-Uyghur semantic information. Two retrieval indicators, such as nearest neighbor retrieval and cross-domain similarity local scaling, are used to calculate similarity to extract bilingual dictionaries. The experimental results show that the accuracy of the Chinese-Uyghur bilingual dictionary extraction method proposed in this paper is improved to 65.06%. This method helps to improve Chinese-Uyghur machine translation, automatic knowledge extraction, and multilingual translations.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Más fuentes

Tesis sobre el tema "Multilingual information extraction"

Ramsey, Marshall C., Thian-Huat Ong y Hsinchun Chen. "Multilingual Input System for the Web - an Open Multimedia Approach of Keyboard and Handwriting Recognition for Chinese and Japanese". IEEE, 1998. http://hdl.handle.net/10150/105120.

Texto completo

Resumen

Artificial Intelligence Lab, Department of MIS, University of Arizona
The basic building block of a multilingual information retrieval system is the input system. Chinese and Japanese characters pose great challenges for the conventional 101 -key alphabet-based keyboard, because they are radical-based and number in the thousands. This paper reviews the development of various approaches and then presents a framework and working demonstrations of Chinese and Japanese input methods implemented in Java, which allow open deployment over the web to any platform, The demo includes both popular keyboard input methods and neural network handwriting recognition using a mouse or pen. This framework is able to accommodate future extension to other input mediums and languages of interest.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ramsey, Marshall C., Thian-Huat Ong y Hsinchun Chen. "Multilingual input system for the Web - an open multimedia approach of keyboard and handwritten recognition for Chinese and Japanese". IEEE, 1998. http://hdl.handle.net/10150/105350.

Texto completo

Resumen

Artificial Intelligence Lab, Department of MIS, University of Arizona
The basic building block of a multilingual information retrieval system is the input system. Chinese and Japanese characters pose great challenges for the conventional 101-key alphabet-based keyboard, because they are radical-based and number in the thousands. This paper reviews the development of various approaches and then presents a framework and working demonstrations of Chinese and Japanese input methods implemented in Java, which allow open deployment over the web to any platform, The demo includes both popular keyboard input methods and neural network handwriting recognition using a mouse or pen. This framework is able to accommodate future extension to other input mediums and languages of interest.

Los estilos APA, Harvard, Vancouver, ISO, etc.

De, Wilde Max. "From Information Extraction to Knowledge Discovery: Semantic Enrichment of Multilingual Content with Linked Open Data". Doctoral thesis, Universite Libre de Bruxelles, 2015. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/218774.

Texto completo

Resumen

Discovering relevant knowledge out of unstructured text in not a trivial task. Search engines relying on full-text indexing of content reach their limits when confronted to poor quality, ambiguity, or multiple languages. Some of these shortcomings can be addressed by information extraction and related natural language processing techniques, but it still falls short of adequate knowledge representation. In this thesis, we defend a generic approach striving to be as language-independent, domain-independent, and content-independent as possible. To reach this goal, we offer to disambiguate terms with their corresponding identifiers in Linked Data knowledge bases, paving the way for full-scale semantic enrichment of textual content. The added value of our approach is illustrated with a comprehensive case study based on a trilingual historical archive, addressing constraints of data quality, multilingualism, and language evolution. A proof-of-concept implementation is also proposed in the form of a Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), demonstrating to a certain extent the general applicability of our methodology to any language, domain, and type of content.
Découvrir de nouveaux savoirs dans du texte non-structuré n'est pas une tâche aisée. Les moteurs de recherche basés sur l'indexation complète des contenus montrent leur limites quand ils se voient confrontés à des textes de mauvaise qualité, ambigus et/ou multilingues. L'extraction d'information et d'autres techniques issues du traitement automatique des langues permettent de répondre partiellement à cette problématique, mais sans pour autant atteindre l'idéal d'une représentation adéquate de la connaissance. Dans cette thèse, nous défendons une approche générique qui se veut la plus indépendante possible des langues, domaines et types de contenus traités. Pour ce faire, nous proposons de désambiguïser les termes à l'aide d'identifiants issus de bases de connaissances du Web des données, facilitant ainsi l'enrichissement sémantique des contenus. La valeur ajoutée de cette approche est illustrée par une étude de cas basée sur une archive historique trilingue, en mettant un accent particulier sur les contraintes de qualité, de multilinguisme et d'évolution dans le temps. Un prototype d'outil est également développé sous le nom de Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), démontrant ainsi le caractère généralisable de notre approche, dans un certaine mesure, à n'importe quelle langue, domaine ou type de contenu.
Doctorat en Information et communication
info:eu-repo/semantics/nonPublished

Los estilos APA, Harvard, Vancouver, ISO, etc.

Schleider, Thomas. "Knowledge Modeling and Multilingual Information Extraction for the Understanding of the Cultural Heritage of Silk". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS280.

Texto completo

Resumen

La modélisation de tout type de connaissance humaine est un effort complexe qui doit prendre en compte toutes les spécificités de son domaine, y compris le vocabulaire de niche. Cette thèse se concentre sur un tel effort pour la connaissance de la production européenne d’objets en soie, qui peut être considérée comme obscure et donc en danger. Cependant, le fait que ces données du patrimoine culturel soient hétérogènes, réparties dans de nombreux musées à travers le monde, éparses et multilingues, pose des défis particuliers pour lesquels les graphes de connaissances sont devenus de plus en plus populaires ces dernières années. Notre objectif principal n’est pas seulement d’étudier les représentations des connaissances, mais aussi de voir comment un tel processus d’intégration peut être accompagné d’enrichissements, tels que la réconciliation des informations par le biais d’ontologies et de vocabulaires, ainsi que la prédiction de métadonnées pour combler les lacunes des données. Nous proposerons d’abord un flux de travail pour la gestion de l’intégration des données sur les artefacts de la soie, puis nous présenterons différentes approches de classification, en mettant l’accent sur les méthodes non supervisées et les méthodes de type "zero-shot". Enfin, nous étudions les moyens de rendre l’exploration de ces métadonnées et des images par la suite aussi facile que possible
Modeling any type of human knowledge is a complex effort and needs to consider all specificities of its domain including niche vocabulary. This thesis focuses on such an endeavour for the knowledge about the European silk object production, which can be considered obscure and therefore endangered. However, the fact that such Cultural Heritage data is heterogenous, spread across many museums worldwide, sparse and multilingual poses particular challenges for which knowledge graphs have become more and more popular in recent years. Our main goal is not only into investigating knowledge representations, but also in which ways such an integration process can be accompanied through enrichments, such as information reconciliation through ontologies and vocabularies, as well as metadata predictions to fill gaps in the data. We will first propose a workflow for the management for the integration of data about silk artifacts and afterwards present different classification approaches, with a special focus on unsupervised and zero-shot methods. Finally, we study ways of making exploration of such metadata and images afterwards as easy as possible

Los estilos APA, Harvard, Vancouver, ISO, etc.

Yeh, Hui-Syuan. "Prompt-based Relation Extraction for Pharmacovigilance". Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG097.

Texto completo

Resumen

L'extraction de connaissances à jour à partir de sources textuelles diverses est importante pour la santé publique. Alors que les sources professionnelles, notamment les revues scientifiques et les notes cliniques, fournissent les connaissances les plus fiables, les observations apportées dans les forums de patients et les médias sociaux permettent d'obtenir des informations complémentaires pour certains thèmes. Détecter les entités et leurs relations dans ces sources variées est particulièrement précieux. Nous nous concentrons sur l'extraction de relations dans le domaine médical. Nous commençons par souligner l'incohérence de la terminologie utilisée dans la communauté et clarifions les configurations distinctes employées pour la construction et l'évaluation d'un système d'extraction de relations. Pour obtenir une comparaison fiable, nous comparons les systèmes en utilisant la même configuration. Nous effectuons également une série d'évaluations stratifiées afin d'étudier plus en détail les propriétés des données qui affectent les performances des modèles. Nous montrons que la performance des modèles tend à diminuer avec la densité des relations, la diversité des relations et la distance entre les entités. Par la suite, ce travail explore un nouveau paradigme d'entraînement pour l'extraction de relations biomédicales : les méthodes à base de prompt avec des modèles de langue masqués. Dans ce contexte, les performances dépendent de la qualité de la conception des prompts. Cela nécessite des efforts manuels et une connaissance du domaine, notamment dans la conception des mots étiquettes qui relient les prédictions du modèle aux classes de relations. Pour surmonter ce problème, nous introduisons une technique de génération automatique de mots étiquettes qui s'appuie sur un analyseur en dépendance et les données d'entraînement. Cette approche minimise l'intervention manuelle et améliore l'efficacité des modèles avec moins de paramètres à affiner. Notre approche a des performances similaires aux autres méthodes de verbalisation sans nécessiter d'entraînement supplémentaire. Ensuite, ce travail traite de l'extraction d'informations à partir de textes écrits par des auteurs non spécialistes sur les effets indésirables des médicaments. À cette fin, dans le cadre d'un effort conjoint, nous avons constitué un corpus trilingue en allemand, français et japonais collecté à partir de forums de patients et de plates-formes de médias sociaux. Le défi et les applications potentielles du corpus sont discutés. Nous présentons des expériences initiales sur le corpus en mettant en avant trois points : l'efficacité d'un modèle multilingue dans un contexte translingue, une préparation d'exemples négatifs pour l'extraction de relations qui tient compte de la coréférence et de la distance entre les entités, et des méthodes pour traiter la distribution hautement déséquilibrée des relations. Enfin, nous intégrons des informations provenant d'une base de connaissances médicales dans une approche à base de prompt avec des modèles de langue autorégressifs pour l'extraction de relations biomédicales. Notre objectif est d'utiliser des connaissances factuelles externes pour enrichir le contexte des entités impliquées dans la relation à classifier. Nous constatons que les modèles généraux bénéficient particulièrement des connaissances externes. Notre dispositif expérimental révèle que différents marqueurs d'entités sont efficaces dans différents corpus. Nous montrons que les connaissances pertinentes sont utiles, mais que le format du prompt a un impact plus important sur les performances que les informations supplémentaires elles-mêmes
Extracting and maintaining up-to-date knowledge from diverse linguistic sources is imperative for the benefit of public health. While professional sources, including scientific journals and clinical notes, provide the most reliable knowledge, observations reported in patient forums and social media can bring complementary information for certain themes. Spotting entities and their relationships in these varied sources is particularly valuable. We focus on relation extraction in the medical domain. At the outset, we highlight the inconsistent terminology in the community and clarify the diverse setups used to build and evaluate relation extraction systems. To obtain reliable comparisons, we compare systems using the same setup. Additionally, we conduct a series of stratified evaluations to further investigate which data properties affect the models' performance. We show that model performance tends to decrease with relation density, relation diversity, and entity distance. Subsequently, this work explores a new training paradigm for biomedical relation extraction: prompt-based methods with masked language models. In this context, performance depends on the quality of prompt design. This requires manual efforts and domain knowledge, especially when designing the label words that link model predictions to relation classes. To overcome this overhead, we introduce an automated label word generation technique leveraging a dependency parser and training data. This approach minimizes manual intervention and enhances model performance with fewer parameters to be fine-tuned. Our approach performs on par with other verbalizer methods without additional training. Then, this work addresses information extraction from text written by laypeople about adverse drug reactions. To this end, as part of a joint effort, we have curated a tri-lingual corpus in German, French, and Japanese collected from patient forums and social media platforms. The challenge and the potential applications of the corpus are discussed. We present baseline experiments on the corpus that highlight three points: the effectiveness of a multilingual model in the cross-lingual setting, preparing negative samples for relation extraction by considering the co-reference and the distance between entities, and methods to address the highly imbalanced distribution of relations. Lastly, we integrate information from a medical knowledge base into the prompt-based approach with autoregressive language models for biomedical relation extraction. Our goal is to use external factual knowledge to enrich the context of the entities involved in the relation to be classified. We find that general models particularly benefit from external knowledge. Our experimental setup reveals that different entity markers are effective across different corpora. We show that the relevant knowledge helps, though the format of the prompt has a greater impact on performance than the additional information itself

Los estilos APA, Harvard, Vancouver, ISO, etc.

Akbik, Alan [Verfasser], Volker [Akademischer Betreuer] Markl, Hans [Gutachter] Uszkoreit y Chris [Gutachter] Biemann. "Exploratory relation extraction in large multilingual data / Alan Akbik ; Gutachter: Hans Uszkoreit, Chris Biemann ; Betreuer: Volker Markl". Berlin : Technische Universität Berlin, 2016. http://d-nb.info/1156177308/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Guénec, Nadège. "Méthodologies pour la création de connaissances relatives au marché chinois dans une démarche d'Intelligence Économique : application dans le domaine des biotechnologies agricoles". Phd thesis, Université Paris-Est, 2009. http://tel.archives-ouvertes.fr/tel-00554743.

Texto completo

Resumen

Le décloisonnement des économies et l'accélération mondiale des échanges commerciaux ont, en une décennie à peine, transformés l'environnement concurrentiel des entreprises. La zone d'activités s'est élargie en ouvrant des nouveaux marchés à potentiels très attrayants. Ainsi en est-il des BRIC (Brésil, Russie, Inde et Chine). De ces quatre pays, impressionnants par la superficie, la population et le potentiel économique qu'ils représentent, la Chine est le moins accessible et le plus hermétique à notre compréhension de par un système linguistique distinct des langues indo-européennes d'une part et du fait d'une culture et d'un système de pensée aux antipodes de ceux de l'occident d'autre part. Pourtant, pour une entreprise de taille internationale, qui souhaite étendre son influence ou simplement conserver sa position sur son propre marché, il est aujourd'hui absolument indispensable d'être présent sur le marché chinois. Comment une entreprise occidentale aborde-t-elle un marché qui de par son altérité, apparaît tout d'abord comme complexe et foncièrement énigmatique ? Six années d'observation en Chine, nous ont permis de constater les écueils dans l'accès à l'information concernant le marché chinois. Comme sur de nombreux marchés extérieurs, nos entreprises sont soumises à des déstabilisations parfois inimaginables. L'incapacité à " lire " la Chine et à comprendre les enjeux qui s'y déroulent malgré des effets soutenus, les erreurs tactiques qui découlent d'une mauvaise appréciation du marché ou d'une compréhension biaisée des jeux d'acteurs nous ont incités à réfléchir à une méthodologie de décryptage plus fine de l'environnement d'affaire qui puisse offrir aux entreprises françaises une approche de la Chine en tant que marché. Les méthodes de l'Intelligence Economique (IE) se sont alors imposées comme étant les plus propices pour plusieurs raisons : le but de l'IE est de trouver l'action juste à mener, la spécificité du contexte dans lequel évolue l'organisation est prise en compte et l'analyse se fait en temps réel. Si une approche culturelle est faite d'interactions humaines et de subtilités, une approche " marché " est dorénavant possible par le traitement automatique de l'information et de la modélisation qui s'en suit. En effet, dans toute démarche d'Intelligence Economique accompagnant l'implantation d'une activité à l'étranger, une grande part de l'information à portée stratégique vient de l'analyse du jeu des acteurs opérants dans le même secteur d'activité. Une telle automatisation de la création de connaissance constitue, en sus de l'approche humaine " sur le terrain ", une réelle valeur ajoutée pour la compréhension des interactions entre les acteurs car elle apporte un ensemble de connaissances qui, prenant en compte des entités plus larges, revêtent un caractère global, insaisissable par ailleurs. La Chine ayant fortement développé les technologies liées à l'économie de la connaissance, il est dorénavant possible d'explorer les sources d'information scientifiques et techniques chinoises. Nous sommes en outre convaincus que l'information chinoise prendra au fil du temps une importance de plus en plus cruciale. Il devient donc urgent pour les organisations de se doter de dispositifs permettant non seulement d'accéder à cette information mais également d'être en mesure de traiter les masses d'informations issues de ces sources. Notre travail consiste principalement à adapter les outils et méthodes issues de la recherche française à l'analyse de l'information chinoise en vue de la création de connaissances élaborées. L'outil MATHEO, apportera par des traitements bibliométriques une vision mondiale de la stratégie chinoise. TETRALOGIE, outil dédié au data-mining, sera adapté à l'environnement linguistique et structurel des bases de données scientifiques chinoises. En outre, nous participons au développement d'un outil d'information retreival (MEVA) qui intègre les données récentes des sciences cognitives et oeuvrons à son application dans la recherche de l'information chinoise, pertinente et adéquate. Cette thèse étant réalisée dans le cadre d'un contrat CIFRE avec le Groupe Limagrain, une application contextualisée de notre démarche sera mise en œuvre dans le domaine des biotechnologies agricoles et plus particulièrement autour des enjeux actuels de la recherche sur les techniques d'hybridation du blé. L'analyse de ce secteur de pointe, qui est à la fois une domaine de recherche fondamentale, expérimentale et appliquée donne actuellement lieu à des prises de brevets et à la mise sur le marché de produits commerciaux et représente donc une thématique très actuelle. La Chine est-elle réellement, comme nous le supposons, un nouveau territoire mondial de la recherche scientifique du 21e siècle ? Les méthodes de l'IE peuvent-elles s'adapter au marché chinois ? Après avoir fourni les éléments de réponses à ces questions dans es deux premières parties de notre étude, nous poserons en troisième partie, le contexte des biotechnologies agricoles et les enjeux mondiaux en terme de puissance économico-financière mais également géopolitique de la recherche sur l'hybridation du blé. Puis nous verrons en dernière partie comment mettre en œuvre une recherche d'information sur le marché chinois ainsi que l'intérêt majeur en terme de valeur ajoutée que représente l'analyse de l'information chinoise

Los estilos APA, Harvard, Vancouver, ISO, etc.

Charton, Éric. "Génération de phrases multilingues par apprentissage automatique de modèles de phrases". Thesis, Avignon, 2010. http://www.theses.fr/2010AVIG0175/document.

Texto completo

Resumen

La Génération Automatique de Texte (GAT) est le champ de recherche de la linguistique informatique qui étudie la possibilité d’attribuer à une machine la faculté de produire du texte intelligible. Dans ce mémoire, nous présentons une proposition de système de GAT reposant exclusivement sur des méthodes statistiques. Son originalité est d’exploiter un corpus en tant que ressource de formation de phrases. Cette méthode offre plusieurs avantages : elle simplifie l’implémentation d’un système de GAT en plusieurs langues et améliore les capacités d’adaptations d’un système de génération à un domaine sémantique particulier. La production, d’après un corpus d’apprentissage, des modèles de phrases finement étiquetées requises par notre générateur de texte nous a conduit à mener des recherches approfondies dans le domaine de l’extraction d’information et de la classification. Nous décrivons le système d’étiquetage et de classification de contenus encyclopédique mis au point à cette fin. Dans les étapes finales du processus de génération, les modèles de phrases sont exploités par un module de génération de texte multilingue. Ce module exploite des algorithmes de recherche d’information pour extraire du modèle une phrase pré-existante, utilisable en tant que support sémantique et syntaxique de l’intention à communiquer. Plusieurs méthodes sont proposées pour générer une phrase, choisies en fonction de la complexité du contenu sémantique à exprimer. Nous présentons notamment parmi ces méthodes une proposition originale de génération de phrases complexes par agrégation de proto-phrases de type Sujet, Verbe, Objet. Nous envisageons dans nos conclusions que cette méthode particulière de génération puisse ouvrir des voies d’investigations prometteuses sur la nature du processus de formation de phrases
Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system. In this thesis report, we present an architecture of NLG system relying on statistical methods. The originality of our proposition is its ability to use a corpus as a learning resource for sentences production. This method offers several advantages : it simplifies the implementation and design of a multilingual NLG system, capable of sentence production of the same meaning in several languages. Our method also improves the adaptability of a NLG system to a particular semantic field. In our proposal, sentence generation is achieved trough the use of sentence models, obtained from a training corpus. Extracted sentences are abstracted by a labelling step obtained from various information extraction and text mining methods like named entity recognition, co-reference resolution, semantic labelling and part of speech tagging. The sentence generation process is achieved by a sentence realisation module. This module provide an adapted sentence model to fit a communicative intent, and then transform this model to generate a new sentence. Two methods are proposed to transform a sentence model into a generated sentence, according to the semantic content to express. In this document, we describe the complete labelling system applied to encyclopaedic content to obtain the sentence models. Then we present two models of sentence generation. The first generation model substitute the semantic content to an original sentence content. The second model is used to find numerous proto-sentences, structured as Subject, Verb, Object, able to fit by part a whole communicative intent, and then aggregate all the selected proto-sentences into a more complex one. Our experiments of sentence generation with various configurations of our system have shown that this new approach of NLG have an interesting potential

Los estilos APA, Harvard, Vancouver, ISO, etc.

Gerber, Daniel [Verfasser], Klaus-Peter [Akademischer Betreuer] Fähnrich, Klaus-Peter [Gutachter] Fähnrich, Ngomo Axel-Cyrille [Akademischer Betreuer] Ngonga y Axel [Gutachter] Polleres. "Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications / Daniel Gerber ; Gutachter: Klaus-Peter Fähnrich, Axel Polleres ; Klaus-Peter Fähnrich, Axel-Cyrille Ngonga Ngomo". Leipzig : Universitätsbibliothek Leipzig, 2016. http://d-nb.info/1239739478/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Libros sobre el tema "Multilingual information extraction"

Poibeau, Thierry, Horacio Saggion, Jakub Piskorski y Roman Yangarber, eds. Multi-source, Multilingual Information Extraction and Summarization. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-28569-1.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Geoff, Barnbrook, Danielsson Pernilla y Mahlberg Michaela, eds. Meaningful texts: The extraction of semantic information from monolingual and multilingual corpora. London: Continuum, 2005.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Multisource Multilingual Information Extraction And Summarization. Springer, 2012.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Poibeau, Thierry, Horacio Saggion, Jakub Piskorski y Roman Yangarber. Multi-Source, Multilingual Information Extraction and Summarization. Springer London, Limited, 2012.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Poibeau, Thierry, Horacio Saggion, Jakub Piskorski y Roman Yangarber. Multi-source, Multilingual Information Extraction and Summarization. Springer, 2014.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Poibeau, Thierry, Horacio Saggion y Jakub Piskorski. Multi-source, Multilingual Information Extraction and Summarization. Springer, 2012.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corporations. Univ of Birmingham, 2004.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corporations. Univ of Birmingham, 2005.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Danielsson, Pernilla. Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. Bloomsbury Publishing Plc, 2010.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Danielsson, Pernilla. Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. Bloomsbury Publishing Plc, 2004.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Capítulos de libros sobre el tema "Multilingual information extraction"

Gamallo, Pablo y Marcos Garcia. "Multilingual Open Information Extraction". En Progress in Artificial Intelligence, 711–22. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-23485-4_72.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Esuli, Andrea y Fabrizio Sebastiani. "Evaluating Information Extraction". En Multilingual and Multimodal Information Access Evaluation, 100–111. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-15998-5_12.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Palmer, David D., Marc B. Reichman y Noah White. "Multimedia Information Extraction in a Live Multilingual News Monitoring System". En Multimedia Information Extraction, 145–57. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2012. http://dx.doi.org/10.1002/9781118219546.ch9.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Kabadjov, Mijail, Josef Steinberger y Ralf Steinberger. "Multilingual Statistical News Summarization". En Multi-source, Multilingual Information Extraction and Summarization, 229–52. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_11.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Thurmair, Gregor. "Multiword expressions in multilingual information extraction". En Multiword Units in Machine Translation and Translation Technology, 104–23. Amsterdam: John Benjamins Publishing Company, 2018. http://dx.doi.org/10.1075/cilt.341.05thu.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Dini, Luca. "Parallel Information Extraction System for Multilingual Information Access". En Advances in Intelligent Systems, 179–90. Dordrecht: Springer Netherlands, 1999. http://dx.doi.org/10.1007/978-94-011-4840-5_16.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Piskorski, Jakub y Roman Yangarber. "Information Extraction: Past, Present and Future". En Multi-source, Multilingual Information Extraction and Summarization, 23–49. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_2.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ribeiro, Ricardo y David Martins de Matos. "Improving Speech-to-Text Summarization by Using Additional Information Sources". En Multi-source, Multilingual Information Extraction and Summarization, 277–97. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_13.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ji, Heng, Benoit Favre, Wen-Pin Lin, Dan Gillick, Dilek Hakkani-Tur y Ralph Grishman. "Open-Domain Multi-Document Summarization via Information Extraction: Challenges and Prospects". En Multi-source, Multilingual Information Extraction and Summarization, 177–201. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_9.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Li, Fang, Huanye Sheng, Dongmo Zhang y Tianfang Yao. "An Internet Based Multilingual Investment Information Extraction System". En The Internet Challenge: Technology and Applications, 1–9. Dordrecht: Springer Netherlands, 2002. http://dx.doi.org/10.1007/978-94-010-0494-7_1.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Multilingual information extraction"

Sanjaya, Hafidz, Kusrini Kusrini, Kumara Ari Yuana y José Ramén Martínez Salio. "Multilingual Named Entity Recognition Model for Location and Time Extraction of Forest Fire". En 2024 4th International Conference of Science and Information Technology in Smart Administration (ICSINTESA), 611–15. IEEE, 2024. http://dx.doi.org/10.1109/icsintesa62455.2024.10747844.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Yuan, Yue y Huaping Zhang. "An Improved Topic Extraction Method Based on Word Frequency Information Entropy for Multilingual Topic Attentional Division". En 2024 9th International Conference on Intelligent Computing and Signal Processing (ICSP), 675–81. IEEE, 2024. http://dx.doi.org/10.1109/icsp62122.2024.10743506.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Wiedemann, Gregor, Seid Muhie Yimam y Chris Biemann. "A Multilingual Information Extraction Pipeline for Investigative Journalism". En Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/d18-2014.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Kotnis, Bhushan, Kiril Gashteovski, Daniel Rubio, Ammar Shaker, Vanesa Rodriguez-Tembras, Makoto Takamoto, Mathias Niepert y Carolin Lawrence. "MILIE: Modular & Iterative Multilingual Open Information Extraction". En Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-long.478.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Vijayan, Karthika y Oshin Anand. "Language-Agnostic Text Processing for Information Extraction". En 12th International Conference on Artificial Intelligence, Soft Computing and Applications. Academy and Industry Research Collaboration Center (AIRCC), 2022. http://dx.doi.org/10.5121/csit.2022.122310.

Texto completo

Resumen

Information extraction from multilingual text for conversational AI generally implements natural language understanding (NLU) using multiple language-specific models, which may not be available for low resource languages or code mixed scenarios. In this paper, we study the implementation of multilingual NLU by development of a language agnostic processing pipeline. We perform this study using the case of a conversational assistant, built using the RASA framework. The automatic assistants for answering text queries are built in different languages and code mixing of languages, while doing so, experimentation with different components in an NLU pipeline is conducted. Sparse and dense feature extraction accomplishes the language agnostic composite featurization of text in the pipeline. We perform experiments with intent classification and entity extraction as part of information extraction. The efficacy of the language agnostic NLU pipeline is showcased when (i) dedicated language models are not available for all languages of our interest, and (ii) in case of code mixing. Our experiments delivered accuracies in intent classification of 98.49%, 96.41% and 97.98% for same queries in English, Hindi and Malayalam languages, respectively, without any dedicated language models.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Aone, Chinatsu, Nicholas Charocopos y James Gorlinsky. "An intelligent multilingual information browsing and retrieval system using information extraction". En the fifth conference. Morristown, NJ, USA: Association for Computational Linguistics, 1997. http://dx.doi.org/10.3115/974557.974606.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Maynard, Diana y Hamish Cunningham. "Multilingual adaptations of ANNIE, a reusable information extraction tool". En the tenth conference. Morristown, NJ, USA: Association for Computational Linguistics, 2003. http://dx.doi.org/10.3115/1067737.1067789.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Kolluru, Keshav, Muqeeth Mohammed, Shubham Mittal, Soumen Chakrabarti y Mausam . "Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction". En Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-long.179.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Bretschneider, Claudia, Heiner Oberkampf, Sonja Zillner, Bernhard Bauer y Matthias Hammon. "Corpus-based Translation of Ontologies for Improved Multilingual Semantic Annotation". En Proceedings of the Third Workshop on Semantic Web and Information Extraction. Stroudsburg, PA, USA: Association for Computational Linguistics and Dublin City University, 2014. http://dx.doi.org/10.3115/v1/w14-6201.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Nguyen, Minh Van, Nghia Ngo, Bonan Min y Thien Nguyen. "FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction". En Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.naacl-demo.14.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!