Relevant bibliographies by topics / Multilingual information extraction

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers

Academic literature on the topic 'Multilingual information extraction'

Author: Grafiati

Published: 1 March 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multilingual information extraction.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multilingual information extraction"

Claro, Daniela Barreiro, Marlo Souza, Clarissa Castellã Xavier, and Leandro Oliveira. "Multilingual Open Information Extraction: Challenges and Opportunities." Information 10, no. 7 (July 2, 2019): 228. http://dx.doi.org/10.3390/info10070228.

Full text

Abstract:

The number of documents published on the Web in languages other than English grows every year. As a consequence, the need to extract useful information from different languages increases, highlighting the importance of research into Open Information Extraction (OIE) techniques. Different OIE methods have dealt with features from a unique language; however, few approaches tackle multilingual aspects. In those approaches, multilingualism is restricted to processing text in different languages, rather than exploring cross-linguistic resources, which results in low precision due to the use of general rules. Multilingual methods have been applied to numerous problems in Natural Language Processing, achieving satisfactory results and demonstrating that knowledge acquisition for a language can be transferred to other languages to improve the quality of the facts extracted. We argue that a multilingual approach can enhance OIE methods as it is ideal to evaluate and compare OIE systems, and therefore can be applied to the collected facts. In this work, we discuss how the transfer knowledge between languages can increase acquisition from multilingual approaches. We provide a roadmap of the Multilingual Open IE area concerning state of the art studies. Additionally, we evaluate the transfer of knowledge to improve the quality of the facts extracted in each language. Moreover, we discuss the importance of a parallel corpus to evaluate and compare multilingual systems.

APA, Harvard, Vancouver, ISO, and other styles

Khairova, Nina, Orken Mamyrbayev, Kuralay Mukhsina, Anastasiia Kolesnyk, and Saurabh Pratap. "Logical-linguistic model for multilingual Open Information Extraction." Cogent Engineering 7, no. 1 (January 1, 2020): 1714829. http://dx.doi.org/10.1080/23311916.2020.1714829.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hashemzahde, Bahare, and Majid Abdolrazzagh-Nezhad. "Improving keyword extraction in multilingual texts." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 6 (December 1, 2020): 5909. http://dx.doi.org/10.11591/ijece.v10i6.pp5909-5916.

Full text

Abstract:

The accuracy of keyword extraction is a leading factor in information retrieval systems and marketing. In the real world, text is produced in a variety of languages, and the ability to extract keywords based on information from different languages improves the accuracy of keyword extraction. In this paper, the available information of all languages is applied to improve a traditional keyword extraction algorithm from a multilingual text. The proposed keywork extraction procedure is an unsupervise algorithm and designed based on selecting a word as a keyword of a given text, if in addition to that language holds a high rank based on the keywords criteria in other languages, as well. To achieve to this aim, the average TF-IDF of the candidate words were calculated for the same and the other languages. Then the words with the higher averages TF-IDF were chosen as the extracted keywords. The obtained results indicat that the algorithms’ accuracis of the multilingual texts in term frequency-inverse document frequency (TF-IDF) algorithm, graph-based algorithm, and the improved proposed algorithm are 80%, 60.65%, and 91.3%, respectively.

APA, Harvard, Vancouver, ISO, and other styles

Vasilkovsky, Michael, Anton Alekseev, Valentin Malykh, Ilya Shenbin, Elena Tutubalina, Dmitriy Salikhov, Mikhail Stepnov, Andrey Chertok, and Sergey Nikolenko. "DetIE: Multilingual Open Information Extraction Inspired by Object Detection." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 11412–20. http://dx.doi.org/10.1609/aaai.v36i10.21393.

Full text

Abstract:

State of the art neural methods for open information extraction (OpenIE) usually extract triplets (or tuples) iteratively in an autoregressive or predicate-based manner in order not to produce duplicates. In this work, we propose a different approach to the problem that can be equally or more successful. Namely, we present a novel single-pass method for OpenIE inspired by object detection algorithms from computer vision. We use an order-agnostic loss based on bipartite matching that forces unique predictions and a Transformer-based encoder-only architecture for sequence labeling. The proposed approach is faster and shows superior or similar performance in comparison with state of the art models on standard benchmarks in terms of both quality metrics and inference time. Our model sets the new state of the art performance of 67.7% F1 on CaRB evaluated as OIE2016 while being 3.35x faster at inference than previous state of the art. We also evaluate the multilingual version of our model in the zero-shot setting for two languages and introduce a strategy for generating synthetic multilingual data to fine-tune the model for each specific language. In this setting, we show performance improvement of 15% on multilingual Re-OIE2016, reaching 75% F1 for both Portuguese and Spanish languages. Code and models are available at https://github.com/sberbank-ai/DetIE.

APA, Harvard, Vancouver, ISO, and other styles

Ghimire, Dadhi Ram, Sanjeev Panday, and Aman Shakya. "Information Extraction from a Large Knowledge Graph in the Nepali Language." National College of Computer Studies Research Journal 3, no. 1 (December 9, 2024): 33–49. https://doi.org/10.3126/nccsrj.v3i1.72336.

Full text

Abstract:

Information is abundant in the web. The knowledge graph is used for organizing information in a structured format that can be retrieved using specialized queries. There are many Knowledge graphs but they differ in their ontologies and taxonomies as well as property types that bind the relation between the entities, which creates problems while extracting the knowledge from them. There is an issue in multilingual support. While most of them claim to be multilingual they are more suitable for querying in the English language. Most of the existing knowledge graphs in existence are based on Wikipedia Info box. In this work, we have devised an information extraction pipeline for retrieving knowledge in Nepali Language from Wikidata using SPARQL endpoint. Queries based on Wikipedia info box has more accurate responses than the Queries based on the paragraph content of Wikipedia articles. The main reason behind that is that the information inside the paragraph is not linked properly in the Wikipedia info box.

APA, Harvard, Vancouver, ISO, and other styles

Azzam, Saliha, Kevin Humphreys, Robert Gaizauskas, and Yorick Wilks. "Using a language independent domain model for multilingual information extraction." Applied Artificial Intelligence 13, no. 7 (October 1999): 705–24. http://dx.doi.org/10.1080/088395199117252.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Seretan, Violeta, and Eric Wehrli. "Multilingual collocation extraction with a syntactic parser." Language Resources and Evaluation 43, no. 1 (October 1, 2008): 71–85. http://dx.doi.org/10.1007/s10579-008-9075-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Ruijuan. "Multilingual pretrained based multi-feature fusion model for English text classification." Computer Science and Information Systems, no. 00 (2025): 4. https://doi.org/10.2298/csis240630004z.

Full text

Abstract:

Deep learning methods have been widely applied to English text classification tasks in recent years, achieving strong performance. However, current methods face two significant challenges: (1) they struggle to effectively capture long-range contextual structure information within text sequences, and (2) they do not adequately integrate linguistic knowledge into representations for enhancing the performance of classifiers. To this end, a novel multilingual pre-training based multi-feature fusion method is proposed for English text classification (MFFMP-ETC). Specifically, MFFMP-ETC consists of the multilingual feature extraction, the multilevel structure learning, and the multi-view representation fusion. MFFMP-ETC utilizes the Multilingual BERT as deep semantic extractor to introduce language information into representation learning, which significantly endows text representations with robustness. Then, MFFMP-ETC integrates Bi-LSTM and TextCNN into multilingual pre-training architecture to capture global and local structure information of English texts, via modelling bidirectional contextual semantic dependencies and multi-granularity local semantic dependencies. Meanwhile, MFFMP-ETC devises the multi-view representation fusion within the invariant semantic learning of representations to aggregate consistent and complementary information among views. MFFMP-ETC synergistically integrates Multilingual BERT?s deep semantic features, Bi-LSTM?s bidirectional context processing, and TextCNN local feature extraction, offering a more comprehensive and effective solution for capturing long-distance dependencies and nuanced contextual information in text classification. Finally, results on three datasets show MFFMP-ETC conducts a new baseline in terms of accuracy, sensitivity, and precision, verifying progressiveness and effectiveness of MFFMP-ETC in the text classification.

APA, Harvard, Vancouver, ISO, and other styles

Danielsson, Pernilla. "Automatic extraction of meaningful units from corpora." International Journal of Corpus Linguistics 8, no. 1 (August 14, 2003): 109–27. http://dx.doi.org/10.1075/ijcl.8.1.06dan.

Full text

Abstract:

In this article, we will reconsider the notion of a word as the basic unit of analysis in language and propose that in an information and meaning carrying system the unit of analysis should be a unit of meaning (UM). Such a UM may consist of one or more words. A method will be promoted that attempts to automatically retrieve UMs from corpora. To illustrate the results that may be obtained by this method, the node word ‘stroke’ will be used in a small study. The results will be discussed, with implications considered for both monolingual and multilingual use. The monolingual study will benefit from using the British National Corpus, while the multilingual study introduces a parallel corpus consisting of Swedish novels and their translations into English.

APA, Harvard, Vancouver, ISO, and other styles

Aysa, Anwar, Mijit Ablimit, Hankiz Yilahun, and Askar Hamdulla. "Chinese-Uyghur Bilingual Lexicon Extraction Based on Weak Supervision." Information 13, no. 4 (March 31, 2022): 175. http://dx.doi.org/10.3390/info13040175.

Full text

Abstract:

Bilingual lexicon extraction is useful, especially for low-resource languages that can leverage from high-resource languages. The Uyghur language is a derivative language, and its language resources are scarce and noisy. Moreover, it is difficult to find a bilingual resource to utilize the linguistic knowledge of other large resource languages, such as Chinese or English. There is little related research on unsupervised extraction for the Chinese-Uyghur languages, and the existing methods mainly focus on term extraction methods based on translated parallel corpora. Accordingly, unsupervised knowledge extraction methods are effective, especially for the low-resource languages. This paper proposes a method to extract a Chinese-Uyghur bilingual dictionary by combining the inter-word relationship matrix mapped by the neural network cross-language word embedding vector. A seed dictionary is used as a weak supervision signal. A small Chinese-Uyghur parallel data resource is used to map the multilingual word vectors into a unified vector space. As the word-particles of these two languages are not well-coordinated, stems are used as the main linguistic particles. The strong inter-word semantic relationship of word vectors is used to associate Chinese-Uyghur semantic information. Two retrieval indicators, such as nearest neighbor retrieval and cross-domain similarity local scaling, are used to calculate similarity to extract bilingual dictionaries. The experimental results show that the accuracy of the Chinese-Uyghur bilingual dictionary extraction method proposed in this paper is improved to 65.06%. This method helps to improve Chinese-Uyghur machine translation, automatic knowledge extraction, and multilingual translations.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Multilingual information extraction"

Ramsey, Marshall C., Thian-Huat Ong, and Hsinchun Chen. "Multilingual Input System for the Web - an Open Multimedia Approach of Keyboard and Handwriting Recognition for Chinese and Japanese." IEEE, 1998. http://hdl.handle.net/10150/105120.

Full text

Abstract:

Artificial Intelligence Lab, Department of MIS, University of Arizona
The basic building block of a multilingual information retrieval system is the input system. Chinese and Japanese characters pose great challenges for the conventional 101 -key alphabet-based keyboard, because they are radical-based and number in the thousands. This paper reviews the development of various approaches and then presents a framework and working demonstrations of Chinese and Japanese input methods implemented in Java, which allow open deployment over the web to any platform, The demo includes both popular keyboard input methods and neural network handwriting recognition using a mouse or pen. This framework is able to accommodate future extension to other input mediums and languages of interest.

APA, Harvard, Vancouver, ISO, and other styles

Ramsey, Marshall C., Thian-Huat Ong, and Hsinchun Chen. "Multilingual input system for the Web - an open multimedia approach of keyboard and handwritten recognition for Chinese and Japanese." IEEE, 1998. http://hdl.handle.net/10150/105350.

Full text

Abstract:

Artificial Intelligence Lab, Department of MIS, University of Arizona
The basic building block of a multilingual information retrieval system is the input system. Chinese and Japanese characters pose great challenges for the conventional 101-key alphabet-based keyboard, because they are radical-based and number in the thousands. This paper reviews the development of various approaches and then presents a framework and working demonstrations of Chinese and Japanese input methods implemented in Java, which allow open deployment over the web to any platform, The demo includes both popular keyboard input methods and neural network handwriting recognition using a mouse or pen. This framework is able to accommodate future extension to other input mediums and languages of interest.

APA, Harvard, Vancouver, ISO, and other styles

De, Wilde Max. "From Information Extraction to Knowledge Discovery: Semantic Enrichment of Multilingual Content with Linked Open Data." Doctoral thesis, Universite Libre de Bruxelles, 2015. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/218774.

Full text

Abstract:

Discovering relevant knowledge out of unstructured text in not a trivial task. Search engines relying on full-text indexing of content reach their limits when confronted to poor quality, ambiguity, or multiple languages. Some of these shortcomings can be addressed by information extraction and related natural language processing techniques, but it still falls short of adequate knowledge representation. In this thesis, we defend a generic approach striving to be as language-independent, domain-independent, and content-independent as possible. To reach this goal, we offer to disambiguate terms with their corresponding identifiers in Linked Data knowledge bases, paving the way for full-scale semantic enrichment of textual content. The added value of our approach is illustrated with a comprehensive case study based on a trilingual historical archive, addressing constraints of data quality, multilingualism, and language evolution. A proof-of-concept implementation is also proposed in the form of a Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), demonstrating to a certain extent the general applicability of our methodology to any language, domain, and type of content.
Découvrir de nouveaux savoirs dans du texte non-structuré n'est pas une tâche aisée. Les moteurs de recherche basés sur l'indexation complète des contenus montrent leur limites quand ils se voient confrontés à des textes de mauvaise qualité, ambigus et/ou multilingues. L'extraction d'information et d'autres techniques issues du traitement automatique des langues permettent de répondre partiellement à cette problématique, mais sans pour autant atteindre l'idéal d'une représentation adéquate de la connaissance. Dans cette thèse, nous défendons une approche générique qui se veut la plus indépendante possible des langues, domaines et types de contenus traités. Pour ce faire, nous proposons de désambiguïser les termes à l'aide d'identifiants issus de bases de connaissances du Web des données, facilitant ainsi l'enrichissement sémantique des contenus. La valeur ajoutée de cette approche est illustrée par une étude de cas basée sur une archive historique trilingue, en mettant un accent particulier sur les contraintes de qualité, de multilinguisme et d'évolution dans le temps. Un prototype d'outil est également développé sous le nom de Multilingual Entity/Resource Combiner & Knowledge eXtractor (MERCKX), démontrant ainsi le caractère généralisable de notre approche, dans un certaine mesure, à n'importe quelle langue, domaine ou type de contenu.
Doctorat en Information et communication
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

Schleider, Thomas. "Knowledge Modeling and Multilingual Information Extraction for the Understanding of the Cultural Heritage of Silk." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS280.

Full text

Abstract:

La modélisation de tout type de connaissance humaine est un effort complexe qui doit prendre en compte toutes les spécificités de son domaine, y compris le vocabulaire de niche. Cette thèse se concentre sur un tel effort pour la connaissance de la production européenne d’objets en soie, qui peut être considérée comme obscure et donc en danger. Cependant, le fait que ces données du patrimoine culturel soient hétérogènes, réparties dans de nombreux musées à travers le monde, éparses et multilingues, pose des défis particuliers pour lesquels les graphes de connaissances sont devenus de plus en plus populaires ces dernières années. Notre objectif principal n’est pas seulement d’étudier les représentations des connaissances, mais aussi de voir comment un tel processus d’intégration peut être accompagné d’enrichissements, tels que la réconciliation des informations par le biais d’ontologies et de vocabulaires, ainsi que la prédiction de métadonnées pour combler les lacunes des données. Nous proposerons d’abord un flux de travail pour la gestion de l’intégration des données sur les artefacts de la soie, puis nous présenterons différentes approches de classification, en mettant l’accent sur les méthodes non supervisées et les méthodes de type "zero-shot". Enfin, nous étudions les moyens de rendre l’exploration de ces métadonnées et des images par la suite aussi facile que possible
Modeling any type of human knowledge is a complex effort and needs to consider all specificities of its domain including niche vocabulary. This thesis focuses on such an endeavour for the knowledge about the European silk object production, which can be considered obscure and therefore endangered. However, the fact that such Cultural Heritage data is heterogenous, spread across many museums worldwide, sparse and multilingual poses particular challenges for which knowledge graphs have become more and more popular in recent years. Our main goal is not only into investigating knowledge representations, but also in which ways such an integration process can be accompanied through enrichments, such as information reconciliation through ontologies and vocabularies, as well as metadata predictions to fill gaps in the data. We will first propose a workflow for the management for the integration of data about silk artifacts and afterwards present different classification approaches, with a special focus on unsupervised and zero-shot methods. Finally, we study ways of making exploration of such metadata and images afterwards as easy as possible

APA, Harvard, Vancouver, ISO, and other styles

Yeh, Hui-Syuan. "Prompt-based Relation Extraction for Pharmacovigilance." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG097.

Full text

Abstract:

L'extraction de connaissances à jour à partir de sources textuelles diverses est importante pour la santé publique. Alors que les sources professionnelles, notamment les revues scientifiques et les notes cliniques, fournissent les connaissances les plus fiables, les observations apportées dans les forums de patients et les médias sociaux permettent d'obtenir des informations complémentaires pour certains thèmes. Détecter les entités et leurs relations dans ces sources variées est particulièrement précieux. Nous nous concentrons sur l'extraction de relations dans le domaine médical. Nous commençons par souligner l'incohérence de la terminologie utilisée dans la communauté et clarifions les configurations distinctes employées pour la construction et l'évaluation d'un système d'extraction de relations. Pour obtenir une comparaison fiable, nous comparons les systèmes en utilisant la même configuration. Nous effectuons également une série d'évaluations stratifiées afin d'étudier plus en détail les propriétés des données qui affectent les performances des modèles. Nous montrons que la performance des modèles tend à diminuer avec la densité des relations, la diversité des relations et la distance entre les entités. Par la suite, ce travail explore un nouveau paradigme d'entraînement pour l'extraction de relations biomédicales : les méthodes à base de prompt avec des modèles de langue masqués. Dans ce contexte, les performances dépendent de la qualité de la conception des prompts. Cela nécessite des efforts manuels et une connaissance du domaine, notamment dans la conception des mots étiquettes qui relient les prédictions du modèle aux classes de relations. Pour surmonter ce problème, nous introduisons une technique de génération automatique de mots étiquettes qui s'appuie sur un analyseur en dépendance et les données d'entraînement. Cette approche minimise l'intervention manuelle et améliore l'efficacité des modèles avec moins de paramètres à affiner. Notre approche a des performances similaires aux autres méthodes de verbalisation sans nécessiter d'entraînement supplémentaire. Ensuite, ce travail traite de l'extraction d'informations à partir de textes écrits par des auteurs non spécialistes sur les effets indésirables des médicaments. À cette fin, dans le cadre d'un effort conjoint, nous avons constitué un corpus trilingue en allemand, français et japonais collecté à partir de forums de patients et de plates-formes de médias sociaux. Le défi et les applications potentielles du corpus sont discutés. Nous présentons des expériences initiales sur le corpus en mettant en avant trois points : l'efficacité d'un modèle multilingue dans un contexte translingue, une préparation d'exemples négatifs pour l'extraction de relations qui tient compte de la coréférence et de la distance entre les entités, et des méthodes pour traiter la distribution hautement déséquilibrée des relations. Enfin, nous intégrons des informations provenant d'une base de connaissances médicales dans une approche à base de prompt avec des modèles de langue autorégressifs pour l'extraction de relations biomédicales. Notre objectif est d'utiliser des connaissances factuelles externes pour enrichir le contexte des entités impliquées dans la relation à classifier. Nous constatons que les modèles généraux bénéficient particulièrement des connaissances externes. Notre dispositif expérimental révèle que différents marqueurs d'entités sont efficaces dans différents corpus. Nous montrons que les connaissances pertinentes sont utiles, mais que le format du prompt a un impact plus important sur les performances que les informations supplémentaires elles-mêmes
Extracting and maintaining up-to-date knowledge from diverse linguistic sources is imperative for the benefit of public health. While professional sources, including scientific journals and clinical notes, provide the most reliable knowledge, observations reported in patient forums and social media can bring complementary information for certain themes. Spotting entities and their relationships in these varied sources is particularly valuable. We focus on relation extraction in the medical domain. At the outset, we highlight the inconsistent terminology in the community and clarify the diverse setups used to build and evaluate relation extraction systems. To obtain reliable comparisons, we compare systems using the same setup. Additionally, we conduct a series of stratified evaluations to further investigate which data properties affect the models' performance. We show that model performance tends to decrease with relation density, relation diversity, and entity distance. Subsequently, this work explores a new training paradigm for biomedical relation extraction: prompt-based methods with masked language models. In this context, performance depends on the quality of prompt design. This requires manual efforts and domain knowledge, especially when designing the label words that link model predictions to relation classes. To overcome this overhead, we introduce an automated label word generation technique leveraging a dependency parser and training data. This approach minimizes manual intervention and enhances model performance with fewer parameters to be fine-tuned. Our approach performs on par with other verbalizer methods without additional training. Then, this work addresses information extraction from text written by laypeople about adverse drug reactions. To this end, as part of a joint effort, we have curated a tri-lingual corpus in German, French, and Japanese collected from patient forums and social media platforms. The challenge and the potential applications of the corpus are discussed. We present baseline experiments on the corpus that highlight three points: the effectiveness of a multilingual model in the cross-lingual setting, preparing negative samples for relation extraction by considering the co-reference and the distance between entities, and methods to address the highly imbalanced distribution of relations. Lastly, we integrate information from a medical knowledge base into the prompt-based approach with autoregressive language models for biomedical relation extraction. Our goal is to use external factual knowledge to enrich the context of the entities involved in the relation to be classified. We find that general models particularly benefit from external knowledge. Our experimental setup reveals that different entity markers are effective across different corpora. We show that the relevant knowledge helps, though the format of the prompt has a greater impact on performance than the additional information itself

APA, Harvard, Vancouver, ISO, and other styles

Akbik, Alan [Verfasser], Volker [Akademischer Betreuer] Markl, Hans [Gutachter] Uszkoreit, and Chris [Gutachter] Biemann. "Exploratory relation extraction in large multilingual data / Alan Akbik ; Gutachter: Hans Uszkoreit, Chris Biemann ; Betreuer: Volker Markl." Berlin : Technische Universität Berlin, 2016. http://d-nb.info/1156177308/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Guénec, Nadège. "Méthodologies pour la création de connaissances relatives au marché chinois dans une démarche d'Intelligence Économique : application dans le domaine des biotechnologies agricoles." Phd thesis, Université Paris-Est, 2009. http://tel.archives-ouvertes.fr/tel-00554743.

Full text

Abstract:

Le décloisonnement des économies et l'accélération mondiale des échanges commerciaux ont, en une décennie à peine, transformés l'environnement concurrentiel des entreprises. La zone d'activités s'est élargie en ouvrant des nouveaux marchés à potentiels très attrayants. Ainsi en est-il des BRIC (Brésil, Russie, Inde et Chine). De ces quatre pays, impressionnants par la superficie, la population et le potentiel économique qu'ils représentent, la Chine est le moins accessible et le plus hermétique à notre compréhension de par un système linguistique distinct des langues indo-européennes d'une part et du fait d'une culture et d'un système de pensée aux antipodes de ceux de l'occident d'autre part. Pourtant, pour une entreprise de taille internationale, qui souhaite étendre son influence ou simplement conserver sa position sur son propre marché, il est aujourd'hui absolument indispensable d'être présent sur le marché chinois. Comment une entreprise occidentale aborde-t-elle un marché qui de par son altérité, apparaît tout d'abord comme complexe et foncièrement énigmatique ? Six années d'observation en Chine, nous ont permis de constater les écueils dans l'accès à l'information concernant le marché chinois. Comme sur de nombreux marchés extérieurs, nos entreprises sont soumises à des déstabilisations parfois inimaginables. L'incapacité à " lire " la Chine et à comprendre les enjeux qui s'y déroulent malgré des effets soutenus, les erreurs tactiques qui découlent d'une mauvaise appréciation du marché ou d'une compréhension biaisée des jeux d'acteurs nous ont incités à réfléchir à une méthodologie de décryptage plus fine de l'environnement d'affaire qui puisse offrir aux entreprises françaises une approche de la Chine en tant que marché. Les méthodes de l'Intelligence Economique (IE) se sont alors imposées comme étant les plus propices pour plusieurs raisons : le but de l'IE est de trouver l'action juste à mener, la spécificité du contexte dans lequel évolue l'organisation est prise en compte et l'analyse se fait en temps réel. Si une approche culturelle est faite d'interactions humaines et de subtilités, une approche " marché " est dorénavant possible par le traitement automatique de l'information et de la modélisation qui s'en suit. En effet, dans toute démarche d'Intelligence Economique accompagnant l'implantation d'une activité à l'étranger, une grande part de l'information à portée stratégique vient de l'analyse du jeu des acteurs opérants dans le même secteur d'activité. Une telle automatisation de la création de connaissance constitue, en sus de l'approche humaine " sur le terrain ", une réelle valeur ajoutée pour la compréhension des interactions entre les acteurs car elle apporte un ensemble de connaissances qui, prenant en compte des entités plus larges, revêtent un caractère global, insaisissable par ailleurs. La Chine ayant fortement développé les technologies liées à l'économie de la connaissance, il est dorénavant possible d'explorer les sources d'information scientifiques et techniques chinoises. Nous sommes en outre convaincus que l'information chinoise prendra au fil du temps une importance de plus en plus cruciale. Il devient donc urgent pour les organisations de se doter de dispositifs permettant non seulement d'accéder à cette information mais également d'être en mesure de traiter les masses d'informations issues de ces sources. Notre travail consiste principalement à adapter les outils et méthodes issues de la recherche française à l'analyse de l'information chinoise en vue de la création de connaissances élaborées. L'outil MATHEO, apportera par des traitements bibliométriques une vision mondiale de la stratégie chinoise. TETRALOGIE, outil dédié au data-mining, sera adapté à l'environnement linguistique et structurel des bases de données scientifiques chinoises. En outre, nous participons au développement d'un outil d'information retreival (MEVA) qui intègre les données récentes des sciences cognitives et oeuvrons à son application dans la recherche de l'information chinoise, pertinente et adéquate. Cette thèse étant réalisée dans le cadre d'un contrat CIFRE avec le Groupe Limagrain, une application contextualisée de notre démarche sera mise en œuvre dans le domaine des biotechnologies agricoles et plus particulièrement autour des enjeux actuels de la recherche sur les techniques d'hybridation du blé. L'analyse de ce secteur de pointe, qui est à la fois une domaine de recherche fondamentale, expérimentale et appliquée donne actuellement lieu à des prises de brevets et à la mise sur le marché de produits commerciaux et représente donc une thématique très actuelle. La Chine est-elle réellement, comme nous le supposons, un nouveau territoire mondial de la recherche scientifique du 21e siècle ? Les méthodes de l'IE peuvent-elles s'adapter au marché chinois ? Après avoir fourni les éléments de réponses à ces questions dans es deux premières parties de notre étude, nous poserons en troisième partie, le contexte des biotechnologies agricoles et les enjeux mondiaux en terme de puissance économico-financière mais également géopolitique de la recherche sur l'hybridation du blé. Puis nous verrons en dernière partie comment mettre en œuvre une recherche d'information sur le marché chinois ainsi que l'intérêt majeur en terme de valeur ajoutée que représente l'analyse de l'information chinoise

APA, Harvard, Vancouver, ISO, and other styles

Charton, Éric. "Génération de phrases multilingues par apprentissage automatique de modèles de phrases." Thesis, Avignon, 2010. http://www.theses.fr/2010AVIG0175/document.

Full text

Abstract:

La Génération Automatique de Texte (GAT) est le champ de recherche de la linguistique informatique qui étudie la possibilité d’attribuer à une machine la faculté de produire du texte intelligible. Dans ce mémoire, nous présentons une proposition de système de GAT reposant exclusivement sur des méthodes statistiques. Son originalité est d’exploiter un corpus en tant que ressource de formation de phrases. Cette méthode offre plusieurs avantages : elle simplifie l’implémentation d’un système de GAT en plusieurs langues et améliore les capacités d’adaptations d’un système de génération à un domaine sémantique particulier. La production, d’après un corpus d’apprentissage, des modèles de phrases finement étiquetées requises par notre générateur de texte nous a conduit à mener des recherches approfondies dans le domaine de l’extraction d’information et de la classification. Nous décrivons le système d’étiquetage et de classification de contenus encyclopédique mis au point à cette fin. Dans les étapes finales du processus de génération, les modèles de phrases sont exploités par un module de génération de texte multilingue. Ce module exploite des algorithmes de recherche d’information pour extraire du modèle une phrase pré-existante, utilisable en tant que support sémantique et syntaxique de l’intention à communiquer. Plusieurs méthodes sont proposées pour générer une phrase, choisies en fonction de la complexité du contenu sémantique à exprimer. Nous présentons notamment parmi ces méthodes une proposition originale de génération de phrases complexes par agrégation de proto-phrases de type Sujet, Verbe, Objet. Nous envisageons dans nos conclusions que cette méthode particulière de génération puisse ouvrir des voies d’investigations prometteuses sur la nature du processus de formation de phrases
Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system. In this thesis report, we present an architecture of NLG system relying on statistical methods. The originality of our proposition is its ability to use a corpus as a learning resource for sentences production. This method offers several advantages : it simplifies the implementation and design of a multilingual NLG system, capable of sentence production of the same meaning in several languages. Our method also improves the adaptability of a NLG system to a particular semantic field. In our proposal, sentence generation is achieved trough the use of sentence models, obtained from a training corpus. Extracted sentences are abstracted by a labelling step obtained from various information extraction and text mining methods like named entity recognition, co-reference resolution, semantic labelling and part of speech tagging. The sentence generation process is achieved by a sentence realisation module. This module provide an adapted sentence model to fit a communicative intent, and then transform this model to generate a new sentence. Two methods are proposed to transform a sentence model into a generated sentence, according to the semantic content to express. In this document, we describe the complete labelling system applied to encyclopaedic content to obtain the sentence models. Then we present two models of sentence generation. The first generation model substitute the semantic content to an original sentence content. The second model is used to find numerous proto-sentences, structured as Subject, Verb, Object, able to fit by part a whole communicative intent, and then aggregate all the selected proto-sentences into a more complex one. Our experiments of sentence generation with various configurations of our system have shown that this new approach of NLG have an interesting potential

APA, Harvard, Vancouver, ISO, and other styles

Gerber, Daniel [Verfasser], Klaus-Peter [Akademischer Betreuer] Fähnrich, Klaus-Peter [Gutachter] Fähnrich, Ngomo Axel-Cyrille [Akademischer Betreuer] Ngonga, and Axel [Gutachter] Polleres. "Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications / Daniel Gerber ; Gutachter: Klaus-Peter Fähnrich, Axel Polleres ; Klaus-Peter Fähnrich, Axel-Cyrille Ngonga Ngomo." Leipzig : Universitätsbibliothek Leipzig, 2016. http://d-nb.info/1239739478/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Multilingual information extraction"

Poibeau, Thierry, Horacio Saggion, Jakub Piskorski, and Roman Yangarber, eds. Multi-source, Multilingual Information Extraction and Summarization. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-28569-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Geoff, Barnbrook, Danielsson Pernilla, and Mahlberg Michaela, eds. Meaningful texts: The extraction of semantic information from monolingual and multilingual corpora. London: Continuum, 2005.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Multisource Multilingual Information Extraction And Summarization. Springer, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Poibeau, Thierry, Horacio Saggion, Jakub Piskorski, and Roman Yangarber. Multi-Source, Multilingual Information Extraction and Summarization. Springer London, Limited, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Poibeau, Thierry, Horacio Saggion, Jakub Piskorski, and Roman Yangarber. Multi-source, Multilingual Information Extraction and Summarization. Springer, 2014.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Poibeau, Thierry, Horacio Saggion, and Jakub Piskorski. Multi-source, Multilingual Information Extraction and Summarization. Springer, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corporations. Univ of Birmingham, 2004.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corporations. Univ of Birmingham, 2005.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Danielsson, Pernilla. Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. Bloomsbury Publishing Plc, 2010.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Danielsson, Pernilla. Meaningful Texts: The Extraction of Semantic Information from Monolingual and Multilingual Corpora. Bloomsbury Publishing Plc, 2004.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multilingual information extraction"

Gamallo, Pablo, and Marcos Garcia. "Multilingual Open Information Extraction." In Progress in Artificial Intelligence, 711–22. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-23485-4_72.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Esuli, Andrea, and Fabrizio Sebastiani. "Evaluating Information Extraction." In Multilingual and Multimodal Information Access Evaluation, 100–111. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-15998-5_12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Palmer, David D., Marc B. Reichman, and Noah White. "Multimedia Information Extraction in a Live Multilingual News Monitoring System." In Multimedia Information Extraction, 145–57. Hoboken, NJ, USA: John Wiley & Sons, Inc., 2012. http://dx.doi.org/10.1002/9781118219546.ch9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kabadjov, Mijail, Josef Steinberger, and Ralf Steinberger. "Multilingual Statistical News Summarization." In Multi-source, Multilingual Information Extraction and Summarization, 229–52. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Thurmair, Gregor. "Multiword expressions in multilingual information extraction." In Multiword Units in Machine Translation and Translation Technology, 104–23. Amsterdam: John Benjamins Publishing Company, 2018. http://dx.doi.org/10.1075/cilt.341.05thu.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dini, Luca. "Parallel Information Extraction System for Multilingual Information Access." In Advances in Intelligent Systems, 179–90. Dordrecht: Springer Netherlands, 1999. http://dx.doi.org/10.1007/978-94-011-4840-5_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Piskorski, Jakub, and Roman Yangarber. "Information Extraction: Past, Present and Future." In Multi-source, Multilingual Information Extraction and Summarization, 23–49. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ribeiro, Ricardo, and David Martins de Matos. "Improving Speech-to-Text Summarization by Using Additional Information Sources." In Multi-source, Multilingual Information Extraction and Summarization, 277–97. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_13.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ji, Heng, Benoit Favre, Wen-Pin Lin, Dan Gillick, Dilek Hakkani-Tur, and Ralph Grishman. "Open-Domain Multi-Document Summarization via Information Extraction: Challenges and Prospects." In Multi-source, Multilingual Information Extraction and Summarization, 177–201. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-28569-1_9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Fang, Huanye Sheng, Dongmo Zhang, and Tianfang Yao. "An Internet Based Multilingual Investment Information Extraction System." In The Internet Challenge: Technology and Applications, 1–9. Dordrecht: Springer Netherlands, 2002. http://dx.doi.org/10.1007/978-94-010-0494-7_1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multilingual information extraction"

Sanjaya, Hafidz, Kusrini Kusrini, Kumara Ari Yuana, and José Ramén Martínez Salio. "Multilingual Named Entity Recognition Model for Location and Time Extraction of Forest Fire." In 2024 4th International Conference of Science and Information Technology in Smart Administration (ICSINTESA), 611–15. IEEE, 2024. http://dx.doi.org/10.1109/icsintesa62455.2024.10747844.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yuan, Yue, and Huaping Zhang. "An Improved Topic Extraction Method Based on Word Frequency Information Entropy for Multilingual Topic Attentional Division." In 2024 9th International Conference on Intelligent Computing and Signal Processing (ICSP), 675–81. IEEE, 2024. http://dx.doi.org/10.1109/icsp62122.2024.10743506.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wiedemann, Gregor, Seid Muhie Yimam, and Chris Biemann. "A Multilingual Information Extraction Pipeline for Investigative Journalism." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/d18-2014.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kotnis, Bhushan, Kiril Gashteovski, Daniel Rubio, Ammar Shaker, Vanesa Rodriguez-Tembras, Makoto Takamoto, Mathias Niepert, and Carolin Lawrence. "MILIE: Modular & Iterative Multilingual Open Information Extraction." In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-long.478.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Vijayan, Karthika, and Oshin Anand. "Language-Agnostic Text Processing for Information Extraction." In 12th International Conference on Artificial Intelligence, Soft Computing and Applications. Academy and Industry Research Collaboration Center (AIRCC), 2022. http://dx.doi.org/10.5121/csit.2022.122310.

Full text

Abstract:

Information extraction from multilingual text for conversational AI generally implements natural language understanding (NLU) using multiple language-specific models, which may not be available for low resource languages or code mixed scenarios. In this paper, we study the implementation of multilingual NLU by development of a language agnostic processing pipeline. We perform this study using the case of a conversational assistant, built using the RASA framework. The automatic assistants for answering text queries are built in different languages and code mixing of languages, while doing so, experimentation with different components in an NLU pipeline is conducted. Sparse and dense feature extraction accomplishes the language agnostic composite featurization of text in the pipeline. We perform experiments with intent classification and entity extraction as part of information extraction. The efficacy of the language agnostic NLU pipeline is showcased when (i) dedicated language models are not available for all languages of our interest, and (ii) in case of code mixing. Our experiments delivered accuracies in intent classification of 98.49%, 96.41% and 97.98% for same queries in English, Hindi and Malayalam languages, respectively, without any dedicated language models.

APA, Harvard, Vancouver, ISO, and other styles

Aone, Chinatsu, Nicholas Charocopos, and James Gorlinsky. "An intelligent multilingual information browsing and retrieval system using information extraction." In the fifth conference. Morristown, NJ, USA: Association for Computational Linguistics, 1997. http://dx.doi.org/10.3115/974557.974606.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Maynard, Diana, and Hamish Cunningham. "Multilingual adaptations of ANNIE, a reusable information extraction tool." In the tenth conference. Morristown, NJ, USA: Association for Computational Linguistics, 2003. http://dx.doi.org/10.3115/1067737.1067789.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kolluru, Keshav, Muqeeth Mohammed, Shubham Mittal, Soumen Chakrabarti, and Mausam . "Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction." In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.acl-long.179.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bretschneider, Claudia, Heiner Oberkampf, Sonja Zillner, Bernhard Bauer, and Matthias Hammon. "Corpus-based Translation of Ontologies for Improved Multilingual Semantic Annotation." In Proceedings of the Third Workshop on Semantic Web and Information Extraction. Stroudsburg, PA, USA: Association for Computational Linguistics and Dublin City University, 2014. http://dx.doi.org/10.3115/v1/w14-6201.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Nguyen, Minh Van, Nghia Ngo, Bonan Min, and Thien Nguyen. "FAMIE: A Fast Active Learning Framework for Multilingual Information Extraction." In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.naacl-demo.14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Multilingual information extraction'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Multilingual information extraction"

Dissertations / Theses on the topic "Multilingual information extraction"

Books on the topic "Multilingual information extraction"

Book chapters on the topic "Multilingual information extraction"

Conference papers on the topic "Multilingual information extraction"