Academic literature on the topic 'Cross-Lingual knowledge transfer'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Cross-Lingual knowledge transfer.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Cross-Lingual knowledge transfer"

1

Wang, Yabing, Fan Wang, Jianfeng Dong, and Hao Luo. "CL2CM: Improving Cross-Lingual Cross-Modal Retrieval via Cross-Lingual Knowledge Transfer." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 6 (March 24, 2024): 5651–59. http://dx.doi.org/10.1609/aaai.v38i6.28376.

Full text
Abstract:
Cross-lingual cross-modal retrieval has garnered increasing attention recently, which aims to achieve the alignment between vision and target language (V-T) without using any annotated V-T data pairs. Current methods employ machine translation (MT) to construct pseudo-parallel data pairs, which are then used to learn a multi-lingual and multi-modal embedding space that aligns visual and target-language representations. However, the large heterogeneous gap between vision and text, along with the noise present in target language translations, poses significant challenges in effectively aligning their representations. To address these challenges, we propose a general framework, Cross-Lingual to Cross-Modal (CL2CM), which improves the alignment between vision and target language using cross-lingual transfer. This approach allows us to fully leverage the merits of multi-lingual pre-trained models (e.g., mBERT) and the benefits of the same modality structure, i.e., smaller gap, to provide reliable and comprehensive semantic correspondence (knowledge) for the cross-modal network. We evaluate our proposed approach on two multilingual image-text datasets, Multi30K and MSCOCO, and one video-text dataset, VATEX. The results clearly demonstrate the effectiveness of our proposed method and its high potential for large-scale retrieval.
APA, Harvard, Vancouver, ISO, and other styles
2

Abhishek Singhal, Happa Khan, Aditya Sharma. "Empowering Multilingual AI: Cross-Lingual Transfer Learning." Tuijin Jishu/Journal of Propulsion Technology 43, no. 4 (November 26, 2023): 284–87. http://dx.doi.org/10.52783/tjjpt.v43.i4.2353.

Full text
Abstract:
Multilingual Natural Language Processing (NLP) and Cross-Lingual Transfer Learning have emerged as pivotal fields in the realm of language technology. This abstract explores the essential concepts and methodologies behind these areas, shedding light on their significance in a world characterized by linguistic diversity. Multilingual NLP enables machines to process global collaboration. Cross-lingual transfer learning, on the other hand, leverages knowledge from one language to enhance NLP tasks in another, facilitating efficient resource utilization and improved model performance. The abstract highlights the growing relevance of these approaches in a multilingual and interconnected world, underscoring their potential to reshape the future of natural language understanding and communication.
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Mozhi, Yoshinari Fujinuma, and Jordan Boyd-Graber. "Exploiting Cross-Lingual Subword Similarities in Low-Resource Document Classification." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 9547–54. http://dx.doi.org/10.1609/aaai.v34i05.6500.

Full text
Abstract:
Text classification must sometimes be applied in a low-resource language with no labeled training data. However, training data may be available in a related language. We investigate whether character-level knowledge transfer from a related language helps text classification. We present a cross-lingual document classification framework (caco) that exploits cross-lingual subword similarity by jointly training a character-based embedder and a word-based classifier. The embedder derives vector representations for input words from their written forms, and the classifier makes predictions based on the word vectors. We use a joint character representation for both the source language and the target language, which allows the embedder to generalize knowledge about source language words to target language words with similar forms. We propose a multi-task objective that can further improve the model if additional cross-lingual or monolingual resources are available. Experiments confirm that character-level knowledge transfer is more data-efficient than word-level transfer between related languages.
APA, Harvard, Vancouver, ISO, and other styles
4

Colhon, Mihaela. "Language engineering for syntactic knowledge transfer." Computer Science and Information Systems 9, no. 3 (2012): 1231–47. http://dx.doi.org/10.2298/csis120130032c.

Full text
Abstract:
In this paper we present a method for an English-Romanian treebank construction, together with the obtained evaluation results. The treebank is built upon a parallel English-Romanian corpus word-aligned and annotated at the morphological and syntactic level. The syntactic trees of the Romanian texts are generated by considering the syntactic phrases of the English parallel texts automatically resulted from syntactic parsing. The method reuses and adjusts existing tools and algorithms for cross-lingual transfer of syntactic constituents and syntactic trees alignment.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhan, Qingran, Xiang Xie, Chenguang Hu, Juan Zuluaga-Gomez, Jing Wang, and Haobo Cheng. "Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition." Electronics 10, no. 24 (December 20, 2021): 3172. http://dx.doi.org/10.3390/electronics10243172.

Full text
Abstract:
Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.
APA, Harvard, Vancouver, ISO, and other styles
6

Xu, Zenan, Linjun Shou, Jian Pei, Ming Gong, Qinliang Su, Xiaojun Quan, and Daxin Jiang. "A Graph Fusion Approach for Cross-Lingual Machine Reading Comprehension." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 13861–68. http://dx.doi.org/10.1609/aaai.v37i11.26623.

Full text
Abstract:
Although great progress has been made for Machine Reading Comprehension (MRC) in English, scaling out to a large number of languages remains a huge challenge due to the lack of large amounts of annotated training data in non-English languages. To address this challenge, some recent efforts of cross-lingual MRC employ machine translation to transfer knowledge from English to other languages, through either explicit alignment or implicit attention. For effective knowledge transition, it is beneficial to leverage both semantic and syntactic information. However, the existing methods fail to explicitly incorporate syntax information in model learning. Consequently, the models are not robust to errors in alignment and noises in attention. In this work, we propose a novel approach, which jointly models the cross-lingual alignment information and the mono-lingual syntax information using a graph. We develop a series of algorithms, including graph construction, learning, and pre-training. The experiments on two benchmark datasets for cross-lingual MRC show that our approach outperforms all strong baselines, which verifies the effectiveness of syntax information for cross-lingual MRC.
APA, Harvard, Vancouver, ISO, and other styles
7

Rijhwani, Shruti, Jiateng Xie, Graham Neubig, and Jaime Carbonell. "Zero-Shot Neural Transfer for Cross-Lingual Entity Linking." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6924–31. http://dx.doi.org/10.1609/aaai.v33i01.33016924.

Full text
Abstract:
Cross-lingual entity linking maps an entity mention in a source language to its corresponding entry in a structured knowledge base that is in a different (target) language. While previous work relies heavily on bilingual lexical resources to bridge the gap between the source and the target languages, these resources are scarce or unavailable for many low-resource languages. To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language. Specifically, we propose pivot-basedentity linking, which leverages information from a highresource “pivot” language to train character-level neural entity linking models that are transferred to the source lowresource language in a zero-shot manner. With experiments on 9 low-resource languages and transfer through a total of54 languages, we show that our proposed pivot-based framework improves entity linking accuracy 17% (absolute) on average over the baseline systems, for the zero-shot scenario.1 Further, we also investigate the use of language-universal phonological representations which improves average accuracy (absolute) by 36% when transferring between languages that use different scripts.
APA, Harvard, Vancouver, ISO, and other styles
8

Bari, M. Saiful, Shafiq Joty, and Prathyusha Jwalapuram. "Zero-Resource Cross-Lingual Named Entity Recognition." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7415–23. http://dx.doi.org/10.1609/aaai.v34i05.6237.

Full text
Abstract:
Recently, neural methods have achieved state-of-the-art (SOTA) results in Named Entity Recognition (NER) tasks for many languages without the need for manually crafted features. However, these models still require manually annotated training data, which is not available for many languages. In this paper, we propose an unsupervised cross-lingual NER model that can transfer NER knowledge from one language to another in a completely unsupervised way without relying on any bilingual dictionary or parallel data. Our model achieves this through word-level adversarial learning and augmented fine-tuning with parameter sharing and feature augmentation. Experiments on five different languages demonstrate the effectiveness of our approach, outperforming existing models by a good margin and setting a new SOTA for each language pair.
APA, Harvard, Vancouver, ISO, and other styles
9

Qi, Kunxun, and Jianfeng Du. "Translation-Based Matching Adversarial Network for Cross-Lingual Natural Language Inference." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8632–39. http://dx.doi.org/10.1609/aaai.v34i05.6387.

Full text
Abstract:
Cross-lingual natural language inference is a fundamental task in cross-lingual natural language understanding, widely addressed by neural models recently. Existing neural model based methods either align sentence embeddings between source and target languages, heavily relying on annotated parallel corpora, or exploit pre-trained cross-lingual language models that are fine-tuned on a single language and hard to transfer knowledge to another language. To resolve these limitations in existing methods, this paper proposes an adversarial training framework to enhance both pre-trained models and classical neural models for cross-lingual natural language inference. It trains on the union of data in the source language and data in the target language, learning language-invariant features to improve the inference performance. Experimental results on the XNLI benchmark demonstrate that three popular neural models enhanced by the proposed framework significantly outperform the original models.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Weizhao, and Hongwu Yang. "Meta-Learning for Mandarin-Tibetan Cross-Lingual Speech Synthesis." Applied Sciences 12, no. 23 (November 28, 2022): 12185. http://dx.doi.org/10.3390/app122312185.

Full text
Abstract:
The paper proposes a meta-learning-based Mandarin-Tibetan cross-lingual text-to-speech (TTS) to realize both Mandarin and Tibetan speech synthesis under a unique framework. First, we build two kinds of Tacotron2-based Mandarin-Tibetan cross-lingual baseline TTS. One is a shared encoder Mandarin-Tibetan cross-lingual TTS, and another is a separate encoder Mandarin-Tibetan cross-lingual TTS. Both baseline TTS use the speaker classifier with a gradient reversal layer to disentangle speaker-specific information from the text encoder. At the same time, we design a prosody generator to extract prosodic information from sentences to explore syntactic and semantic information adequately. To further improve the synthesized speech quality of the Tacotron2-based Mandarin-Tibetan cross-lingual TTS, we propose a meta-learning-based Mandarin-Tibetan cross-lingual TTS. Based on the separate encoder Mandarin-Tibetan cross-lingual TTS, we use an additional dynamic network to predict the parameters of the language-dependent text encoder that could realize better cross-lingual knowledge sharing in the sequence-to-sequence TTS. Lastly, we synthesize Mandarin or Tibetan speech through the unique acoustic model. The baseline experimental results show that the separate encoder Mandarin-Tibetan cross-lingual TTS could handle the input of different languages better than the shared encoder Mandarin-Tibetan cross-lingual TTS. The experimental results further show that the proposed meta-learning-based Mandarin-Tibetan cross-lingual speech synthesis method could effectively improve the voice quality of synthesized speech in terms of naturalness and speaker similarity.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Cross-Lingual knowledge transfer"

1

Aufrant, Lauriane. "Training parsers for low-resourced languages : improving cross-lingual transfer with monolingual knowledge." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS089/document.

Full text
Abstract:
Le récent essor des algorithmes d'apprentissage automatique a rendu les méthodes de Traitement Automatique des Langues d'autant plus sensibles à leur facteur le plus limitant : la qualité des systèmes repose entièrement sur la disponibilité de grandes quantités de données, ce qui n'est pourtant le cas que d'une minorité parmi les 7.000 langues existant au monde. La stratégie dite du transfert cross-lingue permet de contourner cette limitation : une langue peu dotée en ressources (la cible) peut être traitée en exploitant les ressources disponibles dans une autre langue (la source). Les progrès accomplis sur ce plan se limitent néanmoins à des scénarios idéalisés, avec des ressources cross-lingues prédéfinies et de bonne qualité, de sorte que le transfert reste inapplicable aux cas réels de langues peu dotées, qui n'ont pas ces garanties. Cette thèse vise donc à tirer parti d'une multitude de sources et ressources cross-lingues, en opérant une combinaison sélective : il s'agit d'évaluer, pour chaque aspect du traitement cible, la pertinence de chaque ressource. L'étude est menée en utilisant l'analyse en dépendance par transition comme cadre applicatif. Le cœur de ce travail est l'élaboration d'un nouveau méta-algorithme de transfert, dont l'architecture en cascade permet la combinaison fine des diverses ressources, en ciblant leur exploitation à l'échelle du mot. L'approche cross-lingue pure n'étant en l'état pas compétitive avec la simple annotation de quelques phrases cibles, c'est avant tout la complémentarité de ces méthodes que souligne l'analyse empirique. Une série de nouvelles métriques permet une caractérisation fine des similarités cross-lingues et des spécificités syntaxiques de chaque langue, de même que de la valeur ajoutée de l'information cross-lingue par rapport au cadre monolingue. L'exploitation d'informations typologiques s'avère également particulièrement fructueuse. Ces contributions reposent largement sur des innovations techniques en analyse syntaxique, concrétisées par la publication en open source du logiciel PanParser, qui exploite et généralise la méthode dite des oracles dynamiques. Cette thèse contribue sur le plan monolingue à plusieurs autres égards, comme le concept de cascades monolingues, pouvant traiter par exemple d'abord toutes les dépendances faciles, puis seulement les difficiles
As a result of the recent blossoming of Machine Learning techniques, the Natural Language Processing field faces an increasingly thorny bottleneck: the most efficient algorithms entirely rely on the availability of large training data. These technological advances remain consequently unavailable for the 7,000 languages in the world, out of which most are low-resourced. One way to bypass this limitation is the approach of cross-lingual transfer, whereby resources available in another (source) language are leveraged to help building accurate systems in the desired (target) language. However, despite promising results in research settings, the standard transfer techniques lack the flexibility regarding cross-lingual resources needed to be fully usable in real-world scenarios: exploiting very sparse resources, or assorted arrays of resources. This limitation strongly diminishes the applicability of that approach. This thesis consequently proposes to combine multiple sources and resources for transfer, with an emphasis on selectivity: can we estimate which resource of which language is useful for which input? This strategy is put into practice in the frame of transition-based dependency parsing. To this end, a new transfer framework is designed, with a cascading architecture: it enables the desired combination, while ensuring better targeted exploitation of each resource, down to the level of the word. Empirical evaluation dampens indeed the enthusiasm for the purely cross-lingual approach -- it remains in general preferable to annotate just a few target sentences -- but also highlights its complementarity with other approaches. Several metrics are developed to characterize precisely cross-lingual similarities, syntactic idiosyncrasies, and the added value of cross-lingual information compared to monolingual training. The substantial benefits of typological knowledge are also explored. The whole study relies on a series of technical improvements regarding the parsing framework: this work includes the release of a new open source software, PanParser, which revisits the so-called dynamic oracles to extend their use cases. Several purely monolingual contributions complete this work, including an exploration of monolingual cascading, which offers promising perspectives with easy-then-hard strategies
APA, Harvard, Vancouver, ISO, and other styles
2

Raithel, Lisa. "Cross-lingual Information Extraction for the Assessment and Prevention of Adverse Drug Reactions." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG011.

Full text
Abstract:
Les travaux décrits dans cette thèse portent sur la détection et l'extraction trans- et multilingue des effets indésirables des médicaments dans des textes biomédicaux rédigés par des non-spécialistes. Dans un premier temps, je décris la création d'un nouveau corpus trilingue (allemand, français, japonais), centré sur l'allemand et le français, ainsi que le développement de directives, applicables à toutes les langues, pour l'annotation de contenus textuels produits par des utilisateurs de médias sociaux. Enfin, je décris le processus d'annotation et fournis un aperçu du jeu de données obtenu. Dans un second temps, j'aborde la question de la confidentialité en matière d'utilisation de données de santé à caractère personnel. Enfin, je présente un prototype d'étude sur la façon dont les utilisateurs réagissent lorsqu'ils sont directement interrogés sur leurs expériences en matière d'effets indésirables liés à la prise de médicaments. L'étude révèle que la plupart des utilisateurs ne voient pas d'inconvénient à décrire leurs expériences quand demandé, mais que la collecte de données pourrait souffrir de la présence d'un trop grand nombre de questions. Dans un troisième temps, j'analyse les résultats d'une potentielle seconde méthode de collecte de données sur les médias sociaux, à savoir la génération automatique de pseudo-tweets basés sur des messages Twitter réels. Dans cette analyse, je me concentre sur les défis que cette approche induit. Je conclus que de nombreuses erreurs de traduction subsistent, à la fois au niveau du sens du texte et des annotations. Je résume les leçons apprises et je présente des mesures potentielles pour améliorer les résultats. Dans un quatrième temps, je présente des résultats expérimentaux de classification translingue de documents, en anglais et en allemand, en ce qui concerne les effets indésirables des médicaments. Pour ce faire, j'ajuste les modèles de classification sur différentes configurations de jeux de données, d'abord sur des documents anglais, puis sur des documents allemands. Je constate que l'incorporation de données d'entraînement anglaises aide à la classification de documents pertinents en allemand, mais qu'elle n'est pas suffisante pour atténuer efficacement le déséquilibre naturel des classes des documents. Néanmoins, les modèles développés semblent prometteurs et pourraient être particulièrement utiles pour collecter davantage de textes, afin d'étendre le corpus actuel et d'améliorer la détection de documents pertinents pour d'autres langues. Dans un cinquième temps, je décris ma participation à la campagne d'évaluation n2c2 2022 de détection des médicaments qui est ensuite étendue de l'anglais à l'allemand, au français et à l'espagnol, utilisant des ensembles de données de différents sous-domaines. Je montre que le transfert trans- et multilingue fonctionne bien, mais qu'il dépend aussi fortement des types d'annotation et des définitions. Ensuite, je réutilise les modèles mentionnés précédemment pour mettre en évidence quelques résultats préliminaires sur le corpus présenté. J'observe que la détection des médicaments donne des résultats prometteurs, surtout si l'on considère que les modèles ont été ajustés sur des données d'un autre sous-domaine et appliqués sans réentraînement aux nouvelles données. En ce qui concerne la détection d'autres expressions médicales, je constate que la performance des modèles dépend fortement du type d'entité et je propose des moyens de gérer ce problème. Enfin, les travaux présentés sont résumés, et des perspectives sont discutées
The work described in this thesis deals with the cross- and multi-lingual detection and extraction of adverse drug reactions in biomedical texts written by laypeople. This includes the design and creation of a multi-lingual corpus, exploring ways to collect data without harming users' privacy and investigating whether cross-lingual data can mitigate class imbalance in document classification. It further addresses the question of whether zero- and cross-lingual learning can be successful in medical entity detection across languages. I describe the creation of a new tri-lingual corpus (German, French, Japanese) focusing on German and French, including the development of annotation guidelines applicable to any language and oriented towards user-generated texts. I further describe the annotation process and give an overview of the resulting dataset. The data is provided with annotations on four levels: document-level, for describing if a text contains ADRs or not; entity level for capturing relevant expressions; attribute level to further specify these expressions; The last level annotates relations to extract information on how the aforementioned entities interact. I then discuss the topic of user privacy in data about health-related issues and the question of how to collect such data for research purposes without harming the person's privacy. I provide a prototype study of how users react when they are directly asked about their experiences with ADRs. The study reveals that most people do not mind describing their experiences if asked, but that data collection might suffer from too many questions in the questionnaire. Next, I analyze the results of a potential second way of collecting social media data: the synthetic generation of pseudo-tweets based on real Twitter messages. In the analysis, I focus on the challenges this approach entails and find, despite some preliminary cleaning, that there are still problems to be found in the translations, both with respect to the meaning of the text and the annotated labels. I, therefore, give anecdotal examples of what can go wrong during automatic translation, summarize the lessons learned, and present potential steps for improvements. Subsequently, I present experimental results for cross-lingual document classification with respect to ADRs in English and German. For this, I fine-tuned classification models on different dataset configurations first on English and then on German documents, complicated by the strong label imbalance of either language's dataset. I find that incorporating English training data helps in the classification of relevant documents in German, but that it is not enough to mitigate the natural imbalance of document labels efficiently. Nevertheless, the developed models seem promising and might be particularly useful for collecting more texts describing experiences about side effects to extend the current corpus and improve the detection of relevant documents for other languages. Next, I describe my participation in the n2c2 2022 shared task of medication detection which is then extended from English to German, French and Spanish using datasets from different sub-domains based on different annotation guidelines. I show that the multi- and cross-lingual transfer works well but also strongly depends on the annotation types and definitions. After that, I re-use the discussed models to show some preliminary results on the presented corpus, first only on medication detection and then across all the annotated entity types. I find that medication detection shows promising results, especially considering that the models were fine-tuned on data from another sub-domain and applied in a zero-shot fashion to the new data. Regarding the detection of other medical expressions, I find that the performance of the models strongly depends on the entity type and propose ways to handle this. Lastly, the presented work is summarized and future steps are discussed
Die in dieser Dissertation beschriebene Arbeit befasst sich mit der mehrsprachigen Erkennung und Extraktion von unerwünschten Arzneimittelwirkungen in biomedizinischen Texten, die von Laien verfasst wurden. Ich beschreibe die Erstellung eines neuen dreisprachigen Korpus (Deutsch, Französisch, Japanisch) mit Schwerpunkt auf Deutsch und Französisch, einschließlich der Entwicklung von Annotationsrichtlinien, die für alle Sprachen gelten und sich an nutzergenerierten Texten orientieren. Weiterhin dokumentiere ich den Annotationsprozess und gebe einen Überblick über den resultierenden Datensatz. Anschließend gehe ich auf den Schutz der Privatsphäre der Nutzer in Bezug auf Daten über Gesundheitsprobleme ein. Ich präsentiere einen Prototyp zu einer Studie darüber, wie Nutzer reagieren, wenn sie direkt nach ihren Erfahrungen mit Nebenwirkungen befragt werden. Die Studie zeigt, dass die meisten Menschen nichts dagegen haben, ihre Erfahrungen zu schildern, wenn sie um Erlaubnis gefragt werden. Allerdings kann die Datenerhebung darunter leiden, dass der Fragebogen zu viele Fragen enthält. Als nächstes analysiere ich die Ergebnisse einer zweiten potenziellen Methode zur Datenerhebung in sozialen Medien, der synthetischen Generierung von Pseudo-Tweets, die auf echten Twitter-Nachrichten basieren. In der Analyse konzentriere ich mich auf die Herausforderungen, die dieser Ansatz mit sich bringt, und zeige, dass trotz einer vorläufigen Bereinigung noch Probleme in den Übersetzungen zu finden sind, sowohl was die Bedeutung des Textes als auch die annotierten Tags betrifft. Ich gebe daher anekdotische Beispiele dafür, was bei einer maschinellen Übersetzung schiefgehen kann, fasse die gewonnenen Erkenntnisse zusammen und stelle potenzielle Verbesserungsmaßnahmen vor. Weiterhin präsentiere ich experimentelle Ergebnisse für die Klassifizierung mehrsprachiger Dokumente bezüglich medizinischer Nebenwirkungen im Englischen und Deutschen. Dazu wurden Klassifikationsmodelle an verschiedenen Datensatzkonfigurationen verfeinert (fine-tuning), zunächst an englischen und dann an deutschen Dokumenten. Dieser Ansatz wurde durch das starke Ungleichgewicht der Labels in den beiden Datensätzen verkompliziert. Die Ergebnisse zeigen, dass die Einarbeitung englischer Trainingsdaten bei der Klassifizierung relevanter deutscher Dokumente hilft, aber nicht ausreicht, um das natürliche Ungleichgewicht der Dokumentenklassen wirksam abzuschwächen. Dennoch scheinen die entwickelten Modelle vielversprechend zu sein und könnten besonders nützlich sein, um weitere Texte zu sammeln. Dieser wiederum können das aktuelle Korpus erweitern und damit die Erkennung relevanter Dokumente für andere Sprachen verbessern. Nachfolgend beschreibe ich die Teilnahme am n2c2 2022 Shared Task zur Erkennung von Medikamenten. Die Ansätze des Shared Task werden anschließend vom Englischen auf deutsche, französische und spanische Korpora ausgeweitet, indem Datensätze aus verschiedenen Teilbereichen verwendet werden, die auf unterschiedlichen Annotationsrichtlinien basieren. Ich zeige, dass die mehrsprachige Übertragung gut funktioniert, aber auch stark von den Annotationstypen und Definitionen abhängt. Im Anschluss verwende ich die besprochenen Modelle erneut, um einige vorläufige Ergebnisse für das vorgestellte Korpus zu zeigen, zunächst nur für die Erkennung von Medikamenten und dann für alle Arten von annotierten Entitäten. Die experimentellen Ergebnisse zeigen, dass die Medikamentenerkennung vielversprechende ist, insbesondere wenn man bedenkt, dass die Modelle an Daten aus einem anderen Teilbereich verfeinert und mit einem zeroshot Ansatz auf die neuen Daten angewendet wurden. In Bezug auf die Erkennung anderer medizinischer Ausdrücke stellt sich heraus,dass die Leistung der Modelle stark von der Art der Entität abhängt. Ich schlage deshalb Möglichkeiten vor, wie man dieses Problem in Zukunft angehen könnte
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Cross-Lingual knowledge transfer"

1

Gui, Lin, Qin Lu, Ruifeng Xu, Qikang Wei, and Yuhui Cao. "Improving Transfer Learning in Cross Lingual Opinion Analysis Through Negative Transfer Detection." In Knowledge Science, Engineering and Management, 394–406. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-25159-2_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Tian, Lin, Xiuzhen Zhang, and Jey Han Lau. "Rumour Detection via Zero-Shot Cross-Lingual Transfer Learning." In Machine Learning and Knowledge Discovery in Databases. Research Track, 603–18. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86486-6_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Han, Soyeon Caren, Yingru Lin, Siqu Long, and Josiah Poon. "Low Resource Named Entity Recognition Using Contextual Word Representation and Neural Cross-Lingual Knowledge Transfer." In Neural Information Processing, 299–311. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-36708-4_25.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Daou, Ousmane, Satya Ranjan Dash, and Shantipriya Parida. "Cross-Lingual Transfer Learning for Bambara Leveraging Resources From Other Languages." In Advances in Computational Intelligence and Robotics, 183–97. IGI Global, 2024. http://dx.doi.org/10.4018/979-8-3693-0728-1.ch009.

Full text
Abstract:
Bambara, a language spoken primarily in West Africa, faces resource limitations that hinder the development of natural language processing (NLP) applications. This chapter presents a comprehensive cross-lingual transfer learning (CTL) approach to harness knowledge from other languages and substantially improve the performance of Bambara NLP tasks. The authors meticulously outline the methodology, including the creation of a Bambara corpus, training a CTL classifier, evaluating its performance across different languages, conducting a rigorous comparative analysis against baseline methods, and providing insights into future research directions. The results indicate that CTL is a promising and feasible approach to elevate the effectiveness of NLP tasks in Bambara.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Cross-Lingual knowledge transfer"

1

Swietojanski, Pawel, Arnab Ghoshal, and Steve Renals. "Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR." In 2012 IEEE Spoken Language Technology Workshop (SLT 2012). IEEE, 2012. http://dx.doi.org/10.1109/slt.2012.6424230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Lu, Di, Xiaoman Pan, Nima Pourdamghani, Shih-Fu Chang, Heng Ji, and Kevin Knight. "A Multi-media Approach to Cross-lingual Entity Knowledge Transfer." In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2016. http://dx.doi.org/10.18653/v1/p16-1006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Singh, Sumit, and Uma Tiwary. "Silp_nlp at SemEval-2023 Task 2: Cross-lingual Knowledge Transfer for Mono-lingual Learning." In Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023). Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.semeval-1.164.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Feng, Xiaocheng, Xiachong Feng, Bing Qin, Zhangyin Feng, and Ting Liu. "Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/566.

Full text
Abstract:
Neural networks have been widely used for high resource language (e.g. English) named entity recognition (NER) and have shown state-of-the-art results.However, for low resource languages, such as Dutch, Spanish, due to the limitation of resources and lack of annotated data, taggers tend to have lower performances.To narrow this gap, we propose three novel strategies to enrich the semantic representations of low resource languages: we first develop neural networks to improve low resource word representations by knowledge transfer from high resource language using bilingual lexicons. Further, a lexicon extension strategy is designed to address out-of lexicon problem by automatically learning semantic projections.Thirdly, we regard word-level entity type distribution features as an external language-independent knowledge and incorporate them into our neural architecture. Experiments on two low resource languages (including Dutch and Spanish) demonstrate the effectiveness of these additional semantic representations (average 4.8\% improvement). Moreover, on Chinese OntoNotes 4.0 dataset, our approach achieved an F-score of 83.07\% with 2.91\% absolute gain compared to the state-of-the-art results.
APA, Harvard, Vancouver, ISO, and other styles
5

Cao, Yuwei, William Groves, Tanay Kumar Saha, Joel Tetreault, Alejandro Jaimes, Hao Peng, and Philip Yu. "XLTime: A Cross-Lingual Knowledge Transfer Framework for Temporal Expression Extraction." In Findings of the Association for Computational Linguistics: NAACL 2022. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.findings-naacl.148.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Limkonchotiwat, Peerat, Wuttikorn Ponwitayarat, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, and Sarana Nutanong. "CL-ReLKT: Cross-lingual Language Knowledge Transfer for Multilingual Retrieval Question Answering." In Findings of the Association for Computational Linguistics: NAACL 2022. Stroudsburg, PA, USA: Association for Computational Linguistics, 2022. http://dx.doi.org/10.18653/v1/2022.findings-naacl.165.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Jin, Huiming, and Katharina Kann. "Exploring Cross-Lingual Transfer of Morphological Knowledge In Sequence-to-Sequence Models." In Proceedings of the First Workshop on Subword and Character Level Models in NLP. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. http://dx.doi.org/10.18653/v1/w17-4110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Zhou, Yucheng, Xiubo Geng, Tao Shen, Wenqiang Zhang, and Daxin Jiang. "Improving Zero-Shot Cross-lingual Transfer for Multilingual Question Answering over Knowledge Graph." In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.naacl-main.465.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Fukuda, Takashi, and Samuel Thomas. "Knowledge Distillation Based Training of Universal ASR Source Models for Cross-Lingual Transfer." In Interspeech 2021. ISCA: ISCA, 2021. http://dx.doi.org/10.21437/interspeech.2021-796.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Guzman Nateras, Luis, Franck Dernoncourt, and Thien Nguyen. "Hybrid Knowledge Transfer for Improved Cross-Lingual Event Detection via Hierarchical Sample Selection." In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.acl-long.296.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography