Se connecter

Bibliographies thématiques / Cross-Lingual Mapping / Articles de revues

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : Cross-Lingual Mapping.

Articles de revues sur le sujet « Cross-Lingual Mapping »

Auteur : Grafiati

Publié le 9 mars 2023

Mis à jour le 10 mars 2023

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 30 meilleurs articles de revues pour votre recherche sur le sujet « Cross-Lingual Mapping ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les articles de revues sur diverses disciplines et organisez correctement votre bibliographie.

1

Fu, Zuohui, Yikun Xian, Shijie Geng, Yingqiang Ge, Yuting Wang, Xin Dong, Guang Wang et Gerard De Melo. « ABSent : Cross-Lingual Sentence Representation Mapping with Bidirectional GANs ». Proceedings of the AAAI Conference on Artificial Intelligence 34, n^o 05 (3 avril 2020) : 7756–63. http://dx.doi.org/10.1609/aaai.v34i05.6279.

Texte intégral

Résumé :

A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data. The experiments show that our method outperforms several technically more powerful approaches, especially under challenging low-resource circumstances. The source code is available from https://github.com/zuohuif/ABSent along with relevant datasets.

Styles APA, Harvard, Vancouver, ISO, etc.

2

Gao, Jiahui, Yi Zhou, Philip L. H. Yu, Shafiq Joty et Jiuxiang Gu. « UNISON : Unpaired Cross-Lingual Image Captioning ». Proceedings of the AAAI Conference on Artificial Intelligence 36, n^o 10 (28 juin 2022) : 10654–62. http://dx.doi.org/10.1609/aaai.v36i10.21310.

Texte intégral

Résumé :

Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and deprives a large part of the world population of its benefit. In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language. Specifically, our method consists of two phases: (1) a cross-lingual auto-encoding process, which utilizing a sentence parallel (bitext) corpus to learn the mapping from the source to the target language in the scene graph encoding space and decode sentences in the target language, and (2) a cross-modal unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to language modality. We verify the effectiveness of our proposed method on the Chinese image caption generation task. The comparisons against several existing methods demonstrate the effectiveness of our approach.

Styles APA, Harvard, Vancouver, ISO, etc.

3

Li, Juntao, Chang Liu, Jian Wang, Lidong Bing, Hongsong Li, Xiaozhong Liu, Dongyan Zhao et Rui Yan. « Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce ». Proceedings of the AAAI Conference on Artificial Intelligence 34, n^o 05 (3 avril 2020) : 8212–19. http://dx.doi.org/10.1609/aaai.v34i05.6335.

Texte intégral

Résumé :

With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13.5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the context-dependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Abu Helou, Mamoun, Matteo Palmonari et Mustafa Jarrar. « Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping ». Journal of Artificial Intelligence Research 55 (25 janvier 2016) : 165–208. http://dx.doi.org/10.1613/jair.4789.

Texte intégral

Résumé :

Accessing or integrating data lexicalized in different languages is a challenge. Multilingual lexical resources play a fundamental role in reducing the language barriers to map concepts lexicalized in different languages. In this paper we present a large-scale study on the effectiveness of automatic translations to support two key cross-lingual ontology mapping tasks: the retrieval of candidate matches and the selection of the correct matches for inclusion in the final alignment. We conduct our experiments using four different large gold standards, each one consisting of a pair of mapped wordnets, to cover four different families of languages. We categorize concepts based on their lexicalization (type of words, synonym richness, position in a subconcept graph) and analyze their distributions in the gold standards. Leveraging this categorization, we measure several aspects of translation effectiveness, such as word-translation correctness, word sense coverage, synset and synonym coverage. Finally, we thoroughly discuss several findings of our study, which we believe are helpful for the design of more sophisticated cross-lingual mapping algorithms.

Styles APA, Harvard, Vancouver, ISO, etc.

5

Song, Yuting, Biligsaikhan Batjargal et Akira Maeda. « Learning Japanese-English Bilingual Word Embeddings by Using Language Specificity ». International Journal of Asian Language Processing 30, n^o 03 (septembre 2020) : 2050014. http://dx.doi.org/10.1142/s2717554520500149.

Texte intégral

Résumé :

Cross-lingual word embeddings have been gaining attention because they can capture the semantic meaning of words across languages, which can be applied to cross-lingual tasks. Most methods learn a single mapping (e.g., a linear mapping) to transform a word embedding space from one language to another. To improve bilingual word embeddings, we propose an advanced method that adds a language-specific mapping. We focus on learning Japanese-English bilingual word embedding mapping by considering the specificity of the Japanese language. We evaluated our method by comparing it with single mapping-based-models on bilingual lexicon induction between Japanese and English. We determined that our method was more effective, with significant improvements on words of Japanese origin.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Fu, Bo, Rob Brennan et Declan O’Sullivan. « A configurable translation-based cross-lingual ontology mapping system to adjust mapping outcomes ». Journal of Web Semantics 15 (septembre 2012) : 15–36. http://dx.doi.org/10.1016/j.websem.2012.06.001.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

7

Robnik-Šikonja, Marko, Kristjan Reba et Igor Mozetič. « Cross-lingual transfer of sentiment classifiers ». Slovenščina 2.0 : empirical, applied and interdisciplinary research 9, n^o 1 (6 juillet 2021) : 1–25. http://dx.doi.org/10.4312/slo2.0.2021.1.1-25.

Texte intégral

Résumé :

Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by mapping one language’s vector space to the vector space of another language or by construction of a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use cross-lingual word embeddings to transfer machine learning prediction models for Twitter sentiment between 13 languages. We focus on two transfer mechanisms that recently show superior transfer performance. The first mechanism uses the trained models whose input is the joint numerical space for many languages as implemented in the LASER library. The second mechanism uses large pretrained multilingual BERT language models. Our experiments show that the transfer of models between similar languages is sensible, even with no target language data. The performance of cross-lingual models obtained with the multilingual BERT and LASER library is comparable, and the differences are language-dependent. The transfer with CroSloEngual BERT, pretrained on only three languages, is superior on these and some closely related languages.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Bhowmik, Kowshik, et Anca Ralescu. « Clustering of Monolingual Embedding Spaces ». Digital 3, n^o 1 (23 février 2023) : 48–66. http://dx.doi.org/10.3390/digital3010004.

Texte intégral

Résumé :

Suboptimal performance of cross-lingual word embeddings for distant and low-resource languages calls into question the isomorphic assumption integral to the mapping-based methods of obtaining such embeddings. This paper investigates the comparative impact of typological relationship and corpus size on the isomorphism between monolingual embedding spaces. To that end, two clustering algorithms were applied to three sets of pairwise degrees of isomorphisms. It is also the goal of the paper to determine the combination of the isomorphism measure and clustering algorithm that best captures the typological relationship among the chosen set of languages. Of the three measures investigated, Relational Similarity seemed to capture best the typological information of the languages encoded in their respective embedding spaces. These language clusters can help us identify, without any pre-existing knowledge about the real-world linguistic relationships shared among a group of languages, the related higher-resource languages of low-resource languages. The presence of such languages in the cross-lingual embedding space can help improve the performance of low-resource languages in a cross-lingual embedding space.

Styles APA, Harvard, Vancouver, ISO, etc.

9

DO, Van Hai, Xiong XIAO, Eng Siong CHNG et Haizhou LI. « Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages ». IEICE Transactions on Information and Systems E97.D, n^o 2 (2014) : 285–95. http://dx.doi.org/10.1587/transinf.e97.d.285.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

10

Shi, Xiayang, Ping Yue, Xinyi Liu, Chun Xu et Lin Xu. « Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision ». Computational Intelligence and Neuroscience 2022 (3 août 2022) : 1–9. http://dx.doi.org/10.1155/2022/5296946.

Texte intégral

Résumé :

Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.

Styles APA, Harvard, Vancouver, ISO, etc.

11

Musa, Ibrahim Hussein, et Ibrahim Zamit. « Mapping of Cross-Lingual Emotional Topic Model Research Indexed in Scopus databases from 2000-2020 ». Journal of Scientometric Research 11, n^o 3 (6 janvier 2023) : 427–35. http://dx.doi.org/10.5530/jscires.11.3.46.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

12

Beinborn, Lisa, Torsten Zesch et Iryna Gurevych. « Readability for foreign language learning ». Recent Advances in Automatic Readability Assessment and Text Simplification 165, n^o 2 (31 décembre 2014) : 136–62. http://dx.doi.org/10.1075/itl.165.2.02bei.

Texte intégral

Résumé :

In this paper, we analyse the differences between L1 acquisition and L2 learning and identify four main aspects: input quality and quantity, mapping processes, cross-lingual influence, and reading experience. As a consequence of these differences, we conclude that L1 readability measures cannot be directly mapped to L2 readability. We propose to calculate L2 readability for various dimensions and for smaller units. It is particularly important to account for the cross-lingual influence from the learner’s L1 and other previously acquired languages and for the learner’s higher experience in reading. In our analysis, we focus on lexical readability as it has been found to be the most influential dimension for L2 reading comprehension. We discuss the features frequency, lexical variation, concreteness, polysemy, and context specificity and analyse their impact on L2 readability. As a new feature specific to L2 readability, we propose the cognateness of words with words in languages the learner already knows. A pilot study confirms our assumption that learners can deduce the meaning of new words by their cognateness to other languages.

Styles APA, Harvard, Vancouver, ISO, etc.

13

Bitton, Yonatan, Raphael Cohen, Tamar Schifter, Eitan Bachmat, Michael Elhadad et Noémie Elhadad. « Cross-lingual Unified Medical Language System entity linking in online health communities ». Journal of the American Medical Informatics Association 27, n^o 10 (10 septembre 2020) : 1585–92. http://dx.doi.org/10.1093/jamia/ocaa150.

Texte intégral

Résumé :

Abstract Objective In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. Materials and Methods We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. Results We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. Conclusions Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).

Styles APA, Harvard, Vancouver, ISO, etc.

14

Oura, Keiichiro, Junichi Yamagishi, Mirjam Wester, Simon King et Keiichi Tokuda. « Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping ». Speech Communication 54, n^o 6 (juillet 2012) : 703–14. http://dx.doi.org/10.1016/j.specom.2011.12.004.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

15

Jawanpuria, Pratik, Arjun Balgovind, Anoop Kunchukuttan et Bamdev Mishra. « Learning Multilingual Word Embeddings in Latent Metric Space : A Geometric Approach ». Transactions of the Association for Computational Linguistics 7 (novembre 2019) : 107–20. http://dx.doi.org/10.1162/tacl_a_00257.

Texte intégral

Résumé :

We propose a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary. Our approach decouples the source-to-target language transformation into (a) language-specific rotations on the original embeddings to align them in a common, latent space, and (b) a language-independent similarity metric in this common space to better model the similarity between the embeddings. Overall, we pose the bilingual mapping problem as a classification problem on smooth Riemannian manifolds. Empirically, our approach outperforms previous approaches on the bilingual lexicon induction and cross-lingual word similarity tasks. We next generalize our framework to represent multiple languages in a common latent space. Language-specific rotations for all the languages and a common similarity metric in the latent space are learned jointly from bilingual dictionaries for multiple language pairs. We illustrate the effectiveness of joint learning for multiple languages in an indirect word translation setting.

Styles APA, Harvard, Vancouver, ISO, etc.

16

Kann, Katharina, Samuel R. Bowman et Kyunghyun Cho. « Learning to Learn Morphological Inflection for Resource-Poor Languages ». Proceedings of the AAAI Conference on Artificial Intelligence 34, n^o 05 (3 avril 2020) : 8058–65. http://dx.doi.org/10.1609/aaai.v34i05.6316.

Texte intégral

Résumé :

We propose to cast the task of morphological inflection—mapping a lemma to an indicated inflected form—for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters that can serve as a strong initialization point for fine-tuning on a resource-poor target language. Experiments with two model architectures on 29 target languages from 3 families show that our suggested approach outperforms all baselines. In particular, it obtains a 31.7% higher absolute accuracy than a previously proposed cross-lingual transfer model and outperforms the previous state of the art by 1.7% absolute accuracy on average over languages.

Styles APA, Harvard, Vancouver, ISO, etc.

17

Dong, Qianqian, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu et Lei Li. « Consecutive Decoding for Speech-to-text Translation ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 14 (18 mai 2021) : 12738–48. http://dx.doi.org/10.1609/aaai.v35i14.17508.

Texte intégral

Résumé :

Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. It benefits the model training so that additional large parallel text corpus can be fully exploited to enhance the speech translation training. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, TED English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms the previous state-of-the-art methods. The code is available at https://github.com/dqqcasia/st.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Vasconcellos, Maria Lúcia. « Systemic functional translation studies (sfts) : the theory travelling in brazilian environments ». DELTA : Documentação de Estudos em Lingüística Teórica e Aplicada 25, spe (2009) : 585–607. http://dx.doi.org/10.1590/s0102-44502009000300003.

Texte intégral

Résumé :

This paper presents a mapping of the Systemic Functional Translation Studies (SFTS) tradition in the Brazilian environment, from its genesis up to developments in the 2000's. While studies during the earlier period are informed by the concept of "translation as (re)textualization", more recent SFTS research can be charted along the 'cline of instantiation', translations being investigated as "instantiations-in-contexts", cross-lingual functional varieties of language, or still as sources of SFL-based language description of Brazilian Portuguese. From late 90's on, computerized corpora and corpus-based methodologies have been integrated into Brazilian SFTS, for which annotation methods for the tagging of SFL categories have been developed. The paper ends with a consideration of Brazilian SFTS against the background of international SFTS as disseminated in the 2nd HCLS.

Styles APA, Harvard, Vancouver, ISO, etc.

19

Kuai, Xi, Lin Li, Heng Luo, Shen Hang, Zhijun Zhang et Yu Liu. « Geospatial Information Categories Mapping in a Cross-lingual Environment : A Case Study of “Surface Water” Categories in Chinese and American Topographic Maps ». ISPRS International Journal of Geo-Information 5, n^o 6 (14 juin 2016) : 90. http://dx.doi.org/10.3390/ijgi5060090.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

20

Kitasako, Y., A. Sadr, H. Hamba, M. Ikeda et J. Tagami. « Gum Containing Calcium Fluoride Reinforces Enamel Subsurface Lesions in situ ». Journal of Dental Research 91, n^o 4 (15 février 2012) : 370–75. http://dx.doi.org/10.1177/0022034512439716.

Texte intégral

Résumé :

The aim of this study was to assess the effect of chewing gum containing phosphoryl oligosaccharides of calcium (POs-Ca) and a low concentration of fluoride (F) on the hardness of enamel subsurface lesions, utilizing a double-blind, randomized, and controlled in situ model. Fifteen individuals wore removable lingual appliances with 3 bovine-enamel insets containing subsurface demineralized lesions. Three times a day for 14 days, they chewed one of the 3 chewing gums (placebo, POs-Ca, POs-Ca+F). After the treatment period, cross-sectional mineral content, nanoindentation hardness, and fluoride ion mapping by time-of-flight secondary ion mass spectrometry (TOF-SIMS) were evaluated. Although there were no statistical differences in overall mineral content and hardness recovery rates between POs-Ca and POs-Ca+F subsurface lesions (p > 0.05), nanoindentation at 1-μm distance increments from the surface showed statistical differences in hardness recovery rate between POs-Ca and POs-Ca+F in the superficial 20-μm region (p < 0.05). Fluoride mapping revealed distribution of the ion up to 20 μm from the surface in the POs-Ca+F group. Nanoindentation and TOF-SIMS results highlighted the benefits of bioavailability of fluoride ion on reinforcement of the superficial zone of subsurface lesions in situ (NCT01377493).

Styles APA, Harvard, Vancouver, ISO, etc.

21

Espinosa-Anke, Luis, Geraint Palmer, Padraig Corcoran, Maxim Filimonov, Irena Spasić et Dawn Knight. « English–Welsh Cross-Lingual Embeddings ». Applied Sciences 11, n^o 14 (16 juillet 2021) : 6541. http://dx.doi.org/10.3390/app11146541.

Texte intégral

Résumé :

Cross-lingual embeddings are vector space representations where word translations tend to be co-located. These representations enable learning transfer across languages, thus bridging the gap between data-rich languages such as English and others. In this paper, we present and evaluate a suite of cross-lingual embeddings for the English–Welsh language pair. To train the bilingual embeddings, a Welsh corpus of approximately 145 M words was combined with an English Wikipedia corpus. We used a bilingual dictionary to frame the problem of learning bilingual mappings as a supervised machine learning task, where a word vector space is first learned independently on a monolingual corpus, after which a linear alignment strategy is applied to map the monolingual embeddings to a common bilingual vector space. Two approaches were used to learn monolingual embeddings, including word2vec and fastText. Three cross-language alignment strategies were explored, including cosine similarity, inverted softmax and cross-domain similarity local scaling (CSLS). We evaluated different combinations of these approaches using two tasks, bilingual dictionary induction, and cross-lingual sentiment analysis. The best results were achieved using monolingual fastText embeddings and the CSLS metric. We also demonstrated that by including a few automatically translated training documents, the performance of a cross-lingual text classifier for Welsh can increase by approximately 20 percent points.

Styles APA, Harvard, Vancouver, ISO, etc.

22

Li, Yuling, Kui Yu et Yuhong Zhang. « Learning Cross-Lingual Mappings in Imperfectly Isomorphic Embedding Spaces ». IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021) : 2630–42. http://dx.doi.org/10.1109/taslp.2021.3097935.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

23

Hourrane, Oumaima, et El Habib Benlahmar. « Graph transformer for cross-lingual plagiarism detection ». IAES International Journal of Artificial Intelligence (IJ-AI) 11, n^o 3 (1 septembre 2022) : 905. http://dx.doi.org/10.11591/ijai.v11.i3.pp905-915.

Texte intégral

Résumé :

<span lang="EN-US">The existence of vast amounts of multilingual textual data on the internet leads to cross-lingual plagiarism which becomes a serious issue in different fields such as education, science, and literature. Current cross-lingual plagiarism detection approaches usually employ syntactic and lexical properties, external machine translation systems, or finding similarities within a multilingual set of text documents. However, most of these methods are conceived for literal plagiarism such as copy and paste, and their performance is diminished when handling complex cases of plagiarism including paraphrasing. In this paper, we propose a new graph-based approach that represents text passages in different languages using knowledge graphs. We put forward a new graph structure modeling method based on the Transformer architecture that employs precise relation encoding and delivers a more efficient way for global graph representation. The mappings between the graphs are learned both in semi-supervised and unsupervised training mechanisms. The results of our experiments in Arabic–English, French–English, and Spanish–English plagiarism detection show that our graph transformer method surpasses the state-of-the-art cross-lingual plagiarism detection approaches with and without paraphrasing cases, and provides further insights on the use of knowledge graphs on a language-independent model.</span>

Styles APA, Harvard, Vancouver, ISO, etc.

24

Zhang, Yuhong, Yuling Li, Yi Zhu et Xuegang Hu. « Wasserstein GAN based on Autoencoder with back-translation for cross-lingual embedding mappings ». Pattern Recognition Letters 129 (janvier 2020) : 311–16. http://dx.doi.org/10.1016/j.patrec.2019.11.033.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

25

Liu, Gang, Yichao Dong, Kai Wang et Zhizheng Yan. « A cross-lingual sentence pair interaction feature capture model based on pseudo-corpus and multilingual embedding ». AI Communications, 13 avril 2022, 1–14. http://dx.doi.org/10.3233/aic-210085.

Texte intégral

Résumé :

Recently, the emergence of the digital language division and the availability of cross-lingual benchmarks make researches of cross-lingual texts more popular. However, the performance of existing methods based on mapping relation are not good enough, because sometimes the structures of language spaces are not isomorphic. Besides, polysemy makes the extraction of interaction features hard. For cross-lingual word embedding, a model named Cross-lingual Word Embedding Space Based on Pseudo Corpus (CWE-PC) is proposed to obtain cross-lingual and multilingual word embedding. For cross-lingual sentence pair interaction feature capture, a Cross-language Feature Capture Based on Similarity Matrix (CFC-SM) model is built to extract cross-lingual interaction features. ELMo pretrained model and multiple layer convolution are used to alleviate polysemy and extract interaction features. These models are evaluated on multiple language pairs and results show that they outperform the state-of-the-art cross-lingual word embedding methods.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Fu, Bo, Rob Brennan et Declan OOSullivan. « A Configurable Translation-Based Cross-Lingual Ontology Mapping System to Adjust Mapping Outcome ». SSRN Electronic Journal, 2012. http://dx.doi.org/10.2139/ssrn.3198965.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

27

Lin, Ying-Chi, Phillip Hoffmann et Erhard Rahm. « Enhancing Cross-lingual Biomedical Concept Normalization Using Deep Neural Network Pretrained Language Models ». SN Computer Science 3, n^o 5 (21 juillet 2022). http://dx.doi.org/10.1007/s42979-022-01295-7.

Texte intégral

Résumé :

AbstractIn this study, we propose a new approach for cross-lingual biomedical concept normalization, the process of mapping text in non-English documents to English concepts of a knowledge base. The resulting mappings, named as semantic annotations, enhance data integration and interoperability of documents in different languages. The US FDA (Food and Drug Administration), therefore, requires all submitted medical forms to be semantically annotated. These standardized medical forms are used in health care practice and biomedical research and are translated/adapted into various languages. Mapping them to the same concepts (normally in English) facilitates the comparison of multiple medical studies even cross-lingually. However, the translation and adaptation of these forms can cause them to deviate from its original text syntactically and in wording. This leads the conventional string matching methods to produce low-quality annotation results. Therefore, our new approach incorporates semantics into the cross-lingual concept normalization process. This is done using sentence embeddings generated by BERT-based pretrained language models. We evaluate the new approach by annotating entire questions of German medical forms with concepts in English, as required by the FDA. The new approach achieves an improvement of 136% in recall, 52% in precision and 66% in F-measure compared to the conventional string matching methods.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Zhang, Meng, Haoruo Peng, Yang Liu, Huanbo Luan et Maosong Sun. « Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision ». Proceedings of the AAAI Conference on Artificial Intelligence 31, n^o 1 (12 février 2017). http://dx.doi.org/10.1609/aaai.v31i1.10988.

Texte intégral

Résumé :

Building bilingual lexica from non-parallel data is a long-standing natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is however unreliable when there are only a limited number of seeds, which is a reasonable setting for resource-scarce languages. We tackle the limitation by introducing a novel matching mechanism into bilingual word representation learning. It captures extra translation pairs exposed by the seeds to incrementally improve the bilingual word embeddings. In our experiments, we find the matching mechanism to substantially improve the quality of the bilingual vector space, which in turn allows us to induce better bilingual lexica with seeds as few as 10.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Osenova, Petya, et Kiril Simov. « The data-driven Bulgarian WordNet : BTBWN ». Cognitive Studies | Études cognitives, n^o 18 (20 décembre 2018). http://dx.doi.org/10.11649/cs.1713.

Texte intégral

Résumé :

The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.

Styles APA, Harvard, Vancouver, ISO, etc.

30

Maros, Máté E., Chang Gyu Cho, Andreas G. Junge, Benedikt Kämpgen, Victor Saase, Fabian Siegel, Frederik Trinkmann, Thomas Ganslandt, Christoph Groden et Holger Wenz. « Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings ». Scientific Reports 11, n^o 1 (9 mars 2021). http://dx.doi.org/10.1038/s41598-021-85016-9.

Texte intégral

Résumé :

AbstractComputer-assisted reporting (CAR) tools were suggested to improve radiology report quality by context-sensitively recommending key imaging biomarkers. However, studies evaluating machine learning (ML) algorithms on cross-lingual ontological (RadLex) mappings for developing embedded CAR algorithms are lacking. Therefore, we compared ML algorithms developed on human expert-annotated features against those developed on fully automated cross-lingual (German to English) RadLex mappings using 206 CT reports of suspected stroke. Target label was whether the Alberta Stroke Programme Early CT Score (ASPECTS) should have been provided (yes/no:154/52). We focused on probabilistic outputs of ML-algorithms including tree-based methods, elastic net, support vector machines (SVMs) and fastText (linear classifier), which were evaluated in the same 5 × fivefold nested cross-validation framework. This allowed for model stacking and classifier rankings. Performance was evaluated using calibration metrics (AUC, brier score, log loss) and -plots. Contextual ML-based assistance recommending ASPECTS was feasible. SVMs showed the highest accuracies both on human-extracted- (87%) and RadLex features (findings:82.5%; impressions:85.4%). FastText achieved the highest accuracy (89.3%) and AUC (92%) on impressions. Boosted trees fitted on findings had the best calibration profile. Our approach provides guidance for choosing ML classifiers for CAR tools in fully automated and language-agnostic fashion using bag-of-RadLex terms on limited expert-labelled training data.

Styles APA, Harvard, Vancouver, ISO, etc.

Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!