Journal articles on the topic 'Cross-Lingual Mapping'

To see the other types of publications on this topic, follow the link: Cross-Lingual Mapping.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 30 journal articles for your research on the topic 'Cross-Lingual Mapping.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Fu, Zuohui, Yikun Xian, Shijie Geng, Yingqiang Ge, Yuting Wang, Xin Dong, Guang Wang, and Gerard De Melo. "ABSent: Cross-Lingual Sentence Representation Mapping with Bidirectional GANs." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7756–63. http://dx.doi.org/10.1609/aaai.v34i05.6279.

Full text
Abstract:
A number of cross-lingual transfer learning approaches based on neural networks have been proposed for the case when large amounts of parallel text are at our disposal. However, in many real-world settings, the size of parallel annotated training data is restricted. Additionally, prior cross-lingual mapping research has mainly focused on the word level. This raises the question of whether such techniques can also be applied to effortlessly obtain cross-lingually aligned sentence representations. To this end, we propose an Adversarial Bi-directional Sentence Embedding Mapping (ABSent) framework, which learns mappings of cross-lingual sentence representations from limited quantities of parallel data. The experiments show that our method outperforms several technically more powerful approaches, especially under challenging low-resource circumstances. The source code is available from https://github.com/zuohuif/ABSent along with relevant datasets.
APA, Harvard, Vancouver, ISO, and other styles
2

Gao, Jiahui, Yi Zhou, Philip L. H. Yu, Shafiq Joty, and Jiuxiang Gu. "UNISON: Unpaired Cross-Lingual Image Captioning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 10654–62. http://dx.doi.org/10.1609/aaai.v36i10.21310.

Full text
Abstract:
Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and deprives a large part of the world population of its benefit. In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language. Specifically, our method consists of two phases: (1) a cross-lingual auto-encoding process, which utilizing a sentence parallel (bitext) corpus to learn the mapping from the source to the target language in the scene graph encoding space and decode sentences in the target language, and (2) a cross-modal unsupervised feature mapping, which seeks to map the encoded scene graph features from image modality to language modality. We verify the effectiveness of our proposed method on the Chinese image caption generation task. The comparisons against several existing methods demonstrate the effectiveness of our approach.
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Juntao, Chang Liu, Jian Wang, Lidong Bing, Hongsong Li, Xiaozhong Liu, Dongyan Zhao, and Rui Yan. "Cross-Lingual Low-Resource Set-to-Description Retrieval for Global E-Commerce." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8212–19. http://dx.doi.org/10.1609/aaai.v34i05.6335.

Full text
Abstract:
With the prosperous of cross-border e-commerce, there is an urgent demand for designing intelligent approaches for assisting e-commerce sellers to offer local products for consumers from all over the world. In this paper, we explore a new task of cross-lingual information retrieval, i.e., cross-lingual set-to-description retrieval in cross-border e-commerce, which involves matching product attribute sets in the source language with persuasive product descriptions in the target language. We manually collect a new and high-quality paired dataset, where each pair contains an unordered product attribute set in the source language and an informative product description in the target language. As the dataset construction process is both time-consuming and costly, the new dataset only comprises of 13.5k pairs, which is a low-resource setting and can be viewed as a challenging testbed for model development and evaluation in cross-border e-commerce. To tackle this cross-lingual set-to-description retrieval task, we propose a novel cross-lingual matching network (CLMN) with the enhancement of context-dependent cross-lingual mapping upon the pre-trained monolingual BERT representations. Experimental results indicate that our proposed CLMN yields impressive results on the challenging task and the context-dependent cross-lingual mapping on BERT yields noticeable improvement over the pre-trained multi-lingual BERT model.
APA, Harvard, Vancouver, ISO, and other styles
4

Abu Helou, Mamoun, Matteo Palmonari, and Mustafa Jarrar. "Effectiveness of Automatic Translations for Cross-Lingual Ontology Mapping." Journal of Artificial Intelligence Research 55 (January 25, 2016): 165–208. http://dx.doi.org/10.1613/jair.4789.

Full text
Abstract:
Accessing or integrating data lexicalized in different languages is a challenge. Multilingual lexical resources play a fundamental role in reducing the language barriers to map concepts lexicalized in different languages. In this paper we present a large-scale study on the effectiveness of automatic translations to support two key cross-lingual ontology mapping tasks: the retrieval of candidate matches and the selection of the correct matches for inclusion in the final alignment. We conduct our experiments using four different large gold standards, each one consisting of a pair of mapped wordnets, to cover four different families of languages. We categorize concepts based on their lexicalization (type of words, synonym richness, position in a subconcept graph) and analyze their distributions in the gold standards. Leveraging this categorization, we measure several aspects of translation effectiveness, such as word-translation correctness, word sense coverage, synset and synonym coverage. Finally, we thoroughly discuss several findings of our study, which we believe are helpful for the design of more sophisticated cross-lingual mapping algorithms.
APA, Harvard, Vancouver, ISO, and other styles
5

Song, Yuting, Biligsaikhan Batjargal, and Akira Maeda. "Learning Japanese-English Bilingual Word Embeddings by Using Language Specificity." International Journal of Asian Language Processing 30, no. 03 (September 2020): 2050014. http://dx.doi.org/10.1142/s2717554520500149.

Full text
Abstract:
Cross-lingual word embeddings have been gaining attention because they can capture the semantic meaning of words across languages, which can be applied to cross-lingual tasks. Most methods learn a single mapping (e.g., a linear mapping) to transform a word embedding space from one language to another. To improve bilingual word embeddings, we propose an advanced method that adds a language-specific mapping. We focus on learning Japanese-English bilingual word embedding mapping by considering the specificity of the Japanese language. We evaluated our method by comparing it with single mapping-based-models on bilingual lexicon induction between Japanese and English. We determined that our method was more effective, with significant improvements on words of Japanese origin.
APA, Harvard, Vancouver, ISO, and other styles
6

Fu, Bo, Rob Brennan, and Declan O’Sullivan. "A configurable translation-based cross-lingual ontology mapping system to adjust mapping outcomes." Journal of Web Semantics 15 (September 2012): 15–36. http://dx.doi.org/10.1016/j.websem.2012.06.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Robnik-Šikonja, Marko, Kristjan Reba, and Igor Mozetič. "Cross-lingual transfer of sentiment classifiers." Slovenščina 2.0: empirical, applied and interdisciplinary research 9, no. 1 (July 6, 2021): 1–25. http://dx.doi.org/10.4312/slo2.0.2021.1.1-25.

Full text
Abstract:
Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by mapping one language’s vector space to the vector space of another language or by construction of a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models between languages, thereby compensating for insufficient data in less-resourced languages. We use cross-lingual word embeddings to transfer machine learning prediction models for Twitter sentiment between 13 languages. We focus on two transfer mechanisms that recently show superior transfer performance. The first mechanism uses the trained models whose input is the joint numerical space for many languages as implemented in the LASER library. The second mechanism uses large pretrained multilingual BERT language models. Our experiments show that the transfer of models between similar languages is sensible, even with no target language data. The performance of cross-lingual models obtained with the multilingual BERT and LASER library is comparable, and the differences are language-dependent. The transfer with CroSloEngual BERT, pretrained on only three languages, is superior on these and some closely related languages.
APA, Harvard, Vancouver, ISO, and other styles
8

Bhowmik, Kowshik, and Anca Ralescu. "Clustering of Monolingual Embedding Spaces." Digital 3, no. 1 (February 23, 2023): 48–66. http://dx.doi.org/10.3390/digital3010004.

Full text
Abstract:
Suboptimal performance of cross-lingual word embeddings for distant and low-resource languages calls into question the isomorphic assumption integral to the mapping-based methods of obtaining such embeddings. This paper investigates the comparative impact of typological relationship and corpus size on the isomorphism between monolingual embedding spaces. To that end, two clustering algorithms were applied to three sets of pairwise degrees of isomorphisms. It is also the goal of the paper to determine the combination of the isomorphism measure and clustering algorithm that best captures the typological relationship among the chosen set of languages. Of the three measures investigated, Relational Similarity seemed to capture best the typological information of the languages encoded in their respective embedding spaces. These language clusters can help us identify, without any pre-existing knowledge about the real-world linguistic relationships shared among a group of languages, the related higher-resource languages of low-resource languages. The presence of such languages in the cross-lingual embedding space can help improve the performance of low-resource languages in a cross-lingual embedding space.
APA, Harvard, Vancouver, ISO, and other styles
9

DO, Van Hai, Xiong XIAO, Eng Siong CHNG, and Haizhou LI. "Cross-Lingual Phone Mapping for Large Vocabulary Speech Recognition of Under-Resourced Languages." IEICE Transactions on Information and Systems E97.D, no. 2 (2014): 285–95. http://dx.doi.org/10.1587/transinf.e97.d.285.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Shi, Xiayang, Ping Yue, Xinyi Liu, Chun Xu, and Lin Xu. "Obtaining Parallel Sentences in Low-Resource Language Pairs with Minimal Supervision." Computational Intelligence and Neuroscience 2022 (August 3, 2022): 1–9. http://dx.doi.org/10.1155/2022/5296946.

Full text
Abstract:
Machine translation relies on parallel sentences, the number of which is an important factor affecting the performance of machine translation systems, especially in low-resource languages. Recent advances in learning cross-lingual word representations from nonparallel data by machine learning make a new possibility for obtaining bilingual sentences with minimal supervision in low-resource languages. In this paper, we introduce a novel methodology to obtain parallel sentences via only a small-size bilingual seed lexicon about hundreds of entries. We first obtain bilingual semantic by establishing cross-lingual mapping in monolingual languages via a seed lexicon. Then, we construct a deep learning classifier to extract bilingual parallel sentences. We demonstrate the effectiveness of our methodology by harvesting Uyghur-Chinese parallel sentences and constructing a machine translation system. The experiments indicate that our method can obtain large and high-accuracy bilingual parallel sentences in low-resource language pairs.
APA, Harvard, Vancouver, ISO, and other styles
11

Musa, Ibrahim Hussein, and Ibrahim Zamit. "Mapping of Cross-Lingual Emotional Topic Model Research Indexed in Scopus databases from 2000-2020." Journal of Scientometric Research 11, no. 3 (January 6, 2023): 427–35. http://dx.doi.org/10.5530/jscires.11.3.46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Beinborn, Lisa, Torsten Zesch, and Iryna Gurevych. "Readability for foreign language learning." Recent Advances in Automatic Readability Assessment and Text Simplification 165, no. 2 (December 31, 2014): 136–62. http://dx.doi.org/10.1075/itl.165.2.02bei.

Full text
Abstract:
In this paper, we analyse the differences between L1 acquisition and L2 learning and identify four main aspects: input quality and quantity, mapping processes, cross-lingual influence, and reading experience. As a consequence of these differences, we conclude that L1 readability measures cannot be directly mapped to L2 readability. We propose to calculate L2 readability for various dimensions and for smaller units. It is particularly important to account for the cross-lingual influence from the learner’s L1 and other previously acquired languages and for the learner’s higher experience in reading. In our analysis, we focus on lexical readability as it has been found to be the most influential dimension for L2 reading comprehension. We discuss the features frequency, lexical variation, concreteness, polysemy, and context specificity and analyse their impact on L2 readability. As a new feature specific to L2 readability, we propose the cognateness of words with words in languages the learner already knows. A pilot study confirms our assumption that learners can deduce the meaning of new words by their cognateness to other languages.
APA, Harvard, Vancouver, ISO, and other styles
13

Bitton, Yonatan, Raphael Cohen, Tamar Schifter, Eitan Bachmat, Michael Elhadad, and Noémie Elhadad. "Cross-lingual Unified Medical Language System entity linking in online health communities." Journal of the American Medical Informatics Association 27, no. 10 (September 10, 2020): 1585–92. http://dx.doi.org/10.1093/jamia/ocaa150.

Full text
Abstract:
Abstract Objective In Hebrew online health communities, participants commonly write medical terms that appear as transliterated forms of a source term in English. Such transliterations introduce high variability in text and challenge text-analytics methods. To reduce their variability, medical terms must be normalized, such as linking them to Unified Medical Language System (UMLS) concepts. We present a method to identify both transliterated and translated Hebrew medical terms and link them with UMLS entities. Materials and Methods We investigate the effect of linking terms in Camoni, a popular Israeli online health community in Hebrew. Our method, MDTEL (Medical Deep Transliteration Entity Linking), includes (1) an attention-based recurrent neural network encoder-decoder to transliterate words and mapping UMLS from English to Hebrew, (2) an unsupervised method for creating a transliteration dataset in any language without manually labeled data, and (3) an efficient way to identify and link medical entities in the Hebrew corpus to UMLS concepts, by producing a high-recall list of candidate medical terms in the corpus, and then filtering the candidates to relevant medical terms. Results We carry out experiments on 3 disease-specific communities: diabetes, multiple sclerosis, and depression. MDTEL tagging and normalizing on Camoni posts achieved 99% accuracy, 92% recall, and 87% precision. When tagging and normalizing terms in queries from the Camoni search logs, UMLS-normalized queries improved search results in 46% of the cases. Conclusions Cross-lingual UMLS entity linking from Hebrew is possible and improves search performance across communities. Annotated datasets, annotation guidelines, and code are made available online (https://github.com/yonatanbitton/mdtel).
APA, Harvard, Vancouver, ISO, and other styles
14

Oura, Keiichiro, Junichi Yamagishi, Mirjam Wester, Simon King, and Keiichi Tokuda. "Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping." Speech Communication 54, no. 6 (July 2012): 703–14. http://dx.doi.org/10.1016/j.specom.2011.12.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Jawanpuria, Pratik, Arjun Balgovind, Anoop Kunchukuttan, and Bamdev Mishra. "Learning Multilingual Word Embeddings in Latent Metric Space: A Geometric Approach." Transactions of the Association for Computational Linguistics 7 (November 2019): 107–20. http://dx.doi.org/10.1162/tacl_a_00257.

Full text
Abstract:
We propose a novel geometric approach for learning bilingual mappings given monolingual embeddings and a bilingual dictionary. Our approach decouples the source-to-target language transformation into (a) language-specific rotations on the original embeddings to align them in a common, latent space, and (b) a language-independent similarity metric in this common space to better model the similarity between the embeddings. Overall, we pose the bilingual mapping problem as a classification problem on smooth Riemannian manifolds. Empirically, our approach outperforms previous approaches on the bilingual lexicon induction and cross-lingual word similarity tasks. We next generalize our framework to represent multiple languages in a common latent space. Language-specific rotations for all the languages and a common similarity metric in the latent space are learned jointly from bilingual dictionaries for multiple language pairs. We illustrate the effectiveness of joint learning for multiple languages in an indirect word translation setting.
APA, Harvard, Vancouver, ISO, and other styles
16

Kann, Katharina, Samuel R. Bowman, and Kyunghyun Cho. "Learning to Learn Morphological Inflection for Resource-Poor Languages." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8058–65. http://dx.doi.org/10.1609/aaai.v34i05.6316.

Full text
Abstract:
We propose to cast the task of morphological inflection—mapping a lemma to an indicated inflected form—for resource-poor languages as a meta-learning problem. Treating each language as a separate task, we use data from high-resource source languages to learn a set of model parameters that can serve as a strong initialization point for fine-tuning on a resource-poor target language. Experiments with two model architectures on 29 target languages from 3 families show that our suggested approach outperforms all baselines. In particular, it obtains a 31.7% higher absolute accuracy than a previously proposed cross-lingual transfer model and outperforms the previous state of the art by 1.7% absolute accuracy on average over languages.
APA, Harvard, Vancouver, ISO, and other styles
17

Dong, Qianqian, Mingxuan Wang, Hao Zhou, Shuang Xu, Bo Xu, and Lei Li. "Consecutive Decoding for Speech-to-text Translation." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 14 (May 18, 2021): 12738–48. http://dx.doi.org/10.1609/aaai.v35i14.17508.

Full text
Abstract:
Speech-to-text translation (ST), which directly translates the source language speech to the target language text, has attracted intensive attention recently. However, the combination of speech recognition and machine translation in a single model poses a heavy burden on the direct cross-modal cross-lingual mapping. To reduce the learning difficulty, we propose COnSecutive Transcription and Translation (COSTT), an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. It benefits the model training so that additional large parallel text corpus can be fully exploited to enhance the speech translation training. Our method is verified on three mainstream datasets, including Augmented LibriSpeech English-French dataset, TED English-German dataset, and TED English-Chinese dataset. Experiments show that our proposed COSTT outperforms the previous state-of-the-art methods. The code is available at https://github.com/dqqcasia/st.
APA, Harvard, Vancouver, ISO, and other styles
18

Vasconcellos, Maria Lúcia. "Systemic functional translation studies (sfts): the theory travelling in brazilian environments." DELTA: Documentação de Estudos em Lingüística Teórica e Aplicada 25, spe (2009): 585–607. http://dx.doi.org/10.1590/s0102-44502009000300003.

Full text
Abstract:
This paper presents a mapping of the Systemic Functional Translation Studies (SFTS) tradition in the Brazilian environment, from its genesis up to developments in the 2000's. While studies during the earlier period are informed by the concept of "translation as (re)textualization", more recent SFTS research can be charted along the 'cline of instantiation', translations being investigated as "instantiations-in-contexts", cross-lingual functional varieties of language, or still as sources of SFL-based language description of Brazilian Portuguese. From late 90's on, computerized corpora and corpus-based methodologies have been integrated into Brazilian SFTS, for which annotation methods for the tagging of SFL categories have been developed. The paper ends with a consideration of Brazilian SFTS against the background of international SFTS as disseminated in the 2nd HCLS.
APA, Harvard, Vancouver, ISO, and other styles
19

Kuai, Xi, Lin Li, Heng Luo, Shen Hang, Zhijun Zhang, and Yu Liu. "Geospatial Information Categories Mapping in a Cross-lingual Environment: A Case Study of “Surface Water” Categories in Chinese and American Topographic Maps." ISPRS International Journal of Geo-Information 5, no. 6 (June 14, 2016): 90. http://dx.doi.org/10.3390/ijgi5060090.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Kitasako, Y., A. Sadr, H. Hamba, M. Ikeda, and J. Tagami. "Gum Containing Calcium Fluoride Reinforces Enamel Subsurface Lesions in situ." Journal of Dental Research 91, no. 4 (February 15, 2012): 370–75. http://dx.doi.org/10.1177/0022034512439716.

Full text
Abstract:
The aim of this study was to assess the effect of chewing gum containing phosphoryl oligosaccharides of calcium (POs-Ca) and a low concentration of fluoride (F) on the hardness of enamel subsurface lesions, utilizing a double-blind, randomized, and controlled in situ model. Fifteen individuals wore removable lingual appliances with 3 bovine-enamel insets containing subsurface demineralized lesions. Three times a day for 14 days, they chewed one of the 3 chewing gums (placebo, POs-Ca, POs-Ca+F). After the treatment period, cross-sectional mineral content, nanoindentation hardness, and fluoride ion mapping by time-of-flight secondary ion mass spectrometry (TOF-SIMS) were evaluated. Although there were no statistical differences in overall mineral content and hardness recovery rates between POs-Ca and POs-Ca+F subsurface lesions (p > 0.05), nanoindentation at 1-μm distance increments from the surface showed statistical differences in hardness recovery rate between POs-Ca and POs-Ca+F in the superficial 20-μm region (p < 0.05). Fluoride mapping revealed distribution of the ion up to 20 μm from the surface in the POs-Ca+F group. Nanoindentation and TOF-SIMS results highlighted the benefits of bioavailability of fluoride ion on reinforcement of the superficial zone of subsurface lesions in situ (NCT01377493).
APA, Harvard, Vancouver, ISO, and other styles
21

Espinosa-Anke, Luis, Geraint Palmer, Padraig Corcoran, Maxim Filimonov, Irena Spasić, and Dawn Knight. "English–Welsh Cross-Lingual Embeddings." Applied Sciences 11, no. 14 (July 16, 2021): 6541. http://dx.doi.org/10.3390/app11146541.

Full text
Abstract:
Cross-lingual embeddings are vector space representations where word translations tend to be co-located. These representations enable learning transfer across languages, thus bridging the gap between data-rich languages such as English and others. In this paper, we present and evaluate a suite of cross-lingual embeddings for the English–Welsh language pair. To train the bilingual embeddings, a Welsh corpus of approximately 145 M words was combined with an English Wikipedia corpus. We used a bilingual dictionary to frame the problem of learning bilingual mappings as a supervised machine learning task, where a word vector space is first learned independently on a monolingual corpus, after which a linear alignment strategy is applied to map the monolingual embeddings to a common bilingual vector space. Two approaches were used to learn monolingual embeddings, including word2vec and fastText. Three cross-language alignment strategies were explored, including cosine similarity, inverted softmax and cross-domain similarity local scaling (CSLS). We evaluated different combinations of these approaches using two tasks, bilingual dictionary induction, and cross-lingual sentiment analysis. The best results were achieved using monolingual fastText embeddings and the CSLS metric. We also demonstrated that by including a few automatically translated training documents, the performance of a cross-lingual text classifier for Welsh can increase by approximately 20 percent points.
APA, Harvard, Vancouver, ISO, and other styles
22

Li, Yuling, Kui Yu, and Yuhong Zhang. "Learning Cross-Lingual Mappings in Imperfectly Isomorphic Embedding Spaces." IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021): 2630–42. http://dx.doi.org/10.1109/taslp.2021.3097935.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Hourrane, Oumaima, and El Habib Benlahmar. "Graph transformer for cross-lingual plagiarism detection." IAES International Journal of Artificial Intelligence (IJ-AI) 11, no. 3 (September 1, 2022): 905. http://dx.doi.org/10.11591/ijai.v11.i3.pp905-915.

Full text
Abstract:
<span lang="EN-US">The existence of vast amounts of multilingual textual data on the internet leads to cross-lingual plagiarism which becomes a serious issue in different fields such as education, science, and literature. Current cross-lingual plagiarism detection approaches usually employ syntactic and lexical properties, external machine translation systems, or finding similarities within a multilingual set of text documents. However, most of these methods are conceived for literal plagiarism such as copy and paste, and their performance is diminished when handling complex cases of plagiarism including paraphrasing. In this paper, we propose a new graph-based approach that represents text passages in different languages using knowledge graphs. We put forward a new graph structure modeling method based on the Transformer architecture that employs precise relation encoding and delivers a more efficient way for global graph representation. The mappings between the graphs are learned both in semi-supervised and unsupervised training mechanisms. The results of our experiments in Arabic–English, French–English, and Spanish–English plagiarism detection show that our graph transformer method surpasses the state-of-the-art cross-lingual plagiarism detection approaches with and without paraphrasing cases, and provides further insights on the use of knowledge graphs on a language-independent model.</span>
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Yuhong, Yuling Li, Yi Zhu, and Xuegang Hu. "Wasserstein GAN based on Autoencoder with back-translation for cross-lingual embedding mappings." Pattern Recognition Letters 129 (January 2020): 311–16. http://dx.doi.org/10.1016/j.patrec.2019.11.033.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Liu, Gang, Yichao Dong, Kai Wang, and Zhizheng Yan. "A cross-lingual sentence pair interaction feature capture model based on pseudo-corpus and multilingual embedding." AI Communications, April 13, 2022, 1–14. http://dx.doi.org/10.3233/aic-210085.

Full text
Abstract:
Recently, the emergence of the digital language division and the availability of cross-lingual benchmarks make researches of cross-lingual texts more popular. However, the performance of existing methods based on mapping relation are not good enough, because sometimes the structures of language spaces are not isomorphic. Besides, polysemy makes the extraction of interaction features hard. For cross-lingual word embedding, a model named Cross-lingual Word Embedding Space Based on Pseudo Corpus (CWE-PC) is proposed to obtain cross-lingual and multilingual word embedding. For cross-lingual sentence pair interaction feature capture, a Cross-language Feature Capture Based on Similarity Matrix (CFC-SM) model is built to extract cross-lingual interaction features. ELMo pretrained model and multiple layer convolution are used to alleviate polysemy and extract interaction features. These models are evaluated on multiple language pairs and results show that they outperform the state-of-the-art cross-lingual word embedding methods.
APA, Harvard, Vancouver, ISO, and other styles
26

Fu, Bo, Rob Brennan, and Declan OOSullivan. "A Configurable Translation-Based Cross-Lingual Ontology Mapping System to Adjust Mapping Outcome." SSRN Electronic Journal, 2012. http://dx.doi.org/10.2139/ssrn.3198965.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Lin, Ying-Chi, Phillip Hoffmann, and Erhard Rahm. "Enhancing Cross-lingual Biomedical Concept Normalization Using Deep Neural Network Pretrained Language Models." SN Computer Science 3, no. 5 (July 21, 2022). http://dx.doi.org/10.1007/s42979-022-01295-7.

Full text
Abstract:
AbstractIn this study, we propose a new approach for cross-lingual biomedical concept normalization, the process of mapping text in non-English documents to English concepts of a knowledge base. The resulting mappings, named as semantic annotations, enhance data integration and interoperability of documents in different languages. The US FDA (Food and Drug Administration), therefore, requires all submitted medical forms to be semantically annotated. These standardized medical forms are used in health care practice and biomedical research and are translated/adapted into various languages. Mapping them to the same concepts (normally in English) facilitates the comparison of multiple medical studies even cross-lingually. However, the translation and adaptation of these forms can cause them to deviate from its original text syntactically and in wording. This leads the conventional string matching methods to produce low-quality annotation results. Therefore, our new approach incorporates semantics into the cross-lingual concept normalization process. This is done using sentence embeddings generated by BERT-based pretrained language models. We evaluate the new approach by annotating entire questions of German medical forms with concepts in English, as required by the FDA. The new approach achieves an improvement of 136% in recall, 52% in precision and 66% in F-measure compared to the conventional string matching methods.
APA, Harvard, Vancouver, ISO, and other styles
28

Zhang, Meng, Haoruo Peng, Yang Liu, Huanbo Luan, and Maosong Sun. "Bilingual Lexicon Induction from Non-Parallel Data with Minimal Supervision." Proceedings of the AAAI Conference on Artificial Intelligence 31, no. 1 (February 12, 2017). http://dx.doi.org/10.1609/aaai.v31i1.10988.

Full text
Abstract:
Building bilingual lexica from non-parallel data is a long-standing natural language processing research problem that could benefit thousands of resource-scarce languages which lack parallel data. Recent advances of continuous word representations have opened up new possibilities for this task, e.g. by establishing cross-lingual mapping between word embeddings via a seed lexicon. The method is however unreliable when there are only a limited number of seeds, which is a reasonable setting for resource-scarce languages. We tackle the limitation by introducing a novel matching mechanism into bilingual word representation learning. It captures extra translation pairs exposed by the seeds to incrementally improve the bilingual word embeddings. In our experiments, we find the matching mechanism to substantially improve the quality of the bilingual vector space, which in turn allows us to induce better bilingual lexica with seeds as few as 10.
APA, Harvard, Vancouver, ISO, and other styles
29

Osenova, Petya, and Kiril Simov. "The data-driven Bulgarian WordNet: BTBWN." Cognitive Studies | Études cognitives, no. 18 (December 20, 2018). http://dx.doi.org/10.11649/cs.1713.

Full text
Abstract:
The data-driven Bulgarian WordNet: BTBWNThe paper presents our work towards the simultaneous creation of a data-driven WordNet for Bulgarian and a manually annotated treebank with semantic information. Such an approach requires synchronization of the word senses in both - syntactic and lexical resources, without limiting the WordNet senses to the corpus or vice versa. Our strategy focuses on the identification of senses used in BulTreeBank, but the missing senses of a lemma also have been covered through exploration of bigger corpora. The identified senses have been organized in synsets for the Bulgarian WordNet. Then they have been aligned to the Princeton WordNet synsets. Various types of mappings are considered between both resources in a cross-lingual aspect and with respect to ensuring maximum connectivity and potential for incorporating the language specific concepts. The mapping between the two WordNets (English and Bulgarian) is a basis for applications such as machine translation and multilingual information retrieval. Oparty na danych WordNet bułgarski: BTBWNW artykule przedstawiono naszą pracę na rzecz jednoczesnej budowy opartego na danych wordnetu dla języka bułgarskiego oraz ręcznie oznaczonego informacjami semantycznymi banku drzew. Takie podejście wymaga uzgodnienia znaczeń słów zarówno w zasobach składniowych, jak i leksykalnych, bez ograniczania znaczeń umieszczanych w wordnecie do tych obecnych w korpusie, jak i odwrotnie. Nasza strategia koncentruje się na identyfikacji znaczeń stosowanych w BulTreeBank, przy czym brakujące znaczenia lematu zostały również zbadane przez zgłębienie większych korpusów. Zidentyfikowane znaczenia zostały zorganizowane w synsety bułgarskiego wordnetu, a następnie powiązane z synsetami Princeton WordNet. Rozmaite rodzaje rzutowań są rozpatrywane pomiędzy obydwoma zasobami w kontekście międzyjęzykowym, a także w odniesieniu do zapewnienia maksymalnej łączności i możliwości uwzględnienia pojęć specyficznych dla języka bułgarskiego. Rzutowanie między dwoma wordnetami (angielskim i bułgarskim) jest podstawą dla aplikacji, takich jak tłumaczenie maszynowe i wielojęzyczne wyszukiwanie informacji.
APA, Harvard, Vancouver, ISO, and other styles
30

Maros, Máté E., Chang Gyu Cho, Andreas G. Junge, Benedikt Kämpgen, Victor Saase, Fabian Siegel, Frederik Trinkmann, Thomas Ganslandt, Christoph Groden, and Holger Wenz. "Comparative analysis of machine learning algorithms for computer-assisted reporting based on fully automated cross-lingual RadLex mappings." Scientific Reports 11, no. 1 (March 9, 2021). http://dx.doi.org/10.1038/s41598-021-85016-9.

Full text
Abstract:
AbstractComputer-assisted reporting (CAR) tools were suggested to improve radiology report quality by context-sensitively recommending key imaging biomarkers. However, studies evaluating machine learning (ML) algorithms on cross-lingual ontological (RadLex) mappings for developing embedded CAR algorithms are lacking. Therefore, we compared ML algorithms developed on human expert-annotated features against those developed on fully automated cross-lingual (German to English) RadLex mappings using 206 CT reports of suspected stroke. Target label was whether the Alberta Stroke Programme Early CT Score (ASPECTS) should have been provided (yes/no:154/52). We focused on probabilistic outputs of ML-algorithms including tree-based methods, elastic net, support vector machines (SVMs) and fastText (linear classifier), which were evaluated in the same 5 × fivefold nested cross-validation framework. This allowed for model stacking and classifier rankings. Performance was evaluated using calibration metrics (AUC, brier score, log loss) and -plots. Contextual ML-based assistance recommending ASPECTS was feasible. SVMs showed the highest accuracies both on human-extracted- (87%) and RadLex features (findings:82.5%; impressions:85.4%). FastText achieved the highest accuracy (89.3%) and AUC (92%) on impressions. Boosted trees fitted on findings had the best calibration profile. Our approach provides guidance for choosing ML classifiers for CAR tools in fully automated and language-agnostic fashion using bag-of-RadLex terms on limited expert-labelled training data.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography