Einloggen

Thematische Bibliographien / Multilingual Modeling / Zeitschriftenartikel

Zeitschriftenartikel zum Thema „Multilingual Modeling“

Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Multilingual Modeling.

Autor: Grafiati

Veröffentlicht am 25. Mai 2024

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Multilingual Modeling" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Haas, Alison, Scott E. Grapin, Lorena Llosa und Okhee Lee. „Computational Modeling With Multilingual Learners“. Science and Children 60, Nr. 7 (September 2023): 64–70. http://dx.doi.org/10.1080/00368148.2023.12315941.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Santhosh Kumar, C., und V. P. Mohandas. „Robust features for multilingual acoustic modeling“. International Journal of Speech Technology 14, Nr. 3 (11.05.2011): 147–55. http://dx.doi.org/10.1007/s10772-011-9092-6.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Grutman, Rainier. „The Missing Link: Modeling Readers of Multilingual Writing“. Journal of Literary Multilingualism 1, Nr. 1 (Mai 2023): 15–36. http://dx.doi.org/10.1163/2667324x-20230103.

Der volle Inhalt der Quelle

Annotation:

Abstract This contribution tries to fill the gap concerning the place and role of readers in multilingual studies by focusing on the ways in which multilingual texts both do and do not create multilingual readers. Three scenarios are illustrated with two examples each. So-called ‘shared multilingualism’ implies bilingual competence (and excludes monolingual readers) by juxtaposing languages with little overlap. Other texts exhibit more than one language yet construct a monolingual reader, while others still reward bilingual competence and at the same time accommodate monolingual incompetence.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Park, Hyunji Hayley, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu und Lane Schwartz. „Morphology Matters: A Multilingual Language Modeling Analysis“. Transactions of the Association for Computational Linguistics 9 (17.03.2021): 261–76. http://dx.doi.org/10.1162/tacl_a_00365.

Der volle Inhalt der Quelle

Annotation:

Abstract Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features.1 We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language’s morphology on language modeling.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Lindén, Krister. „Multilingual modeling of cross-lingual spelling variants“. Information Retrieval 9, Nr. 3 (Juni 2006): 295–310. http://dx.doi.org/10.1007/s10791-006-1541-5.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Han, Yao Jun, und Xue Mei Luo. „Modeling and Analysis of Multilingual Information Parallel Downloads in Data Grid“. Applied Mechanics and Materials 263-266 (Dezember 2012): 1424–28. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.1424.

Der volle Inhalt der Quelle

Annotation:

The need arises in parallel downloads of multilingual information for powerful graphical and analytical tools, as information with a variety of different languages distributed in different Web pages and the databases are heterogeneous and uneven in data grid. Petri net is a powerful graphical and mathematics tool for describing the concurrent, asynchronous and dynamic events. The parallel downloading of multilingual information was modeled and analyzed using extended timed colored Petri net (ETSdCPN). In ETSdCPN model, the color represents different languages information, and the time duration associated with place instead of transition is a function of tokens instead of constant. The reachable parallel download graph (RPDG) of ETSdCPN is defined. Finally, some important results such as rate of satisfaction and makespan of multilingual information parallel downloads are gotten by analyzing reachability of Petri net.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Song, Guizhe, Degen Huang und Zhifeng Xiao. „A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution“. Information 12, Nr. 5 (12.05.2021): 205. http://dx.doi.org/10.3390/info12050205.

Der volle Inhalt der Quelle

Annotation:

Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting. This paper proposes a multilingual toxic text classifier which adopts a novel fusion strategy that combines different loss functions and multiple pre-training models. Specifically, the proposed learning pipeline starts with a series of pre-processing steps, including translation, word segmentation, purification, text digitization, and vectorization, to convert word tokens to a vectorized form suitable for the downstream tasks. Two models, multilingual bidirectional encoder representation from transformers (MBERT) and XLM-RoBERTa (XLM-R), are employed for pre-training through Masking Language Modeling (MLM) and Translation Language Modeling (TLM), which incorporate semantic and contextual information into the models. We train six base models and fuse them to obtain three fusion models using the F1 scores as the weights. The models are evaluated on the Jigsaw Multilingual Toxic Comment dataset. Experimental results show that the best fusion model outperforms the two state-of-the-art models, MBERT and XLM-R, in F1 score by 5.05% and 0.76%, respectively, verifying the effectiveness and robustness of the proposed fusion strategy.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Hao, Shudong, und Michael J. Paul. „An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models“. Computational Linguistics 46, Nr. 1 (März 2020): 95–134. http://dx.doi.org/10.1162/coli_a_00369.

Der volle Inhalt der Quelle

Annotation:

Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge transfer and extract multilingual features. Although many multilingual topic models have been developed, their assumptions about the training corpus are quite varied, and it is not clear how well the different models can be utilized under various training conditions. In this article, the knowledge transfer mechanisms behind different multilingual topic models are systematically studied, and through a broad set of experiments with four models on ten languages, we provide empirical insights that can inform the selection and future development of multilingual topic models.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Rahimi, Razieh, Azadeh Shakery und Irwin King. „Multilingual information retrieval in the language modeling framework“. Information Retrieval Journal 18, Nr. 3 (06.05.2015): 246–81. http://dx.doi.org/10.1007/s10791-015-9255-1.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Mitchell, Joan S., Marcia Lei Zeng und Maja Žumer. „Modeling Classification Systems in Multicultural and Multilingual Contexts“. Cataloging & Classification Quarterly 52, Nr. 1 (18.12.2013): 90–101. http://dx.doi.org/10.1080/01639374.2013.845620.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

11

Teferra, Solomon, Martha Yifiru und Tanja Schultz. „DNN-based Multilingual Acoustic Modeling for Four Ethiopian Languages“. SINET: Ethiopian Journal of Science 46, Nr. 3 (27.03.2024): 237–49. http://dx.doi.org/10.4314/sinet.v46i3.2.

Der volle Inhalt der Quelle

Annotation:

In this paper, we present the results of experiments conducted on multilingual acoustic modeling in the development of an Automatic Speech Recognition (ASR) system using speech data of phonetically much related Ethiopian languages (Amharic, Tigrigna, Oromo and Wolaytta) with multilingual (ML) mix and multitask approaches. The use of speech data from only phonetically much related languages brought improvement over results reported in a previous work that used 26 languages (including the four languages). A maximum Word Error Rate (WER) reduction from 25.03% (in the previous work) to 21.52% has been achieved for Wolaytta, which is a relative WER reduction of 14.02%. As a result of using multilingual acoustic modeling for the development of an automatic speech recognition (ASR) system, a relative WER reduction of up to 7.36% (a WER reduction from 23.23% to 21.52%) has been achieved over a monolingual ASR. Compared to the ML mix, the multitask approach brought a better performance improvement (a relative WERs reduction of up to 5.9%). Experiments have also been conducted using Amharic and Tigrigna in a pair and Oromo and Wolaytta in another pair. The results of the experiments showed that languages with a relatively better language resources for lexical and language modeling (Amharic and Tigrigna) benefited from the use of speech data from only two languages. Generally, the findings show that the use of speech corpora of phonetically related languages with the multitask multilingual modeling approach for the development of ASR systems for less-resourced languages is a promising solution.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

12

Pian, Weiguo, Hanyu Peng, Xunzhu Tang, Tiezhu Sun, Haoye Tian, Andrew Habib, Jacques Klein und Tegawendé F. Bissyandé. „MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 4 (26.06.2023): 5239–47. http://dx.doi.org/10.1609/aaai.v37i4.25654.

Der volle Inhalt der Quelle

Annotation:

Representation learning of source code is essential for applying machine learning to software engineering tasks. Learning code representation from a multilingual source code dataset has been shown to be more effective than learning from single-language datasets separately, since more training data from multilingual dataset improves the model's ability to extract language-agnostic information from source code. However, existing multilingual training overlooks the language-specific information which is crucial for modeling source code across different programming languages, while only focusing on learning a unified model with shared parameters among different languages for language-agnostic information modeling. To address this problem, we propose MetaTPTrans, a meta learning approach for multilingual code representation learning. MetaTPTrans generates different parameters for the feature extractor according to the specific programming language type of the input code snippet, enabling the model to learn both language-agnostic and language-specific information with dynamic parameters in the feature extractor. We conduct experiments on the code summarization and code completion tasks to verify the effectiveness of our approach. The results demonstrate the superiority of our approach with significant improvements on state-of-the-art baselines.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

13

Lewoniewski, Włodzimierz, Krzysztof Węcel und Witold Abramowicz. „Modeling Popularity and Reliability of Sources in Multilingual Wikipedia“. Information 11, Nr. 5 (13.05.2020): 263. http://dx.doi.org/10.3390/info11050263.

Der volle Inhalt der Quelle

Annotation:

One of the most important factors impacting quality of content in Wikipedia is presence of reliable sources. By following references, readers can verify facts or find more details about described topic. A Wikipedia article can be edited independently in any of over 300 languages, even by anonymous users, therefore information about the same topic may be inconsistent. This also applies to use of references in different language versions of a particular article, so the same statement can have different sources. In this paper we analyzed over 40 million articles from the 55 most developed language versions of Wikipedia to extract information about over 200 million references and find the most popular and reliable sources. We presented 10 models for the assessment of the popularity and reliability of the sources based on analysis of meta information about the references in Wikipedia articles, page views and authors of the articles. Using DBpedia and Wikidata we automatically identified the alignment of the sources to a specific domain. Additionally, we analyzed the changes of popularity and reliability in time and identified growth leaders in each of the considered months. The results can be used for quality improvements of the content in different languages versions of Wikipedia.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

14

Hermann, Enno, Herman Kamper und Sharon Goldwater. „Multilingual and unsupervised subword modeling for zero-resource languages“. Computer Speech & Language 65 (Januar 2021): 101098. http://dx.doi.org/10.1016/j.csl.2020.101098.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

15

Natvig, David. „Modeling Heritage Language Phonetics and Phonology: Toward an Integrated Multilingual Sound System“. Languages 6, Nr. 4 (14.12.2021): 209. http://dx.doi.org/10.3390/languages6040209.

Der volle Inhalt der Quelle

Annotation:

Although heritage language phonology is often argued to be fairly stable, heritage language speakers often sound noticeably different from both monolinguals and second-language learners. In order to model these types of asymmetries, I propose a theoretical framework—an integrated multilingual sound system—based on modular representations of an integrated set of phonological contrasts. An examination of general findings in laryngeal (voicing, aspiration, etc.) phonetics and phonology for heritage languages shows that procedures for pronouncing phonemes are variable and plastic, even if abstract may representations remain stable. Furthermore, an integrated multilingual sound system predicts that use of one language may require a subset of the available representations, which illuminates the mechanisms that underlie phonological transfer, attrition, and acquisition.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

16

Shliazhko, Oleh, Alena Fenogenova, Maria Tikhonova, Anastasia Kozlova, Vladislav Mikhailov und Tatiana Shavrina. „mGPT: Few-Shot Learners Go Multilingual“. Transactions of the Association for Computational Linguistics 12 (2024): 58–79. http://dx.doi.org/10.1162/tacl_a_00633.

Der volle Inhalt der Quelle

Annotation:

Abstract This paper introduces mGPT, a multilingual variant of GPT-3, pretrained on 61 languages from 25 linguistically diverse language families using Wikipedia and the C4 Corpus. We detail the design and pretraining procedure. The models undergo an intrinsic and extrinsic evaluation: language modeling in all languages, downstream evaluation on cross-lingual NLU datasets and benchmarks in 33 languages, and world knowledge probing in 23 languages. The in-context learning abilities are on par with the contemporaneous language models while covering a larger number of languages, including underrepresented and low-resource languages of the Commonwealth of Independent States and the indigenous peoples in Russia. The source code and the language models are publicly available under the MIT license.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

17

Li, Rui, Liyang He, Qi Liu, Yuze Zhao, Zheng Zhang, Zhenya Huang, Yu Su und Shijin Wang. „CONSIDER: Commonalities and Specialties Driven Multilingual Code Retrieval Framework“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 8 (24.03.2024): 8679–87. http://dx.doi.org/10.1609/aaai.v38i8.28713.

Der volle Inhalt der Quelle

Annotation:

Multilingual code retrieval aims to find code snippets relevant to a user's query from a multilingual codebase, which plays a crucial role in software development and expands their application scenarios compared to classical monolingual code retrieval. Despite the performance improvements achieved by previous studies, two crucial problems are overlooked in the multilingual scenario. First, certain programming languages face data scarcity in specific domains, resulting in limited representation capabilities within those domains. Second, different programming languages can be used interchangeably within the same domain, making it challenging for multilingual models to accurately identify the intended programming language of a user's query. To address these issues, we propose the CommONalities and SpecIalties Driven Multilingual CodE Retrieval Framework (CONSIDER), which includes two modules. The first module enhances the representation of various programming languages by modeling pairwise and global commonalities among them. The second module introduces a novel contrastive learning negative sampling algorithm that leverages language confusion to automatically extract specific language features. Through our experiments, we confirm the significant benefits of our model in real-world multilingual code retrieval scenarios in various aspects. Furthermore, an evaluation demonstrates the effectiveness of our proposed CONSIDER framework in monolingual scenarios as well. Our source code is available at https://github.com/smsquirrel/consider.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

18

Choi, Sung-Kwon, und Younggil Kim. „Linguistic Modeling for Multilingual Machine Translation based on Common Transfer“. Language and Information 18, Nr. 1 (30.06.2014): 77–97. http://dx.doi.org/10.29403/li.18.1.4.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

19

Nejad, Gholamali, und Mohammadreza Rostamzadeh. „Towards an Evaluation Framework for Multilingual Supported Data Modeling Patterns“. International Journal of Computer Applications 143, Nr. 10 (17.06.2016): 9–13. http://dx.doi.org/10.5120/ijca2016910364.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

20

Stepykin, N. I. „Experience in Modeling Associative Fields (Project “Multilingual Associative Thesaurus of Politeness”)“. Nauchnyi dialog, Nr. 3 (27.03.2021): 106–20. http://dx.doi.org/10.24224/2227-1295-2021-3-106-120.

Der volle Inhalt der Quelle

Annotation:

The article is devoted to modeling associative fields vezhlivaya (f) and vezhlivyy (m) (polite) based on the materials of the project “Multilingual associative thesaurus of politeness”. The relevance of the study is due to the need to identify the structure and content of the associative-verbal network of a native speaker, which is possible when referring to the data of a free associative experiment. The author considers the combination of stimulus-response as a speech action. The novelty of the research lies in the fact that the analysis of associative data is carried out based on the operational model of speech production of distributive activation, which makes it possible to explain the presence of various reactions in the structure of the associative field. When analyzing each speech action and operation, the principle of approaching the word as a unity of the acoustic image and concept is considered. This indissoluble unity is manifested in the simultaneous mechanism of speech actions of conceptualization and internal articulation. A comparative analysis of the associations of the respondents in the masculine and female groups based on the operational model of speech production of distributive activation made it possible to identify universal and gender-specific features in the structure and content of the analyzed associative fields. It is concluded that it is possible to use the speech production model developed by the author in modeling associative fields.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

21

Jayanna, H. S., und B. G. Nagaraja. „An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification“. Journal of Intelligent Systems 25, Nr. 4 (01.10.2016): 529–38. http://dx.doi.org/10.1515/jisys-2014-0128.

Der volle Inhalt der Quelle

Annotation:

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

22

Kim, Hyunah, Christine Barron, Jeanne Sinclair und Eunice Eunhee Jang. „Change in home language environment and English literacy achievement over time: A multi-group latent growth curve modeling investigation“. Language Testing 37, Nr. 4 (30.06.2020): 573–99. http://dx.doi.org/10.1177/0265532220930348.

Der volle Inhalt der Quelle

Annotation:

In most studies investigating the educational outcomes of linguistically diverse students, variables that identify this population have been considered as static. In reality, owing to the dynamic nature of students and their families, students’ home language environments change over time. This study aims to understand how elementary school students’ home language environments change over time, and how longitudinal patterns of English literacy achievement across grades 3, 6, and 10 differ among students with various home language shift patterns in Ontario, Canada. The longitudinal cohort data of 89,609 students between grades 3 and 10 from the provincial assessments were analyzed for changes in their home language environment. A subsample of 18,000 students was used to examine different patterns of relative literacy performance over time and their associations with immigration background and early intervention programming using multi-group latent growth curve modeling. Our findings suggest a strong movement toward an English-dominant home language environment among multilingual students; yet, students whose homes remained as multilingual demonstrated the highest literacy achievement in the early grade as well as the highest improvement in relative performance over time. The paper draws implications for promoting students’ home language, instilling a positive view of multilingual competence.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

23

Singh, Pranaydeep, Orphée De Clercq und Els Lefever. „Distilling Monolingual Models from Large Multilingual Transformers“. Electronics 12, Nr. 4 (18.02.2023): 1022. http://dx.doi.org/10.3390/electronics12041022.

Der volle Inhalt der Quelle

Annotation:

Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual models such as mBERT and XLM-RoBERTa, which come with significant overheads for deployment vis-à-vis their model size, inference speeds, etc. We attempt to tackle this problem by proposing a novel methodology to apply knowledge distillation techniques to filter language-specific information from a large multilingual model into a small, fast monolingual model that can often outperform the teacher model. We demonstrate the viability of this methodology on two downstream tasks each for six languages. We further dive into the possible modifications to the basic setup for low-resourced languages by exploring ideas to tune the final vocabulary of the distilled models. Lastly, we perform a detailed ablation study to understand the different components of the setup better and find out what works best for the two under-resourced languages, Swahili and Slovene.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

24

Mercha, El Mahdi, Houda Benbrahim und Mohammed Erradi. „Heterogeneous text graph for comprehensive multilingual sentiment analysis: capturing short- and long-distance semantics“. PeerJ Computer Science 10 (23.02.2024): e1876. http://dx.doi.org/10.7717/peerj-cs.1876.

Der volle Inhalt der Quelle

Annotation:

Multilingual sentiment analysis (MSA) involves the task of comprehending people’s opinions, sentiments, and emotions in multilingual written texts. This task has garnered considerable attention due to its importance in extracting insights for decision-making across diverse fields such as marketing, finance, and politics. Several studies have explored MSA using deep learning methods. Nonetheless, a majority of these studies depend on sequential-based approaches, which focus on capturing short-distance semantics within adjacent word sequences, but they overlook long-distance semantics, which can provide more profound insights for analysis. In this work, we propose an approach for multilingual sentiment analysis, namely MSA-GCN, leveraging a graph convolutional network to effectively capture both short- and long-distance semantics. MSA-GCN involves the comprehensive modeling of the multilingual sentiment analysis corpus through a unified heterogeneous text graph. Subsequently, a slightly deep graph convolutional network is employed to acquire predictive representations for all nodes by encouraging the transfer learning across languages. Extensive experiments are carried out on various language combinations using different benchmark datasets to assess the efficiency of the proposed approach. These datasets include Multilingual Amazon Reviews Corpus (MARC), Internet Movie Database (IMDB), Allociné, and Muchocine. The achieved results reveal that MSA-GCN significantly outperformed all baseline models in almost all datasets with a p-value < 0.05 based on student t-test. In addition, such approach shows prominent results in a variety of language combinations, revealing the robustness of the approach against language variation.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

25

Fu, Hui. „Gaussian Mixture Modeling of Neighbor Characters for Multilingual Text Extraction in Images“. Journal of Computer Research and Development 44, Nr. 11 (2007): 1920. http://dx.doi.org/10.1360/crad20071115.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

26

Perrier, Pascal, und Susanne Fuchs. „Speed‐curvature relations in speech production: a multilingual experimental and modeling study“. Journal of the Acoustical Society of America 123, Nr. 5 (Mai 2008): 3330. http://dx.doi.org/10.1121/1.2933840.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

27

Chauhan, Uttam, und Apurva Shah. „Topic Modeling Using Latent Dirichlet allocation“. ACM Computing Surveys 54, Nr. 7 (30.09.2022): 1–35. http://dx.doi.org/10.1145/3462478.

Der volle Inhalt der Quelle

Annotation:

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

28

Kreutzer, Julia, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo et al. „Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets“. Transactions of the Association for Computational Linguistics 10 (2022): 50–72. http://dx.doi.org/10.1162/tacl_a_00447.

Der volle Inhalt der Quelle

Annotation:

Abstract With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, Web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

29

Majewska, Olga, Evgeniia Razumovskaia, Edoardo M. Ponti, Ivan Vulić und Anna Korhonen. „Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation“. Transactions of the Association for Computational Linguistics 11 (2023): 139–56. http://dx.doi.org/10.1162/tacl_a_00539.

Der volle Inhalt der Quelle

Annotation:

Abstract Multilingual task-oriented dialogue (ToD) facilitates access to services and information for many (communities of) speakers. Nevertheless, its potential is not fully realized, as current multilingual ToD datasets—both for modular and end-to-end modeling—suffer from severe limitations. 1) When created from scratch, they are usually small in scale and fail to cover many possible dialogue flows. 2) Translation-based ToD datasets might lack naturalness and cultural specificity in the target language. In this work, to tackle these limitations we propose a novel outline-based annotation process for multilingual ToD datasets, where domain-specific abstract schemata of dialogue are mapped into natural language outlines. These in turn guide the target language annotators in writing dialogues by providing instructions about each turn’s intents and slots. Through this process we annotate a new large-scale dataset for evaluation of multilingual and cross-lingual ToD systems. Our Cross-lingual Outline-based Dialogue dataset (cod) enables natural language understanding, dialogue state tracking, and end-to-end dialogue evaluation in 4 diverse languages: Arabic, Indonesian, Russian, and Kiswahili. Qualitative and quantitative analyses of cod versus an equivalent translation-based dataset demonstrate improvements in data quality, unlocked by the outline-based approach. Finally, we benchmark a series of state-of-the-art systems for cross-lingual ToD, setting reference scores for future work and demonstrating that cod prevents over-inflated performance, typically met with prior translation-based ToD datasets.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

30

Blake, John, Natalia Bogach, Akemi Kusakari, Iurii Lezhenin, Veronica Khaustova, Son Luu Xuan, Van Nhi Nguyen et al. „An Open CAPT System for Prosody Practice: Practical Steps towards Multilingual Setup“. Languages 9, Nr. 1 (12.01.2024): 27. http://dx.doi.org/10.3390/languages9010027.

Der volle Inhalt der Quelle

Annotation:

This paper discusses the challenges posed in creating a Computer-Assisted Pronunciation Training (CAPT) environment for multiple languages. By selecting one language from each of three different language families, we show that a single environment may be tailored to cater for different target languages. We detail the challenges faced during the development of a multimodal CAPT environment comprising a toolkit that manages mobile applications using speech signal processing, visualization, and estimation algorithms. Since the applied underlying mathematical and phonological models, as well as the feedback production algorithms, are based on sound signal processing and modeling rather than on particular languages, the system is language-agnostic and serves as an open toolkit for developing phrasal intonation training exercises for an open selection of languages. However, it was necessary to tailor the CAPT environment to the language-specific particularities in the multilingual setups, especially the additional requirements for adequate and consistent speech evaluation and feedback production. In our work, we describe our response to the challenges in visualizing and segmenting recorded pitch signals and modeling the language melody and rhythm necessary for such a multilingual adaptation, particularly for tonal syllable-timed and mora-timed languages.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

31

Amara, Amina, Mohamed Ali Hadj Taieb und Mohamed Ben Aouicha. „Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis“. Applied Intelligence 51, Nr. 5 (13.02.2021): 3052–73. http://dx.doi.org/10.1007/s10489-020-02033-3.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

32

KITA, KENJI. „Reconstructing the Language Family Tree from Multilingual Corpus Based on Probabilistic Language Modeling“. Journal of Natural Language Processing 4, Nr. 3 (1997): 71–82. http://dx.doi.org/10.5715/jnlp.4.3_71.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

33

Bouselmi, G., D. Fohr und I. Illina. „Multilingual recognition of non-native speech using acoustic model transformation and pronunciation modeling“. International Journal of Speech Technology 15, Nr. 2 (08.03.2012): 203–13. http://dx.doi.org/10.1007/s10772-012-9134-8.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

34

Vulić, Ivan, Wim De Smet, Jie Tang und Marie-Francine Moens. „Probabilistic topic modeling in multilingual settings: An overview of its methodology and applications“. Information Processing & Management 51, Nr. 1 (Januar 2015): 111–47. http://dx.doi.org/10.1016/j.ipm.2014.08.003.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

35

Miller, R. A., R. H. Baud, J. R. Scherrer und A. M. Rassinoux. „Modeling Concepts in Medicine for Medical Language Understanding“. Methods of Information in Medicine 37, Nr. 04/05 (Oktober 1998): 361–72. http://dx.doi.org/10.1055/s-0038-1634561.

Der volle Inhalt der Quelle

Annotation:

AbstractOver the past two decades, the construction of models for medical concept representation and for understanding of the deep meaning of medical narrative texts have been challenging areas of medical informatics research. This review highlights how these two inter-related domains have evolved, emphasizing aspects of medical modeling as a tool for medical language understanding. A representation schema, which balances partially but accurately with complete but complex representations of domainspecific knowledge, must be developed to facilitate language understanding. Representative examples are drawn from two major independent efforts undertaken by the authors: the elaboration and the subsequent adjustment of the RECIT multilingual analyzer to include a robust medical concept model, and the recasting of a frame-based interlingua system, originally developed to map equivalent concepts between controlled clinical vocabularies, to invoke a similar concept model.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

36

Nagaraja, B. G., und H. S. Jayanna. „Multilingual Speaker Identification by Combining Evidence from LPR and Multitaper MFCC“. Journal of Intelligent Systems 22, Nr. 3 (01.09.2013): 241–51. http://dx.doi.org/10.1515/jisys-2013-0038.

Der volle Inhalt der Quelle

Annotation:

AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

37

Alexandrowicz, Viviana, und Bobbi Hansen. „Addressing Multilingual Learners’ Language Needs Through Engaging Inquiry-Based Science“. English Language Teaching 16, Nr. 10 (30.09.2023): 73. http://dx.doi.org/10.5539/elt.v16n10p73.

Der volle Inhalt der Quelle

Annotation:

This article presents an overview of a 3-year series of workshops on teaching and providing multilingual learners (MLs) with access to science by utilizing effective language development strategies. The workshops were delivered to primary teachers in California and included three different modules: (a) Think and Question Like a Scientist, (b) Observe and Record Like a Scientist through Science Notebooking, and (c) Argue Like a Scientist. The activities showcase effective, research-based second language acquisition (SLA) strategies, including providing comprehensible input via paraphrasing, using visual and media resources, gestures, and the student’s native language, and modeling tasks. Additionally, scaffolding academic language through personal dictionaries, sentence frames, and native language support constitutes some of the ideas shared. Detailed descriptions highlight the “how” of addressing the needs of MLs at a variety of proficiency levels.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

38

Mykhalchuk, Nataliia, Pavlo Levchuk, Ernest Ivashkevych und Alexander Nabochuk. „Dynamic Models of Multilingualism on the Territory of Western Ukraine“. PSYCHOLINGUISTICS 33, Nr. 2 (21.02.2023): 114–44. http://dx.doi.org/10.31470/2309-1797-2023-33-2-114-144.

Der volle Inhalt der Quelle

Annotation:

The purpose of the article is to study lexical units, with the help of which it becomes possible to build up the models of multilingualism, which are dominant among the population on the territory of Western Ukraine. Methods. Theoretical methods – categorical and structurally-functional analysis of the texts, the methods of systematization, modeling, generalization; empirical ones – the analysis of lexical units, the experiment. For the purpose of studying the models of multilingualism we used “The Methodology of studying the models of multilingualism on the territory of Western Ukraine (by the influence of Russian, English and German” (Mykhalchuk & Ivashkevych, 2022). Results. Dynamic models of multilingualism on the territory of Western Ukraine are: the Model of Balanced Ambilingualism and the Model of Unbalanced or Asymmetric Bilingualism. There are two types of Balanced Ambilingualism: (1) the Model of Ambilingual Balanced Bilingualism. It emphasizes that both language systems are developed to the highest level of perfect mastery of the language as mastering a native one; (2) the Model of Non-Ambilingual Balanced Bilingualism implies that both language systems aren’t at the same level of their development. Unbalanced or Asymmetric Bilingualism is presented by two sub-models: (1) Transitional Bilingualism; (2) Stable Dominant Multilingualism. Conclusions. Any multilingual system is not reduced to the summation of different monolingual systems. Multilingual psycholinguistic systems of the person are open ones. The bilingual’s metalinguistic abilities show a strengthening effect when the person is studying not only the second, but also the third or more languages. Accumulating such advantages as cognitive variability (mobility), metalinguistic abilities, metapragmatic and sociocultural “awareness”, multilinguals also accumulate some disadvantages: a deficit in the level of language proficiency due to interlanguage interactions; limitations in language acquisition and language efforts.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

39

Tachbelie, Martha Yifiru, Solomon Teferra Abate und Tanja Schultz. „Multilingual speech recognition for GlobalPhone languages“. Speech Communication 140 (Mai 2022): 71–86. http://dx.doi.org/10.1016/j.specom.2022.03.006.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

40

Khademi Zahedi, Reza, Naif Alajlan, Hooman Khademi Zahedi und Timon Rabczuk. „Multilingual Sentiment Mining System to Prognosticate Governance“. Computers, Materials & Continua 71, Nr. 1 (2022): 389–406. http://dx.doi.org/10.32604/cmc.2022.021384.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

41

K. Alnahdi, Amany. „A Framework for Building a Multilingual Industrial Ontology: Methodology and a Case Study for Building Smartphone English-Arabic Ontology“. International journal of Web & Semantic Technology 12, Nr. 03 (31.07.2021): 15–21. http://dx.doi.org/10.5121/ijwest.2021.12302.

Der volle Inhalt der Quelle

Annotation:

As Web 3.0 is blooming, ontologies augment semantic Web with semi–structured knowledge. Industrial ontologies can help in improving online commercial communication and marketing. In addition, conceptualizing the enterprise knowledge can improve information retrieval for industrial applications. Having ontologies combine multiple languages can help in delivering the knowledge to a broad sector of Internet users. In addition, multi-lingual ontologies can also help in commercial transactions. This research paper provides a framework model for building industrial multilingual ontologies which include Corpus Determination, Filtering, Analysis, Ontology Building, and Ontology Evaluation. It also addresses factors to be considered when modeling multilingual ontologies. A case study for building a bilingual English-Arabic ontology for smart phones is presented. The ontology was illustrated using an ontology editor and visualization tool. The built ontology consists of 67 classes and 18 instances presented in both Arabic and English. In addition, applications for using the ontology are presented. Future research directions for the built industrial ontology are presented.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

42

Lee, Jaeseong, Dohyeon Lee und Seung-won Hwang. „Script, Language, and Labels: Overcoming Three Discrepancies for Low-Resource Language Specialization“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 11 (26.06.2023): 13004–13. http://dx.doi.org/10.1609/aaai.v37i11.26528.

Der volle Inhalt der Quelle

Annotation:

Although multilingual pretrained models (mPLMs) enabled support of various natural language processing in diverse languages, its limited coverage of 100+ languages lets 6500+ languages remain ‘unseen’. One common approach for an unseen language is specializing the model for it as target, by performing additional masked language modeling (MLM) with the target language corpus. However, we argue that, due to the discrepancy from multilingual MLM pretraining, a naive specialization as such can be suboptimal. Specifically, we pose three discrepancies to overcome. Script and linguistic discrepancy of the target language from the related seen languages, hinder a positive transfer, for which we propose to maximize representation similarity, unlike existing approaches maximizing overlaps. In addition, label space for MLM prediction can vary across languages, for which we propose to reinitialize top layers for a more effective adaptation. Experiments over four different language families and three tasks shows that our method improves the task performance of unseen languages with statistical significance, while previous approach fails to.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

43

Gutiérrez-Fandiño, Asier, David Pérez-Fernández, Jordi Armengol-Estapé, David Griol, Ksenia Kharitonova und Zoraida Callejas. „esCorpius-m: A Massive Multilingual Crawling Corpus with a Focus on Spanish“. Applied Sciences 13, Nr. 22 (08.11.2023): 12155. http://dx.doi.org/10.3390/app132212155.

Der volle Inhalt der Quelle

Annotation:

In recent years, transformer-based models have played a significant role in advancing language modeling for natural language processing. However, they require substantial amounts of data and there is a shortage of high-quality non-English corpora. Some recent initiatives have introduced multilingual datasets obtained through web crawling. However, there are notable limitations in the results for some languages, including Spanish. These datasets are either smaller compared to other languages or suffer from lower quality due to insufficient cleaning and deduplication. In this paper, we present esCorpius-m, a multilingual corpus extracted from around 1 petabyte of Common Crawl data. It is the most extensive corpus for some languages with such a level of high-quality content extraction, cleanliness, and deduplication. Our data curation process involves an efficient cleaning pipeline and various deduplication methods that maintain the integrity of document and paragraph boundaries. We also ensure compliance with EU regulations by retaining both the source web page URL and the WARC shared origin URL.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

44

Liu, Xiabi, Hui Fu und Yunde Jia. „Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images“. Pattern Recognition 41, Nr. 2 (Februar 2008): 484–93. http://dx.doi.org/10.1016/j.patcog.2007.06.004.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

45

Grapin, Scott E., Sharon Dudek und Okhee Lee. „Justice-Centered STEM Education With Multilingual Learners: Computational Modeling to Address COVID-19 Disparities“. Science Scope 46, Nr. 5 (Mai 2023): 36–44. http://dx.doi.org/10.1080/19434901.2023.12290258.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

46

Longpre, Shayne, Yi Lu und Joachim Daiber. „MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering“. Transactions of the Association for Computational Linguistics 9 (2021): 1389–406. http://dx.doi.org/10.1162/tacl_a_00433.

Der volle Inhalt der Quelle

Annotation:

Abstract Progress in cross-lingual modeling depends on challenging, realistic, and diverse evaluation sets. We introduce Multilingual Knowledge Questions and Answers (MKQA), an open- domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). Answers are based on heavily curated, language- independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering. We benchmark a variety of state- of-the-art methods and baselines for generative and extractive question answering, trained on Natural Questions, in zero shot and translation settings. Results indicate this dataset is challenging even in English, but especially in low-resource languages.1

APA, Harvard, Vancouver, ISO und andere Zitierweisen

47

Lupancu, Viorica-Camelia, und Adrian Iftene. „Multilingual Fine-Grained Named Entity Recognition“. Computer Science Journal of Moldova 31, Nr. 3(93) (Dezember 2023): 321–39. http://dx.doi.org/10.56415/csjm.v31.16.

Der volle Inhalt der Quelle

Annotation:

The “MultiCoNER II Multilingual Complex Named Entity Recognition” task1 within SemEval 2023 competition focuses on identifying complex named entities (NEs), such as the titles of creative works (e.g., songs, books, movies), people with different titles (e.g., politicians, scientists, artists, athletes), different categories of products (e.g., food, drinks, clothing), and so on, in several languages. In the context of SemEval, our team, FII_Better, presented an exploration of a base transformer model’s capabilities regarding the task, focused more specifically on five languages (English, Spanish, Swedish, German, and Italian). We took DistilBERT (a distilled version of BERT) and BERT (Bidirectional Encoder Representations from Transformers) as two examples of basic transformer models, using DistilBERT as a baseline and BERT as the platform to create an improved model. In this process, we managed to get fair results in the chosen languages. We have managed to get moderate results in the English track (we ranked 17th out of 34), while our results in the other tracks could be further improved in the future (overall third to last).

APA, Harvard, Vancouver, ISO und andere Zitierweisen

48

Del Río, Miguel, Corey Miller, Ján Profant, Jennifer Drexler-Fox, Quinn Mcnamara, Nishchal Bhandari, Natalie Delworth et al. „Accents in Speech Recognition through the Lens of a World Englishes Evaluation Set“. Research in Language 21, Nr. 3 (28.12.2023): 225–44. http://dx.doi.org/10.18778/1731-7533.21.3.02.

Der volle Inhalt der Quelle

Annotation:

Automatic Speech Recognition (ASR) systems generalize poorly on accented speech, creating bias issues for users and providers. The phonetic and linguistic variability of accents present challenges for ASR systems in both data collection and modeling strategies. We present two promising approaches to accented speech recognition— custom vocabulary and multilingual modeling— and highlight key challenges in the space. Among these, lack of a standard benchmark makes research and comparison difficult. We address this with a novel corpus of accented speech: Earnings-22, A 125 file, 119 hour corpus of English-language earnings calls gathered from global companies. We compare commercial models showing variation in performance when taking country of origin into consideration and demonstrate targeted improvements using the methods we introduce.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

49

Radke, Sarah C., Sara E. Vogel, Jasmine Y. Ma, Christopher Hoadley und Laura Ascenzi-Moreno. „Emergent Bilingual Middle Schoolers’ Syncretic Reasoning in Statistical Modeling“. Teachers College Record: The Voice of Scholarship in Education 124, Nr. 5 (Mai 2022): 206–28. http://dx.doi.org/10.1177/01614681221104141.

Der volle Inhalt der Quelle

Annotation:

Background/Context: Bi/multilingual students’ STEM learning is better supported when educators leverage their language and cultural practices as resources, but STEM subject divisions have been historically constructed based on oppressive, dominant values and exclude the ways of knowing of nondominant groups. Truly promoting equity requires expanding and transforming STEM disciplines. Purpose/Objective/Research Question/Focus of Study: This article contributes to efforts to illuminate emergent bi/multilingual students’ ways of knowing, languaging, and doing in STEM. We follow the development of syncretic literacies in relation to translanguaging practices, asking, How do knowledges and practices from different communities get combined and reorganized by students and teachers in service of new modeling practices? Setting and Participants: We focus on a seventh-grade science classroom, deliberately designed to support syncretic literacies and translanguaging practices, where computer science concepts were infused into the curriculum through modeling activities. The majority of the students in the bilingual program had arrived in the United States at most three years before enrolling, from the Caribbean and Central and South America. Research Design: We analyze one lesson that was part of a larger research–practice partnership focused on teaching computer science through leveraging translanguaging practices and syncretic literacies. The lesson was a modeling and computing activity codesigned by the teacher and two researchers about post–Hurricane María outmigration from Puerto Rico. Analysis used microethnographic methods to trace how students assembled translanguaging, social, and schooled practices to make sense of and construct models. Findings/Results: Findings show how students assembled representational forms from a variety of practices as part of accomplishing and negotiating both designed and emergent goals. These included sensemaking, constructing, explaining, justifying, and interpreting both the physical and computational models of migration. Conclusions/Recommendations: Implications support the development of theory and pedagogy that intentionally make space for students to engage in meaning-making through translanguaging and syncretic practices in order to provide new possibilities for lifting up STEM learning that may include, but is not constrained by, disciplinary learning. Additional implications for teacher education and student assessment practices call for reconceptualizing schooling beyond day-to-day curriculum as part of making an ontological shift away from prioritizing math, science, and CS disciplinary and language objectives as defined by and for schooling, and toward celebrating, supporting, and centering students’ diverse, syncretic knowledges and knowledge use.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

50

Bovi, Claudio Delli, und Roberto Navigli. „Multilingual semantic dictionaries for natural language processing: The case of BabelNet“. Encyclopedia with Semantic Computing and Robotic Intelligence 01, Nr. 01 (März 2017): 1630015. http://dx.doi.org/10.1142/s2425038416300159.

Der volle Inhalt der Quelle

Annotation:

Accurate semantic modeling lies at the very core of today’s Natural Language Processing (NLP). Getting a handle on the various phenomena that regulate the meaning of linguistic utterances can pave the way for solving many compelling and ambitious tasks in the field, from Machine Translation to Question Answering and Information Retrieval. A complete semantic model of language, however, needs first of all reliable building blocks. In the last two decades, research in lexical semantics (which focuses on the meaning of individual linguistic elements, i.e., words and expressions), has produced increasingly comprehensive and effective machine-readable dictionaries in multiple languages: like humans, NLP systems can now leverage these sources of lexical knowledge to discriminate among various senses of a given lexeme, thereby improving their performances on downstream tasks and applications. In this paper, we focus on the case study of BabelNet, a large multilingual encyclopedic dictionary and semantic network, to describe in detail how such knowledge resources are built, improved and exploited for crucial NLP tasks such as Word Sense Disambiguation, Entity Linking and Semantic Similarity.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!