Einloggen

Thematische Bibliographien / Multilingual Modeling

Inhaltsverzeichnis

Zeitschriftenartikel
Dissertationen
Bücher
Buchteile
Konferenzberichte

Auswahl der wissenschaftlichen Literatur zum Thema „Multilingual Modeling“

Autor: Grafiati

Veröffentlicht am 25. Mai 2024

Zuletzt aktualisiert am 1. Februar 2025

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Multilingual Modeling" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Multilingual Modeling"

1

Haas, Alison, Scott E. Grapin, Lorena Llosa, and Okhee Lee. "Computational Modeling With Multilingual Learners." Science and Children 60, no. 7 (2023): 64–70. http://dx.doi.org/10.1080/00368148.2023.12315941.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Santhosh Kumar, C., and V. P. Mohandas. "Robust features for multilingual acoustic modeling." International Journal of Speech Technology 14, no. 3 (2011): 147–55. http://dx.doi.org/10.1007/s10772-011-9092-6.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Grutman, Rainier. "The Missing Link: Modeling Readers of Multilingual Writing." Journal of Literary Multilingualism 1, no. 1 (2023): 15–36. http://dx.doi.org/10.1163/2667324x-20230103.

Der volle Inhalt der Quelle

Annotation:

Abstract This contribution tries to fill the gap concerning the place and role of readers in multilingual studies by focusing on the ways in which multilingual texts both do and do not create multilingual readers. Three scenarios are illustrated with two examples each. So-called ‘shared multilingualism’ implies bilingual competence (and excludes monolingual readers) by juxtaposing languages with little overlap. Other texts exhibit more than one language yet construct a monolingual reader, while others still reward bilingual competence and at the same time accommodate monolingual incompetence.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Park, Hyunji Hayley, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu, and Lane Schwartz. "Morphology Matters: A Multilingual Language Modeling Analysis." Transactions of the Association for Computational Linguistics 9 (March 17, 2021): 261–76. http://dx.doi.org/10.1162/tacl_a_00365.

Der volle Inhalt der Quelle

Annotation:

Abstract Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features.1 We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language’s morphology on language modeling.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Lindén, Krister. "Multilingual modeling of cross-lingual spelling variants." Information Retrieval 9, no. 3 (2006): 295–310. http://dx.doi.org/10.1007/s10791-006-1541-5.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Han, Yao Jun, and Xue Mei Luo. "Modeling and Analysis of Multilingual Information Parallel Downloads in Data Grid." Applied Mechanics and Materials 263-266 (December 2012): 1424–28. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.1424.

Der volle Inhalt der Quelle

Annotation:

The need arises in parallel downloads of multilingual information for powerful graphical and analytical tools, as information with a variety of different languages distributed in different Web pages and the databases are heterogeneous and uneven in data grid. Petri net is a powerful graphical and mathematics tool for describing the concurrent, asynchronous and dynamic events. The parallel downloading of multilingual information was modeled and analyzed using extended timed colored Petri net (ETSdCPN). In ETSdCPN model, the color represents different languages information, and the time duration associated with place instead of transition is a function of tokens instead of constant. The reachable parallel download graph (RPDG) of ETSdCPN is defined. Finally, some important results such as rate of satisfaction and makespan of multilingual information parallel downloads are gotten by analyzing reachability of Petri net.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Song, Guizhe, Degen Huang, and Zhifeng Xiao. "A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution." Information 12, no. 5 (2021): 205. http://dx.doi.org/10.3390/info12050205.

Der volle Inhalt der Quelle

Annotation:

Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting. This paper proposes a multilingual toxic text classifier which adopts a novel fusion strategy that combines different loss functions and multiple pre-training models. Specifically, the proposed learning pipeline starts with a series of pre-processing steps, including translation, word segmentation, purification, text digitization, and vectorization, to convert word tokens to a vectorized form suitable for the downstream tasks. Two models, multilingual bidirectional encoder representation from transformers (MBERT) and XLM-RoBERTa (XLM-R), are employed for pre-training through Masking Language Modeling (MLM) and Translation Language Modeling (TLM), which incorporate semantic and contextual information into the models. We train six base models and fuse them to obtain three fusion models using the F1 scores as the weights. The models are evaluated on the Jigsaw Multilingual Toxic Comment dataset. Experimental results show that the best fusion model outperforms the two state-of-the-art models, MBERT and XLM-R, in F1 score by 5.05% and 0.76%, respectively, verifying the effectiveness and robustness of the proposed fusion strategy.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Hao, Shudong, and Michael J. Paul. "An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models." Computational Linguistics 46, no. 1 (2020): 95–134. http://dx.doi.org/10.1162/coli_a_00369.

Der volle Inhalt der Quelle

Annotation:

Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge transfer and extract multilingual features. Although many multilingual topic models have been developed, their assumptions about the training corpus are quite varied, and it is not clear how well the different models can be utilized under various training conditions. In this article, the knowledge transfer mechanisms behind different multilingual topic models are systematically studied, and through a broad set of experiments with four models on ten languages, we provide empirical insights that can inform the selection and future development of multilingual topic models.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Rahimi, Razieh, Azadeh Shakery, and Irwin King. "Multilingual information retrieval in the language modeling framework." Information Retrieval Journal 18, no. 3 (2015): 246–81. http://dx.doi.org/10.1007/s10791-015-9255-1.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Mitchell, Joan S., Marcia Lei Zeng, and Maja Žumer. "Modeling Classification Systems in Multicultural and Multilingual Contexts." Cataloging & Classification Quarterly 52, no. 1 (2013): 90–101. http://dx.doi.org/10.1080/01639374.2013.845620.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Dissertationen zum Thema "Multilingual Modeling"

1

Wicentowski, Richard. "Modeling and learning multilingual inflectional morphology in a minimally supervised framework." Available to US Hopkins community, 2002. http://wwwlib.umi.com/dissertations/dlnow/3068229.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Schleider, Thomas. "Knowledge Modeling and Multilingual Information Extraction for the Understanding of the Cultural Heritage of Silk." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS280.

Der volle Inhalt der Quelle

Annotation:

La modélisation de tout type de connaissance humaine est un effort complexe qui doit prendre en compte toutes les spécificités de son domaine, y compris le vocabulaire de niche. Cette thèse se concentre sur un tel effort pour la connaissance de la production européenne d’objets en soie, qui peut être considérée comme obscure et donc en danger. Cependant, le fait que ces données du patrimoine culturel soient hétérogènes, réparties dans de nombreux musées à travers le monde, éparses et multilingues, pose des défis particuliers pour lesquels les graphes de connaissances sont devenus de plus en plus populaires ces dernières années. Notre objectif principal n’est pas seulement d’étudier les représentations des connaissances, mais aussi de voir comment un tel processus d’intégration peut être accompagné d’enrichissements, tels que la réconciliation des informations par le biais d’ontologies et de vocabulaires, ainsi que la prédiction de métadonnées pour combler les lacunes des données. Nous proposerons d’abord un flux de travail pour la gestion de l’intégration des données sur les artefacts de la soie, puis nous présenterons différentes approches de classification, en mettant l’accent sur les méthodes non supervisées et les méthodes de type "zero-shot". Enfin, nous étudions les moyens de rendre l’exploration de ces métadonnées et des images par la suite aussi facile que possible<br>Modeling any type of human knowledge is a complex effort and needs to consider all specificities of its domain including niche vocabulary. This thesis focuses on such an endeavour for the knowledge about the European silk object production, which can be considered obscure and therefore endangered. However, the fact that such Cultural Heritage data is heterogenous, spread across many museums worldwide, sparse and multilingual poses particular challenges for which knowledge graphs have become more and more popular in recent years. Our main goal is not only into investigating knowledge representations, but also in which ways such an integration process can be accompanied through enrichments, such as information reconciliation through ontologies and vocabularies, as well as metadata predictions to fill gaps in the data. We will first propose a workflow for the management for the integration of data about silk artifacts and afterwards present different classification approaches, with a special focus on unsupervised and zero-shot methods. Finally, we study ways of making exploration of such metadata and images afterwards as easy as possible

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Caon, Daniel Régis Sarmento. "Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing." Universidade Federal do Espírito Santo, 2010. http://repositorio.ufes.br/handle/10/6390.

Der volle Inhalt der Quelle

Annotation:

Made available in DSpace on 2016-12-23T14:33:42Z (GMT). No. of bitstreams: 1 Dissertacao de Daniel Regis Sarmento Caon.pdf: 1566094 bytes, checksum: 67b557539f4bc5b354bc90066e805215 (MD5) Previous issue date: 2010-08-27<br>This work aims to provide automatic cognitive assistance via speech interface, to the elderly who live alone, at risk situation. Distress expressions and voice commands are part of the target vocabulary for speech recognition. Throughout the work, the large vocabulary continuous speech recognition system Julius is used in conjunction with the Hidden Markov Model Toolkit(HTK). The system Julius has its main features described, including its modification. This modification is part of the contribution which is in this work, including the detection of distress expressions ( situations of speech which suggest emergency). Four different languages were provided as target for recognition: French, Dutch, Spanish and English. In this same sequence of languages (determined by data availability and the local of scenarios for the integration of systems) theoretical studies and experiments were conducted to solve the need of working with each new configuration. This work includes studies of the French and Dutch languages. Initial experiments (in French) were made with adaptation of hidden Markov models and were analyzed by cross validation. In order to perform a new demonstration in Dutch, acoustic and language models were built and the system was integrated with other auxiliary modules (such as voice activity detector and the dialogue system). Results of speech recognition after acoustic adaptation to a specific speaker (and the creation of language models for a specific scenario to demonstrate the system) showed 86.39 % accuracy rate of sentence for the Dutch acoustic models. The same data shows 94.44 % semantical accuracy rate of sentence<br>Este trabalho visa prover assistência cognitiva automática via interface de fala, à idosos que moram sozinhos, em situação de risco. Expressões de angústia e comandos vocais fazem parte do vocabulário alvo de reconhecimento de fala. Durante todo o trabalho, o sistema de reconhecimento de fala contínua de grande vocabulário Julius é utilizado em conjunto com o Hidden Markov Model Toolkit(HTK). O sistema Julius tem suas principais características descritas, tendo inclusive sido modificado. Tal modificação é parte da contribuição desse estudo, assim como a detecção de expressões de angústia (situações de fala que caracterizam emergência). Quatro diferentes linguas foram previstas como alvo de reconhecimento: Francês, Holandês, Espanhol e Inglês. Nessa mesma ordem de linguas (determinadas pela disponibilidade de dados e local de cenários de integração de sistemas) os estudos teóricos e experimentos foram conduzidos para suprir a necessidade de trabalhar com cada nova configuração. Este trabalho inclui estudos feitos com as linguas Francês e Holandês. Experimentos iniciais (em Francês) foram feitos com adaptação de modelos ocultos de Markov e analisados por validação cruzada. Para realizar uma nova demonstração em Holandês, modelos acústicos e de linguagem foram construídos e o sistema foi integrado a outros módulos auxiliares (como o detector de atividades vocais e sistema de diálogo). Resultados de reconhecimento de fala após adaptação dos modelos acústicos à um locutor específico (e da criação de modelos de linguagem específicos para um cenário de demonstração do sistema) demonstraram 86,39% de taxa de acerto de sentença para os modelos acústicos holandeses. Os mesmos dados demonstram 94,44% de taxa de acerto semântico de sentença

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Gohr, André [Verfasser], Alexander [Akademischer Betreuer] Hinneburg, and Stefan [Akademischer Betreuer] Wrobel. "Learning and visualizing topics and their change with time for the exploratory analysis of social tags and multilingual topic modeling of chemical compounds / André Gohr. Betreuer: Alexander Hinneburg ; Stefan Wrobel." Halle, Saale : Universitäts- und Landesbibliothek Sachsen-Anhalt, 2012. http://d-nb.info/1033306614/34.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Lam-Yee-Mui, Léa-Marie. "Modélisations pour la reconnaissance de la parole à données contraintes." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPASG075.

Der volle Inhalt der Quelle

Annotation:

Cette thèse s'inscrit dans le cadre du développement de systèmes de reconnaissance de la parole à données contraintes. Depuis une dizaine d'années, les réseaux de neurones profonds ont permis d'améliorer grandement la performance des systèmes de reconnaissance de la parole. Le succès de l'apprentissage profond va de pair avec l'utilisation de milliers d'heures de parole transcrite manuellement et avec l'augmentation du nombre de paramètres des modèles. Cependant, la constitution de corpus de parole annotée est le résultat d'un processus long et coûteux ce qui limite les quantités disponibles pour certaines conditions. Pour cette raison, nous nous intéressons au développement de systèmes avec très peu de données (quelques heures), en particulier pour traiter la parole conversationnelle. Les travaux actuels à l'état de l'art montrent qu'en dessous de quelques dizaines d'heures de données d'apprentissage de parole, les systèmes hybrides avec modélisation acoustique et linguistique séparées sont plus efficaces que les systèmes neuronaux de bout-en-bout. Nous privilégions donc ces approches en nous intéressant en particulier à la modélisation acoustique multilingue avec la mutualisation de données issues de sources différentes. Pour les modèles multilingues, nous analysons la répartition des données à adopter entre différentes sources selon la proximité des langues d'apprentissage et des langues cibles. De plus, nous évaluons l'utilisation directe des modèles acoustiques multilingues sans adaptation et une adaptation par transfert de connaissance vers une nouvelle langue sur quatre langues cibles (amharique, assamais, géorgien et kurmandji) du programme iARPA Babel. Ces langues présentent des caractéristiques linguistiques différentes et choisies pour couvrir plusieurs familles de langue. Un apprentissage adaptatif est aussi proposé par le biais de l'ajout d'une représentation vectorielle de la langue dans le modèle acoustique. Nous utilisons les différents modèles multilingues obtenus pour décoder la parole ou pour extraire dse paramètres acoustiques multilingues. Cette dernière approche est également évaluée sur le corpus sud-africain Soap Operas, comportant de l'alternance codique. Ensuite, nous comparons nos modèles hybrides et des modèles multilingues pré-entraînés par auto-supervision sur des corpus de très grande taille et provenant de domaines variés. Quelle que soit la méthode d'apprentissage et la langue de test, nous montrons que les systèmes hybrides multilingues restent compétitifs et robustes pour les données sous contraintes et qu'ils présentent l'avantage d'être industrialisables, car plus légers et plus facilement embarquables. Enfin, nous montrons l'apport de la modélisation acoustique multilingue sur une tâche de détection de mots-clés lorsque peu de données monolingues sont disponibles<br>This thesis explores the development of speech recognition systems in the context of low-resource conditions. Over the last decade, advances with deep neural networks have led to large improvements in the performance of speech-to-text systems. The success of deep learning methods relies on supervised training with very large annotated corpora, typically comprised of thousands of hours of recordings with manual transcriptions, and on increasing the number of trainable parameters in the models. However, sufficient training corpora are not always available due to the lengthy and costly process of data collection and annotation. Our aim is to build systems under low-resource conditions (a few hours) for the transcription of conversational speech. Recent research shows that state-of-the-art hybrid systems with distinct acoustic and linguistic models are more efficient than neuronal end-to-end systems when less than ten hours of annotated speech are available. Therefore, we adopt hybrid models, and investigate multilingual acoustic modeling to mutualize linguistic resources from multiple sources. For the multilingual models, we first investigate the impact of the amount of training data as well the similarity between the training and target languages. The multilingual models are evaluation both without adaptation and after fine-tuning via transfer learning on conversational telephone speech data in four languages (Amharic, Assamese, Georgian, and Kurmandji) collected as part of the iARPA Babel program. These languages are linguistically varied and were chosen to cover several language families. Next, we study language adaptive training in which the acoustic feature vector is augmented with a language embedding when training the multilingual acoustic model. Our multilingual models can be used to decode speech or to extract multilingual features. These features are evaluated on both the Babal corpus and on the South African corpus Soap Operas, composed of code-switched speech. We compare our hybrid models with multilingual self-supervised publicly available pretrained models, trained with a large amount of data from various domains. For every proposed method and for all target languages, we show that hybrid multilingual systems remain competitive and robust under low resource conditions, while having the advantage of being industrializable with low computational resource requirements. Lastly, we show the usefulness of multilingual acoustic modeling on keyword spotting when only a few hours of monolingual data are available

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Wright, Chrysalis L. "Parental Absence and Academic Achievement in Immigrant Students." FIU Digital Commons, 2010. http://digitalcommons.fiu.edu/etd/322.

Der volle Inhalt der Quelle

Annotation:

Academic achievement and educational expectations as a function of parental absence were examined among 268 newly immigrant elementary, middle, and high-school students from Spanish-speaking countries. Data collected as part of a longitudinal study of adaptation and achievement in newly immigrant students were analyzed. Participants had varying experiences with parental absence, in terms of length of absence, gender of absent parent, and reason for absence. Reasons for parental absence included parental divorce, parental death, and serial migration, a cause unique to immigrant children. Students who experienced parental absence reported lower educational expectations. Students who experienced the death of a parent had lower achievement scores and lower expectations than students who did not experience parental death. Prolonged absence was also important, with students who experienced parental absence for more than one year performing worse than students who had minimal parental separation. In addition, boys who experienced parental absence because of serial migration performed worse academically than boys who did not have this occurrence. Educational expectations were reduced among students who experienced parental absence as a result of the migratory process, especially for younger students. The extent to which parental absence related to achievement and expectations through potential mediating factors, such as economic hardship, perceived school support, and parental school involvement was assessed with structural equation modeling. Overall, the model was able to explain some of the relationship between parental absence and the academic achievement and educational expectations of immigrant students from Spanish-speaking countries.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Jackson, Brianne L. "Assessing K12 Online Teachers Knowledge of Online Student Identities and Characteristics." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5316.

Der volle Inhalt der Quelle

Annotation:

As K12 online learning continues to grow across the nation, the population of online students, much like the population of face-to face students, continues to change. As the online student population becomes increasingly diverse, not only in terms of race, but in terms of religion, sexual orientation and socioeconomic status, research must be undertaken to assess the level of preparation that K12 online teachers have in terms of teaching this population. This dissertation intends to serve as a baseline analysis, providing information on K12 online teachers' knowledge of the types of student characteristics and identities that may be present in their online students, as well as their abilities to meet the needs of these increasingly diverse students. Using the MAKSS-T survey measure and framed within the lens of Bourdieu's field theory, this study found that while K12 online teachers feel as if they have a "good" understanding of a number of possible characteristics and identities in their online students, that terms related to sexual orientation were not as well understood. Additionally, teachers felt "good" in terms of their skills in addressing the unique needs of these students. However, teachers felt weakest in their ability to critique multicultural research. Teachers also noted that they do not feel adequately prepared to handle this changing population and desire additional training in this area.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Muller, Benjamin. "How Can We Make Language Models Better at Handling the Diversity and Variability of Natural Languages ?" Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS399.

Der volle Inhalt der Quelle

Annotation:

Ces dernières années, le passage à l’échelle (scaling) des modèles de langues basés sur l’apprentissage profond — principalement en termes de taille de modèle, de taille de l’ensemble de données d’entraînement et de puissance de calcul d’entraînement — est devenu l’une des principales forces motrices des progrès empiriques en Traitement Automatique du Langage (TAL). Comme l’illustrent les exemples de (Peters et al., 2018b; Devlin et al., 2018a; Brown et al., 2020;Zhang et al., 2022; Chowdhery et al., 2022), cela conduit à de meilleures performances en apprentissage supervisé ainsi qu’à de meilleures capacités de zero-shot (i.e. sans données annotées pour une tâche dans une langue donnée) et de few-shot (i.e. pour une quantité très limitée de données annotées) et cela pour une grande variété de tâches. Dans cette thèse, nous travaillons avec des modèles monolingues et multilingues de type BERT (Devlin et al., 2018a). Pour répondre à notre principale question de recherche: “Comment rendre les modèles de langue meilleurs face à la diversité et la variabilité des langues?” Nous explorons trois directions principales.1. Analyses comportementales (behavioral) et structurelles des modèles de langues 2. Approche de réduction des différences de domaine 3. Approche par technique d’adaptation. Tout d’abord, les modèles de langues de type BERT sont des objets complexes. La première étape de cette thèse a été de mener des analyses approfondies pour comprendre le comportement de ces modèles dans différents scénarios d’entraînement et de test (behavioral analysis). Ces analyses ont été enrichies par des études structurelles des modèles en décrivant leur fonctionnement interne. Ensuite, nous nous sommes concentrés sur une approche de réduction de l’écart entre les domaines. Dans cette approche, l’objectif est de rendre les données hautement variables hors domaine plus similaires aux données d’apprentissage. Enfin, nous présentons des techniques d’adaptation qui modélisent directement les données hors-domaine ou dans une langue différente des données d’apprentissage<br>Deep Learning for NLP has led to impressive empirical progress in recent years. In essence, this progress is based on better contextualized representations that can be easily used for a wide variety of tasks. However, these models usually require substantial computing power and large amounts of raw textual data. This makes language’s inherent diversity and variability a vivid challenge in NLP. We focus on the following: How can we make language models better at handling the variability and diversity of natural languages?. First, we explore the generalizability of language models by building and analyzing one of the first large-scale replication of a BERT model for a non-English language. Our results raise the question of using these language models on highly-variable domains such as these found online. Focusing on lexical normalization, we show that this task can be approached with BERT-like models. However, we show that it only partially helps downstream performance. In consequence, we focus on adaptation techniques using what we refer to as representation transfer and explore challenging settings such as the zero-shot setting, low-resource languages. We show that multilingual language models can be adapted and used efficiently with low-resource languages, even with the ones unseen during pretraining, and that the script is a critical component in this adaptation

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Martin, Terrence Lance. "Towards improved speech recognition for resource poor languages." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/35771/1/Terrence_Martin_Thesis.pdf.

Der volle Inhalt der Quelle

Annotation:

In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Balikas, Georgios. "Explorer et apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM054/document.

Der volle Inhalt der Quelle

Annotation:

Le texte est l'une des sources d'informations les plus répandues et les plus persistantes. L'analyse de contenu du texte se réfère à des méthodes d'étude et de récupération d'informations à partir de documents. Aujourd'hui, avec une quantité de texte disponible en ligne toujours croissante l'analyse de contenu du texte revêt une grande importance parce qu' elle permet une variété d'applications. À cette fin, les méthodes d'apprentissage de la représentation sans supervision telles que les modèles thématiques et les word embeddings constituent des outils importants.L'objectif de cette dissertation est d'étudier et de relever des défis dans ce domaine.Dans la première partie de la thèse, nous nous concentrons sur les modèles thématiques et plus précisément sur la manière d'incorporer des informations antérieures sur la structure du texte à ces modèles.Les modèles de sujets sont basés sur le principe du sac-de-mots et, par conséquent, les mots sont échangeables. Bien que cette hypothèse profite les calculs des probabilités conditionnelles, cela entraîne une perte d'information.Pour éviter cette limitation, nous proposons deux mécanismes qui étendent les modèles de sujets en intégrant leur connaissance de la structure du texte. Nous supposons que les documents sont répartis dans des segments de texte cohérents. Le premier mécanisme attribue le même sujet aux mots d'un segment. La seconde, capitalise sur les propriétés de copulas, un outil principalement utilisé dans les domaines de l'économie et de la gestion des risques, qui sert à modéliser les distributions communes de densité de probabilité des variables aléatoires tout en n'accédant qu'à leurs marginaux.La deuxième partie de la thèse explore les modèles de sujets bilingues pour les collections comparables avec des alignements de documents explicites. En règle générale, une collection de documents pour ces modèles se présente sous la forme de paires de documents comparables. Les documents d'une paire sont écrits dans différentes langues et sont thématiquement similaires. À moins de traductions, les documents d'une paire sont semblables dans une certaine mesure seulement. Pendant ce temps, les modèles de sujets représentatifs supposent que les documents ont des distributions thématiques identiques, ce qui constitue une hypothèse forte et limitante. Pour le surmonter, nous proposons de nouveaux modèles thématiques bilingues qui intègrent la notion de similitude interlingue des documents qui constituent les paires dans leurs processus générateurs et d'inférence.La dernière partie de la thèse porte sur l'utilisation d'embeddings de mots et de réseaux de neurones pour trois applications d'exploration de texte. Tout d'abord, nous abordons la classification du document polylinguistique où nous soutenons que les traductions d'un document peuvent être utilisées pour enrichir sa représentation. À l'aide d'un codeur automatique pour obtenir ces représentations de documents robustes, nous démontrons des améliorations dans la tâche de classification de documents multi-classes. Deuxièmement, nous explorons la classification des tweets à plusieurs tâches en soutenant que, en formant conjointement des systèmes de classification utilisant des tâches corrélées, on peut améliorer la performance obtenue. À cette fin, nous montrons comment réaliser des performances de pointe sur une tâche de classification du sentiment en utilisant des réseaux neuronaux récurrents. La troisième application que nous explorons est la récupération d'informations entre langues. Compte tenu d'un document écrit dans une langue, la tâche consiste à récupérer les documents les plus similaires à partir d'un ensemble de documents écrits dans une autre langue. Dans cette ligne de recherche, nous montrons qu'en adaptant le problème du transport pour la tâche d'estimation des distances documentaires, on peut obtenir des améliorations importantes<br>Text is one of the most pervasive and persistent sources of information. Content analysis of text in its broad sense refers to methods for studying and retrieving information from documents. Nowadays, with the ever increasing amounts of text becoming available online is several languages and different styles, content analysis of text is of tremendous importance as it enables a variety of applications. To this end, unsupervised representation learning methods such as topic models and word embeddings constitute prominent tools.The goal of this dissertation is to study and address challengingproblems in this area, focusing on both the design of novel text miningalgorithms and tools, as well as on studying how these tools can be applied to text collections written in a single or several languages.In the first part of the thesis we focus on topic models and more precisely on how to incorporate prior information of text structure to such models.Topic models are built on the premise of bag-of-words, and therefore words are exchangeable. While this assumption benefits the calculations of the conditional probabilities it results in loss of information.To overcome this limitation we propose two mechanisms that extend topic models by integrating knowledge of text structure to them. We assume that the documents are partitioned in thematically coherent text segments. The first mechanism assigns the same topic to the words of a segment. The second, capitalizes on the properties of copulas, a tool mainly used in the fields of economics and risk management that is used to model the joint probability density distributions of random variables while having access only to their marginals.The second part of the thesis explores bilingual topic models for comparable corpora with explicit document alignments. Typically, a document collection for such models is in the form of comparable document pairs. The documents of a pair are written in different languages and are thematically similar. Unless translations, the documents of a pair are similar to some extent only. Meanwhile, representative topic models assume that the documents have identical topic distributions, which is a strong and limiting assumption. To overcome it we propose novel bilingual topic models that incorporate the notion of cross-lingual similarity of the documents that constitute the pairs in their generative and inference processes. Calculating this cross-lingual document similarity is a task on itself, which we propose to address using cross-lingual word embeddings.The last part of the thesis concerns the use of word embeddings and neural networks for three text mining applications. First, we discuss polylingual document classification where we argue that translations of a document can be used to enrich its representation. Using an auto-encoder to obtain these robust document representations we demonstrate improvements in the task of multi-class document classification. Second, we explore multi-task sentiment classification of tweets arguing that by jointly training classification systems using correlated tasks can improve the obtained performance. To this end we show how can achieve state-of-the-art performance on a sentiment classification task using recurrent neural networks. The third application we explore is cross-lingual information retrieval. Given a document written in one language, the task consists in retrieving the most similar documents from a pool of documents written in another language. In this line of research, we show that by adapting the transportation problem for the task of estimating document distances one can achieve important improvements

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bücher zum Thema "Multilingual Modeling"

1

(Editor), Kenneth Hyltenstam, and Manfred Pienemann (Editor), eds. Modelling Assessing SEC Lang (Multilingual Matters). Multilingual Matters Limited, 1985.

Den vollen Inhalt der Quelle finden

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

(Editor), Kenneth Hyltenstam, and Manfred Pienemann (Editor), eds. Modelling and Assessing: Second Language Acquisition (Multilingual Matters). Multilingual Matters, 1998.

Den vollen Inhalt der Quelle finden

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Multilingual Modeling"

1

Ghorab, M. Rami, Séamus Lawless, Alexander O’Connor, Dong Zhou, and Vincent Wade. "Multilingual vs. Monolingual User Models for Personalized Multilingual Information Retrieval." In User Modeling, Adaptation, and Personalization. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-38844-6_38.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Steichen, Ben, M. Rami Ghorab, Alexander O’Connor, Séamus Lawless, and Vincent Wade. "Towards Personalized Multilingual Information Access - Exploring the Browsing and Search Behavior of Multilingual Users." In User Modeling, Adaptation, and Personalization. Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-08786-3_39.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Gao, Ming, Shilian Wu, and Zengfu Wang. "A Length-Sensitive Language-Bound Recognition Network for Multilingual Text Recognition." In MultiMedia Modeling. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-27818-1_12.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Embley, David W., Stephen W. Liddle, Deryle W. Lonsdale, and Yuri Tijerino. "Multilingual Ontologies for Cross-Language Information Extraction and Semantic Search." In Conceptual Modeling – ER 2011. Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24606-7_12.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Díaz Esteban, Alberto. "Integrating Multilingual Text Classification Tasks and User Modeling in Personalized Newspaper Services." In User Modeling 2001. Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-44566-8_41.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Chew, Peter A., and Jessica G. Turnley. "Understanding Russian Information Operations Using Unsupervised Multilingual Topic Modeling." In Social, Cultural, and Behavioral Modeling. Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-60240-0_12.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Donahue, Christiane. "Trends in modeling academic writing in multilingual contexts." In Academic writing across languages: multilingual and contrastive approaches in higher education. Böhlau Verlag, 2019. http://dx.doi.org/10.7767/9783205208815.41.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Chew, Peter A. "‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data." In Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-16268-3_30.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Mogadala, Aditya, Rambhoopal Kothwal, and Vasudeva Varma. "Language Modeling Approach to Retrieval for SMS and FAQ Matching." In Multilingual Information Access in South Asian Languages. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40087-2_12.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Wu, Jiajia, Kun Zhao, Zhengyan Yang, Bing Yin, Cong Liu, and Lirong Dai. "End-to-End Multilingual Text Recognition Based on Byte Modeling." In Lecture Notes in Computer Science. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-46311-2_11.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Multilingual Modeling"

1

Fedorova, Mariia, Timothee Mickus, Niko Partanen, Janine Siewert, Elena Spaziani, and Andrey Kutuzov. "AXOLOTL’24 Shared Task on Multilingual Explainable Semantic Change Modeling." In Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.lchange-1.8.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

2

Zhang, Demi, Bushi Xiao, Chao Gao, Sangpil Youm, and Bonnie J. Dorr. "Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming." In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.mrl-1.8.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

3

Li, Zihao, Shaoxiong Ji, Timothee Mickus, Vincent Segonne, and Jörg Tiedemann. "A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives." In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.emnlp-main.888.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

4

Limisiewicz, Tomasz, Terra Blevins, Hila Gonen, Orevaoghene Ahia, and Luke Zettlemoyer. "MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling." In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2024. http://dx.doi.org/10.18653/v1/2024.acl-long.804.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

5

Tian, Jilei, Juha Häkkinen, and Olli Viikki. "Multilingual pronunciation modeling for improving multilingual speech recognition." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-176.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

6

Datta, Arindrima, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, and Brian Roark. "Language-Agnostic Multilingual Modeling." In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. http://dx.doi.org/10.1109/icassp40776.2020.9053443.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

7

Kanthak, S., and Hermann Ney. "Multilingual acoustic modeling using graphemes." In 8th European Conference on Speech Communication and Technology (Eurospeech 2003). ISCA, 2003. http://dx.doi.org/10.21437/eurospeech.2003-373.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

8

Musa, Ibrahim Hussein, Kang Xu, and Ibrahim Zamit. "Multilingual Document Concept Topic Modeling." In 2022 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR). IEEE, 2022. http://dx.doi.org/10.1109/ecnlpir57021.2022.00027.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

9

Lowe, Ryan, and Ben Steichen. "Multilingual Search User Behaviors -- Exploring Multilingual Querying and Result Selection Through Crowdsourcing." In UMAP '17: 25th Conference on User Modeling, Adaptation and Personalization. ACM, 2017. http://dx.doi.org/10.1145/3079628.3079702.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

10

Moosa, Ibraheem Muhammad, Mahmud Elahi Akhter, and Ashfia Binte Habib. "Does Transliteration Help Multilingual Language Modeling?" In Findings of the Association for Computational Linguistics: EACL 2023. Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.findings-eacl.50.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!