Academic literature on the topic 'Multilingual Modeling'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multilingual Modeling.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multilingual Modeling"

1

Haas, Alison, Scott E. Grapin, Lorena Llosa, and Okhee Lee. "Computational Modeling With Multilingual Learners." Science and Children 60, no. 7 (September 2023): 64–70. http://dx.doi.org/10.1080/00368148.2023.12315941.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Santhosh Kumar, C., and V. P. Mohandas. "Robust features for multilingual acoustic modeling." International Journal of Speech Technology 14, no. 3 (May 11, 2011): 147–55. http://dx.doi.org/10.1007/s10772-011-9092-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Grutman, Rainier. "The Missing Link: Modeling Readers of Multilingual Writing." Journal of Literary Multilingualism 1, no. 1 (May 2023): 15–36. http://dx.doi.org/10.1163/2667324x-20230103.

Full text
Abstract:
Abstract This contribution tries to fill the gap concerning the place and role of readers in multilingual studies by focusing on the ways in which multilingual texts both do and do not create multilingual readers. Three scenarios are illustrated with two examples each. So-called ‘shared multilingualism’ implies bilingual competence (and excludes monolingual readers) by juxtaposing languages with little overlap. Other texts exhibit more than one language yet construct a monolingual reader, while others still reward bilingual competence and at the same time accommodate monolingual incompetence.
APA, Harvard, Vancouver, ISO, and other styles
4

Park, Hyunji Hayley, Katherine J. Zhang, Coleman Haley, Kenneth Steimel, Han Liu, and Lane Schwartz. "Morphology Matters: A Multilingual Language Modeling Analysis." Transactions of the Association for Computational Linguistics 9 (March 17, 2021): 261–76. http://dx.doi.org/10.1162/tacl_a_00365.

Full text
Abstract:
Abstract Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features.1 We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language’s morphology on language modeling.
APA, Harvard, Vancouver, ISO, and other styles
5

Lindén, Krister. "Multilingual modeling of cross-lingual spelling variants." Information Retrieval 9, no. 3 (June 2006): 295–310. http://dx.doi.org/10.1007/s10791-006-1541-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Han, Yao Jun, and Xue Mei Luo. "Modeling and Analysis of Multilingual Information Parallel Downloads in Data Grid." Applied Mechanics and Materials 263-266 (December 2012): 1424–28. http://dx.doi.org/10.4028/www.scientific.net/amm.263-266.1424.

Full text
Abstract:
The need arises in parallel downloads of multilingual information for powerful graphical and analytical tools, as information with a variety of different languages distributed in different Web pages and the databases are heterogeneous and uneven in data grid. Petri net is a powerful graphical and mathematics tool for describing the concurrent, asynchronous and dynamic events. The parallel downloading of multilingual information was modeled and analyzed using extended timed colored Petri net (ETSdCPN). In ETSdCPN model, the color represents different languages information, and the time duration associated with place instead of transition is a function of tokens instead of constant. The reachable parallel download graph (RPDG) of ETSdCPN is defined. Finally, some important results such as rate of satisfaction and makespan of multilingual information parallel downloads are gotten by analyzing reachability of Petri net.
APA, Harvard, Vancouver, ISO, and other styles
7

Song, Guizhe, Degen Huang, and Zhifeng Xiao. "A Study of Multilingual Toxic Text Detection Approaches under Imbalanced Sample Distribution." Information 12, no. 5 (May 12, 2021): 205. http://dx.doi.org/10.3390/info12050205.

Full text
Abstract:
Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting. This paper proposes a multilingual toxic text classifier which adopts a novel fusion strategy that combines different loss functions and multiple pre-training models. Specifically, the proposed learning pipeline starts with a series of pre-processing steps, including translation, word segmentation, purification, text digitization, and vectorization, to convert word tokens to a vectorized form suitable for the downstream tasks. Two models, multilingual bidirectional encoder representation from transformers (MBERT) and XLM-RoBERTa (XLM-R), are employed for pre-training through Masking Language Modeling (MLM) and Translation Language Modeling (TLM), which incorporate semantic and contextual information into the models. We train six base models and fuse them to obtain three fusion models using the F1 scores as the weights. The models are evaluated on the Jigsaw Multilingual Toxic Comment dataset. Experimental results show that the best fusion model outperforms the two state-of-the-art models, MBERT and XLM-R, in F1 score by 5.05% and 0.76%, respectively, verifying the effectiveness and robustness of the proposed fusion strategy.
APA, Harvard, Vancouver, ISO, and other styles
8

Hao, Shudong, and Michael J. Paul. "An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models." Computational Linguistics 46, no. 1 (March 2020): 95–134. http://dx.doi.org/10.1162/coli_a_00369.

Full text
Abstract:
Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge transfer and extract multilingual features. Although many multilingual topic models have been developed, their assumptions about the training corpus are quite varied, and it is not clear how well the different models can be utilized under various training conditions. In this article, the knowledge transfer mechanisms behind different multilingual topic models are systematically studied, and through a broad set of experiments with four models on ten languages, we provide empirical insights that can inform the selection and future development of multilingual topic models.
APA, Harvard, Vancouver, ISO, and other styles
9

Rahimi, Razieh, Azadeh Shakery, and Irwin King. "Multilingual information retrieval in the language modeling framework." Information Retrieval Journal 18, no. 3 (May 6, 2015): 246–81. http://dx.doi.org/10.1007/s10791-015-9255-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mitchell, Joan S., Marcia Lei Zeng, and Maja Žumer. "Modeling Classification Systems in Multicultural and Multilingual Contexts." Cataloging & Classification Quarterly 52, no. 1 (December 18, 2013): 90–101. http://dx.doi.org/10.1080/01639374.2013.845620.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Multilingual Modeling"

1

Wicentowski, Richard. "Modeling and learning multilingual inflectional morphology in a minimally supervised framework." Available to US Hopkins community, 2002. http://wwwlib.umi.com/dissertations/dlnow/3068229.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Schleider, Thomas. "Knowledge Modeling and Multilingual Information Extraction for the Understanding of the Cultural Heritage of Silk." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS280.

Full text
Abstract:
La modélisation de tout type de connaissance humaine est un effort complexe qui doit prendre en compte toutes les spécificités de son domaine, y compris le vocabulaire de niche. Cette thèse se concentre sur un tel effort pour la connaissance de la production européenne d’objets en soie, qui peut être considérée comme obscure et donc en danger. Cependant, le fait que ces données du patrimoine culturel soient hétérogènes, réparties dans de nombreux musées à travers le monde, éparses et multilingues, pose des défis particuliers pour lesquels les graphes de connaissances sont devenus de plus en plus populaires ces dernières années. Notre objectif principal n’est pas seulement d’étudier les représentations des connaissances, mais aussi de voir comment un tel processus d’intégration peut être accompagné d’enrichissements, tels que la réconciliation des informations par le biais d’ontologies et de vocabulaires, ainsi que la prédiction de métadonnées pour combler les lacunes des données. Nous proposerons d’abord un flux de travail pour la gestion de l’intégration des données sur les artefacts de la soie, puis nous présenterons différentes approches de classification, en mettant l’accent sur les méthodes non supervisées et les méthodes de type "zero-shot". Enfin, nous étudions les moyens de rendre l’exploration de ces métadonnées et des images par la suite aussi facile que possible
Modeling any type of human knowledge is a complex effort and needs to consider all specificities of its domain including niche vocabulary. This thesis focuses on such an endeavour for the knowledge about the European silk object production, which can be considered obscure and therefore endangered. However, the fact that such Cultural Heritage data is heterogenous, spread across many museums worldwide, sparse and multilingual poses particular challenges for which knowledge graphs have become more and more popular in recent years. Our main goal is not only into investigating knowledge representations, but also in which ways such an integration process can be accompanied through enrichments, such as information reconciliation through ontologies and vocabularies, as well as metadata predictions to fill gaps in the data. We will first propose a workflow for the management for the integration of data about silk artifacts and afterwards present different classification approaches, with a special focus on unsupervised and zero-shot methods. Finally, we study ways of making exploration of such metadata and images afterwards as easy as possible
APA, Harvard, Vancouver, ISO, and other styles
3

Caon, Daniel Régis Sarmento. "Automatic speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing." Universidade Federal do Espírito Santo, 2010. http://repositorio.ufes.br/handle/10/6390.

Full text
Abstract:
Made available in DSpace on 2016-12-23T14:33:42Z (GMT). No. of bitstreams: 1 Dissertacao de Daniel Regis Sarmento Caon.pdf: 1566094 bytes, checksum: 67b557539f4bc5b354bc90066e805215 (MD5) Previous issue date: 2010-08-27
This work aims to provide automatic cognitive assistance via speech interface, to the elderly who live alone, at risk situation. Distress expressions and voice commands are part of the target vocabulary for speech recognition. Throughout the work, the large vocabulary continuous speech recognition system Julius is used in conjunction with the Hidden Markov Model Toolkit(HTK). The system Julius has its main features described, including its modification. This modification is part of the contribution which is in this work, including the detection of distress expressions ( situations of speech which suggest emergency). Four different languages were provided as target for recognition: French, Dutch, Spanish and English. In this same sequence of languages (determined by data availability and the local of scenarios for the integration of systems) theoretical studies and experiments were conducted to solve the need of working with each new configuration. This work includes studies of the French and Dutch languages. Initial experiments (in French) were made with adaptation of hidden Markov models and were analyzed by cross validation. In order to perform a new demonstration in Dutch, acoustic and language models were built and the system was integrated with other auxiliary modules (such as voice activity detector and the dialogue system). Results of speech recognition after acoustic adaptation to a specific speaker (and the creation of language models for a specific scenario to demonstrate the system) showed 86.39 % accuracy rate of sentence for the Dutch acoustic models. The same data shows 94.44 % semantical accuracy rate of sentence
Este trabalho visa prover assistência cognitiva automática via interface de fala, à idosos que moram sozinhos, em situação de risco. Expressões de angústia e comandos vocais fazem parte do vocabulário alvo de reconhecimento de fala. Durante todo o trabalho, o sistema de reconhecimento de fala contínua de grande vocabulário Julius é utilizado em conjunto com o Hidden Markov Model Toolkit(HTK). O sistema Julius tem suas principais características descritas, tendo inclusive sido modificado. Tal modificação é parte da contribuição desse estudo, assim como a detecção de expressões de angústia (situações de fala que caracterizam emergência). Quatro diferentes linguas foram previstas como alvo de reconhecimento: Francês, Holandês, Espanhol e Inglês. Nessa mesma ordem de linguas (determinadas pela disponibilidade de dados e local de cenários de integração de sistemas) os estudos teóricos e experimentos foram conduzidos para suprir a necessidade de trabalhar com cada nova configuração. Este trabalho inclui estudos feitos com as linguas Francês e Holandês. Experimentos iniciais (em Francês) foram feitos com adaptação de modelos ocultos de Markov e analisados por validação cruzada. Para realizar uma nova demonstração em Holandês, modelos acústicos e de linguagem foram construídos e o sistema foi integrado a outros módulos auxiliares (como o detector de atividades vocais e sistema de diálogo). Resultados de reconhecimento de fala após adaptação dos modelos acústicos à um locutor específico (e da criação de modelos de linguagem específicos para um cenário de demonstração do sistema) demonstraram 86,39% de taxa de acerto de sentença para os modelos acústicos holandeses. Os mesmos dados demonstram 94,44% de taxa de acerto semântico de sentença
APA, Harvard, Vancouver, ISO, and other styles
4

Gohr, André [Verfasser], Alexander [Akademischer Betreuer] Hinneburg, and Stefan [Akademischer Betreuer] Wrobel. "Learning and visualizing topics and their change with time for the exploratory analysis of social tags and multilingual topic modeling of chemical compounds / André Gohr. Betreuer: Alexander Hinneburg ; Stefan Wrobel." Halle, Saale : Universitäts- und Landesbibliothek Sachsen-Anhalt, 2012. http://d-nb.info/1033306614/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wright, Chrysalis L. "Parental Absence and Academic Achievement in Immigrant Students." FIU Digital Commons, 2010. http://digitalcommons.fiu.edu/etd/322.

Full text
Abstract:
Academic achievement and educational expectations as a function of parental absence were examined among 268 newly immigrant elementary, middle, and high-school students from Spanish-speaking countries. Data collected as part of a longitudinal study of adaptation and achievement in newly immigrant students were analyzed. Participants had varying experiences with parental absence, in terms of length of absence, gender of absent parent, and reason for absence. Reasons for parental absence included parental divorce, parental death, and serial migration, a cause unique to immigrant children. Students who experienced parental absence reported lower educational expectations. Students who experienced the death of a parent had lower achievement scores and lower expectations than students who did not experience parental death. Prolonged absence was also important, with students who experienced parental absence for more than one year performing worse than students who had minimal parental separation. In addition, boys who experienced parental absence because of serial migration performed worse academically than boys who did not have this occurrence. Educational expectations were reduced among students who experienced parental absence as a result of the migratory process, especially for younger students. The extent to which parental absence related to achievement and expectations through potential mediating factors, such as economic hardship, perceived school support, and parental school involvement was assessed with structural equation modeling. Overall, the model was able to explain some of the relationship between parental absence and the academic achievement and educational expectations of immigrant students from Spanish-speaking countries.
APA, Harvard, Vancouver, ISO, and other styles
6

Jackson, Brianne L. "Assessing K12 Online Teachers Knowledge of Online Student Identities and Characteristics." VCU Scholars Compass, 2018. https://scholarscompass.vcu.edu/etd/5316.

Full text
Abstract:
As K12 online learning continues to grow across the nation, the population of online students, much like the population of face-to face students, continues to change. As the online student population becomes increasingly diverse, not only in terms of race, but in terms of religion, sexual orientation and socioeconomic status, research must be undertaken to assess the level of preparation that K12 online teachers have in terms of teaching this population. This dissertation intends to serve as a baseline analysis, providing information on K12 online teachers' knowledge of the types of student characteristics and identities that may be present in their online students, as well as their abilities to meet the needs of these increasingly diverse students. Using the MAKSS-T survey measure and framed within the lens of Bourdieu's field theory, this study found that while K12 online teachers feel as if they have a "good" understanding of a number of possible characteristics and identities in their online students, that terms related to sexual orientation were not as well understood. Additionally, teachers felt "good" in terms of their skills in addressing the unique needs of these students. However, teachers felt weakest in their ability to critique multicultural research. Teachers also noted that they do not feel adequately prepared to handle this changing population and desire additional training in this area.
APA, Harvard, Vancouver, ISO, and other styles
7

Muller, Benjamin. "How Can We Make Language Models Better at Handling the Diversity and Variability of Natural Languages ?" Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS399.

Full text
Abstract:
Ces dernières années, le passage à l’échelle (scaling) des modèles de langues basés sur l’apprentissage profond — principalement en termes de taille de modèle, de taille de l’ensemble de données d’entraînement et de puissance de calcul d’entraînement — est devenu l’une des principales forces motrices des progrès empiriques en Traitement Automatique du Langage (TAL). Comme l’illustrent les exemples de (Peters et al., 2018b; Devlin et al., 2018a; Brown et al., 2020;Zhang et al., 2022; Chowdhery et al., 2022), cela conduit à de meilleures performances en apprentissage supervisé ainsi qu’à de meilleures capacités de zero-shot (i.e. sans données annotées pour une tâche dans une langue donnée) et de few-shot (i.e. pour une quantité très limitée de données annotées) et cela pour une grande variété de tâches. Dans cette thèse, nous travaillons avec des modèles monolingues et multilingues de type BERT (Devlin et al., 2018a). Pour répondre à notre principale question de recherche: “Comment rendre les modèles de langue meilleurs face à la diversité et la variabilité des langues?” Nous explorons trois directions principales.1. Analyses comportementales (behavioral) et structurelles des modèles de langues 2. Approche de réduction des différences de domaine 3. Approche par technique d’adaptation. Tout d’abord, les modèles de langues de type BERT sont des objets complexes. La première étape de cette thèse a été de mener des analyses approfondies pour comprendre le comportement de ces modèles dans différents scénarios d’entraînement et de test (behavioral analysis). Ces analyses ont été enrichies par des études structurelles des modèles en décrivant leur fonctionnement interne. Ensuite, nous nous sommes concentrés sur une approche de réduction de l’écart entre les domaines. Dans cette approche, l’objectif est de rendre les données hautement variables hors domaine plus similaires aux données d’apprentissage. Enfin, nous présentons des techniques d’adaptation qui modélisent directement les données hors-domaine ou dans une langue différente des données d’apprentissage
Deep Learning for NLP has led to impressive empirical progress in recent years. In essence, this progress is based on better contextualized representations that can be easily used for a wide variety of tasks. However, these models usually require substantial computing power and large amounts of raw textual data. This makes language’s inherent diversity and variability a vivid challenge in NLP. We focus on the following: How can we make language models better at handling the variability and diversity of natural languages?. First, we explore the generalizability of language models by building and analyzing one of the first large-scale replication of a BERT model for a non-English language. Our results raise the question of using these language models on highly-variable domains such as these found online. Focusing on lexical normalization, we show that this task can be approached with BERT-like models. However, we show that it only partially helps downstream performance. In consequence, we focus on adaptation techniques using what we refer to as representation transfer and explore challenging settings such as the zero-shot setting, low-resource languages. We show that multilingual language models can be adapted and used efficiently with low-resource languages, even with the ones unseen during pretraining, and that the script is a critical component in this adaptation
APA, Harvard, Vancouver, ISO, and other styles
8

Martin, Terrence Lance. "Towards improved speech recognition for resource poor languages." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/35771/1/Terrence_Martin_Thesis.pdf.

Full text
Abstract:
In recent times, the improved levels of accuracy obtained by Automatic Speech Recognition (ASR) technology has made it viable for use in a number of commercial products. Unfortunately, these types of applications are limited to only a few of the world’s languages, primarily because ASR development is reliant on the availability of large amounts of language specific resources. This motivates the need for techniques which reduce this language-specific, resource dependency. Ideally, these approaches should generalise across languages, thereby providing scope for rapid creation of ASR capabilities for resource poor languages. Cross Lingual ASR emerges as a means for addressing this need. Underpinning this approach is the observation that sound production is largely influenced by the physiological construction of the vocal tract, and accordingly, is human, and not language specific. As a result, a common inventory of sounds exists across languages; a property which is exploitable, as sounds from a resource poor, target language can be recognised using models trained on resource rich, source languages. One of the initial impediments to the commercial uptake of ASR technology was its fragility in more challenging environments, such as conversational telephone speech. Subsequent improvements in these environments has gained consumer confidence. Pragmatically, if cross lingual techniques are to considered a viable alternative when resources are limited, they need to perform under the same types of conditions. Accordingly, this thesis evaluates cross lingual techniques using two speech environments; clean read speech and conversational telephone speech. Languages used in evaluations are German, Mandarin, Japanese and Spanish. Results highlight that previously proposed approaches provide respectable results for simpler environments such as read speech, but degrade significantly when in the more taxing conversational environment. Two separate approaches for addressing this degradation are proposed. The first is based on deriving better target language lexical representation, in terms of the source language model set. The second, and ultimately more successful approach, focuses on improving the classification accuracy of context-dependent (CD) models, by catering for the adverse influence of languages specific phonotactic properties. Whilst the primary research goal in this thesis is directed towards improving cross lingual techniques, the catalyst for investigating its use was based on expressed interest from several organisations for an Indonesian ASR capability. In Indonesia alone, there are over 200 million speakers of some Malay variant, provides further impetus and commercial justification for speech related research on this language. Unfortunately, at the beginning of the candidature, limited research had been conducted on the Indonesian language in the field of speech science, and virtually no resources existed. This thesis details the investigative and development work dedicated towards obtaining an ASR system with a 10000 word recognition vocabulary for the Indonesian language.
APA, Harvard, Vancouver, ISO, and other styles
9

Balikas, Georgios. "Explorer et apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM054/document.

Full text
Abstract:
Le texte est l'une des sources d'informations les plus répandues et les plus persistantes. L'analyse de contenu du texte se réfère à des méthodes d'étude et de récupération d'informations à partir de documents. Aujourd'hui, avec une quantité de texte disponible en ligne toujours croissante l'analyse de contenu du texte revêt une grande importance parce qu' elle permet une variété d'applications. À cette fin, les méthodes d'apprentissage de la représentation sans supervision telles que les modèles thématiques et les word embeddings constituent des outils importants.L'objectif de cette dissertation est d'étudier et de relever des défis dans ce domaine.Dans la première partie de la thèse, nous nous concentrons sur les modèles thématiques et plus précisément sur la manière d'incorporer des informations antérieures sur la structure du texte à ces modèles.Les modèles de sujets sont basés sur le principe du sac-de-mots et, par conséquent, les mots sont échangeables. Bien que cette hypothèse profite les calculs des probabilités conditionnelles, cela entraîne une perte d'information.Pour éviter cette limitation, nous proposons deux mécanismes qui étendent les modèles de sujets en intégrant leur connaissance de la structure du texte. Nous supposons que les documents sont répartis dans des segments de texte cohérents. Le premier mécanisme attribue le même sujet aux mots d'un segment. La seconde, capitalise sur les propriétés de copulas, un outil principalement utilisé dans les domaines de l'économie et de la gestion des risques, qui sert à modéliser les distributions communes de densité de probabilité des variables aléatoires tout en n'accédant qu'à leurs marginaux.La deuxième partie de la thèse explore les modèles de sujets bilingues pour les collections comparables avec des alignements de documents explicites. En règle générale, une collection de documents pour ces modèles se présente sous la forme de paires de documents comparables. Les documents d'une paire sont écrits dans différentes langues et sont thématiquement similaires. À moins de traductions, les documents d'une paire sont semblables dans une certaine mesure seulement. Pendant ce temps, les modèles de sujets représentatifs supposent que les documents ont des distributions thématiques identiques, ce qui constitue une hypothèse forte et limitante. Pour le surmonter, nous proposons de nouveaux modèles thématiques bilingues qui intègrent la notion de similitude interlingue des documents qui constituent les paires dans leurs processus générateurs et d'inférence.La dernière partie de la thèse porte sur l'utilisation d'embeddings de mots et de réseaux de neurones pour trois applications d'exploration de texte. Tout d'abord, nous abordons la classification du document polylinguistique où nous soutenons que les traductions d'un document peuvent être utilisées pour enrichir sa représentation. À l'aide d'un codeur automatique pour obtenir ces représentations de documents robustes, nous démontrons des améliorations dans la tâche de classification de documents multi-classes. Deuxièmement, nous explorons la classification des tweets à plusieurs tâches en soutenant que, en formant conjointement des systèmes de classification utilisant des tâches corrélées, on peut améliorer la performance obtenue. À cette fin, nous montrons comment réaliser des performances de pointe sur une tâche de classification du sentiment en utilisant des réseaux neuronaux récurrents. La troisième application que nous explorons est la récupération d'informations entre langues. Compte tenu d'un document écrit dans une langue, la tâche consiste à récupérer les documents les plus similaires à partir d'un ensemble de documents écrits dans une autre langue. Dans cette ligne de recherche, nous montrons qu'en adaptant le problème du transport pour la tâche d'estimation des distances documentaires, on peut obtenir des améliorations importantes
Text is one of the most pervasive and persistent sources of information. Content analysis of text in its broad sense refers to methods for studying and retrieving information from documents. Nowadays, with the ever increasing amounts of text becoming available online is several languages and different styles, content analysis of text is of tremendous importance as it enables a variety of applications. To this end, unsupervised representation learning methods such as topic models and word embeddings constitute prominent tools.The goal of this dissertation is to study and address challengingproblems in this area, focusing on both the design of novel text miningalgorithms and tools, as well as on studying how these tools can be applied to text collections written in a single or several languages.In the first part of the thesis we focus on topic models and more precisely on how to incorporate prior information of text structure to such models.Topic models are built on the premise of bag-of-words, and therefore words are exchangeable. While this assumption benefits the calculations of the conditional probabilities it results in loss of information.To overcome this limitation we propose two mechanisms that extend topic models by integrating knowledge of text structure to them. We assume that the documents are partitioned in thematically coherent text segments. The first mechanism assigns the same topic to the words of a segment. The second, capitalizes on the properties of copulas, a tool mainly used in the fields of economics and risk management that is used to model the joint probability density distributions of random variables while having access only to their marginals.The second part of the thesis explores bilingual topic models for comparable corpora with explicit document alignments. Typically, a document collection for such models is in the form of comparable document pairs. The documents of a pair are written in different languages and are thematically similar. Unless translations, the documents of a pair are similar to some extent only. Meanwhile, representative topic models assume that the documents have identical topic distributions, which is a strong and limiting assumption. To overcome it we propose novel bilingual topic models that incorporate the notion of cross-lingual similarity of the documents that constitute the pairs in their generative and inference processes. Calculating this cross-lingual document similarity is a task on itself, which we propose to address using cross-lingual word embeddings.The last part of the thesis concerns the use of word embeddings and neural networks for three text mining applications. First, we discuss polylingual document classification where we argue that translations of a document can be used to enrich its representation. Using an auto-encoder to obtain these robust document representations we demonstrate improvements in the task of multi-class document classification. Second, we explore multi-task sentiment classification of tweets arguing that by jointly training classification systems using correlated tasks can improve the obtained performance. To this end we show how can achieve state-of-the-art performance on a sentiment classification task using recurrent neural networks. The third application we explore is cross-lingual information retrieval. Given a document written in one language, the task consists in retrieving the most similar documents from a pool of documents written in another language. In this line of research, we show that by adapting the transportation problem for the task of estimating document distances one can achieve important improvements
APA, Harvard, Vancouver, ISO, and other styles
10

Cossu, Jean-Valère. "Analyse de l’image de marque sur le Web 2.0." Thesis, Avignon, 2015. http://www.theses.fr/2015AVIG0207/document.

Full text
Abstract:
Image sur le web : analyse de la dynamique des images sur le Web 2.0. En plus d’être un moyen d’accès à la connaissance, Internet est devenu en quelques années un lieu privilégié pour l’apparition et la diffusion d’opinions.Chaque jour, des millions d’individus publient leurs avis sur le Web 2.0 (réseaux sociaux, blogs, etc.). Ces commentaires portent sur des sujets aussi variés que l’actualité, la politique, les résultats sportifs, biens culturels, des objets de consommation, etc. L’amoncellement et l’agglomération de ces avis publiés sur une entité (qu’il s’agisse d’un produit, une entreprise ou une personnalité publique)donnent naissance à l’image de marque de cette entité.L’image d’une entité est ici comprise comme l’idée qu’une personne ou qu’un groupe de personnes se fait de cette entité. Cette idée porte a priori sur un sujet particulier et n’est valable que dans un contexte, à un instant donné.Cette image perçue est par nature différente de celle que l’entité souhaitait initialement diffuser (par exemple via une campagne de communication). De plus,dans la réalité, il existe au final plusieurs images qui cohabitent en parallèle sur le réseau, chacune propre à une communauté et toutes évoluant différemment au fil du temps (imaginons comment serait perçu dans chaque camp le rapprochement de deux hommes politiques de bords opposés). Enfin, en plus des polémiques volontairement provoquées par le comportement de certaines entités en vue d’attirer l’attention sur elles (pensons aux tenues ou déclarations choquantes), il arrive également que la diffusion d’une image dépasse le cadre qui la régissait et même parfois se retourne contre l’entité (par exemple, «le mariage pour tous» devenu « la manif pour tous »). Les opinions exprimées constituent alors autant d’indices permettant de comprendre la logique de construction et d’évolution de ces images. Ce travail d’analyse est jusqu’à présent confié à des spécialistes de l’e-communication qui monnaient leur subjectivité. Ces derniers ne peuvent considérer qu’un volume restreint d’information et ne sont que rarement d’accord entre eux. Dans cette thèse, nous proposons d’utiliser différentes méthodes automatiques, statistiques, supervisées et d’une faible complexité permettant d’analyser et représenter l’image de marque d’entité à partir de contenus textuels les mentionnant. Plus spécifiquement, nous cherchons à identifier les contenus(ainsi que leurs auteurs) qui sont les plus préjudiciables à l’image de marque d’une entité. Nous introduisons un processus d’optimisation automatique de ces méthodes automatiques permettant d’enrichir les données en utilisant un retour de pertinence simulé (sans qu’aucune action de la part de l’entité concernée ne soit nécessaire). Nous comparer également plusieurs approches de contextualisation de messages courts à partir de méthodes de recherche d’information et de résumé automatique. Nous tirons également parti d’algorithmes de modélisation(tels que la Régression des moindres carrés partiels), dans le cadre d’une modélisation conceptuelle de l’image de marque, pour améliorer nos systèmes automatiques de catégorisation de documents textuels. Ces méthodes de modélisation et notamment les représentations des corrélations entre les différents concepts que nous manipulons nous permettent de représenter d’une part, le contexte thématique d’une requête de l’entité et d’autre, le contexte général de son image de marque. Nous expérimentons l’utilisation et la combinaison de différentes sources d’information générales représentant les grands types d’information auxquels nous sommes confrontés sur internet : de long les contenus objectifs rédigés à des informatives, les contenus brefs générés par les utilisateurs visant à partager des opinions. Nous évaluons nos approches en utilisant deux collections de données, la première est celle constituée dans le cadre du projet Imagiweb, la seconde est la collection de référence sur le sujet : CLEFRepLab
Analyse of entities representation over the Web 2.0Every day, millions of people publish their views on Web 2.0 (social networks,blogs, etc.). These comments focus on subjects as diverse as news, politics,sports scores, consumer objects, etc. The accumulation and agglomerationof these notices on an entity (be it a product, a company or a public entity) givebirth to the brand image of that entity. Internet has become in recent years aprivileged place for the emergence and dissemination of opinions and puttingWeb 2.0 at the head of observatories of opinions. The latter being a means ofaccessing the knowledge of the opinion of the world population.The image is here understood as the idea that a person or a group of peopleis that entity. This idea carries a priori on a particular subject and is onlyvalid in context for a given time. This perceived image is different from theentity initially wanted to broadcast (eg via a communication campaign). Moreover,in reality, there are several images in the end living together in parallel onthe network, each specific to a community and all evolve differently over time(imagine how would be perceived in each camp together two politicians edgesopposite). Finally, in addition to the controversy caused by the voluntary behaviorof some entities to attract attention (think of the declarations required orshocking). It also happens that the dissemination of an image beyond the frameworkthat governed the and sometimes turns against the entity (for example,« marriage for all » became « the demonstration for all »). The views expressedthen are so many clues to understand the logic of construction and evolution ofthese images. The aim is to be able to know what we are talking about and howwe talk with filigree opportunity to know who is speaking.viiIn this thesis we propose to use several simple supervised statistical automaticmethods to monitor entity’s online reputation based on textual contentsmentioning it. More precisely we look the most important contents and theirsauthors (from a reputation manager point-of-view). We introduce an optimizationprocess allowing us to enrich the data using a simulated relevance feedback(without any human involvement). We also compare content contextualizationmethod using information retrieval and automatic summarization methods.Wealso propose a reflection and a new approach to model online reputation, improveand evaluate reputation monitoring methods using Partial Least SquaresPath Modelling (PLS-PM). In designing the system, we wanted to address localand global context of the reputation. That is to say the features can explain thedecision and the correlation betweens topics and reputation. The goal of ourwork was to propose a different way to combine usual methods and featuresthat may render reputation monitoring systems more accurate than the existingones. We evaluate and compare our systems using state of the art frameworks: Imagiweb and RepLab. The performances of our proposals are comparableto the state of the art. In addition, the fact that we provide reputation modelsmake our methods even more attractive for reputation manager or scientistsfrom various fields
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Multilingual Modeling"

1

(Editor), Kenneth Hyltenstam, and Manfred Pienemann (Editor), eds. Modelling Assessing SEC Lang (Multilingual Matters). Multilingual Matters Limited, 1985.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

(Editor), Kenneth Hyltenstam, and Manfred Pienemann (Editor), eds. Modelling and Assessing: Second Language Acquisition (Multilingual Matters). Multilingual Matters, 1998.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multilingual Modeling"

1

Ghorab, M. Rami, Séamus Lawless, Alexander O’Connor, Dong Zhou, and Vincent Wade. "Multilingual vs. Monolingual User Models for Personalized Multilingual Information Retrieval." In User Modeling, Adaptation, and Personalization, 356–58. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-38844-6_38.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Steichen, Ben, M. Rami Ghorab, Alexander O’Connor, Séamus Lawless, and Vincent Wade. "Towards Personalized Multilingual Information Access - Exploring the Browsing and Search Behavior of Multilingual Users." In User Modeling, Adaptation, and Personalization, 435–46. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-08786-3_39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Gao, Ming, Shilian Wu, and Zengfu Wang. "A Length-Sensitive Language-Bound Recognition Network for Multilingual Text Recognition." In MultiMedia Modeling, 139–50. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-27818-1_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Embley, David W., Stephen W. Liddle, Deryle W. Lonsdale, and Yuri Tijerino. "Multilingual Ontologies for Cross-Language Information Extraction and Semantic Search." In Conceptual Modeling – ER 2011, 147–60. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24606-7_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Díaz Esteban, Alberto. "Integrating Multilingual Text Classification Tasks and User Modeling in Personalized Newspaper Services." In User Modeling 2001, 268–70. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-44566-8_41.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Chew, Peter A., and Jessica G. Turnley. "Understanding Russian Information Operations Using Unsupervised Multilingual Topic Modeling." In Social, Cultural, and Behavioral Modeling, 102–7. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-60240-0_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Donahue, Christiane. "Trends in modeling academic writing in multilingual contexts." In Academic writing across languages: multilingual and contrastive approaches in higher education, 41–58. Wien: Böhlau Verlag, 2019. http://dx.doi.org/10.7767/9783205208815.41.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chew, Peter A. "‘Linguistics-Lite’ Topic Extraction from Multilingual Social Media Data." In Social Computing, Behavioral-Cultural Modeling, and Prediction, 276–82. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-16268-3_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Mogadala, Aditya, Rambhoopal Kothwal, and Vasudeva Varma. "Language Modeling Approach to Retrieval for SMS and FAQ Matching." In Multilingual Information Access in South Asian Languages, 119–30. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40087-2_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wu, Jiajia, Kun Zhao, Zhengyan Yang, Bing Yin, Cong Liu, and Lirong Dai. "End-to-End Multilingual Text Recognition Based on Byte Modeling." In Lecture Notes in Computer Science, 128–37. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-46311-2_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multilingual Modeling"

1

Tian, Jilei, Juha Häkkinen, and Olli Viikki. "Multilingual pronunciation modeling for improving multilingual speech recognition." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA: ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-176.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Datta, Arindrima, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, and Brian Roark. "Language-Agnostic Multilingual Modeling." In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020. http://dx.doi.org/10.1109/icassp40776.2020.9053443.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kanthak, S., and Hermann Ney. "Multilingual acoustic modeling using graphemes." In 8th European Conference on Speech Communication and Technology (Eurospeech 2003). ISCA: ISCA, 2003. http://dx.doi.org/10.21437/eurospeech.2003-373.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Musa, Ibrahim Hussein, Kang Xu, and Ibrahim Zamit. "Multilingual Document Concept Topic Modeling." In 2022 European Conference on Natural Language Processing and Information Retrieval (ECNLPIR). IEEE, 2022. http://dx.doi.org/10.1109/ecnlpir57021.2022.00027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lowe, Ryan, and Ben Steichen. "Multilingual Search User Behaviors -- Exploring Multilingual Querying and Result Selection Through Crowdsourcing." In UMAP '17: 25th Conference on User Modeling, Adaptation and Personalization. New York, NY, USA: ACM, 2017. http://dx.doi.org/10.1145/3079628.3079702.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Moosa, Ibraheem Muhammad, Mahmud Elahi Akhter, and Ashfia Binte Habib. "Does Transliteration Help Multilingual Language Modeling?" In Findings of the Association for Computational Linguistics: EACL 2023. Stroudsburg, PA, USA: Association for Computational Linguistics, 2023. http://dx.doi.org/10.18653/v1/2023.findings-eacl.50.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zha, Hongyuan, and Xiang Ji. "Correlating multilingual documents via bipartite graph modeling." In the 25th annual international ACM SIGIR conference. New York, New York, USA: ACM Press, 2002. http://dx.doi.org/10.1145/564376.564485.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Goyal, Naman, Jingfei Du, Myle Ott, Giri Anantharaman, and Alexis Conneau. "Larger-Scale Transformers for Multilingual Masked Language Modeling." In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021). Stroudsburg, PA, USA: Association for Computational Linguistics, 2021. http://dx.doi.org/10.18653/v1/2021.repl4nlp-1.4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Imseng, David, John Dines, Petr Motlicek, Philip N. Garner, and Hervé Bourlard. "Comparing different acoustic modeling techniques for multilingual boosting." In Interspeech 2012. ISCA: ISCA, 2012. http://dx.doi.org/10.21437/interspeech.2012-369.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Romeo, Salvatore, Andrea Tagarelli, and Dino Ienco. "Semantic-Based Multilingual Document Clustering via Tensor Modeling." In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg, PA, USA: Association for Computational Linguistics, 2014. http://dx.doi.org/10.3115/v1/d14-1065.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography