Siga este enlace para ver otros tipos de publicaciones sobre el tema: Neural language models.

Tesis sobre el tema "Neural language models"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores tesis para su investigación sobre el tema "Neural language models".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Lei, Tao Ph D. Massachusetts Institute of Technology. "Interpretable neural models for natural language processing". Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/108990.

Texto completo
Resumen
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 109-119).
The success of neural network models often comes at a cost of interpretability. This thesis addresses the problem by providing justifications behind the model's structure and predictions. In the first part of this thesis, we present a class of sequence operations for text processing. The proposed component generalizes from convolution operations and gated aggregations. As justifications, we relate this component to string kernels, i.e. functions measuring the similarity between sequences, and demonstrate how it encodes the efficient kernel computing algorithm into its structure. The proposed model achieves state-of-the-art or competitive results compared to alternative architectures (such as LSTMs and CNNs) across several NLP applications. In the second part, we learn rationales behind the model's prediction by extracting input pieces as supporting evidence. Rationales are tailored to be short and coherent, yet sufficient for making the same prediction. Our approach combines two modular components, generator and encoder, which are trained to operate well together. The generator specifies a distribution over text fragments as candidate rationales and these are passed through the encoder for prediction. Rationales are never given during training. Instead, the model is regularized by the desiderata for rationales. We demonstrate the effectiveness of this learning framework in applications such multi-aspect sentiment analysis. Our method achieves a performance over 90% evaluated against manual annotated rationales.
by Tao Lei.
Ph. D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Kunz, Jenny. "Neural Language Models with Explicit Coreference Decision". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-371827.

Texto completo
Resumen
Coreference is an important and frequent concept in any form of discourse, and Coreference Resolution (CR) a widely used task in Natural Language Understanding (NLU). In this thesis, we implement and explore two recent models that include the concept of coreference in Recurrent Neural Network (RNN)-based Language Models (LM). Entity and reference decisions are modeled explicitly in these models using attention mechanisms. Both models learn to save the previously observed entities in a set and to decide if the next token created by the LM is a mention of one of the entities in the set, an entity that has not been observed yet, or not an entity. After a theoretical analysis where we compare the two LMs to each other and to a state of the art Coreference Resolution system, we perform an extensive quantitative and qualitative analysis. For this purpose, we train the two models and a classical RNN-LM as the baseline model on the OntoNotes 5.0 corpus with coreference annotation. While we do not reach the baseline in the perplexity metric, we show that the models’ relative performance on entity tokens has the potential to improve when including the explicit entity modeling. We show that the most challenging point in the systems is the decision if the next token is an entity token, while the decision which entity the next token refers to performs comparatively well. Our analysis in the context of a text generation task shows that a wide-spread error source for the mention creation process is the confusion of tokens that refer to related but different entities in the real world, presumably a result of the context-based word representations in the models. Our re-implementation of the DeepMind model by Yang et al. 2016 performs notably better than the re-implementation of the EntityNLM model by Ji et al. 2017 with a perplexity of 107 compared to a perplexity of 131.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Labeau, Matthieu. "Neural language models : Dealing with large vocabularies". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS313/document.

Texto completo
Resumen
Le travail présenté dans cette thèse explore les méthodes pratiques utilisées pour faciliter l'entraînement et améliorer les performances des modèles de langues munis de très grands vocabulaires. La principale limite à l'utilisation des modèles de langue neuronaux est leur coût computationnel: il dépend de la taille du vocabulaire avec laquelle il grandit linéairement. La façon la plus aisée de réduire le temps de calcul de ces modèles reste de limiter la taille du vocabulaire, ce qui est loin d'être satisfaisant pour de nombreuses tâches. La plupart des méthodes existantes pour l'entraînement de ces modèles à grand vocabulaire évitent le calcul de la fonction de partition, qui est utilisée pour forcer la distribution de sortie du modèle à être normalisée en une distribution de probabilités. Ici, nous nous concentrons sur les méthodes à base d'échantillonnage, dont le sampling par importance et l'estimation contrastive bruitée. Ces méthodes permettent de calculer facilement une approximation de cette fonction de partition. L'examen des mécanismes de l'estimation contrastive bruitée nous permet de proposer des solutions qui vont considérablement faciliter l'entraînement, ce que nous montrons expérimentalement. Ensuite, nous utilisons la généralisation d'un ensemble d'objectifs basés sur l'échantillonnage comme divergences de Bregman pour expérimenter avec de nouvelles fonctions objectif. Enfin, nous exploitons les informations données par les unités sous-mots pour enrichir les représentations en sortie du modèle. Nous expérimentons avec différentes architectures, sur le Tchèque, et montrons que les représentations basées sur les caractères permettent l'amélioration des résultats, d'autant plus lorsque l'on réduit conjointement l'utilisation des représentations de mots
This work investigates practical methods to ease training and improve performances of neural language models with large vocabularies. The main limitation of neural language models is their expensive computational cost: it depends on the size of the vocabulary, with which it grows linearly. Despite several training tricks, the most straightforward way to limit computation time is to limit the vocabulary size, which is not a satisfactory solution for numerous tasks. Most of the existing methods used to train large-vocabulary language models revolve around avoiding the computation of the partition function, ensuring that output scores are normalized into a probability distribution. Here, we focus on sampling-based approaches, including importance sampling and noise contrastive estimation. These methods allow an approximate computation of the partition function. After examining the mechanism of self-normalization in noise-contrastive estimation, we first propose to improve its efficiency with solutions that are adapted to the inner workings of the method and experimentally show that they considerably ease training. Our second contribution is to expand on a generalization of several sampling based objectives as Bregman divergences, in order to experiment with new objectives. We use Beta divergences to derive a set of objectives from which noise contrastive estimation is a particular case. Finally, we aim at improving performances on full vocabulary language models, by augmenting output words representation with subwords. We experiment on a Czech dataset and show that using character-based representations besides word embeddings for output representations gives better results. We also show that reducing the size of the output look-up table improves results even more
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Bayer, Ali Orkan. "Semantic Language models with deep neural Networks". Doctoral thesis, Università degli studi di Trento, 2015. https://hdl.handle.net/11572/367784.

Texto completo
Resumen
Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as “what the system understands†, to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Bayer, Ali Orkan. "Semantic Language models with deep neural Networks". Doctoral thesis, University of Trento, 2015. http://eprints-phd.biblio.unitn.it/1578/1/bayer_thesis.pdf.

Texto completo
Resumen
Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as “what the system understands”, to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Li, Zhongliang. "Slim Embedding Layers for Recurrent Neural Language Models". Wright State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=wright1531950458646138.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Gangireddy, Siva Reddy. "Recurrent neural network language models for automatic speech recognition". Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28990.

Texto completo
Resumen
The goal of this thesis is to advance the use of recurrent neural network language models (RNNLMs) for large vocabulary continuous speech recognition (LVCSR). RNNLMs are currently state-of-the-art and shown to consistently reduce the word error rates (WERs) of LVCSR tasks when compared to other language models. In this thesis we propose various advances to RNNLMs. The advances are: improved learning procedures for RNNLMs, enhancing the context, and adaptation of RNNLMs. We learned better parameters by a novel pre-training approach and enhanced the context using prosody and syntactic features. We present a pre-training method for RNNLMs, in which the output weights of a feed-forward neural network language model (NNLM) are shared with the RNNLM. This is accomplished by first fine-tuning the weights of the NNLM, which are then used to initialise the output weights of an RNNLM with the same number of hidden units. To investigate the effectiveness of the proposed pre-training method, we have carried out text-based experiments on the Penn Treebank Wall Street Journal data, and ASR experiments on the TED lectures data. Across the experiments, we observe small but significant improvements in perplexity (PPL) and ASR WER. Next, we present unsupervised adaptation of RNNLMs. We adapted the RNNLMs to a target domain (topic or genre or television programme (show)) at test time using ASR transcripts from first pass recognition. We investigated two approaches to adapt the RNNLMs. In the first approach the forward propagating hidden activations are scaled - learning hidden unit contributions (LHUC). In the second approach we adapt all parameters of RNNLM.We evaluated the adapted RNNLMs by showing the WERs on multi genre broadcast speech data. We observe small (on an average 0.1% absolute) but significant improvements in WER compared to a strong unadapted RNNLM model. Finally, we present the context-enhancement of RNNLMs using prosody and syntactic features. The prosody features were computed from the acoustics of the context words and the syntactic features were from the surface form of the words in the context. We trained the RNNLMs with word duration, pause duration, final phone duration, syllable duration, syllable F0, part-of-speech tag and Combinatory Categorial Grammar (CCG) supertag features. The proposed context-enhanced RNNLMs were evaluated by reporting PPL and WER on two speech recognition tasks, Switchboard and TED lectures. We observed substantial improvements in PPL (5% to 15% relative) and small but significant improvements in WER (0.1% to 0.5% absolute).
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Scarcella, Alessandro. "Recurrent neural network language models in the context of under-resourced South African languages". Master's thesis, University of Cape Town, 2018. http://hdl.handle.net/11427/29431.

Texto completo
Resumen
Over the past five years neural network models have been successful across a range of computational linguistic tasks. However, these triumphs have been concentrated in languages with significant resources such as large datasets. Thus, many languages, which are commonly referred to as under-resourced languages, have received little attention and have yet to benefit from recent advances. This investigation aims to evaluate the implications of recent advances in neural network language modelling techniques for under-resourced South African languages. Rudimentary, single layered recurrent neural networks (RNN) were used to model four South African text corpora. The accuracy of these models were compared directly to legacy approaches. A suite of hybrid models was then tested. Across all four datasets, neural networks led to overall better performing language models either directly or as part of a hybrid model. A short examination of punctuation marks in text data revealed that performance metrics for language models are greatly overestimated when punctuation marks have not been excluded. The investigation concludes by appraising the sensitivity of RNN language models (RNNLMs) to the size of the datasets by artificially constraining the datasets and evaluating the accuracy of the models. It is recommended that future research endeavours within this domain are directed towards evaluating more sophisticated RNNLMs as well as measuring their impact on application focused tasks such as speech recognition and machine translation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Le, Hai Son. "Continuous space models with neural networks in natural language processing". Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00776704.

Texto completo
Resumen
The purpose of language models is in general to capture and to model regularities of language, thereby capturing morphological, syntactical and distributional properties of word sequences in a given language. They play an important role in many successful applications of Natural Language Processing, such as Automatic Speech Recognition, Machine Translation and Information Extraction. The most successful approaches to date are based on n-gram assumption and the adjustment of statistics from the training data by applying smoothing and back-off techniques, notably Kneser-Ney technique, introduced twenty years ago. In this way, language models predict a word based on its n-1 previous words. In spite of their prevalence, conventional n-gram based language models still suffer from several limitations that could be intuitively overcome by consulting human expert knowledge. One critical limitation is that, ignoring all linguistic properties, they treat each word as one discrete symbol with no relation with the others. Another point is that, even with a huge amount of data, the data sparsity issue always has an important impact, so the optimal value of n in the n-gram assumption is often 4 or 5 which is insufficient in practice. This kind of model is constructed based on the count of n-grams in training data. Therefore, the pertinence of these models is conditioned only on the characteristics of the training text (its quantity, its representation of the content in terms of theme, date). Recently, one of the most successful attempts that tries to directly learn word similarities is to use distributed word representations in language modeling, where distributionally words, which have semantic and syntactic similarities, are expected to be represented as neighbors in a continuous space. These representations and the associated objective function (the likelihood of the training data) are jointly learned using a multi-layer neural network architecture. In this way, word similarities are learned automatically. This approach has shown significant and consistent improvements when applied to automatic speech recognition and statistical machine translation tasks. A major difficulty with the continuous space neural network based approach remains the computational burden, which does not scale well to the massive corpora that are nowadays available. For this reason, the first contribution of this dissertation is the definition of a neural architecture based on a tree representation of the output vocabulary, namely Structured OUtput Layer (SOUL), which makes them well suited for large scale frameworks. The SOUL model combines the neural network approach with the class-based approach. It achieves significant improvements on both state-of-the-art large scale automatic speech recognition and statistical machine translations tasks. The second contribution is to provide several insightful analyses on their performances, their pros and cons, their induced word space representation. Finally, the third contribution is the successful adoption of the continuous space neural network into a machine translation framework. New translation models are proposed and reported to achieve significant improvements over state-of-the-art baseline systems.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Miao, Yishu. "Deep generative models for natural language processing". Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:e4e1f1f9-e507-4754-a0ab-0246f1e1e258.

Texto completo
Resumen
Deep generative models are essential to Natural Language Processing (NLP) due to their outstanding ability to use unlabelled data, to incorporate abundant linguistic features, and to learn interpretable dependencies among data. As the structure becomes deeper and more complex, having an effective and efficient inference method becomes increasingly important. In this thesis, neural variational inference is applied to carry out inference for deep generative models. While traditional variational methods derive an analytic approximation for the intractable distributions over latent variables, here we construct an inference network conditioned on the discrete text input to provide the variational distribution. The powerful neural networks are able to approximate complicated non-linear distributions and grant the possibilities for more interesting and complicated generative models. Therefore, we develop the potential of neural variational inference and apply it to a variety of models for NLP with continuous or discrete latent variables. This thesis is divided into three parts. Part I introduces a generic variational inference framework for generative and conditional models of text. For continuous or discrete latent variables, we apply a continuous reparameterisation trick or the REINFORCE algorithm to build low-variance gradient estimators. To further explore Bayesian non-parametrics in deep neural networks, we propose a family of neural networks that parameterise categorical distributions with continuous latent variables. Using the stick-breaking construction, an unbounded categorical distribution is incorporated into our deep generative models which can be optimised by stochastic gradient back-propagation with a continuous reparameterisation. Part II explores continuous latent variable models for NLP. Chapter 3 discusses the Neural Variational Document Model (NVDM): an unsupervised generative model of text which aims to extract a continuous semantic latent variable for each document. In Chapter 4, the neural topic models modify the neural document models by parameterising categorical distributions with continuous latent variables, where the topics are explicitly modelled by discrete latent variables. The models are further extended to neural unbounded topic models with the help of stick-breaking construction, and a truncation-free variational inference method is proposed based on a Recurrent Stick-breaking construction (RSB). Chapter 5 describes the Neural Answer Selection Model (NASM) for learning a latent stochastic attention mechanism to model the semantics of question-answer pairs and predict their relatedness. Part III discusses discrete latent variable models. Chapter 6 introduces latent sentence compression models. The Auto-encoding Sentence Compression Model (ASC), as a discrete variational auto-encoder, generates a sentence by a sequence of discrete latent variables representing explicit words. The Forced Attention Sentence Compression Model (FSC) incorporates a combined pointer network biased towards the usage of words from source sentence, which significantly improves the performance when jointly trained with the ASC model in a semi-supervised learning fashion. Chapter 7 describes the Latent Intention Dialogue Models (LIDM) that employ a discrete latent variable to learn underlying dialogue intentions. Additionally, the latent intentions can be interpreted as actions guiding the generation of machine responses, which could be further refined autonomously by reinforcement learning. Finally, Chapter 8 summarizes our findings and directions for future work.
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Sun, Qing. "Greedy Inference Algorithms for Structured and Neural Models". Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/81860.

Texto completo
Resumen
A number of problems in Computer Vision, Natural Language Processing, and Machine Learning produce structured outputs in high-dimensional space, which makes searching for the global optimal solution extremely expensive. Thus, greedy algorithms, making trade-offs between precision and efficiency, are widely used. %Unfortunately, they in general lack theoretical guarantees. In this thesis, we prove that greedy algorithms are effective and efficient to search for multiple top-scoring hypotheses from structured (neural) models: 1) Entropy estimation. We aim to find deterministic samples that are representative of Gibbs distribution via a greedy strategy. 2) Searching for a set of diverse and high-quality bounding boxes. We formulate this problem as the constrained maximization of a monotonic sub-modular function such that there exists a greedy algorithm having near-optimal guarantee. 3) Fill-in-the-blank. The goal is to generate missing words conditioned on context given an image. We extend Beam Search, a greedy algorithm applicable on unidirectional expansion, to bidirectional neural models when both past and future information have to be considered. We test our proposed approaches on a series of Computer Vision and Natural Language Processing benchmarks and show that they are effective and efficient.
Ph. D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Hedström, Simon. "General Purpose Vector Representation for Swedish Documents : An application of Neural Language Models". Thesis, Umeå universitet, Institutionen för fysik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160109.

Texto completo
Resumen
This thesis is a proof-of-concept for embedding Swedish documents using continuous vectors. These vectors can be used as input in any subsequent task and serves as an alternative to discrete bag of words vectors. The differences goes beyond fewer dimensions as the continuous vectors also hold contextual information. This means that documents with no shared vocabulary can be directly identified as contextually similar, which is impossible for the bag of words vectors. The continuous vectors are the result of neural language models and algorithms that pool the model output into document-level representations. This thesis has looked into the latest research regarding such models, starting from the Word2Vec algorithms. A wide variety of neural language models were selected together with algorithms for pooling word and sentence vectors into document vectors. For the training of the neural language models we have assembled a training corpus spanning 1.2 billion Swedish words. The trained neural language models were later paired with pooling algorithms to finalize an array of document vector models. The document vector models were evaluated on five classifications tasks and compared against the baseline bag of words vectors. A few models that were trained directly on the evaluation data were also included as reference. For each evaluation task the setup was held constant, which ensured that any difference in performance came from the quality of the document representations. The results show that the continuous document vectors outperform the baseline on topic and text format classifications tasks. It was noted that the best performance was achieved when a document vector model was trained directly on the evaluation data. However, this result was only marginally better than that of the best general document vector models. In conclusion it was a successful proof of concept but there are still improvements to be made, such as optimizing the composition of the training corpus. Due to its simplicity and overall performance we recommend a general Sent2Vec model as a new baseline for future projects.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Parthiban, Dwarak Govind. "On the Softmax Bottleneck of Word-Level Recurrent Language Models". Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41412.

Texto completo
Resumen
For different input contexts (sequence of previous words), to predict the next word, a neural word-level language model outputs a probability distribution over all the words in the vocabulary using a softmax function. When the log of probability outputs for all such contexts are stacked together, the resulting matrix is a log probability matrix which can be denoted as Q_theta, where theta denotes the model parameters. When language modeling is formulated as a matrix factorization problem, the matrix to be factorized Q_theta is expected to be high-rank as natural language is highly context-dependent. But existing softmax based word-level language models have a limitation of not being able to produce such matrices; this is known as the softmax bottleneck. There are several works that attempted to overcome the limitations introduced by softmax bottleneck, such as the models that can produce high-rank Q_theta. During the process of reproducing the results of these works, we observed that the rank of Q_theta does not always positively correlate with better performance (i.e., lower test perplexity). This puzzling observation triggered us to conduct a systematic investigation to check the influence of rank of Q_theta on better performance of a language model. We first introduce a new family of activation functions called the Generalized SigSoftmax (GSS). By controlling the parameters of GSS, we were able to construct language models that can produce Q_theta with diverse ranks (i.e., low, medium, and high ranks). For models that use GSS with different parameters, we observe that rank does not have a strong positive correlation with perplexity on the test data, reinforcing the support of our initial observation. By inspecting the top-5 predictions made by different models for a selected set of input contexts, we observe that a high-rank Q_theta does not guarantee a strong qualitative performance. Then, we conduct experiments to check if there are any other additional benefits in having models that can produce high-rank Q_theta. We expose that Q_theta rather suffers from the phenomenon of fast singular value decay. Additionally, we also propose an alternative metric to denote the rank of any matrix known as epsilon-effective rank, which can be useful to approximately quantify the singular value distribution when different values for epsilon are used. We conclude by showing that it is the regularization which has played a positive role in the performance of these high-rank models in comparison to the chosen baselines, and there is no single model yet which truly gains improved expressiveness just because of breaking the softmax bottleneck.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Kamper, Herman. "Unsupervised neural and Bayesian models for zero-resource speech processing". Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/25432.

Texto completo
Resumen
Zero-resource speech processing is a growing research area which aims to develop methods that can discover linguistic structure and representations directly from unlabelled speech audio. Such unsupervised methods would allow speech technology to be developed in settings where transcriptions, pronunciation dictionaries, and text for language modelling are not available. Similar methods are required for cognitive models of language acquisition in human infants, and for developing robotic applications that are able to automatically learn language in a novel linguistic environment. There are two central problems in zero-resource speech processing: (i) finding frame-level feature representations which make it easier to discriminate between linguistic units (phones or words), and (ii) segmenting and clustering unlabelled speech into meaningful units. The claim of this thesis is that both top-down modelling (using knowledge of higher-level units to to learn, discover and gain insight into their lower-level constituents) as well as bottom-up modelling (piecing together lower-level features to give rise to more complex higher-level structures) are advantageous in tackling these two problems. The thesis is divided into three parts. The first part introduces a new autoencoder-like deep neural network for unsupervised frame-level representation learning. This correspondence autoencoder (cAE) uses weak top-down supervision from an unsupervised term discovery system that identifies noisy word-like terms in unlabelled speech data. In an intrinsic evaluation of frame-level representations, the cAE outperforms several state-of-the-art bottom-up and top-down approaches, achieving a relative improvement of more than 60% over the previous best system. This shows that the cAE is particularly effective in using top-down knowledge of longer-spanning patterns in the data; at the same time, we find that the cAE is only able to learn useful representations when it is initialized using bottom-up pretraining on a large set of unlabelled speech. The second part of the thesis presents a novel unsupervised segmental Bayesian model that segments unlabelled speech data and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types|the system essentially performs unsupervised speech recognition. In this approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this embedding space while jointly performing segmentation. We first evaluate the approach in a small-vocabulary multi-speaker connected digit recognition task, where we report unsupervised word error rates (WER) by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20% WER, outperforming a previous HMM-based system by about 10% absolute. To achieve this performance, the acoustic word embedding function (which maps variable-duration segments to single vectors) is refined in a top-down manner by using terms discovered by the model in an outer loop of segmentation. The third and final part of the study extends the small-vocabulary system in order to handle larger vocabularies in conversational speech data. To our knowledge, this is the first full-coverage segmentation and clustering system that is applied to large-vocabulary multi-speaker data. To improve efficiency, the system incorporates a bottom-up syllable boundary detection method to eliminate unlikely word boundaries. We compare the system on English and Xitsonga datasets to several state-of-the-art baselines. We show that by imposing a consistent top-down segmentation while also using bottom-up knowledge from detected syllable boundaries, both single-speaker and multi-speaker versions of our system outperform a purely bottom-up single-speaker syllable-based approach. We also show that the discovered clusters can be made less speaker- and gender-specific by using features from the cAE (which incorporates both top-down and bottom-up learning). The system's discovered clusters are still less pure than those of two multi-speaker unsupervised term discovery systems, but provide far greater coverage. In summary, the different models and systems presented in this thesis show that both top-down and bottom-up modelling can improve representation learning, segmentation and clustering of unlabelled speech data.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Kryściński, Wojciech. "Training Neural Models for Abstractive Text Summarization". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-236973.

Texto completo
Resumen
Abstractive text summarization aims to condense long textual documents into a short, human-readable form while preserving the most important information from the source document. A common approach to training summarization models is by using maximum likelihood estimation with the teacher forcing strategy. Despite its popularity, this method has been shown to yield models with suboptimal performance at inference time. This work examines how using alternative, task-specific training signals affects the performance of summarization models. Two novel training signals are proposed and evaluated as part of this work. One, a novelty metric, measuring the overlap between n-grams in the summary and the summarized article. The other, utilizing a discriminator model to distinguish human-written summaries from generated ones on a word-level basis. Empirical results show that using the mentioned metrics as rewards for policy gradient training yields significant performance gains measured by ROUGE scores, novelty scores and human evaluation.
Abstraktiv textsammanfattning syftar på att korta ner långa textdokument till en förkortad, mänskligt läsbar form, samtidigt som den viktigaste informationen i källdokumentet bevaras. Ett vanligt tillvägagångssätt för att träna sammanfattningsmodeller är att använda maximum likelihood-estimering med teacher-forcing-strategin. Trots dess popularitet har denna metod visat sig ge modeller med suboptimal prestanda vid inferens. I det här arbetet undersöks hur användningen av alternativa, uppgiftsspecifika träningssignaler påverkar sammanfattningsmodellens prestanda. Två nya träningssignaler föreslås och utvärderas som en del av detta arbete. Den första, vilket är en ny metrik, mäter överlappningen mellan n-gram i sammanfattningen och den sammanfattade artikeln. Den andra använder en diskrimineringsmodell för att skilja mänskliga skriftliga sammanfattningar från genererade på ordnivå. Empiriska resultat visar att användandet av de nämnda mätvärdena som belöningar för policygradient-träning ger betydande prestationsvinster mätt med ROUGE-score, novelty score och mänsklig utvärdering.
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Wen, Tsung-Hsien. "Recurrent neural network language generation for dialogue systems". Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/275648.

Texto completo
Resumen
Language is the principal medium for ideas, while dialogue is the most natural and effective way for humans to interact with and access information from machines. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact on usability and perceived quality. Many commonly used NLG systems employ rules and heuristics, which tend to generate inflexible and stylised responses without the natural variation of human language. However, the frequent repetition of identical output forms can quickly make dialogue become tedious for most real-world users. Additionally, these rules and heuristics are not scalable and hence not trivially extensible to other domains or languages. A statistical approach to language generation can learn language decisions directly from data without relying on hand-coded rules or heuristics, which brings scalability and flexibility to NLG. Statistical models also provide an opportunity to learn in-domain human colloquialisms and cross-domain model adaptations. A robust and quasi-supervised NLG model is proposed in this thesis. The model leverages a Recurrent Neural Network (RNN)-based surface realiser and a gating mechanism applied to input semantics. The model is motivated by the Long-Short Term Memory (LSTM) network. The RNN-based surface realiser and gating mechanism use a neural network to learn end-to-end language generation decisions from input dialogue act and sentence pairs; it also integrates sentence planning and surface realisation into a single optimisation problem. The single optimisation not only bypasses the costly intermediate linguistic annotations but also generates more natural and human-like responses. Furthermore, a domain adaptation study shows that the proposed model can be readily adapted and extended to new dialogue domains via a proposed recipe. Continuing the success of end-to-end learning, the second part of the thesis speculates on building an end-to-end dialogue system by framing it as a conditional generation problem. The proposed model encapsulates a belief tracker with a minimal state representation and a generator that takes the dialogue context to produce responses. These features suggest comprehension and fast learning. The proposed model is capable of understanding requests and accomplishing tasks after training on only a few hundred human-human dialogues. A complementary Wizard-of-Oz data collection method is also introduced to facilitate the collection of human-human conversations from online workers. The results demonstrate that the proposed model can talk to human judges naturally, without any difficulty, for a sample application domain. In addition, the results also suggest that the introduction of a stochastic latent variable can help the system model intrinsic variation in communicative intention much better.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Pasquiou, Alexandre. "Deciphering the neural bases of language comprehension using latent linguistic representations". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG041.

Texto completo
Resumen
Au cours des dernières décennies, les modèles de langage (MLs) ont atteint des performances équivalentes à celles de l'homme sur plusieurs tâches. Ces modèles peuvent générer des représentations vectorielles qui capturent diverses propriétés linguistiques des mots d'un texte, telles que la sémantique ou la syntaxe. Les neuroscientifiques ont donc mis à profit ces progrès et ont commencé à utiliser ces modèles pour explorer les bases neurales de la compréhension du langage. Plus précisément, les représentations des ML calculées à partir d'une histoire sont utilisées pour modéliser les données cérébrales d'humains écoutant la même histoire, ce qui permet l'examen de plusieurs niveaux de traitement du langage dans le cerveau. Si les représentations du ML s'alignent étroitement avec une région cérébrale, il est probable que le modèle et la région codent la même information. En utilisant les données cérébrales d'IRMf de participants américains écoutant l'histoire du Petit Prince, cette thèse 1) examine les facteurs influant l'alignement entre les représentations des MLs et celles du cerveau, ainsi que 2) les limites de telles alignements. La comparaison de plusieurs MLs pré-entraînés et personnalisés (GloVe, LSTM, GPT-2 et BERT) a révélé que les Transformers s'alignent mieux aux données d'IRMf que LSTM et GloVe. Cependant, aucun d'entre eux n'est capable d'expliquer tout le signal IRMf, suggérant des limites liées au paradigme d'encodage ou aux MLs. En étudiant l'architecture des Transformers, nous avons constaté qu'aucune région cérébrale n'est mieux expliquée par une couche ou une tête d'attention spécifique. Nos résultats montrent que la nature et la quantité de données d'entraînement affectent l'alignement. Ainsi, les modèles pré-entraînés sur de petits ensembles de données ne sont pas efficaces pour capturer les activations cérébrales. Nous avons aussi montré que l'entraînement des MLs influence leur capacité à s'aligner aux données IRMf et que la perplexité n'est pas un bon prédicteur de leur capacité à s'aligner. Cependant, entraîner les MLs améliore particulièrement leur performance d'alignement dans les régions coeur de la sémantique, indépendamment de l'architecture et des données d'entraînement. Nous avons également montré que les représentations du cerveau et des MLs convergent d'abord pendant l'entraînement du modèle avant de diverger l'une de l'autre. Cette thèse examine en outre les bases neurales de la syntaxe, de la sémantique et de la sensibilité au contexte en développant une méthode qui peut sonder des dimensions linguistiques spécifiques. Cette méthode utilise des MLs restreints en information, c'est-à-dire des architectures entraînées sur des espaces de représentations contenant un type spécifique d'information. Tout d'abord, l'entraînement de MLs sur des représentations sémantiques et syntaxiques a révélé un bon alignement dans la plupart du cortex mais avec des degrés relatifs variables. La quantification de cette sensibilité relative à la syntaxe et à la sémantique a montré que les régions cérébrales les plus sensibles à la syntaxe sont plus localisées, contrairement au traitement de la sémantique qui reste largement distribué dans le cortex. Une découverte notable de cette thèse est que l'étendue des régions cérébrales sensibles à la syntaxe et à la sémantique est similaire dans les deux hémisphères. Cependant, l'hémisphère gauche a une plus grande tendance à distinguer le traitement syntaxique et sémantique par rapport à l'hémisphère droit. Dans un dernier ensemble d'expériences, nous avons conçu une méthode qui contrôle les mécanismes d'attention dans les Transformers afin de générer des représentations qui utilisent un contexte de taille fixe. Cette approche fournit des preuves de la sensibilité au contexte dans la plupart du cortex. De plus, cette analyse a révélé que les hémisphères gauche et droit avaient tendance à traiter respectivement des informations contextuelles plus courtes et plus longues
In the last decades, language models (LMs) have reached human level performance on several tasks. They can generate rich representations (features) that capture various linguistic properties such has semantics or syntax. Following these improvements, neuroscientists have increasingly used them to explore the neural bases of language comprehension. Specifically, LM's features computed from a story are used to fit the brain data of humans listening to the same story, allowing the examination of multiple levels of language processing in the brain. If LM's features closely align with a specific brain region, then it suggests that both the model and the region are encoding the same information. LM-brain comparisons can then teach us about language processing in the brain. Using the fMRI brain data of fifty US participants listening to "The Little Prince" story, this thesis 1) investigates the reasons why LMs' features fit brain activity and 2) examines the limitations of such comparisons. The comparison of several pre-trained and custom-trained LMs (GloVe, LSTM, GPT-2 and BERT) revealed that Transformers better fit fMRI brain data than LSTM and GloVe. Yet, none are able to explain all the fMRI signal, suggesting either limitations related to the encoding paradigm or to the LMs. Focusing specifically on Transformers, we found that no brain region is better fitted by specific attentional head or layer. Our results caution that the nature and the amount of training data greatly affects the outcome, indicating that using off-the-shelf models trained on small datasets is not effective in capturing brain activations. We showed that LMs' training influences their ability to fit fMRI brain data, and that perplexity was not a good predictor of brain score. Still, training LMs particularly improves their fitting performance in core semantic regions, irrespective of the architecture and training data. Moreover, we showed a partial convergence between brain's and LM's representations.Specifically, they first converge during model training before diverging from one another. This thesis further investigates the neural bases of syntax, semantics and context-sensitivity by developing a method that can probe specific linguistic dimensions. This method makes use of "information-restricted LMs", that are customized LMs architectures trained on feature spaces containing a specific type of information, in order to fit brain data. First, training LMs on semantic and syntactic features revealed a good fitting performance in a widespread network, albeit with varying relative degrees. The quantification of this relative sensitivity to syntax and semantics showed that brain regions most attuned to syntax tend to be more localized, while semantic processing remain widely distributed over the cortex. One notable finding from this analysis was that the extent of semantic and syntactic sensitive brain regions was similar across hemispheres. However, the left hemisphere had a greater tendency to distinguish between syntactic and semantic processing compared to the right hemisphere. In a last set of experiments we designed "masked-attention generation", a method that controls the attention mechanisms in transformers, in order to generate latent representations that leverage fixed-size context. This approach provides evidence of context-sensitivity across most of the cortex. Moreover, this analysis found that the left and right hemispheres tend to process shorter and longer contextual information respectively
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Rossi, Alex. "Self-supervised information retrieval: a novel approach based on Deep Metric Learning and Neural Language Models". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Buscar texto completo
Resumen
Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Brorson, Erik. "Classifying Hate Speech using Fine-tuned Language Models". Thesis, Uppsala universitet, Statistiska institutionen, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-352637.

Texto completo
Resumen
Given the explosion in the size of social media, the amount of hate speech is also growing. To efficiently combat this issue we need reliable and scalable machine learning models. Current solutions rely on crowdsourced datasets that are limited in size, or using training data from self-identified hateful communities, that lacks specificity. In this thesis we introduce a novel semi-supervised modelling strategy. It is first trained on the freely available data from the hateful communities and then fine-tuned to classify hateful tweets from crowdsourced annotated datasets. We show that our model reach state of the art performance with minimal hyper-parameter tuning.
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Chen, Charles L. "Neural Network Models for Tasks in Open-Domain and Closed-Domain Question Answering". Ohio University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1578592581367428.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Siniša, Suzić. "Parametarska sinteza ekspresivnog govora". Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2019. https://www.cris.uns.ac.rs/record.jsf?recordId=110631&source=NDLTD&language=en.

Texto completo
Resumen
U disertaciji su opisani postupci sinteze ekspresivnog govorakorišćenjem parametarskih pristupa. Pokazano je da se korišćenjemdubokih neuronskih mreža dobijaju bolji rezultati nego korišćenjemskrivenix Markovljevih modela. Predložene su tri nove metode zasintezu ekspresivnog govora korišćenjem dubokih neuronskih mreža:metoda kodova stila, metoda dodatne obuke mreže i arhitekturazasnovana na deljenim skrivenim slojevima. Pokazano je da se najboljirezultati dobijaju korišćenjem metode kodova stila. Takođe jepredložana i nova metoda za transplantaciju emocija/stilovabazirana na deljenim skrivenim slojevima. Predložena metodaocenjena je bolje od referentne metode iz literature.
In this thesis methods for expressive speech synthesis using parametricapproaches are presented. It is shown that better results are achived withusage of deep neural networks compared to synthesis based on hiddenMarkov models. Three new methods for synthesis of expresive speech usingdeep neural networks are presented: style codes, model re-training andshared hidden layer architecture. It is shown that best results are achived byusing style code method. The new method for style transplantation based onshared hidden layer architecture is also proposed. It is shown that thismethod outperforms referent method from literature.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Fancellu, Federico. "Computational models for multilingual negation scope detection". Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/33038.

Texto completo
Resumen
Negation is a common property of languages, in that there are few languages, if any, that lack means to revert the truth-value of a statement. A challenge to cross-lingual studies of negation lies in the fact that languages encode and use it in different ways. Although this variation has been extensively researched in linguistics, little has been done in automated language processing. In particular, we lack computational models of processing negation that can be generalized across language. We even lack knowledge of what the development of such models would require. These models however exist and can be built by means of existing cross-lingual resources, even when annotated data for a language other than English is not available. This thesis shows this in the context of detecting string-level negation scope, i.e. the set of tokens in a sentence whose meaning is affected by a negation marker (e.g. 'not'). Our contribution has two parts. First, we investigate the scenario where annotated training data is available. We show that Bi-directional Long Short Term Memory (BiLSTM) networks are state-of-the-art models whose features can be generalized across language. We also show that these models suffer from genre effects and that for most of the corpora we have experimented with, high performance is simply an artifact of the annotation styles, where negation scope is often a span of text delimited by punctuation. Second, we investigate the scenario where annotated data is available in only one language, experimenting with model transfer. To test our approach, we first build NEGPAR, a parallel corpus annotated for negation, where pre-existing annotations on English sentences have been edited and extended to Chinese translations. We then show that transferring a model for negation scope detection across languages is possible by means of structured neural models where negation scope is detected on top of a cross-linguistically consistent representation, Universal Dependencies. On the other hand, we found cross-lingual lexical information only to help very little with performance. Finally, error analysis shows that performance is better when a negation marker is in the same dependency substructure as its scope and that some of the phenomena related to negation scope requiring lexical knowledge are still not captured correctly. In the conclusions, we tie up the contributions of this thesis and we point future work towards representing negation scope across languages at the level of logical form as well.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Zamora, Martínez Francisco Julián. "Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática". Doctoral thesis, Universitat Politècnica de València, 2012. http://hdl.handle.net/10251/18066.

Texto completo
Resumen
El procesamiento del lenguaje natural es un área de aplicación de la inteligencia artificial, en particular, del reconocimiento de formas que estudia, entre otras cosas, incorporar información sintáctica (modelo de lenguaje) sobre cómo deben juntarse las palabras de una determinada lengua, para así permitir a los sistemas de reconocimiento/traducción decidir cual es la mejor hipótesis �con sentido común�. Es un área muy amplia, y este trabajo se centra únicamente en la parte relacionada con el modelado de lenguaje y su aplicación a diversas tareas: reconocimiento de secuencias mediante modelos ocultos de Markov y traducción automática estadística. Concretamente, esta tesis tiene su foco central en los denominados modelos conexionistas de lenguaje, esto es, modelos de lenguaje basados en redes neuronales. Los buenos resultados de estos modelos en diversas áreas del procesamiento del lenguaje natural han motivado el desarrollo de este estudio. Debido a determinados problemas computacionales que adolecen los modelos conexionistas de lenguaje, los sistemas que aparecen en la literatura se construyen en dos etapas totalmente desacopladas. En la primera fase se encuentra, a través de un modelo de lenguaje estándar, un conjunto de hipótesis factibles, asumiendo que dicho conjunto es representativo del espacio de búsqueda en el cual se encuentra la mejor hipótesis. En segundo lugar, sobre dicho conjunto, se aplica el modelo conexionista de lenguaje y se extrae la hipótesis con mejor puntuación. A este procedimiento se le denomina �rescoring�. Este escenario motiva los objetivos principales de esta tesis: � Proponer alguna técnica que pueda reducir drásticamente dicho coste computacional degradando lo mínimo posible la calidad de la solución encontrada. � Estudiar el efecto que tiene la integración de los modelos conexionistas de lenguaje en el proceso de búsqueda de las tareas propuestas. � Proponer algunas modificaciones del modelo original que permitan mejorar su calidad
Zamora Martínez, FJ. (2012). Aportaciones al modelado conexionista de lenguaje y su aplicación al reconocimiento de secuencias y traducción automática [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18066
Palancia
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

VENTURA, FRANCESCO. "Explaining black-box deep neural models' predictions, behaviors, and performances through the unsupervised mining of their inner knowledge". Doctoral thesis, Politecnico di Torino, 2021. http://hdl.handle.net/11583/2912972.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Wenestam, Arvid. "Labelling factual information in legal cases using fine-tuned BERT models". Thesis, Uppsala universitet, Statistiska institutionen, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-447230.

Texto completo
Resumen
Labelling factual information on the token level in legal cases requires legal expertise and is time-consuming. This thesis proposes transfer-learning and fine-tuning implementation of pre-trained state-of-the-art BERT models to perform this labelling task. Investigations are done to compare whether models pre-trained on solely legal corpus outperforms a generic corps trained BERT and the model’s behaviour as the number of cases in the training sample varies. This work showed that the models metric scores are stable and on par using 40-60 professionally annotated cases as opposed to using the full sample of 100 cases. Also, the generic-trained BERT model is a strong baseline, and a solely pre-trained BERT on legal corpus is not crucial for this task.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Callin, Jimmy. "Word Representations and Machine Learning Models for Implicit Sense Classification in Shallow Discourse Parsing". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-325876.

Texto completo
Resumen
CoNLL 2015 featured a shared task on shallow discourse parsing. In 2016, the efforts continued with an increasing focus on sense classification. In the case of implicit sense classification, there was an interesting mix of traditional and modern machine learning classifiers using word representation models. In this thesis, we explore the performance of a number of these models, and investigate how they perform using a variety of word representation models. We show that there are large performance differences between word representation models for certain machine learning classifiers, while others are more robust to the choice of word representation model. We also show that with the right choice of word representation model, simple and traditional machine learning classifiers can reach competitive scores even when compared with modern neural network approaches.
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Das, Manirupa. "Neural Methods Towards Concept Discovery from Text via Knowledge Transfer". The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1572387318988274.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Andruccioli, Matteo. "Previsione del Successo di Prodotti di Moda Prima della Commercializzazione: un Nuovo Dataset e Modello di Vision-Language Transformer". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24956/.

Texto completo
Resumen
A differenza di quanto avviene nel commercio tradizionale, in quello online il cliente non ha la possibilità di toccare con mano o provare il prodotto. La decisione di acquisto viene maturata in base ai dati messi a disposizione dal venditore attraverso titolo, descrizioni, immagini e alle recensioni di clienti precedenti. É quindi possibile prevedere quanto un prodotto venderà sulla base di queste informazioni. La maggior parte delle soluzioni attualmente presenti in letteratura effettua previsioni basandosi sulle recensioni, oppure analizzando il linguaggio usato nelle descrizioni per capire come questo influenzi le vendite. Le recensioni, tuttavia, non sono informazioni note ai venditori prima della commercializzazione del prodotto; usando solo dati testuali, inoltre, si tralascia l’influenza delle immagini. L'obiettivo di questa tesi è usare modelli di machine learning per prevedere il successo di vendita di un prodotto a partire dalle informazioni disponibili al venditore prima della commercializzazione. Si fa questo introducendo un modello cross-modale basato su Vision-Language Transformer in grado di effettuare classificazione. Un modello di questo tipo può aiutare i venditori a massimizzare il successo di vendita dei prodotti. A causa della mancanza, in letteratura, di dataset contenenti informazioni relative a prodotti venduti online che includono l’indicazione del successo di vendita, il lavoro svolto comprende la realizzazione di un dataset adatto a testare la soluzione sviluppata. Il dataset contiene un elenco di 78300 prodotti di Moda venduti su Amazon, per ognuno dei quali vengono riportate le principali informazioni messe a disposizione dal venditore e una misura di successo sul mercato. Questa viene ricavata a partire dal gradimento espresso dagli acquirenti e dal posizionamento del prodotto in una graduatoria basata sul numero di esemplari venduti.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Azeraf, Elie. "Classification avec des modèles probabilistes génératifs et des réseaux de neurones. Applications au traitement des langues naturelles". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. https://theses.hal.science/tel-03880848.

Texto completo
Resumen
Un nombre important de modèles probabilistes connaissent une grande perte d'intérêt pour la classification avec apprentissage supervisé depuis un certain nombre d'années, tels que le Naive Bayes ou la chaîne de Markov cachée. Ces modèles, qualifiés de génératifs, sont critiqués car leur classificateur induit doit prendre en compte la loi des observations, qui peut s'avérer très complexe à apprendre quand le nombre de features de ces derniers est élevé. C'est notamment le cas en Traitement des Langues Naturelles, où les récents algorithmes convertissent des mots en vecteurs numériques de grande taille pour atteindre de meilleures performances.Au cours de cette thèse, nous montrons que tout modèle génératif peut définir son classificateur sans prendre en compte la loi des observations. Cette proposition remet en question la catégorisation connue des modèles probabilistes et leurs classificateurs induits - en classes générative et discriminante - et ouvre la voie à un grand nombre d'applications possibles. Ainsi, la chaîne de Markov cachée peut être appliquée sans contraintes à la décomposition syntaxique de textes, ou encore le Naive Bayes à l'analyse de sentiments.Nous allons plus loin, puisque cette proposition permet de calculer le classificateur d'un modèle probabiliste génératif avec des réseaux de neurones. Par conséquent, nous « neuralisons » les modèles cités plus haut ainsi qu'un grand nombre de leurs extensions. Les modèles ainsi obtenus permettant d'atteindre des scores pertinents pour diverses tâches de Traitement des Langues Naturelles tout en étant interprétable, nécessitant peu de données d'entraînement, et étant simple à mettre en production
Many probabilistic models have been neglected for classification tasks with supervised learning for several years, as the Naive Bayes or the Hidden Markov Chain. These models, called generative, are criticized because the induced classifier must learn the observations' law. This problem is too complex when the number of observations' features is too large. It is especially the case with Natural Language Processing tasks, as the recent embedding algorithms convert words in large numerical vectors to achieve better scores.This thesis shows that every generative model can define its induced classifier without using the observations' law. This proposition questions the usual categorization of the probabilistic models and classifiers and allows many new applications. Therefore, Hidden Markov Chain can be efficiently applied to Chunking and Naive Bayes to sentiment analysis.We go further, as this proposition allows to define the classifier induced from a generative model with neural network functions. We "neuralize" the models mentioned above and many of their extensions. Models so obtained allow to achieve relevant scores for many Natural Language Processing tasks while being interpretable, able to require little training data, and easy to serve
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Gorana, Mijatović. "Dekompozicija neuralne aktivnosti: model za empirijsku karakterizaciju inter-spajk intervala". Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2018. https://www.cris.uns.ac.rs/record.jsf?recordId=107498&source=NDLTD&language=en.

Texto completo
Resumen
Disertacija se se bavi analizom mogućnosti brze, efikasnei pouzdane klasterizacije masivnog skupa neuralnihsnimaka na osnovu probabilističkih parametara procenjenihiz obrazaca generisanja akcionih potencijala, tzv."spajkova", na izlazu pojedinih neurona. Neuralnaaktivnost se grubo može podeliti na periode intezivne,umerene i niske aktivnosti. Shodno tome, predložena jegruba dekompozicija neuralne aktivnosti na tri moda kojaodgovaraju navedenim obrascima neuralne aktivnosti, naosnovu dobro poznatog Gilbert-Eliot modela. Modovi sudodatno raščlanjeni na sopstvena stanja na osnovu osobina sukcesivnih spajkova, omogućujući finiji, kompozitniopis neuralne aktivnosti. Za svaki neuron empirijski seprocenjuju probabilistički parametri grube dekompozicije- na osnovu Gilbert-Eliotovog modela i finije dekompozicije- na osnovu sopstvenih stanja modova, obezbeđujućiželjeni skup deskriptora. Dobijeni deskriptorikoriste se kao obeležja nekoliko algoritama klasterizacijenad simuliranim i eksperimentalnim podacima. Za generisanjesimuliranih podataka primenjen je jednostavanmodel za generisanje akcionih potencijala različitihoscilatornih ponašanja pobuđujućih i blokirajućih kortikalnihneurona. Validacija primene probabilističkih parametaraza klasterizaciju rada neurona izvršena je naosnovu estimacije parametera nad generisanim neuralnimodzivima. Eksperimentalni podaci su dobijenisnimanjem kortikografskih signala iz dorzalnog anteriornogcingularanog korteksa i lateralnog prefrontalnogkorteksa korteksa budnih rezus majmuna. U okviru predloženogprotokola evaluacije različitih pristupaklasterizacije testirano je nekoliko metoda. Klasterizacijazasnovana na akumulaciji dokaza iz ansambla particijadobijenih k-means klasterovanjem dala je najstabilnijegrupisanje neuralnih jedinica uz brzu i efikasnu implementaciju.Predložena empirijska karakterizacija može daposluži za identifikaciju korelacije sa spoljašnjim stimulusima,akcijama i ponašanjem životinja u okvirueksperimentalne procedure. Prednosti ovog postupka zaopis neuralne aktivnosti su brza estimacija i mali skupdeskriptora. Računarska efikasnost omogućuje primenunad obimnim, paralelno snimanim neuralnim podacima utoku snimanja ili u periodima od interesa za identifikacijuaktiviranih i povezanih zona pri određenim aktivnostima.
The advances in extracellular neural recording techniquesresult in big data volumes that necessitate fast,reliable, and automatic identification of statisticallysimilar units. This study proposes a single frameworkyielding a compact set of probabilistic descriptors thatcharacterise the firing patterns of a single unit. Probabilisticfeatures are estimated from an inter-spikeintervaltime series, without assumptions about the firing distribution or the stationarity. The first level of proposedfiring patterns decomposition divides the inter-spikeintervals into bursting, moderate and idle firing modes,yielding a coarse feature set. The second level identifiesthe successive bursting spikes, or the spiking acceleration/deceleration in the moderate firing mode, yieldinga refined feature set. The features are estimated fromsimulated data and from experimental recordings fromthe lateral prefrontal cortex in awake, behaving rhesusmonkeys. An effcient and stable partitioning of neuralunits is provided by the ensemble evidence accumulationclustering. The possibility of selecting the number ofclusters and choosing among coarse and refined featuresets provides an opportunity to explore and comparedifferent data partitions. The estimation of features, ifapplied to a single unit, can serve as a tool for the firinganalysis, observing either overall spiking activity or theperiods of interest in trial-to-trial recordings. If applied tomassively parallel recordings, it additionally serves as aninput to the clustering procedure, with the potential tocompare the functional properties of various brainstructures and to link the types of neural cells to theparticular behavioural states.
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Korger, Christina. "Clustering of Distributed Word Representations and its Applicability for Enterprise Search". Master's thesis, Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2016. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-208869.

Texto completo
Resumen
Machine learning of distributed word representations with neural embeddings is a state-of-the-art approach to modelling semantic relationships hidden in natural language. The thesis “Clustering of Distributed Word Representations and its Applicability for Enterprise Search” covers different aspects of how such a model can be applied to knowledge management in enterprises. A review of distributed word representations and related language modelling techniques, combined with an overview of applicable clustering algorithms, constitutes the basis for practical studies. The latter have two goals: firstly, they examine the quality of German embedding models trained with gensim and a selected choice of parameter configurations. Secondly, clusterings conducted on the resulting word representations are evaluated against the objective of retrieving immediate semantic relations for a given term. The application of the final results to company-wide knowledge management is subsequently outlined by the example of the platform intergator and conceptual extensions."
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Rolnic, Sergiu Gabriel. "Anonimizzazione di documenti mediante Named Entity Recognition e Neural Language Model". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Buscar texto completo
Resumen
I transformers hanno rivoluzionato il mondo dell'interpretazione linguistica da parte delle macchine. La possibilità di addestrare un neural language model su vocabolari ed enciclopedie intere, per poi utilizzare le conoscenze acquisite e trasmetterle a task specifici, ha permesso di raggiungere lo stato dell'arte in quasi tutti i domini applicativi del Natural Language Processing. In questo contesto è stato sviluppato un applicativo per l'anonimizzazione di file, in grado di identificare entità specifiche rappresentative di dati personali.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

BIANCHI, FEDERICO. "Corpus-based Comparison of Distributional Models of Language and Knowledge Graphs". Doctoral thesis, Università degli Studi di Milano-Bicocca, 2020. http://hdl.handle.net/10281/263553.

Texto completo
Resumen
L'intelligenza artificiale cerca di spiegare come gli agenti intelligenti si comportano. Il linguaggio è uno dei media di comunicazioni più importanti e studiare delle teorie che permettano di definire il significato di espressioni naturali è molto importante. I linguisti hanno usato con successo linguaggi artificiali basati su logiche, ma una theory che ha avuto un impatto significativo in intelligenza artificiale è la semantica distribuzionale. La semantica distribuzionale afferma che il significato di espressioni in linguaggio naturale può essere derivato dal contesto in cui tali espressioni compaiono. Questa teoria è stata implementata da algoritmi che permettono di generare rappresentazioni vettoriali delle espressioni del linguaggio natural in modo che espressioni simili vengano rappresentate con vettori simili. Negli ultimi anni, gli scienziati cognitivi hanno sottolineato che queste rappresentazioni sono correlate con l'associative learning e che sono anche in grado di catturare bias e stereotype del testo. Diventa quindi importante trovare metodologie per comparare rappresentazioni che arrivano da sorgenti diverse. Ad esempio, usare questi algoritmi su testi di periodi differenti genera rappresentazioni differenti: visto che il linguaggio muta nel tempo, trovare delle metododoloie per comparare come le parole si sono mosse è un task imporante per l'intelligenza artificiale (e.g., la parola "amazon" ha cambiato il suo significato principale negli ultimi anni) In questa tesi, introduciamo un modello comparative basato su testi che permette di comparare rappresentazioni di sorgenti diverse generate con la semantica distribuzionale. Proponiamo un modello che è efficiente ed efficace e mostriamo che possiamo anche gestire nomi di entità e non solo paorle, superando problemi legati all'ambiguità del linguaggio. Alla fine, mostriamo che è possibile combinare questi metodi con approcci logici e fare comparazioni utilizzando costrutti logici.
One of the main goals of artificial intelligence is understanding how intelligent agent acts. Language is one of the most important media of communication, and studying theories that can account for the meaning of natural language expressions is an important task. Language is one of the most important media of communication, and studying theories that can account for the meaning of natural language expressions is a crucial task in artificial intelligence. Distributional semantics states that the meaning of natural language expressions can be derived from the context in which the expressions appear. This theory has been implemented by algorithms that generate vector representations of natural language expressions that represent similar natural language expressions with similar vectors. In the last years, several cognitive scientists have shown that these representations are correlated with associative learning and they capture cognitive biases and stereotypes as they are encoded in text corpora. If language is encoding important aspects of cognition and our associative knowledge, and language usage change across the contexts, the comparison of language usage in different contexts may reveal important associative knowledge patterns. Thus, if we want to reveal these patterns, we need ways to compare distributional representations that are generated from different text corpora. For example, using these algorithms on textual documents from different periods will generate different representations: since language evolves during time, finding a way to compare words that have shifted over time is a valuable task for artificial intelligence (e.g., the word "Amazon" has changed its prevalent meaning during the last years). In this thesis, we introduce a corpus-based comparative model that allows us to compare representations of different sources generated under the distributional semantic theory. We propose a model that is both effective and efficient, and we show that it can also deal with entity names and not just words, overcoming some problems that follow from the ambiguity of natural language. Eventually, we combine these methods with logical approaches. We show that we can do logical reasoning on these representations and make comparisons based on logical constructs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Keisala, Simon. "Using a Character-Based Language Model for Caption Generation". Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-163001.

Texto completo
Resumen
Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness. Previous work has shown that character-based language models are able to outperform token-based language models in morphologically rich languages. Other studies show that simple multi-layered LSTM-blocks are able to learn to replicate the syntax of its training data. To study the usability of character-based language models an alternative model based on TensorFlow im2txt has been created. The model changes the token-generation architecture into handling character-sized tokens instead of word-sized tokens. The results suggest that a character-based language model could outperform the current token-based language models, although due to time and computing power constraints this study fails to draw a clear conclusion. A problem with one of the methods, subsampling, is discussed. When using the original method on character-sized tokens this method removes characters (including special characters) instead of full words. To solve this issue, a two-phase approach is suggested, where training data first is separated into word-sized tokens where subsampling is performed. The remaining tokens are then separated into character-sized tokens. Future work where the modified subsampling and fine-tuning of the hyperparameters are performed is suggested to gain a clearer conclusion of the performance of character-based language models.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Garagnani, Max. "Understanding language and attention : brain-based model and neurophysiological experiments". Thesis, University of Cambridge, 2009. https://www.repository.cam.ac.uk/handle/1810/243852.

Texto completo
Resumen
This work concerns the investigation of the neuronal mechanisms at the basis of language acquisition and processing, and the complex interactions of language and attention processes in the human brain. In particular, this research was motivated by two sets of existing neurophysiological data which cannot be reconciled on the basis of current psycholinguistic accounts: on the one hand, the N400, a robust index of lexico-semantic processing which emerges at around 400ms after stimulus onset in attention demanding tasks and is larger for senseless materials (meaningless pseudowords) than for matched meaningful stimuli (words); on the other, the more recent results on the Mismatch Negativity (MMN, latency 100-250ms), an early automatic brain response elicited under distraction which is larger to words than to pseudowords. We asked what the mechanisms underlying these differential neurophysiological responses may be, and whether attention and language processes could interact so as to produce the observed brain responses, having opposite magnitude and different latencies. We also asked questions about the functional nature and anatomical characteristics of the cortical representation of linguistic elements. These questions were addressed by combining neurocomputational techniques and neuroimaging (magneto-encephalography, MEG) experimental methods. Firstly, a neurobiologically realistic neural-network model composed of neuron-like elements (graded response units) was implemented, which closely replicates the neuroanatomical and connectivity features of the main areas of the left perisylvian cortex involved in spoken language processing (i.e., the areas controlling speech output – left inferior-prefrontal cortex, including Broca’s area – and the main sensory input – auditory – areas, located in the left superior-temporal lobe, including Wernicke’s area). Secondly, the model was used to simulate early word acquisition processes by means of a Hebbian correlation learning rule (which reflects known synaptic plasticity mechanisms of the neocortex). The network was “taught” to associate pairs of auditory and articulatory activation patterns, simulating activity due to perception and production of the same speech sound: as a result, neuronal word representations distributed over the different cortical areas of the model emerged. Thirdly, the network was stimulated, in its “auditory cortex”, with either one of the words it had learned, or new, unfamiliar pseudoword patterns, while the availability of attentional resources was modulated by changing the level of non-specific, global cortical inhibition. In this way, the model was able to replicate both the MMN and N400 brain responses by means of a single set of neuroscientifically grounded principles, providing the first mechanistic account, at the cortical-circuit level, for these data. Finally, in order to verify the neurophysiological validity of the model, its crucial predictions were tested in a novel MEG experiment investigating how attention processes modulate event-related brain responses to speech stimuli. Neurophysiological responses to the same words and pseudowords were recorded while the same subjects were asked to attend to the spoken input or ignore it. The experimental results confirmed the model’s predictions; in particular, profound variability of magnetic brain responses to pseudowords but relative stability of activation to words as a function of attention emerged. While the results of the simulations demonstrated that distributed cortical representations for words can spontaneously emerge in the cortex as a result of neuroanatomical structure and synaptic plasticity, the experimental results confirm the validity of the model and provide evidence in support of the existence of such memory circuits in the brain. This work is a first step towards a mechanistic account of cognition in which the basic atoms of cognitive processing (e.g., words, objects, faces) are represented in the brain as discrete and distributed action-perception networks that behave as closed, independent systems.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Al-Kadhimi, Staffan y Paul Löwenström. "Identification of machine-generated reviews : 1D CNN applied on the GPT-2 neural language model". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280335.

Texto completo
Resumen
With recent advances in machine learning, computers are able to create more convincing text, creating a concern for an increase in fake information on the internet. At the same time, researchers are creating tools for detecting computer-generated text. Researchers have been able to exploit flaws in neural language models and use them against themselves; for example, GLTR provides human users with a visual representation of texts that assists in classification as human-written or machine-generated. By training a convolutional neural network (CNN) on GLTR output data from analysis of machine-generated and human-written movie reviews, we are able to take GLTR a step further and use it to automatically perform this classification. However, using a CNN with GLTR as the main source of data for classification does not appear to be enough to be on par with the best existing approaches.
I och med de senaste framstegen inom maskininlärning kan datorer skapa mer och mer övertygande text, vilket skapar en oro för ökad falsk information på internet. Samtidigt vägs detta upp genom att forskare skapar verktyg för att identifiera datorgenererad text. Forskare har kunnat utnyttja svagheter i neurala språkmodeller och använda dessa mot dem. Till exempel tillhandahåller GLTR användare en visuell representation av texter, som hjälp för att klassificera dessa som människo- skrivna eller maskingenererade. Genom att träna ett faltningsnätverk (convolutional neural network, eller CNN) på utdata från GLTR-analys av maskingenererade och människoskrivna filmrecensioner, tar vi GLTR ett steg längre och använder det för att genomföra klassifikationen automatiskt. Emellertid tycks det ej vara tillräckligt att använda en CNN med GLTR som huvuddatakälla för att klassificera på en nivå som är jämförbar med de bästa existerande metoderna.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Cavallucci, Martina. "Speech Recognition per l'italiano: Sviluppo e Sperimentazione di Soluzioni Neurali con Language Model". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Buscar texto completo
Resumen
Le e-mail e i servizi di messaggistica hanno cambiato significativamente la comunicazione umana, ma la parola è ancora il metodo più importante di comunicazione tra esseri umani. Pertanto, il riconoscimento vocale automatico (ASR) è di particolare rilevanza perché fornisce una trascrizione della lingua parlata che può essere valutata da sistemi automatizzati. Con altoparlanti intelligenti come Google Home, Alexa o Siri, l' ASR è già un parte integrante di molte famiglie ed è usato per suonare musica, rispondere alle domande o controllare altri dispositivi intelligenti come un sistema di domotica. Tuttavia, l' ASR può essere trovato anche in molti altri sistemi, come sistemi di dettatura, traduttori vocali o interfacce utente vocali. Sempre più aziende ne comprendono le potenzialità sopratutto per migliorare i processi aziendali, il lavoro di tesi mira infatti a sperimentare modelli neurali per la trascrizione di Webinar creati dall'azienda ospitante Maggioli dove si è svolto il tirocinio, ottenendo così trascrizioni utili per il recupero delle informazioni e la loro gestione. A tale scopo si sono utilizzati modelli basati sui recenti Transformers e grazie alla tecnica dell'apprendimento auto-supervisionato che apprende da dati non etichettati è stato possibile ottenere buoni risultati su dataset con audio e trascrizioni in italiano di cui si dispongono ancora poche risorse rispetto alla lingua inglese.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Roos, Magnus. "Speech Comprehension : Theoretical approaches and neural correlates". Thesis, Högskolan i Skövde, Institutionen för biovetenskap, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-11240.

Texto completo
Resumen
This review has examined the spatial and temporal neural activation of speech comprehension. Six theories on speech comprehension were selected and reviewed. The most fundamental structures for speech comprehension are the superior temporal gyrus, the fusiform gyrus, the temporal pole, the temporoparietal junction, and the inferior frontal gyrus. Considering temporal aspects of processes, the N400 ERP effect indicates semantic violations, and the P600 indicates re-evaluation of a word due to ambiguity or syntax error. The dual-route processing model provides the most accurate account of neural correlates and streams of activation necessary for speech comprehension, while also being compatible with both the reviewed studies and the reviewed theories. The integrated theory of language production and comprehension provides a contemporary theory of speech production and comprehension with roots in computational neuroscience, which in conjunction with the dual-route processing model could drive the fields of language and neuroscience even further forward.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Souza, Cristiano Roberto de. "Modelos para previsão do risco de crédito". [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/259123.

Texto completo
Resumen
Orientador: Gilmar Barreto
Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de Computação
Made available in DSpace on 2018-08-15T23:37:59Z (GMT). No. of bitstreams: 1 Souza_CristianoRobertode_M.pdf: 1062354 bytes, checksum: 8217be7daba7d7fd194700fdacfc5b03 (MD5) Previous issue date: 2010
Resumo: Os modelos computacionais para previsão do risco financeiro têm ganhado grande importância desde 1970. Com a atual crise financeira os governos tem discutido formas de regular o setor financeiro e a mais conhecida e adotada é a de Basiléia I e II, que é fortemente suportada por modelo de previsão de risco de crédito. Assim este tipo de modelo pode ajudar os governos e as instituições financeiras a conhecerem melhor suas carteiras para assim criarem controle sobre os riscos envolvidos. Para se ter uma idéia da importância destes modelos para as instituições financeiras a avaliação de risco dada pelo modelo é utilizada como forma de mostrar ao Banco Central a qualidade da carteira de crédito. Através desta medida de qualidade o Banco Central exige que os acionistas do banco deixem depositados um percentual do dinheiro emprestado como garantia dos empréstimos duvidosos criando assim o Índice de Basiléia. Com o objetivo de estudar as ferramentas que atualmente auxiliam no desenvolvimento dos modelos de risco de crédito iremos abordar: 1. Técnicas tradicionais Estatísticas, 2. Técnicas Não Paramétricas, 3. Técnicas Computação Natural
Abstract: The computer models to forecast financial risk have gained great importance since 1970 [1]. With the current crisis Financial government has discussed ways to regulate the financial sector, and the most widely known and adopted form is Basel I and II, which is strongly supported by the forecasting models of credit risk. This type of model can help governments and financial institutions to better understand their portfolios so they can establish control over the risks involved. To get an idea of the importance of this models for financial institutions, the risk assessment given by the model is used as a way of showing the central bank quality of credit portfolio. This measure of quality the Central Bank requires that the shareholders of the bank no longer paid a percentage of the borrowed money as collateral in problem loans and thus creating the index of Basel. In order to study the tools that actually support the development to models of credit risk we will cover: 1. Statistics techniques, 2. Non-Parametric Techniques, 3. Natural Computation Techniques
Mestrado
Automação
Mestre em Engenharia Elétrica
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Gennari, Riccardo. "End-to-end Deep Metric Learning con Vision-Language Model per il Fashion Image Captioning". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25772/.

Texto completo
Resumen
L'image captioning è un task di machine learning che consiste nella generazione di una didascalia, o caption, che descriva le caratteristiche di un'immagine data in input. Questo può essere applicato, ad esempio, per descrivere in dettaglio i prodotti in vendita su un sito di e-commerce, migliorando l'accessibilità del sito web e permettendo un acquisto più consapevole ai clienti con difficoltà visive. La generazione di descrizioni accurate per gli articoli di moda online è importante non solo per migliorare le esperienze di acquisto dei clienti, ma anche per aumentare le vendite online. Oltre alla necessità di presentare correttamente gli attributi degli articoli, infatti, descrivere i propri prodotti con il giusto linguaggio può contribuire a catturare l'attenzione dei clienti. In questa tesi, ci poniamo l'obiettivo di sviluppare un sistema in grado di generare una caption che descriva in modo dettagliato l'immagine di un prodotto dell'industria della moda dato in input, sia esso un capo di vestiario o un qualche tipo di accessorio. A questo proposito, negli ultimi anni molti studi hanno proposto soluzioni basate su reti convoluzionali e LSTM. In questo progetto proponiamo invece un'architettura encoder-decoder, che utilizza il modello Vision Transformer per la codifica delle immagini e GPT-2 per la generazione dei testi. Studiamo inoltre come tecniche di deep metric learning applicate in end-to-end durante l'addestramento influenzino le metriche e la qualità delle caption generate dal nostro modello.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Lombardini, Alessandro. "Estrazione di Correlazioni Medicali da Social Post non Etichettati con Language Model Neurali e Data Clustering". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Buscar texto completo
Resumen
La progressiva informatizzazione della società a cui il mondo contemporaneo sta assistendo, ha generato un radicale cambiamento nelle abitudini delle persone, le quali oggi giorno trascorrono sempre più tempo online e creano reti di conoscenza prima inimmaginabili. Tale cambiamento ha coinvolto, nel suo avanzare, anche gli individui affetti da malattie di varia natura. In particolare, la scarsa disponibilità di informazioni che caratterizza alcuni contesti medici, unita al bisogno di dialogare con altre persone aventi la medesima problematica, ha determinato negli ultimi anni una forte crescita di comunità sulle piattaforme social, all’interno delle quali vengono scambiati dettagli rispetto a trattamenti, centri specializzati e dottori. In questo senso, i social network sono diventati il luogo in cui i pazienti sono più propensi a condividere le proprie esperienze e opinioni maturate durante il corso della propria malattia. Questa tesi nasce dalla consapevolezza del valore di tali dati e dalla volontà di consentire un ragionamento logico deduttivo al di sopra di essi. Nello specifico, si intende estrarre — con un approccio non supervisionato, mediante l’uso di language model neurali e data clustering — le correlazioni semantiche racchiuse nell’elevata quantità di testo generato dagli utenti attraverso interazioni social, prendendo l’Acalasia Esofagea come caso di studio.
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Bojan, Batinić. "Model za predviđanje količine ambalažnog i biorazgradivog otpada primenom neuronskih mreža". Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2015. http://www.cris.uns.ac.rs/record.jsf?recordId=94084&source=NDLTD&language=en.

Texto completo
Resumen
U okviru disertacije, korišćenjem veštačkih neuronskih mreža razvijeni su modeli za predviđanje količina ambalažnog i biorazgradivog komunalnog otpada u Republici Srbiji do kraja 2030. godine. Razvoj modela baziran je na zavisnosti između ukupne potrošnje domaćinstva i generisane količine dva posmatrana toka otpada. Pored toga, na bazi zavisnosti sa bruto domaćim proizvodom (BDP), definisan je i model za projekciju zastupljenosti osnovnih opcija tretmana komunalnog otpada u Republici Srbiji za isti period. Na osnovu dobijenih rezultata, stvorene su polazne osnove za procenu potencijala za reciklažu ambalažnog otpada, kao i za procenu u kojoj meri se može očekivati da određene količine biorazgradivog otpada u narednom periodu ne budu odložene na deponije, što je u skladu sa savremenim principima upravljanja otpadom i postojećim zahtevima EU u ovoj oblasti.
By using artificial neural networks, models for prediction of the quantity ofpackaging and biodegradable municipal waste in the Republic of Serbia bythe end of 2030, were developed. Models were based on dependencebetween total household consumption and generated quantities of twoobserved waste streams. In addition, based on dependence with the GrossDomestic Product (GDP), a model for the projection of share of differentmunicipal solid waste treatment options in the Republic of Serbia for the sameperiod, was created. Obtained results represent a starting point for assessingthe potential for recycling of packaging waste, and determination ofbiodegradable municipal waste quantities which expected that in the futureperiod will not be disposed at landfills, in accordance with modern principlesof waste management and existing EU requirements in this area.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Galdo, Carlos y Teddy Chavez. "Prototyputveckling för skalbar motor med förståelse för naturligt språk". Thesis, KTH, Hälsoinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-223350.

Texto completo
Resumen
Förståelse för naturligt språk, språk som har utvecklats av människan ex. talspråk eller teckenspråk, är en del av språkteknik. Det är ett brett ämnesområde där utvecklingen har gått fram i snabb takt senaste 20 åren. En bidragande faktor till denna utveckling är framgångarna med neurala nätverk som är en matematisk modell inspirerad av biologiska hjärnor. Förståelse för naturligt språk används inom många områden där det krävs att applikationer förstår innebörden av textinmatning. Exempel på applikationer som använder förståelse för naturligt språk är Google translate, Googles sökmotor och rättstavningsfunktionen i textredigerarprogram.   A Great Thing AB har utvecklat applikationen Thing Launcher. Thing Launcher är en applikation som hanterar andra applikationer med hjälp av användarens olika kriterier i samband mobilens olika funktionaliteter som; väder, geografisk position, tid mm. Ett exempel kan vara att användaren vill att Spotify ska spela en specifik låt när användaren kommer hem, eller att en taxi ska vara på plats när användaren anländer till en geografisk position.  I dagsläget styr man Thing Launcher med hjälp av textinmatningar. A Great Thing AB behöver hjälp att ta en prototyp på en motor med förståelse för naturligt språk som kan styras av både textinmatning och röstinmatning. Motorn ska användas i applikationen Thing Launcher. Med skalbarhet menas att motorn ska kunna utvecklas, att nya funktioner och applikationer ska kunna läggas till, samtidigt som systemet ska kunna vara i drift och att prestandan påverkas så lite som möjligt.   Detta examensarbete har som syfte att undersöka vilka algoritmer som är lämpliga för att bygga en skalbar motor med förståelse av naturligt språk. Utifrån detta utveckla en prototyp. En litteraturstudie gjordes mellan dolda Markovmodeller och neurala nätverk. Resultatet visade att neurala nätverk var överlägset i förståelse av naturligt språk. Flera typer av neurala nätverk finns implementerade i TensorFlow och den är mycket flexibelt med sitt bredda utbud av kompatibla mobila enheter, vilket nyttar utvecklingen med det modulära aspekten och därför valdes detta som ramverk för att utveckla prototypen. De två viktigaste komponenterna i prototypen bestod av Command tagger, som ska kunna identifiera vilken applikation som användaren vill styra och NER tagger, som ska identifiera vad användaren vill att applikationen ska utföra. För att mäta träffsäkerheten utfördes det två tester, en för respektive tagger, flera gånger som mätte hur ofta komponenterna gissade rätt efter varje träningsrunda. Varje träningsrunda bestod av att komponenterna fick tiotusentals meningar som de fick gissa på följt av facit för att ge feedback. Med hjälp av feedback kunde komponenterna anpassas för hur de agerar i framtiden i samma situation. Command tagger gissade rätt 94 procent av gångerna och Ner tagger gissade rätt 96 procent av gångerna efter de sista träningsrundorna. I prototypen användes Androids inbyggda mjukvara för taligenkänning. Det är en funktion som omvandlar ljudvågor till text. En serverbaserad lösning med REST applikationsgränssnitt utvecklades för att göra motorn skalbar.   Resultatet visar att fungerande prototyp som kan vidareutvecklas till en skalbar motor för naturligt språk.
Natural Language Understanding is a field that is part of Natural Language Processing. Big improvements have been made in the broad field of Natural Language Understanding during the past two decades. One big contribution to this is improvement is Neural Networks, a mathematical model inspired by biological brains. Natural Language Understanding is used in fields that require deeper understanding by applications. Google translate, Google search engine and grammar/spelling check are some examples of applications requiring deeper understanding. Thing Launcher is an application developed by A Great Thing AB. Thing Launcher is an application capable of managing other applications with different parameters. Some examples of parameters the user can use are geographic position and time. The user can as an example control what song will be played when you get home or order an Uber when you arrive to a certain destination. It is possible to control Thing Launcher today by text input. A Great Thing AB needs help developing a prototype capable of understanding text input and speech. The meaning of scalable is that it should be possible to develop, add functions and applications with as little impact as possible on up time and performance of the service. A comparison of suitable algorithms, tools and frameworks has been made in this thesis in order research what it takes to develop a scalable engine with the natural language understanding and then build a prototype from this gathered information. A theoretical comparison was made between Hidden Markov Models and Neural Networks. The results showed that Neural Networks are superior in the field of natural language understanding. The tests made in this thesis indicated that high accuracy could be achieved using neural networks. TensorFlow framework was chosen because it has many different types of neural network implemented in C/C++ ready to be used with Python and alsoand for the wide compatibility with mobile devices.  The prototype should be able to identify voice commands. The prototype has two important components called Command tagger, which is going to identify which application the user wants to control and NER tagger, which is the going to identify what the user wants to do. To calculate the accuracy, two types of tests, one for each component, was executed several times to calculate how often the components guessed right after each training iteration. Each training iteration consisted of giving the components thousands of sentences to guess and giving them feedback by then letting them know the right answers. With the help of feedback, the components were molded to act right in situations like the training. The tests after the training process resulted with the Command tagger guessing right 94% of the time and the NER tagger guessing right 96% of the time. The built-in software in Android was used for speech recognition. This is a function that converts sound waves to text. A server-based solution with REST interface was developed to make the engine scalability. This thesis resulted with a working prototype that can be used to further developed into a scalable engine.
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Zarrinkoub, Sahand. "Transfer Learning in Deep Structured Semantic Models for Information Retrieval". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286310.

Texto completo
Resumen
Recent approaches to IR include neural networks that generate query and document vector representations. The representations are used as the basis for document retrieval and are able to encode semantic features if trained on large datasets, an ability that sets them apart from classical IR approaches such as TF-IDF. However, the datasets necessary to train these networks are not available to the owners of most search services used today, since they are not used by enough users. Thus, methods for enabling the use of neural IR models in data-poor environments are of interest. In this work, a bag-of-trigrams neural IR architecture is used in a transfer learning procedure in an attempt to increase performance on a target dataset by pre-training on external datasets. The target dataset used is WikiQA, and the external datasets are Quora’s Question Pairs, Reuters’ RCV1 and SQuAD. When considering individual model performance, pre-training on Question Pairs and fine-tuning on WikiQA gives us the best individual models. However, when considering average performance, pre-training on the chosen external dataset result in lower performance on the target dataset, both when all datasets are used together and when they are used individually, with different average performance depending on the external dataset used. On average, pre-training on RCV1 and Question Pairs gives the lowest and highest average performance respectively, when considering only the pre-trained networks. Surprisingly, the performance of an untrained, randomly generated network is high, and beats the performance of all pre-trained networks on average. The best performing model on average is a neural IR model trained on the target dataset without prior pre-training.
Nya modeller inom informationssökning inkluderar neurala nät som genererar vektorrepresentationer för sökfrågor och dokument. Dessa vektorrepresentationer används tillsammans med ett likhetsmått för att avgöra relevansen för ett givet dokument med avseende på en sökfråga. Semantiska särdrag i sökfrågor och dokument kan kodas in i vektorrepresentationerna. Detta möjliggör informationssökning baserat på semantiska enheter, vilket ej är möjligt genom de klassiska metoderna inom informationssökning, som istället förlitar sig på den ömsesidiga förekomsten av nyckelord i sökfrågor och dokument. För att träna neurala sökmodeller krävs stora datamängder. De flesta av dagens söktjänster används i för liten utsträckning för att möjliggöra framställande av datamängder som är stora nog att träna en neural sökmodell. Därför är det önskvärt att hitta metoder som möjliggör användadet av neurala sökmodeller i domäner med små tillgängliga datamängder. I detta examensarbete har en neural sökmodell implementerats och använts i en metod avsedd att förbättra dess prestanda på en måldatamängd genom att förträna den på externa datamängder. Måldatamängden som används är WikiQA, och de externa datamängderna är Quoras Question Pairs, Reuters RCV1 samt SquAD. I experimenten erhålls de bästa enskilda modellerna genom att föträna på Question Pairs och finjustera på WikiQA. Den genomsnittliga prestandan över ett flertal tränade modeller påverkas negativt av vår metod. Detta äller både när samtliga externa datamänder används tillsammans, samt när de används enskilt, med varierande prestanda beroende på vilken datamängd som används. Att förträna på RCV1 och Question Pairs ger den största respektive minsta negativa påverkan på den genomsnittliga prestandan. Prestandan hos en slumpmässigt genererad, otränad modell är förvånansvärt hög, i genomsnitt högre än samtliga förtränade modeller, och i nivå med BM25. Den bästa genomsnittliga prestandan erhålls genom att träna på måldatamängden WikiQA utan tidigare förträning.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

Prencipe, Michele Pio. "Elaborazione del Linguaggio Naturale con Metodi Probabilistici e Reti Neurali". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24312/.

Texto completo
Resumen
L'elaborazione del linguaggio naturale (NLP) è il processo per il quale la macchina tenta di imparare le informazioni del parlato o dello scritto tipico dell'essere umano. La procedura è resa particolarmente complessa dalle numerose ambiguità tipiche della lingua o del testo: ironia, metafore, errori ortografici e così via. Grazie all'apprendimento profondo, il Deep Learning, che ha permesso lo sviluppo delle reti neurali, si è raggiunto lo stato dell'arte nell'ambito NLP, tramite l'introduzione di architetture quali Encoder-Decoder, Transformers o meccanismi di attenzione. Le reti neurali, in particolare quelle con memoria o ricorrenti, si prestano molto bene ai task di NLP, per via della loro capacità di apprendere da una grande mole di dati a disposizione, ma anche perché riescono a concentrarsi particolarmente bene sul contesto di ciascuna parola in input o sulla sentiment analysis di una frase. In questo elaborato vengono analizzate le principali tecniche per fare apprendere il linguaggio naturale al calcolatore elettronico; il tutto viene descritto con esempi e parti di codice Python. Per avere una visione completa sull'ambito, si prende come riferimento il libro di testo "Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow" di Aurélien Géron, oltre che alla bibliografia correlata.
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Hubková, Helena. "Named-entity recognition in Czech historical texts : Using a CNN-BiLSTM neural network model". Thesis, Uppsala universitet, Institutionen för lingvistik och filologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-385682.

Texto completo
Resumen
The thesis presents named-entity recognition in Czech historical newspapers from Modern Access to Historical Sources Project. Our goal was to create a specific corpus and annotation manual for the project and evaluate neural networks methods for named-entity recognition within the task. We created the corpus using scanned Czech historical newspapers. The scanned pages were converted to digitize text by optical character recognition (OCR) method. The data were preprocessed by deleting some OCR errors. We also defined specific named entities types for our task and created an annotation manual with examples for the project. Based on that, we annotated the final corpus. To find the most suitable neural networks model for our task, we experimented with different neural networks architectures, namely long short-term memory (LSTM), bidirectional LSTM and CNN-BiLSTM models. Moreover, we experimented with randomly initialized word embeddings that were trained during the training process and pretrained word embeddings for contemporary Czech published as open source by fastText. We achieved the best result F1 score 0.444 using CNN-BiLSTM model and the pretrained word embeddings by fastText. We found out that we do not need to normalize spelling of our historical texts to get closer to contemporary language if we use the neural network model. We provided a qualitative analysis of observed linguistics phenomena as well. We found out that some word forms and pair of words which were not frequent in our training data set were miss-tagged or not tagged at all. Based on that, we can say that larger data sets could improve the results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Šůstek, Martin. "Word2vec modely s přidanou kontextovou informací". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363837.

Texto completo
Resumen
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Morillot, Olivier. "Reconnaissance de textes manuscrits par modèles de Markov cachés et réseaux de neurones récurrents : application à l'écriture latine et arabe". Electronic Thesis or Diss., Paris, ENST, 2014. http://www.theses.fr/2014ENST0002.

Texto completo
Resumen
La reconnaissance d’écriture manuscrite est une composante essentielle de l’analyse de document. Une tendance actuelle de ce domaine est de passer de la reconnaissance de mots isolés à celle d’une séquence de mots. Notre travail consiste donc à proposer un système de reconnaissance de lignes de texte sans segmentation explicite de la ligne en mots. Afin de construire un modèle performant, nous intervenons à plusieurs niveaux du système de reconnaissance. Tout d’abord, nous introduisons deux méthodes de prétraitement originales : un nettoyage des images de lignes de texte et une correction locale de la ligne de base. Ensuite, nous construisons un modèle de langage optimisé pour la reconnaissance de courriers manuscrits. Puis nous proposons deux systèmes de reconnaissance à l’état de l’art fondés sur les HMM (Hidden Markov Models) contextuels et les réseaux de neurones récurrents BLSTM (Bi-directional LongShort-Term Memory). Nous optimisons nos systèmes afin de proposer une comparaison de ces deux approches. Nos systèmes sont évalués sur l’écriture cursive latine et arabe et ont été soumis à deux compétitions internationales de reconnaissance d’écriture. Enfin, enperspective de notre travail, nous présentons une stratégie de reconnaissance pour certaines chaînes de caractères hors-vocabulaire
Handwriting recognition is an essential component of document analysis. One of the popular trends is to go from isolated word to word sequence recognition. Our work aims to propose a text-line recognition system without explicit word segmentation. In order to build an efficient model, we intervene at different levels of the recognition system. First of all, we introduce two new preprocessing techniques : a cleaning and a local baseline correction for text-lines. Then, a language model is built and optimized for handwritten mails. Afterwards, we propose two state-of-the-art recognition systems based on contextual HMMs (Hidden Markov Models) and recurrent neural networks BLSTM (Bi-directional Long Short-Term Memory). We optimize our systems in order to give a comparison of those two approaches. Our systems are evaluated on arabic and latin cursive handwritings and have been submitted to two international handwriting recognition competitions. At last, we introduce a strategy for some out-of-vocabulary character strings recognition, as a prospect of future work
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Dunja, Vrbaški. "Primena mašinskog učenja u problemu nedostajućih podataka pri razvoju prediktivnih modela". Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2020. https://www.cris.uns.ac.rs/record.jsf?recordId=114270&source=NDLTD&language=en.

Texto completo
Resumen
Problem nedostajućih podataka je često prisutan prilikom razvojaprediktivnih modela. Umesto uklanjanja podataka koji sadrževrednosti koje nedostaju mogu se primeniti metode za njihovuimputaciju. Disertacija predlaže metodologiju za pristup analiziuspešnosti imputacija prilikom razvoja prediktivnih modela. Naosnovu iznete metodologije prikazuju se rezultati primene algoritamamašinskog učenja, kao metoda imputacije, prilikom razvoja određenih,konkretnih prediktivnih modela.
The problem of missing data is often present when developing predictivemodels. Instead of removing data containing missing values, methods forimputation can be applied. The dissertation proposes a methodology foranalysis of imputation performance in the development of predictive models.Based on the proposed methodology, results of the application of machinelearning algorithms, as an imputation method in the development of specificmodels, are presented.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Botha, Jan Abraham. "Probabilistic modelling of morphologically rich languages". Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c7.

Texto completo
Resumen
This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía