Dissertations / Theses: 'Deep Discriminative Probabilistic Models'

1

Misino, Eleonora. "Deep Generative Models with Probabilistic Logic Priors." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24058/.

Full text

Abstract:

Many different extensions of the VAE framework have been introduced in the past. How ever, the vast majority of them focused on pure sub-symbolic approaches that are not sufficient for solving generative tasks that require a form of reasoning. In this thesis, we propose the probabilistic logic VAE (PLVAE), a neuro-symbolic deep generative model that combines the representational power of VAEs with the reasoning ability of probabilistic logic programming. The strength of PLVAE resides in its probabilistic logic prior, which provides an interpretable structure to the latent space that can be easily changed in order to apply the model to different scenarios. We provide empirical results of our approach by training PLVAE on a base task and then using the same model to generalize to novel tasks that involve reasoning with the same set of symbols.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhai, Menghua. "Deep Probabilistic Models for Camera Geo-Calibration." UKnowledge, 2018. https://uknowledge.uky.edu/cs_etds/74.

Full text

Abstract:

The ultimate goal of image understanding is to transfer visual images into numerical or symbolic descriptions of the scene that are helpful for decision making. Knowing when, where, and in which direction a picture was taken, the task of geo-calibration makes it possible to use imagery to understand the world and how it changes in time. Current models for geo-calibration are mostly deterministic, which in many cases fails to model the inherent uncertainties when the image content is ambiguous. Furthermore, without a proper modeling of the uncertainty, subsequent processing can yield overly confident predictions. To address these limitations, we propose a probabilistic model for camera geo-calibration using deep neural networks. While our primary contribution is geo-calibration, we also show that learning to geo-calibrate a camera allows us to implicitly learn to understand the content of the scene.

APA, Harvard, Vancouver, ISO, and other styles

3

Georgatzis, Konstantinos. "Dynamical probabilistic graphical models applied to physiological condition monitoring." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28838.

Full text

Abstract:

Intensive Care Units (ICUs) host patients in critical condition who are being monitored by sensors which measure their vital signs. These vital signs carry information about a patient’s physiology and can have a very rich structure at fine resolution levels. The task of analysing these biosignals for the purposes of monitoring a patient’s physiology is referred to as physiological condition monitoring. Physiological condition monitoring of patients in ICUs is of critical importance as their health is subject to a number of events of interest. For the purposes of this thesis, the overall task of physiological condition monitoring is decomposed into the sub-tasks of modelling a patient’s physiology a) under the effect of physiological or artifactual events and b) under the effect of drug administration. The first sub-task is concerned with modelling artifact (such as the taking of blood samples, suction events etc.), and physiological episodes (such as bradycardia), while the second sub-task is focussed on modelling the effect of drug administration on a patient’s physiology. The first contribution of this thesis is the formulation, development and validation of the Discriminative Switching Linear Dynamical System (DSLDS) for the first sub-task. The DSLDS is a discriminative model which identifies the state-of-health of a patient given their observed vital signs using a discriminative probabilistic classifier, and then infers their underlying physiological values conditioned on this status. It is demonstrated on two real-world datasets that the DSLDS is able to outperform an alternative, generative approach in most cases of interest, and that an a-mixture of the two models achieves higher performance than either of the two models separately. The second contribution of this thesis is the formulation, development and validation of the Input-Output Non-Linear Dynamical System (IO-NLDS) for the second sub-task. The IO-NLDS is a non-linear dynamical system for modelling the effect of drug infusions on the vital signs of patients. More specifically, in this thesis the focus is on modelling the effect of the widely used anaesthetic drug Propofol on a patient’s monitored depth of anaesthesia and haemodynamics. A comparison of the IO-NLDS with a model derived from the Pharmacokinetics/Pharmacodynamics (PK/PD) literature on a real-world dataset shows that significant improvements in predictive performance can be provided without requiring the incorporation of expert physiological knowledge.

APA, Harvard, Vancouver, ISO, and other styles

4

Wu, Di. "Human action recognition using deep probabilistic graphical models." Thesis, University of Sheffield, 2014. http://etheses.whiterose.ac.uk/6603/.

Full text

Abstract:

Building intelligent systems that are capable of representing or extracting high-level representations from high-dimensional sensory data lies at the core of solving many A.I. related tasks. Human action recognition is an important topic in computer vision that lies in high-dimensional space. Its applications include robotics, video surveillance, human-computer interaction, user interface design, and multi-media video retrieval amongst others. A number of approaches have been proposed to extract representative features from high-dimensional temporal data, most commonly hard wired geometric or bio-inspired shape context features. This thesis first demonstrates some \emph{ad-hoc} hand-crafted rules for effectively encoding motion features, and later elicits a more generic approach for incorporating structured feature learning and reasoning, \ie deep probabilistic graphical models. The hierarchial dynamic framework first extracts high level features and then uses the learned representation for estimating emission probability to infer action sequences. We show that better action recognition can be achieved by replacing gaussian mixture models by Deep Neural Networks that contain many layers of features to predict probability distributions over states of Markov Models. The framework can be easily extended to include an ergodic state to segment and recognise actions simultaneously. The first part of the thesis focuses on analysis and applications of hand-crafted features for human action representation and classification. We show that the ``hard coded" concept of correlogram can incorporate correlations between time domain sequences and we further investigate multi-modal inputs, \eg depth sensor input and its unique traits for action recognition. The second part of this thesis focuses on marrying probabilistic graphical models with Deep Neural Networks (both Deep Belief Networks and Deep 3D Convolutional Neural Networks) for structured sequence prediction. The proposed Deep Dynamic Neural Network exhibits its general framework for structured 2D data representation and classification. This inspires us to further investigate for applying various graphical models for time-variant video sequences.

APA, Harvard, Vancouver, ISO, and other styles

5

Sokolovska, Nataliya. "Contributions to the estimation of probabilistic discriminative models: semi-supervised learning and feature selection." Phd thesis, Télécom ParisTech, 2010. http://pastel.archives-ouvertes.fr/pastel-00006257.

Full text

Abstract:

Dans cette thèse nous étudions l'estimation de modèles probabilistes discriminants, surtout des aspects d'apprentissage semi-supervisé et de sélection de caractéristiques. Le but de l'apprentissage semi-supervisé est d'améliorer l'efficacité de l'apprentissage supervisé en utilisant des données non-étiquetées. Cet objectif est difficile à atteindre dans les cas des modèles discriminants. Les modèles probabilistes discriminants permettent de manipuler des représentations linguistiques riches, sous la forme de vecteurs de caractéristiques de très grande taille. Travailler en grande dimension pose des problèmes, en particulier computationnels, qui sont exacerbés dans le cadre de modèles de séquences tels que les champs aléatoires conditionnels (CRF). Notre contribution est double. Nous introduisons une méthode originale et simple pour intégrer des données non étiquetées dans une fonction objectif semi-supervisée. Nous démontrons alors que l'estimateur semi-supervisé correspondant est asymptotiquement optimal. Le cas de la régression logistique est illustré par des résultats d'expèriences. Dans cette étude, nous proposons un algorithme d'estimation pour les CRF qui réalise une sélection de modèle, par le truchement d'une pénalisation $L_1$. Nous présentons également les résultats d'expériences menées sur des tâches de traitement des langues (le chunking et la détection des entités nommées), en analysant les performances en généralisation et les caractéristiques sélectionnées. Nous proposons finalement diverses pistes pour améliorer l'efficacité computationelle de cette technique.

APA, Harvard, Vancouver, ISO, and other styles

6

Hager, Paul Andrew. "Investigation of connection between deep learning and probabilistic graphical models." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119552.

Full text

Abstract:

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (page 21).
The field of machine learning (ML) has benefitted greatly from its relationship with the field of classical statistics. In support of that continued expansion, the following proposes an alternative perspective at the link between these fields. The link focuses on probabilistic graphical models in the context of reinforcement learning. Viewing certain algorithms as reinforcement learning gives one an ability to map ML concepts to statistics problems. Training a multi-layer nonlinear perceptron algorithm is equivalent to structure learning problems in probabilistic graphical models (PGMs). The technique of boosting weak rules into an ensemble is weighted sampling. Finally regularizing neural networks using the dropout technique is conditioning on certain observations in PGMs.
by Paul Andrew Hager.
M. Eng.

APA, Harvard, Vancouver, ISO, and other styles

7

Azizpour, Hossein. "Visual Representations and Models: From Latent SVM to Deep Learning." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192289.

Full text

Abstract:

Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning. First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sampled space of the positive distribution is usually structured. The proposed clustering techniques include a novel similarity measure based on exemplar learning, an approach for using additional annotation, and augmenting latent SVM to automatically find clusters whose members can be reliably distinguished from background class. In another effort, a strongly supervised DPM is suggested to study how these models can benefit from privileged information. The extra information comes in the form of semantic parts annotation (i.e. their presence and location). And they are used to constrain DPMs latent variables during or prior to the optimization of the latent SVM. Its effectiveness is demonstrated on the task of animal detection. Finally, we generalize the formulation of discriminative latent variable models, including DPMs, to incorporate new set of latent variables representing the structure or properties of negative samples. Thus, we term them as negative latent variables. We show this generalization affects state-of-the-art techniques and helps the visual recognition by explicitly searching for counter evidences of an object presence. Following the resurgence of deep networks, in the last works of this thesis we have focused on deep learning in order to produce a generic representation for visual recognition. A Convolutional Network (ConvNet) is trained on a largely annotated image classification dataset called ImageNet with $\sim1.3$ million images. Then, the activations at each layer of the trained ConvNet can be treated as the representation of an input image. We show that such a representation is surprisingly effective for various recognition tasks, making it clearly superior to all the handcrafted features previously used in visual recognition (such as HOG in our first works on DPM). We further investigate the ways that one can improve this representation for a task in mind. We propose various factors involving before or after the training of the representation which can improve the efficacy of the ConvNet representation. These factors are analyzed on 16 datasets from various subfields of visual recognition.

QC 20160908

APA, Harvard, Vancouver, ISO, and other styles

8

Farouni, Tarek. "An Overview of Probabilistic Latent Variable Models with anApplication to the Deep Unsupervised Learning of ChromatinStates." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492189894812539.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Qian, Weizhu. "Discovering human mobility from mobile data : probabilistic models and learning algorithms." Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCA025.

Full text

Abstract:

Les données d'utilisation des smartphones peuvent être utilisées pour étudier la mobilité humaine que ce soit en environnement extérieur ouvert ou à l'intérieur de bâtiments. Dans ce travail, nous étudions ces deux aspects de la mobilité humaine en proposant des algorithmes de machine learning adapté aux sources d'information disponibles dans chacun des contextes.Pour l'étude de la mobilité en environnement extérieur, nous utilisons les données de coordonnées GPS collectées pour découvrir les schémas de mobilité quotidiens des utilisateurs. Pour cela, nous proposons un algorithme de clustering automatique utilisant le Dirichlet process Gaussian mixture model (DPGMM) afin de regrouper les trajectoires GPS quotidiennes. Cette méthode de segmentation est basée sur l'estimation des densités de probabilité des trajectoires, ce qui atténue les problèmes causés par le bruit des données.Concernant l'étude de la mobilité humaine dans les bâtiments, nous utilisons les données d'empreintes digitales WiFi collectées par les smartphones. Afin de prédire la trajectoire d'un individu à l'intérieur d'un bâtiment, nous avons conçu un modèle hybride d'apprentissage profond, appelé convolutional mixture density recurrent neural network (CMDRNN), qui combine les avantages de différents réseaux de neurones profonds multiples. De plus, en ce qui concerne la localisation précise en intérieur, nous supposons qu'il existe une distribution latente régissant l'entrée et la sortie en même temps. Sur la base de cette hypothèse, nous avons développé un modèle d'apprentissage semi-supervisé basé sur le variational autoencoder (VAE). Dans la procédure d'apprentissage non supervisé, nous utilisons un modèle VAE pour apprendre une distribution latente de l'entrée qui est composée de données d'empreintes digitales WiFi. Dans la procédure d'apprentissage supervisé, nous utilisons un réseau de neurones pour calculer la cible, coordonnées par l'utilisateur. De plus, sur la base de la même hypothèse utilisée dans le modèle d'apprentissage semi-supervisé basé sur le VAE, nous exploitons la théorie des goulots d'étranglement de l'information pour concevoir un modèle basé sur le variational information bottleneck (VIB). Il s'agit d'un modèle d'apprentissage en profondeur de bout en bout plus facile à former et offrant de meilleures performances.Enfin, les méthodes proposées ont été validées sur plusieurs jeux de données publics acquis en situation réelle. Les résultats obtenus ont permis de vérifier l'efficacité de nos méthodes par rapport à l'existant
Smartphone usage data can be used to study human indoor and outdoor mobility. In our work, we investigate both aspects in proposing machine learning-based algorithms adapted to the different information sources that can be collected.In terms of outdoor mobility, we use the collected GPS coordinate data to discover the daily mobility patterns of the users. To this end, we propose an automatic clustering algorithm using the Dirichlet process Gaussian mixture model (DPGMM) so as to cluster the daily GPS trajectories. This clustering method is based on estimating probability densities of the trajectories, which alleviate the problems caused by the data noise.By contrast, we utilize the collected WiFi fingerprint data to study indoor human mobility. In order to predict the indoor user location at the next time points, we devise a hybrid deep learning model, called the convolutional mixture density recurrent neural network (CMDRNN), which combines the advantages of different multiple deep neural networks. Moreover, as for accurate indoor location recognition, we presume that there exists a latent distribution governing the input and output at the same time. Based on this assumption, we develop a variational auto-encoder (VAE)-based semi-supervised learning model. In the unsupervised learning procedure, we employ a VAE model to learn a latent distribution of the input, the WiFi fingerprint data. In the supervised learning procedure, we use a neural network to compute the target, the user coordinates. Furthermore, based on the same assumption used in the VAE-based semi-supervised learning model, we leverage the information bottleneck theory to devise a variational information bottleneck (VIB)-based model. This is an end-to-end deep learning model which is easier to train and has better performance.Finally, we validate thees proposed methods on several public real-world datasets providing thus results that verify the efficiencies of our methods as compared to other existing methods generally used

APA, Harvard, Vancouver, ISO, and other styles

10

SYED, MUHAMMAD FARRUKH SHAHID. "Data-Driven Approach based on Deep Learning and Probabilistic Models for PHY-Layer Security in AI-enabled Cognitive Radio IoT." Doctoral thesis, Università degli studi di Genova, 2021. http://hdl.handle.net/11567/1048543.

Full text

Abstract:

Cognitive Radio Internet of Things (CR-IoT) has revolutionized almost every field of life and reshaped the technological world. Several tiny devices are seamlessly connected in a CR-IoT network to perform various tasks in many applications. Nevertheless, CR-IoT surfers from malicious attacks that pulverize communication and perturb network performance. Therefore, recently it is envisaged to introduce higher-level Artificial Intelligence (AI) by incorporating Self-Awareness (SA) capabilities into CR-IoT objects to facilitate CR-IoT networks to establish secure transmission against vicious attacks autonomously. In this context, sub-band information from the Orthogonal Frequency Division Multiplexing (OFDM) modulated transmission in the spectrum has been extracted from the radio device receiver terminal, and a generalized state vector (GS) is formed containing low dimension in-phase and quadrature components. Accordingly, a probabilistic method based on learning a switching Dynamic Bayesian Network (DBN) from OFDM transmission with no abnormalities has been proposed to statistically model signal behaviors inside the CR-IoT spectrum. A Bayesian filter, Markov Jump Particle Filter (MJPF), is implemented to perform state estimation and capture malicious attacks. Subsequently, GS containing a higher number of subcarriers has been investigated. In this connection, Variational autoencoders (VAE) is used as a deep learning technique to extract features from high dimension radio signals into low dimension latent space z, and DBN is learned based on GS containing latent space data. Afterward, to perform state estimation and capture abnormalities in a spectrum, Adapted-Markov Jump Particle Filter (A-MJPF) is deployed. The proposed method can capture anomaly that appears due to either jammer attacks in transmission or cognitive devices in a network experiencing different transmission sources that have not been observed previously. The performance is assessed using the receiver operating characteristic (ROC) curves and the area under the curve (AUC) metrics.

APA, Harvard, Vancouver, ISO, and other styles

11

El-Shaer, Mennat Allah. "An Experimental Evaluation of Probabilistic Deep Networks for Real-time Traffic Scene Representation using Graphical Processing Units." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1546539166677894.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Azeraf, Elie. "Classification avec des modèles probabilistes génératifs et des réseaux de neurones. Applications au traitement des langues naturelles." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. https://theses.hal.science/tel-03880848.

Full text

Abstract:

Un nombre important de modèles probabilistes connaissent une grande perte d'intérêt pour la classification avec apprentissage supervisé depuis un certain nombre d'années, tels que le Naive Bayes ou la chaîne de Markov cachée. Ces modèles, qualifiés de génératifs, sont critiqués car leur classificateur induit doit prendre en compte la loi des observations, qui peut s'avérer très complexe à apprendre quand le nombre de features de ces derniers est élevé. C'est notamment le cas en Traitement des Langues Naturelles, où les récents algorithmes convertissent des mots en vecteurs numériques de grande taille pour atteindre de meilleures performances.Au cours de cette thèse, nous montrons que tout modèle génératif peut définir son classificateur sans prendre en compte la loi des observations. Cette proposition remet en question la catégorisation connue des modèles probabilistes et leurs classificateurs induits - en classes générative et discriminante - et ouvre la voie à un grand nombre d'applications possibles. Ainsi, la chaîne de Markov cachée peut être appliquée sans contraintes à la décomposition syntaxique de textes, ou encore le Naive Bayes à l'analyse de sentiments.Nous allons plus loin, puisque cette proposition permet de calculer le classificateur d'un modèle probabiliste génératif avec des réseaux de neurones. Par conséquent, nous « neuralisons » les modèles cités plus haut ainsi qu'un grand nombre de leurs extensions. Les modèles ainsi obtenus permettant d'atteindre des scores pertinents pour diverses tâches de Traitement des Langues Naturelles tout en étant interprétable, nécessitant peu de données d'entraînement, et étant simple à mettre en production
Many probabilistic models have been neglected for classification tasks with supervised learning for several years, as the Naive Bayes or the Hidden Markov Chain. These models, called generative, are criticized because the induced classifier must learn the observations' law. This problem is too complex when the number of observations' features is too large. It is especially the case with Natural Language Processing tasks, as the recent embedding algorithms convert words in large numerical vectors to achieve better scores.This thesis shows that every generative model can define its induced classifier without using the observations' law. This proposition questions the usual categorization of the probabilistic models and classifiers and allows many new applications. Therefore, Hidden Markov Chain can be efficiently applied to Chunking and Naive Bayes to sentiment analysis.We go further, as this proposition allows to define the classifier induced from a generative model with neural network functions. We "neuralize" the models mentioned above and many of their extensions. Models so obtained allow to achieve relevant scores for many Natural Language Processing tasks while being interpretable, able to require little training data, and easy to serve

APA, Harvard, Vancouver, ISO, and other styles

13

Hu, Xu. "Towards efficient learning of graphical models and neural networks with variational techniques." Thesis, Paris Est, 2019. http://www.theses.fr/2019PESC1037.

Full text

Abstract:

Dans cette thèse, je me concentrerai principalement sur l’inférence variationnelle et les modèles probabilistes. En particulier, je couvrirai plusieurs projets sur lesquels j'ai travaillé pendant ma thèse sur l'amélioration de l'efficacité des systèmes AI / ML avec des techniques variationnelles. La thèse comprend deux parties. Dans la première partie, l’efficacité des modèles probabilistes graphiques est étudiée. Dans la deuxième partie, plusieurs problèmes d’apprentissage des réseaux de neurones profonds sont examinés, qui sont liés à l’efficacité énergétique ou à l’efficacité des échantillons
In this thesis, I will mainly focus on variational inference and probabilistic models. In particular, I will cover several projects I have been working on during my PhD about improving the efficiency of AI/ML systems with variational techniques. The thesis consists of two parts. In the first part, the computational efficiency of probabilistic graphical models is studied. In the second part, several problems of learning deep neural networks are investigated, which are related to either energy efficiency or sample efficiency

APA, Harvard, Vancouver, ISO, and other styles

14

Balikas, Georgios. "Explorer et apprendre à partir de collections de textes multilingues à l'aide des modèles probabilistes latents et des réseaux profonds." Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM054/document.

Full text

Abstract:

Le texte est l'une des sources d'informations les plus répandues et les plus persistantes. L'analyse de contenu du texte se réfère à des méthodes d'étude et de récupération d'informations à partir de documents. Aujourd'hui, avec une quantité de texte disponible en ligne toujours croissante l'analyse de contenu du texte revêt une grande importance parce qu' elle permet une variété d'applications. À cette fin, les méthodes d'apprentissage de la représentation sans supervision telles que les modèles thématiques et les word embeddings constituent des outils importants.L'objectif de cette dissertation est d'étudier et de relever des défis dans ce domaine.Dans la première partie de la thèse, nous nous concentrons sur les modèles thématiques et plus précisément sur la manière d'incorporer des informations antérieures sur la structure du texte à ces modèles.Les modèles de sujets sont basés sur le principe du sac-de-mots et, par conséquent, les mots sont échangeables. Bien que cette hypothèse profite les calculs des probabilités conditionnelles, cela entraîne une perte d'information.Pour éviter cette limitation, nous proposons deux mécanismes qui étendent les modèles de sujets en intégrant leur connaissance de la structure du texte. Nous supposons que les documents sont répartis dans des segments de texte cohérents. Le premier mécanisme attribue le même sujet aux mots d'un segment. La seconde, capitalise sur les propriétés de copulas, un outil principalement utilisé dans les domaines de l'économie et de la gestion des risques, qui sert à modéliser les distributions communes de densité de probabilité des variables aléatoires tout en n'accédant qu'à leurs marginaux.La deuxième partie de la thèse explore les modèles de sujets bilingues pour les collections comparables avec des alignements de documents explicites. En règle générale, une collection de documents pour ces modèles se présente sous la forme de paires de documents comparables. Les documents d'une paire sont écrits dans différentes langues et sont thématiquement similaires. À moins de traductions, les documents d'une paire sont semblables dans une certaine mesure seulement. Pendant ce temps, les modèles de sujets représentatifs supposent que les documents ont des distributions thématiques identiques, ce qui constitue une hypothèse forte et limitante. Pour le surmonter, nous proposons de nouveaux modèles thématiques bilingues qui intègrent la notion de similitude interlingue des documents qui constituent les paires dans leurs processus générateurs et d'inférence.La dernière partie de la thèse porte sur l'utilisation d'embeddings de mots et de réseaux de neurones pour trois applications d'exploration de texte. Tout d'abord, nous abordons la classification du document polylinguistique où nous soutenons que les traductions d'un document peuvent être utilisées pour enrichir sa représentation. À l'aide d'un codeur automatique pour obtenir ces représentations de documents robustes, nous démontrons des améliorations dans la tâche de classification de documents multi-classes. Deuxièmement, nous explorons la classification des tweets à plusieurs tâches en soutenant que, en formant conjointement des systèmes de classification utilisant des tâches corrélées, on peut améliorer la performance obtenue. À cette fin, nous montrons comment réaliser des performances de pointe sur une tâche de classification du sentiment en utilisant des réseaux neuronaux récurrents. La troisième application que nous explorons est la récupération d'informations entre langues. Compte tenu d'un document écrit dans une langue, la tâche consiste à récupérer les documents les plus similaires à partir d'un ensemble de documents écrits dans une autre langue. Dans cette ligne de recherche, nous montrons qu'en adaptant le problème du transport pour la tâche d'estimation des distances documentaires, on peut obtenir des améliorations importantes
Text is one of the most pervasive and persistent sources of information. Content analysis of text in its broad sense refers to methods for studying and retrieving information from documents. Nowadays, with the ever increasing amounts of text becoming available online is several languages and different styles, content analysis of text is of tremendous importance as it enables a variety of applications. To this end, unsupervised representation learning methods such as topic models and word embeddings constitute prominent tools.The goal of this dissertation is to study and address challengingproblems in this area, focusing on both the design of novel text miningalgorithms and tools, as well as on studying how these tools can be applied to text collections written in a single or several languages.In the first part of the thesis we focus on topic models and more precisely on how to incorporate prior information of text structure to such models.Topic models are built on the premise of bag-of-words, and therefore words are exchangeable. While this assumption benefits the calculations of the conditional probabilities it results in loss of information.To overcome this limitation we propose two mechanisms that extend topic models by integrating knowledge of text structure to them. We assume that the documents are partitioned in thematically coherent text segments. The first mechanism assigns the same topic to the words of a segment. The second, capitalizes on the properties of copulas, a tool mainly used in the fields of economics and risk management that is used to model the joint probability density distributions of random variables while having access only to their marginals.The second part of the thesis explores bilingual topic models for comparable corpora with explicit document alignments. Typically, a document collection for such models is in the form of comparable document pairs. The documents of a pair are written in different languages and are thematically similar. Unless translations, the documents of a pair are similar to some extent only. Meanwhile, representative topic models assume that the documents have identical topic distributions, which is a strong and limiting assumption. To overcome it we propose novel bilingual topic models that incorporate the notion of cross-lingual similarity of the documents that constitute the pairs in their generative and inference processes. Calculating this cross-lingual document similarity is a task on itself, which we propose to address using cross-lingual word embeddings.The last part of the thesis concerns the use of word embeddings and neural networks for three text mining applications. First, we discuss polylingual document classification where we argue that translations of a document can be used to enrich its representation. Using an auto-encoder to obtain these robust document representations we demonstrate improvements in the task of multi-class document classification. Second, we explore multi-task sentiment classification of tweets arguing that by jointly training classification systems using correlated tasks can improve the obtained performance. To this end we show how can achieve state-of-the-art performance on a sentiment classification task using recurrent neural networks. The third application we explore is cross-lingual information retrieval. Given a document written in one language, the task consists in retrieving the most similar documents from a pool of documents written in another language. In this line of research, we show that by adapting the transportation problem for the task of estimating document distances one can achieve important improvements

APA, Harvard, Vancouver, ISO, and other styles

15

Pandey, Gaurav. "Deep Learning with Minimal Supervision." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/4315.

Full text

Abstract:

Abstract In recent years, deep neural networks have achieved extraordinary performance on supervised learning tasks. Convolutional neural networks (CNN) have vastly improved the state of the art for most computer vision tasks including object recognition and segmentation. However, their success relies on the presence of a large amount of labeled data. In contrast, relatively fewer work has been done in deep learning to handle scenarios when access to ground truth is limited, partial or completely absent. In this thesis, we propose models to handle challenging problems with limited labeled information. Our first contribution is a neural architecture that allows for the extraction of infinitely many features from an object while allowing for tractable inference. This is achieved by using the `kernel trick', that is, we express the inner product in the infinite dimensional feature space as a kernel. The kernel can either be computed exactly for single layer feedforward networks, or approximated by an iterative algorithm for deep convolutional networks. The corresponding models are referred to as stretched deep networks (SDN). We show that when the amount of training data is limited, SDNs with random weights drastically outperform fully supervised CNNs with similar architectures. While SDNs perform reasonably well for classification with limited labeled data, they can not utilize unlabeled data which is often much easier to obtain. A common approach to utilize unlabeled data is to couple the classifier with an autoencoder (or its variants) thereby minimizing reconstruction error in addition to the classification error. We discuss the limitations of decoder based architectures and propose a model that allows for the utilization of unlabeled data without the need of a decoder. This is achieved by jointly modeling the distribution of data and latent features in a manner that explicitly assigns zero probability to unobserved data. The joint probability of the data and the latent features is maximized using a two-step EM-like procedure. Depending on the task, we allow the latent features to be one-hot or real-valued vectors and define a suitable prior on the features. For instance, one-hot features correspond to class labels and are directly used for the unsupervised and semi-supervised classification tasks. For real-valued features, we use hierarchical Bayesian models as priors over the latent features. Hence, the proposed model, which we refer to as discriminative encoder (or DisCoder), is flexible in the type of latent features that it can capture. The proposed model achieves state-of-the-art performance on several challenging datasets. Having addressed the problem of utilizing unlabeled data for classification, we move to a domain where obtaining labels is a lot more expensive, that is, semantic segmentation of images. Explicitly labeling each pixel of an image with the object that the pixel belongs to, is an expensive operation, in terms of time as well as effort? Currently, only a few classes of images have been densely (pixel-level) labeled. Even among these classes, only a few images per class have pixel-level supervision. Models that rely on densely-labeled images, cannot utilize a much larger set of weakly annotated images available on the web. Moreover, these models cannot learn the segmentation masks for new classes, where there is no densely labeled data. Hence, we propose a model for utilizing weakly-labeled data for semantic segmentation of images. This is achieved by generating fake labels for each image, while simultaneously forcing the output of the CNN to satisfy the mean-field constraints imposed by a conditional random field. We show that one can enforce the CRF constraints by forcing the distribution at each pixel to be close to the distribution of its neighbors. The proposed model is very fast to train and achieves state-of-the-art performance on the popular VOC-2012 dataset for the task of weakly supervised semantic segmentation of images.

APA, Harvard, Vancouver, ISO, and other styles

16

Salakhutdinov, Ruslan. "Learning Deep Generative Models." Thesis, 2009. http://hdl.handle.net/1807/19226.

Full text

Abstract:

Building intelligent systems that are capable of extracting high-level representations from high-dimensional sensory data lies at the core of solving many AI related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that involve many layers of nonlinear processing. The aim of the thesis is to demonstrate that deep generative models that contain many layers of latent variables and millions of parameters can be learned efficiently, and that the learned high-level feature representations can be successfully applied in a wide spectrum of application domains, including visual object recognition, information retrieval, and classification and regression tasks. In addition, similar methods can be used for nonlinear dimensionality reduction.

APA, Harvard, Vancouver, ISO, and other styles

17

Tran, Dustin. "Probabilistic Programming for Deep Learning." Thesis, 2020. https://doi.org/10.7916/d8-95c9-sj96.

Full text

Abstract:

We propose the idea of deep probabilistic programming, a synthesis of advances for systems at the intersection of probabilistic modeling and deep learning. Such systems enable the development of new probabilistic models and inference algorithms that would otherwise be impossible: enabling unprecedented scales to billions of parameters, distributed and mixed precision environments, and AI accelerators; integration with neural architectures for modeling massive and high-dimensional datasets; and the use of computation graphs for automatic differentiation and arbitrary manipulation of probabilistic programs for flexible inference and model criticism. After describing deep probabilistic programming, we discuss applications in novel variational inference algorithms and deep probabilistic models. First, we introduce the variational Gaussian process (VGP), a Bayesian nonparametric variational family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity of the true posterior. Second, we introduce hierarchical implicit models (HIMs). HIMs combine the idea of implicit densities with hierarchical Bayesian modeling, thereby defining models via simulators of data with rich hidden structure.

APA, Harvard, Vancouver, ISO, and other styles

18

"Can Knowledge Rich Sentences Help Language Models To Solve Common Sense Reasoning Problems?" Master's thesis, 2019. http://hdl.handle.net/2286/R.I.55573.

Full text

Abstract:

abstract: Significance of real-world knowledge for Natural Language Understanding(NLU) is well-known for decades. With advancements in technology, challenging tasks like question-answering, text-summarizing, and machine translation are made possible with continuous efforts in the field of Natural Language Processing(NLP). Yet, knowledge integration to answer common sense questions is still a daunting task. Logical reasoning has been a resort for many of the problems in NLP and has achieved considerable results in the field, but it is difficult to resolve the ambiguities in a natural language. Co-reference resolution is one of the problems where ambiguity arises due to the semantics of the sentence. Another such problem is the cause and result statements which require causal commonsense reasoning to resolve the ambiguity. Modeling these type of problems is not a simple task with rules or logic. State-of-the-art systems addressing these problems use a trained neural network model, which claims to have overall knowledge from a huge trained corpus. These systems answer the questions by using the knowledge embedded in their trained language model. Although the language models embed the knowledge from the data, they use occurrences of words and frequency of co-existing words to solve the prevailing ambiguity. This limits the performance of language models to solve the problems in common-sense reasoning task as it generalizes the concept rather than trying to answer the problem specific to its context. For example, "The painting in Mark's living room shows an oak tree. It is to the right of a house", is a co-reference resolution problem which requires knowledge. Language models can resolve whether "it" refers to "painting" or "tree", since "house" and "tree" are two common co-occurring words so the models can resolve "tree" to be the co-reference. On the other hand, "The large ball crashed right through the table. Because it was made of Styrofoam ." to resolve for "it" which can be either "table" or "ball", is difficult for a language model as it requires more information about the problem. In this work, I have built an end-to-end framework, which uses the automatically extracted knowledge based on the problem. This knowledge is augmented with the language models using an explicit reasoning module to resolve the ambiguity. This system is built to improve the accuracy of the language models based approaches for commonsense reasoning. This system has proved to achieve the state of the art accuracy on the Winograd Schema Challenge.
Dissertation/Thesis
Masters Thesis Computer Science 2019

APA, Harvard, Vancouver, ISO, and other styles

19

Dinh, Laurent. "Reparametrization in deep learning." Thèse, 2018. http://hdl.handle.net/1866/21139.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Tan, Shawn. "Latent variable language models." Thèse, 2018. http://hdl.handle.net/1866/22131.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Almahairi, Amjad. "Advances in deep learning with limited supervision and computational resources." Thèse, 2018. http://hdl.handle.net/1866/23434.

Full text

Abstract:

Les réseaux de neurones profonds sont la pierre angulaire des systèmes à la fine pointe de la technologie pour une vaste gamme de tâches, comme la reconnaissance d'objets, la modélisation du langage et la traduction automatique. Mis à part le progrès important établi dans les architectures et les procédures de formation des réseaux de neurones profonds, deux facteurs ont été la clé du succès remarquable de l'apprentissage profond : la disponibilité de grandes quantités de données étiquetées et la puissance de calcul massive. Cette thèse par articles apporte plusieurs contributions à l'avancement de l'apprentissage profond, en particulier dans les problèmes avec très peu ou pas de données étiquetées, ou avec des ressources informatiques limitées. Le premier article aborde la question de la rareté des données dans les systèmes de recommandation, en apprenant les représentations distribuées des produits à partir des commentaires d'évaluation de produits en langage naturel. Plus précisément, nous proposons un cadre d'apprentissage multitâches dans lequel nous utilisons des méthodes basées sur les réseaux de neurones pour apprendre les représentations de produits à partir de textes de critiques de produits et de données d'évaluation. Nous démontrons que la méthode proposée peut améliorer la généralisation dans les systèmes de recommandation et atteindre une performance de pointe sur l'ensemble de données Amazon Reviews. Le deuxième article s'attaque aux défis computationnels qui existent dans l'entraînement des réseaux de neurones profonds à grande échelle. Nous proposons une nouvelle architecture de réseaux de neurones conditionnels permettant d'attribuer la capacité du réseau de façon adaptative, et donc des calculs, dans les différentes régions des entrées. Nous démontrons l'efficacité de notre modèle sur les tâches de reconnaissance visuelle où les objets d'intérêt sont localisés à la couche d'entrée, tout en maintenant une surcharge de calcul beaucoup plus faible que les architectures standards des réseaux de neurones. Le troisième article contribue au domaine de l'apprentissage non supervisé, avec l'aide du paradigme des réseaux antagoniste génératifs. Nous introduisons un cadre fléxible pour l'entraînement des réseaux antagonistes génératifs, qui non seulement assure que le générateur estime la véritable distribution des données, mais permet également au discriminateur de conserver l'information sur la densité des données à l'optimum global. Nous validons notre cadre empiriquement en montrant que le discriminateur est capable de récupérer l'énergie de la distribution des données et d'obtenir une qualité d'échantillons à la fine pointe de la technologie. Enfin, dans le quatrième article, nous nous attaquons au problème de l'apprentissage non supervisé à travers différents domaines. Nous proposons un modèle qui permet d'apprendre des transformations plusieurs à plusieurs à travers deux domaines, et ce, à partir des données non appariées. Nous validons notre approche sur plusieurs ensembles de données se rapportant à l'imagerie, et nous montrons que notre méthode peut être appliquée efficacement dans des situations d'apprentissage semi-supervisé.
Deep neural networks are the cornerstone of state-of-the-art systems for a wide range of tasks, including object recognition, language modelling and machine translation. In the last decade, research in the field of deep learning has led to numerous key advances in designing novel architectures and training algorithms for neural networks. However, most success stories in deep learning heavily relied on two main factors: the availability of large amounts of labelled data and massive computational resources. This thesis by articles makes several contributions to advancing deep learning, specifically in problems with limited or no labelled data, or with constrained computational resources. The first article addresses sparsity of labelled data that emerges in the application field of recommender systems. We propose a multi-task learning framework that leverages natural language reviews in improving recommendation. Specifically, we apply neural-network-based methods for learning representations of products from review text, while learning from rating data. We demonstrate that the proposed method can achieve state-of-the-art performance on the Amazon Reviews dataset. The second article tackles computational challenges in training large-scale deep neural networks. We propose a conditional computation network architecture which can adaptively assign its capacity, and hence computations, across different regions of the input. We demonstrate the effectiveness of our model on visual recognition tasks where objects are spatially localized within the input, while maintaining much lower computational overhead than standard network architectures. The third article contributes to the domain of unsupervised learning with the generative adversarial networks paradigm. We introduce a flexible adversarial training framework, in which not only the generator converges to the true data distribution, but also the discriminator recovers the relative density of the data at the optimum. We validate our framework empirically by showing that the discriminator is able to accurately estimate the true energy of data while obtaining state-of-the-art quality of samples. Finally, in the fourth article, we address the problem of unsupervised domain translation. We propose a model which can learn flexible, many-to-many mappings across domains from unpaired data. We validate our approach on several image datasets, and we show that it can be effectively applied in semi-supervised learning settings.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Deep Discriminative Probabilistic Models'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles