Dissertations / Theses on the topic 'Modèle génératif profond'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 32 dissertations / theses for your research on the topic 'Modèle génératif profond.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Sadok, Samir. "Audiovisual speech representation learning applied to emotion recognition." Electronic Thesis or Diss., CentraleSupélec, 2024. http://www.theses.fr/2024CSUP0003.
Emotions are vital in our daily lives, becoming a primary focus of ongoing research. Automatic emotion recognition has gained considerable attention owing to its wide-ranging applications across sectors such as healthcare, education, entertainment, and marketing. This advancement in emotion recognition is pivotal for fostering the development of human-centric artificial intelligence. Supervised emotion recognition systems have significantly improved over traditional machine learning approaches. However, this progress encounters limitations due to the complexity and ambiguous nature of emotions. Acquiring extensive emotionally labeled datasets is costly, time-intensive, and often impractical.Moreover, the subjective nature of emotions results in biased datasets, impacting the learning models' applicability in real-world scenarios. Motivated by how humans learn and conceptualize complex representations from an early age with minimal supervision, this approach demonstrates the effectiveness of leveraging prior experience to adapt to new situations. Unsupervised or self-supervised learning models draw inspiration from this paradigm. Initially, they aim to establish a general representation learning from unlabeled data, akin to the foundational prior experience in human learning. These representations should adhere to criteria like invariance, interpretability, and effectiveness. Subsequently, these learned representations are applied to downstream tasks with limited labeled data, such as emotion recognition. This mirrors the assimilation of new situations in human learning. In this thesis, we aim to propose unsupervised and self-supervised representation learning methods designed explicitly for multimodal and sequential data and to explore their potential advantages in the context of emotion recognition tasks. The main contributions of this thesis encompass:1. Developing generative models via unsupervised or self-supervised learning for audiovisual speech representation learning, incorporating joint temporal and multimodal (audiovisual) modeling.2. Structuring the latent space to enable disentangled representations, enhancing interpretability by controlling human-interpretable latent factors.3. Validating the effectiveness of our approaches through both qualitative and quantitative analyses, in particular on emotion recognition task. Our methods facilitate signal analysis, transformation, and generation
Hadjeres, Gaëtan. "Modèles génératifs profonds pour la génération interactive de musique symbolique." Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUS027/document.
This thesis discusses the use of deep generative models for symbolic music generation. We will be focused on devising interactive generative models which are able to create new creative processes through a fruitful dialogue between a human composer and a computer. Recent advances in artificial intelligence led to the development of powerful generative models able to generate musical content without the need of human intervention. I believe that this practice cannot be thriving in the future since the human experience and human appreciation are at the crux of the artistic production. However, the need of both flexible and expressive tools which could enhance content creators' creativity is patent; the development and the potential of such novel A.I.-augmented computer music tools are promising. In this manuscript, I propose novel architectures that are able to put artists back in the loop. The proposed models share the common characteristic that they are devised so that a user can control the generated musical contents in a creative way. In order to create a user-friendly interaction with these interactive deep generative models, user interfaces were developed. I believe that new compositional paradigms will emerge from the possibilities offered by these enhanced controls. This thesis ends on the presentation of genuine musical projects like concerts featuring these new creative tools
Hadjeres, Gaëtan. "Modèles génératifs profonds pour la génération interactive de musique symbolique." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS027.
This thesis discusses the use of deep generative models for symbolic music generation. We will be focused on devising interactive generative models which are able to create new creative processes through a fruitful dialogue between a human composer and a computer. Recent advances in artificial intelligence led to the development of powerful generative models able to generate musical content without the need of human intervention. I believe that this practice cannot be thriving in the future since the human experience and human appreciation are at the crux of the artistic production. However, the need of both flexible and expressive tools which could enhance content creators' creativity is patent; the development and the potential of such novel A.I.-augmented computer music tools are promising. In this manuscript, I propose novel architectures that are able to put artists back in the loop. The proposed models share the common characteristic that they are devised so that a user can control the generated musical contents in a creative way. In order to create a user-friendly interaction with these interactive deep generative models, user interfaces were developed. I believe that new compositional paradigms will emerge from the possibilities offered by these enhanced controls. This thesis ends on the presentation of genuine musical projects like concerts featuring these new creative tools
Mehr, Éloi. "Unsupervised Learning of 3D Shape Spaces for 3D Modeling." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS566.
Even though 3D data is becoming increasingly more popular, especially with the democratization of virtual and augmented experiences, it remains very difficult to manipulate a 3D shape, even for designers or experts. Given a database containing 3D instances of one or several categories of objects, we want to learn the manifold of plausible shapes in order to develop new intelligent 3D modeling and editing tools. However, this manifold is often much more complex compared to the 2D domain. Indeed, 3D surfaces can be represented using various embeddings, and may also exhibit different alignments and topologies. In this thesis we study the manifold of plausible shapes in the light of the aforementioned challenges, by deepening three different points of view. First of all, we consider the manifold as a quotient space, in order to learn the shapes’ intrinsic geometry from a dataset where the 3D models are not co-aligned. Then, we assume that the manifold is disconnected, which leads to a new deep learning model that is able to automatically cluster and learn the shapes according to their typology. Finally, we study the conversion of an unstructured 3D input to an exact geometry, represented as a structured tree of continuous solid primitives
Lucas, Thomas. "Modèles génératifs profonds : sur-généralisation et abandon de mode." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM049.
This dissertation explores the topic of generative modelling of natural images,which is the task of fitting a data generating distribution.Such models can be used to generate artificial data resembling the true data, or to compress images.Latent variable models, which are at the core of our contributions, seek to capture the main factors of variations of an image into a variable that can be manipulated.In particular we build on two successful latent variable generative models, the generative adversarial network (GAN) and Variational autoencoder (VAE) models.Recently GANs significantly improved the quality of images generated by deep models, obtaining very compelling samples.Unfortunately these models struggle to capture all the modes of the original distribution, ie they do not cover the full variability of the dataset.Conversely, likelihood based models such as VAEs typically cover the full variety of the data well and provide an objective measure of coverage.However these models produce samples of inferior visual quality that are more easily distinguished from real ones.The work presented in this thesis strives for the best of both worlds: to obtain compelling samples while modelling the full support of the distribution.To achieve that, we focus on i) the optimisation problems used and ii) practical model limitations that hinder performance.The first contribution of this manuscript is a deep generative model that encodes global image structure into latent variables, built on the VAE, and autoregressively models low level detail.We propose a training procedure relying on an auxiliary loss function to control what information is captured by the latent variables and what information is left to an autoregressive decoder.Unlike previous approaches to such hybrid models, ours does not need to restrict the capacity of the autoregressive decoder to prevent degenerate models that ignore the latent variables.The second contribution builds on the standard GAN model, which trains a discriminator network to provide feedback to a generative network.The discriminator usually assesses the quality of individual samples, which makes it hard to evaluate the variability of the data.Instead we propose to feed the discriminator with emph{batches} that mix both true and fake samples, and train it to predict the ratio of true samples in the batch.These batches work as approximations of the distribution of generated images and allows the discriminator to approximate distributional statistics.We introduce an architecture that is well suited to solve this problem efficiently,and show experimentally that our approach reduces mode collapse in GANs on two synthetic datasets, and obtains good results on the CIFAR10 and CelebA datasets.The mutual shortcomings of VAEs and GANs can in principle be addressed by training hybrid models that use both types of objective.In our third contribution, we show that usual parametric assumptions made in VAEs induce a conflict between them, leading to lackluster performance of hybrid models.We propose a solution based on deep invertible transformations, that trains a feature space in which usual assumptions can be made without harm.Our approach provides likelihood computations in image space while being able to take advantage of adversarial training.It obtains GAN-like samples that are competitive with fully adversarial models while improving likelihood scores over existing hybrid models at the time of publication, which is a significant advancement
Prang, Mathieu. "Representation learning for symbolic music." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS489.
A key part in the recent success of deep language processing models lies in the ability to learn efficient word embeddings. These methods provide structured spaces of reduced dimensionality with interesting metric relationship properties. These, in turn, can be used as efficient input representations for handling more complex tasks. In this thesis, we focus on the task of learning embedding spaces for polyphonic music in the symbolic domain. To do so, we explore two different approaches.We introduce an embedding model based on a convolutional network with a novel type of self-modulated hierarchical attention, which is computed at each layer to obtain a hierarchical vision of musical information.Then, we propose another system based on VAEs, a type of auto-encoder that constrains the data distribution of the latent space to be close to a prior distribution. As polyphonic music information is very complex, the design of input representation is a crucial process. Hence, we introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal.Finally, we show the potential of the resulting embedding spaces through the development of several creative applications used to enhance musical knowledge and expression, through tasks such as melodies modification or composer identification
Franceschi, Jean-Yves. "Apprentissage de représentations et modèles génératifs profonds dans les systèmes dynamiques." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS014.
The recent rise of deep learning has been motivated by numerous scientific breakthroughs, particularly regarding representation learning and generative modeling. However, most of these achievements have been obtained on image or text data, whose evolution through time remains challenging for existing methods. Given their importance for autonomous systems to adapt in a constantly evolving environment, these challenges have been actively investigated in a growing body of work. In this thesis, we follow this line of work and study several aspects of temporality and dynamical systems in deep unsupervised representation learning and generative modeling. Firstly, we present a general-purpose deep unsupervised representation learning method for time series tackling scalability and adaptivity issues arising in practical applications. We then further study in a second part representation learning for sequences by focusing on structured and stochastic spatiotemporal data: videos and physical phenomena. We show in this context that performant temporal generative prediction models help to uncover meaningful and disentangled representations, and conversely. We highlight to this end the crucial role of differential equations in the modeling and embedding of these natural sequences within sequential generative models. Finally, we more broadly analyze in a third part a popular class of generative models, generative adversarial networks, under the scope of dynamical systems. We study the evolution of the involved neural networks with respect to their training time by describing it with a differential equation, allowing us to gain a novel understanding of this generative model
Grechka, Asya. "Image editing with deep neural networks." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS683.pdf.
Image editing has a rich history which dates back two centuries. That said, "classic" image editing requires strong artistic skills as well as considerable time, often in the scale of hours, to modify an image. In recent years, considerable progress has been made in generative modeling which has allowed realistic and high-quality image synthesis. However, real image editing is still a challenge which requires a balance between novel generation all while faithfully preserving parts of the original image. In this thesis, we will explore different approaches to edit images, leveraging three families of generative networks: GANs, VAEs and diffusion models. First, we study how to use a GAN to edit a real image. While methods exist to modify generated images, they do not generalize easily to real images. We analyze the reasons for this and propose a solution to better project a real image into the GAN's latent space so as to make it editable. Then, we use variational autoencoders with vector quantification to directly obtain a compact image representation (which we could not obtain with GANs) and optimize the latent vector so as to match a desired text input. We aim to constrain this problem, which on the face could be vulnerable to adversarial attacks. We propose a method to chose the hyperparameters while optimizing simultaneously the image quality and the fidelity to the original image. We present a robust evaluation protocol and show the interest of our method. Finally, we abord the problem of image editing from the view of inpainting. Our goal is to synthesize a part of an image while preserving the rest unmodified. For this, we leverage pre-trained diffusion models and build off on their classic inpainting method while replacing, at each denoising step, the part which we do not wish to modify with the noisy real image. However, this method leads to a disharmonization between the real and generated parts. We propose an approach based on calculating a gradient of a loss which evaluates the harmonization of the two parts. We guide the denoising process with this gradient
Cohen, Max. "Metamodel and bayesian approaches for dynamic systems." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS003.
In this thesis, we develop deep learning architectures for modelling building energy consumption and air quality.We first present an end-to-end methodology for optimizing energy demand while improving indoor comfort, by substituting the traditionally used physical simulators with a much faster surrogate model.Using historic data, we can ensure that simulations from this metamodel match the real conditions of the buildings.Yet some differences remain, due to unavailable and random factors.We propose to quantify this uncertainty by combining state space models with time series deep learning models.In a first approach, we show how the weights of a model can be finetuned through Sequential Monte Carlo methods, in order to take into account uncertainty on the last layer.We propose a second generative model with discrete latent states, allowing for a simpler training procedure through Variational Inference and equivalent performances on a relative humidity forecasting task.Finally, our last work extends on these quantized models, by proposing a new prior based on diffusion bridges.By learning to corrupt and reconstruct samples from the latent space, our model is able to learn the complex prior distribution, regardless of the nature of the data
Cherti, Mehdi. "Deep generative neural networks for novelty generation : a foundational framework, metrics and experiments." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS029/document.
In recent years, significant advances made in deep neural networks enabled the creation of groundbreaking technologies such as self-driving cars and voice-enabled personal assistants. Almost all successes of deep neural networks are about prediction, whereas the initial breakthroughs came from generative models. Today, although we have very powerful deep generative modeling techniques, these techniques are essentially being used for prediction or for generating known objects (i.e., good quality images of known classes): any generated object that is a priori unknown is considered as a failure mode (Salimans et al., 2016) or as spurious (Bengio et al., 2013b). In other words, when prediction seems to be the only possible objective, novelty is seen as an error that researchers have been trying hard to eliminate. This thesis defends the point of view that, instead of trying to eliminate these novelties, we should study them and the generative potential of deep nets to create useful novelty, especially given the economic and societal importance of creating new objects in contemporary societies. The thesis sets out to study novelty generation in relationship with data-driven knowledge models produced by deep generative neural networks. Our first key contribution is the clarification of the importance of representations and their impact on the kind of novelties that can be generated: a key consequence is that a creative agent might need to rerepresent known objects to access various kinds of novelty. We then demonstrate that traditional objective functions of statistical learning theory, such as maximum likelihood, are not necessarily the best theoretical framework for studying novelty generation. We propose several other alternatives at the conceptual level. A second key result is the confirmation that current models, with traditional objective functions, can indeed generate unknown objects. This also shows that even though objectives like maximum likelihood are designed to eliminate novelty, practical implementations do generate novelty. Through a series of experiments, we study the behavior of these models and the novelty they generate. In particular, we propose a new task setup and metrics for selecting good generative models. Finally, the thesis concludes with a series of experiments clarifying the characteristics of models that can exhibit novelty. Experiments show that sparsity, noise level, and restricting the capacity of the net eliminates novelty and that models that are better at recognizing novelty are also good at generating novelty
Luc, Pauline. "Apprentissage autosupervisé de modèles prédictifs de segmentation à partir de vidéos." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM024/document.
Predictive models of the environment hold promise for allowing the transfer of recent reinforcement learning successes to many real-world contexts, by decreasing the number of interactions needed with the real world.Video prediction has been studied in recent years as a particular case of such predictive models, with broad applications in robotics and navigation systems.While RGB frames are easy to acquire and hold a lot of information, they are extremely challenging to predict, and cannot be directly interpreted by downstream applications.Here we introduce the novel tasks of predicting semantic and instance segmentation of future frames.The abstract feature spaces we consider are better suited for recursive prediction and allow us to develop models which convincingly predict segmentations up to half a second into the future.Predictions are more easily interpretable by downstream algorithms and remain rich, spatially detailed and easy to obtain, relying on state-of-the-art segmentation methods.We first focus on the task of semantic segmentation, for which we propose a discriminative approach based on adversarial training.Then, we introduce the novel task of predicting future semantic segmentation, and develop an autoregressive convolutional neural network to address it.Finally, we extend our method to the more challenging problem of predicting future instance segmentation, which additionally segments out individual objects.To deal with a varying number of output labels per image, we develop a predictive model in the space of high-level convolutional image features of the Mask R-CNN instance segmentation model.We are able to produce visually pleasing segmentations at a high resolution for complex scenes involving a large number of instances, and with convincing accuracy up to half a second ahead
Chali, Samy. "Robustness Analysis of Classifiers Against Out-of-Distribution and Adversarial Inputs." Electronic Thesis or Diss., université Paris-Saclay, 2024. http://www.theses.fr/2024UPAST012.
Many issues addressed by AI involve the classification of complex input data that needs to be separated into different classes. The functions that transform the complex input values into a simpler, linearly separable space are achieved either through learning (deep convolutional networks) or by projecting into a high-dimensional space to obtain a 'rich' non-linear representation of the inputs, followed by a linear mapping between the high-dimensional space and the output units, as used in Support Vector Machines (Vapnik's work 1966-1995). The thesis aims to create an optimized, generic architecture capable of preprocessing data to prepare them for classification with minimal operations required. Additionally, this architecture aims to enhance the model's autonomy by enabling continuous learning, robustness to corrupted data, and the identification of data that the model cannot process
Chen, Mickaël. "Learning with weak supervision using deep generative networks." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS024.
Many successes of deep learning rely on the availability of massive annotated datasets that can be exploited by supervised algorithms. Obtaining those labels at a large scale, however, can be difficult, or even impossible in many situations. Designing methods that are less dependent on annotations is therefore a major research topic, and many semi-supervised and weakly supervised methods have been proposed. Meanwhile, the recent introduction of deep generative networks provided deep learning methods with the ability to manipulate complex distributions, allowing for breakthroughs in tasks such as image edition and domain adaptation. In this thesis, we explore how these new tools can be useful to further alleviate the need for annotations. Firstly, we tackle the task of performing stochastic predictions. It consists in designing systems for structured prediction that take into account the variability in possible outputs. We propose, in this context, two models. The first one performs predictions on multi-view data with missing views, and the second one predicts possible futures of a video sequence. Then, we study adversarial methods to learn a factorized latent space, in a setting with two explanatory factors but only one of them is annotated. We propose models that aim to uncover semantically consistent latent representations for those factors. One model is applied to the conditional generation of motion capture data, and another one to multi-view data. Finally, we focus on the task of image segmentation, which is of crucial importance in computer vision. Building on previously explored ideas, we propose a model for object segmentation that is entirely unsupervised
Ayed, Ibrahim. "Neural Models for Learning Real World Dynamics and the Neural Dynamics of Learning." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS434.
The work presented in this thesis was initially motivated by the discrepancy between the impressive performances of modern neural networks and the lack of applications to scientific problems for which data abounds. Focusing on evolution problems which are classically modelled through ordinary or partial differential equations~(O/PDEs) naturally brought us to consider the more general problem of representing and learning such equations from raw data with neural networks. This was the inception of the first part of our work. The point of view considered in this first part has a natural counterpart: what about the dynamics induced by the trajectories of the NN's weights during training or by the trajectories of data points within them during inference? Can they be usefully modelled? This question was the core of the second part of our work and, while theoretical tools other than O/PDEs happened to be useful in our analysis, our reasoning and intuition were fundamentally driven by considerations stemming from a dynamical viewpoint
Besedin, Andrey. "Continual forgetting-free deep learning from high-dimensional data streams." Electronic Thesis or Diss., Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1263.
In this thesis, we propose a new deep-learning-based approach for online classification on streams of high-dimensional data. In recent years, Neural Networks (NN) have become the primary building block of state-of-the-art methods in various machine learning problems. Most of these methods, however, are designed to solve the static learning problem, when all data are available at once at training time. Performing Online Deep Learning is exceptionally challenging.The main difficulty is that NN-based classifiers usually rely on the assumption that the sequence of data batches used during training is stationary, or in other words, that the distribution of data classes is the same for all batches (i.i.d. assumption).When this assumption does not hold Neural Networks tend to forget the concepts that are temporarily not available in thestream. In the literature, this phenomenon is known as catastrophic forgetting. The approaches we propose in this thesis aim to guarantee the i.i.d. nature of each batch that comes from the stream and compensates for the lack of historical data. To do this, we train generative models and pseudo-generative models capable of producing synthetic samples from classes that are absent or misrepresented in the stream and complete the stream’s batches with these samples. We test our approaches in an incremental learning scenario and a specific type of continuous learning. Our approaches perform classification on dynamic data streams with the accuracy close to the results obtained in the static classification configuration where all data are available for the duration of the learning. Besides, we demonstrate the ability of our methods to adapt to invisible data classes and new instances of already known data categories, while avoiding forgetting the previously acquired knowledge
Marzouki, Meryem. "Approches à base de connaissances pour le test de circuits VLSI : application à la validation de prototypes dans le cadre d'un test sans contact." Phd thesis, Grenoble INPG, 1991. http://tel.archives-ouvertes.fr/tel-00339355.
Martin, Alice. "Deep learning models and algorithms for sequential data problems : applications to language modelling and uncertainty quantification." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS007.
In this thesis, we develop new models and algorithms to solve deep learning tasks on sequential data problems, with the perspective of tackling the pitfalls of current approaches for learning language models based on neural networks. A first research work develops a new deep generative model for sequential data based on Sequential Monte Carlo Methods, that enables to better model diversity in language modelling tasks, and better quantify uncertainty in sequential regression problems. A second research work aims to facilitate the use of SMC techniques within deep learning architectures, by developing a new online smoothing algorithm with reduced computational cost, and applicable on a wider scope of state-space models, including deep generative models. Finally, a third research work proposes the first reinforcement learning that enables to learn conditional language models from scratch (i.e without supervised datasets), based on a truncation mechanism of the natural language action space with a pretrained language model
Crestel, Léopold. "Neural networks for automatic musical projective orchestration." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS625.
Orchestration is the art of composing a musical discourse over a combinatorial set of instrumental possibilities. For centuries, musical orchestration has only been addressed in an empirical way, as a scientific theory of orchestration appears elusive. In this work, we attempt to build the first system for automatic projective orchestration, and to rely on machine learning. Hence, we start by formalizing this novel task. We focus our effort on projecting a piano piece onto a full symphonic orchestra, in the style of notable classic composers such as Mozart or Beethoven. Hence, the first objective is to design a system of live orchestration, which takes as input the sequence of chords played by a pianist and generate in real-time its orchestration. Afterwards, we relax the real-time constraints in order to use slower but more powerful models and to generate scores in a non-causal way, which is closer to the writing process of a human composer. By observing a large dataset of orchestral music written by composers and their reduction for piano, we hope to be able to capture through statistical learning methods the mechanisms involved in the orchestration of a piano piece. Deep neural networks seem to be a promising lead for their ability to model complex behaviour from a large dataset and in an unsupervised way. More specifically, in the challenging context of symbolic music which is characterized by a high-dimensional target space and few examples, we investigate autoregressive models. At the price of a slower generation process, auto-regressive models allow to account for more complex dependencies between the different elements of the score, which we believe to be of the foremost importance in the case of orchestration
Fissore, Giancarlo. "Generative modeling : statistical physics of Restricted Boltzmann Machines, learning with missing information and scalable training of Linear Flows." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG028.
Neural network models able to approximate and sample high-dimensional probability distributions are known as generative models. In recent years this class of models has received tremendous attention due to their potential in automatically learning meaningful representations of the vast amount of data that we produce and consume daily. This thesis presents theoretical and algorithmic results pertaining to generative models and it is divided in two parts. In the first part, we focus our attention on the Restricted Boltzmann Machine (RBM) and its statistical physics formulation. Historically, statistical physics has played a central role in studying the theoretical foundations and providing inspiration for neural network models. The first neural implementation of an associative memory (Hopfield, 1982) is a seminal work in this context. The RBM can be regarded to as a development of the Hopfield model, and it is of particular interest due to its role at the forefront of the deep learning revolution (Hinton et al. 2006).Exploiting its statistical physics formulation, we derive a mean-field theory of the RBM that let us characterize both its functioning as a generative model and the dynamics of its training procedure. This analysis proves useful in deriving a robust mean-field imputation strategy that makes it possible to use the RBM to learn empirical distributions in the challenging case in which the dataset to model is only partially observed and presents high percentages of missing information. In the second part we consider a class of generative models known as Normalizing Flows (NF), whose distinguishing feature is the ability to model complex high-dimensional distributions by employing invertible transformations of a simple tractable distribution. The invertibility of the transformation allows to express the probability density through a change of variables whose optimization by Maximum Likelihood (ML) is rather straightforward but computationally expensive. The common practice is to impose architectural constraints on the class of transformations used for NF, in order to make the ML optimization efficient. Proceeding from geometrical considerations, we propose a stochastic gradient descent optimization algorithm that exploits the matrix structure of fully connected neural networks without imposing any constraints on their structure other then the fixed dimensionality required by invertibility. This algorithm is computationally efficient and can scale to very high dimensional datasets. We demonstrate its effectiveness in training a multylayer nonlinear architecture employing fully connected layers
El, Mahi Imad. "Schémas volumes finis pour la simulation numérique de problèmes à fronts raides en maillages non structurés adaptatifs." Rouen, 1999. http://www.theses.fr/1999ROUES019.
Bordes, Florian. "Learning to sample from noise with deep generative models." Thèse, 2017. http://hdl.handle.net/1866/19370.
Machine learning and specifically deep learning has made significant breakthroughs in recent years concerning different tasks. One well known application of deep learning is computer vision. Tasks such as detection or classification are nearly considered solved by the community. However, training state-of-the-art models for such tasks requires to have labels associated to the data we want to classify. A more general goal is, similarly to animal brains, to be able to design algorithms that can extract meaningful features from data that aren’t labeled. Unsupervised learning is one of the axes that try to solve this problem. In this thesis, I present a new way to train a neural network as a generative model capable of generating quality samples (a task akin to imagining). I explain how by starting from noise, it is possible to get samples which are close to the training data. This iterative procedure is called Infusion training and is a novel approach to learning the transition operator of a generative Markov chain. In the first chapter, I present some background about machine learning and probabilistic models. The second chapter presents generative models that inspired this work. The third and last chapter presents and investigates our novel approach to learn a generative model with Infusion training.
Kumar, Rithesh. "Improved training of energy-based models." Thèse, 2019. http://hdl.handle.net/1866/22528.
Dinh, Laurent. "Reparametrization in deep learning." Thèse, 2018. http://hdl.handle.net/1866/21139.
Almahairi, Amjad. "Advances in deep learning with limited supervision and computational resources." Thèse, 2018. http://hdl.handle.net/1866/23434.
Deep neural networks are the cornerstone of state-of-the-art systems for a wide range of tasks, including object recognition, language modelling and machine translation. In the last decade, research in the field of deep learning has led to numerous key advances in designing novel architectures and training algorithms for neural networks. However, most success stories in deep learning heavily relied on two main factors: the availability of large amounts of labelled data and massive computational resources. This thesis by articles makes several contributions to advancing deep learning, specifically in problems with limited or no labelled data, or with constrained computational resources. The first article addresses sparsity of labelled data that emerges in the application field of recommender systems. We propose a multi-task learning framework that leverages natural language reviews in improving recommendation. Specifically, we apply neural-network-based methods for learning representations of products from review text, while learning from rating data. We demonstrate that the proposed method can achieve state-of-the-art performance on the Amazon Reviews dataset. The second article tackles computational challenges in training large-scale deep neural networks. We propose a conditional computation network architecture which can adaptively assign its capacity, and hence computations, across different regions of the input. We demonstrate the effectiveness of our model on visual recognition tasks where objects are spatially localized within the input, while maintaining much lower computational overhead than standard network architectures. The third article contributes to the domain of unsupervised learning with the generative adversarial networks paradigm. We introduce a flexible adversarial training framework, in which not only the generator converges to the true data distribution, but also the discriminator recovers the relative density of the data at the optimum. We validate our framework empirically by showing that the discriminator is able to accurately estimate the true energy of data while obtaining state-of-the-art quality of samples. Finally, in the fourth article, we address the problem of unsupervised domain translation. We propose a model which can learn flexible, many-to-many mappings across domains from unpaired data. We validate our approach on several image datasets, and we show that it can be effectively applied in semi-supervised learning settings.
Mehri, Soroush. "Sequential modeling, generative recurrent neural networks, and their applications to audio." Thèse, 2016. http://hdl.handle.net/1866/18762.
Tan, Shawn. "Latent variable language models." Thèse, 2018. http://hdl.handle.net/1866/22131.
Ahmed, Faruk. "Generative models for natural images." Thèse, 2017. http://hdl.handle.net/1866/20186.
Sylvain, Tristan. "Locality and compositionality in representation learning for complex visual tasks." Thesis, 2021. http://hdl.handle.net/1866/25594.
The use of deep neural architectures coupled with specific innovations such as adversarial methods, pre-training on large datasets and mutual information estimation has in recent years allowed rapid progress in many complex vision tasks such as zero-shot learning, scene generation, or multi-modal classification. Despite such progress, it is still not clear if current representation learning methods will be enough to attain human-level performance on arbitrary visual tasks, and if not, what direction should future research take. In this thesis, we will focus on two aspects of representations that seem necessary to achieve good downstream performance for representation learning: locality and compositionality. Locality can be understood as a representation's ability to retain local information. This will be relevant in many cases, and will specifically benefit computer vision where natural images inherently feature local information, i.e. relevant patches of an image, multiple objects present in a scene... On the other hand, a compositional representation can be understood as one that arises from a combination of simpler parts. Convolutional neural networks are inherently compositional, and many complex images can be seen as composition of relevant sub-components: individual objects and attributes in a scene, semantic attributes in zero-shot learning are two examples. We believe both properties hold the key to designing better representation learning methods. In this thesis, we present 3 articles dealing with locality and/or compositionality, and their application to representation learning for complex visual tasks. In the first article, we introduce ways of measuring locality and compositionality for image representations, and demonstrate that local and compositional representations perform better at zero-shot learning. We also use these two notions as the basis for designing class-matching deep info-max, a novel representation learning algorithm that achieves state-of-the-art performance on our proposed "Zero-shot from scratch" setting, a harder zero-shot setting where external information, e.g. pre-training on other image datasets is not allowed. In the second article, we show that by encouraging a generator to retain local object-level information, using a scene-graph similarity module, we can improve scene generation performance. This model also showcases the importance of compositionality as many components operate individually on each object present. To fully demonstrate the reach of our approach, we perform detailed analysis, and propose a new framework to evaluate scene generation models. Finally, in the third article, we show that encouraging high mutual information between local and global multi-modal representations of 2D and 3D medical images can lead to improvements in image classification and segmentation. This general framework can be applied to a wide variety of settings, and demonstrates the benefits of not only locality, but also of compositionality as multi-modal representations are combined to obtain a more general one.
Lamb, Alexander. "Generative models : a critical review." Thèse, 2018. http://hdl.handle.net/1866/21282.
Mastropietro, Olivier. "Deep Learning for Video Modelling." Thèse, 2017. http://hdl.handle.net/1866/20192.
Serban, Iulian Vlad. "Representation learning for dialogue systems." Thèse, 2019. http://hdl.handle.net/1866/23440.
This thesis presents a series of steps taken towards investigating representation learning (e.g. deep learning) for building dialogue systems and conversational agents. The thesis is split into two general parts. The first part of the thesis investigates representation learning for generative dialogue models. Conditioned on a sequence of turns from a text-based dialogue, these models are tasked with generating the next, appropriate response in the dialogue. This part of the thesis focuses on sequence-to-sequence models, a class of generative deep neural networks. First, we propose the Hierarchical Recurrent Encoder-Decoder model, which is an extension of the vanilla sequence-to sequence model incorporating the turn-taking structure of dialogues. Second, we propose the Multiresolution Recurrent Neural Network model, which is a stacked sequence-to-sequence model with an intermediate, stochastic representation (a "coarse representation") capturing the abstract semantic content communicated between the dialogue speakers. Third, we propose the Latent Variable Recurrent Encoder-Decoder model, which is a variant of the Hierarchical Recurrent Encoder-Decoder model with latent, stochastic normally-distributed variables. The latent, stochastic variables are intended for modelling the ambiguity and uncertainty occurring naturally in human language communication. The three models are evaluated and compared on two dialogue response generation tasks: a Twitter response generation task and the Ubuntu technical response generation task. The second part of the thesis investigates representation learning for a real-world reinforcement learning dialogue system. Specifically, this part focuses on the Milabot system built by the Quebec Artificial Intelligence Institute (Mila) for the Amazon Alexa Prize 2017 competition. Milabot is a system capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language retrieval and generation models, including template-based models, bag-of-words models, and variants of the models discussed in the first part of the thesis. This part of the thesis focuses on the response selection task. Given a sequence of turns from a dialogue and a set of candidate responses, the system must select an appropriate response to give the user. A model-based reinforcement learning approach, called the Bottleneck Simulator, is proposed for selecting the appropriate candidate response. The Bottleneck Simulator learns an approximate model of the environment based on observed dialogue trajectories and human crowdsourcing, while utilizing an abstract (bottleneck) state representing high-level discourse semantics. The learned environment model is then employed to learn a reinforcement learning policy through rollout simulations. The learned policy has been evaluated and compared to competing approaches through A/B testing with real-world users, where it was found to yield excellent performance.
Dumoulin, Vincent. "Representation Learning for Visual Data." Thèse, 2018. http://hdl.handle.net/1866/21140.