Дисертації з теми "Apprentissage profond des représentations"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 дисертацій для дослідження на тему "Apprentissage profond des représentations".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Moradi, Fard Maziar. "Apprentissage de représentations de données dans un apprentissage non-supervisé." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM053.
Due to the great impact of deep learning on variety fields of machine learning, recently their abilities to improve clustering approaches have been investi- gated. At first, deep learning approaches (mostly Autoencoders) have been used to reduce the dimensionality of the original space and to remove possible noises (also to learn new data representations). Such clustering approaches that utilize deep learning approaches are called Deep Clustering. This thesis focuses on developing Deep Clustering models which can be used for different types of data (e.g., images, text). First we propose a Deep k-means (DKM) algorithm where learning data representations (through a deep Autoencoder) and cluster representatives (through the k-means) are performed in a joint way. The results of our DKM approach indicate that this framework is able to outperform similar algorithms in Deep Clustering. Indeed, our proposed framework is able to truly and smoothly backpropagate the loss function error through all learnable variables.Moreover, we propose two frameworks named SD2C and PCD2C which are able to integrate respectively seed words and pairwise constraints into end-to-end Deep Clustering frameworks. In fact, by utilizing such frameworks, the users can observe the reflection of their needs in clustering. Finally, the results obtained from these frameworks indicate their ability to obtain more tailored results
Tamaazousti, Youssef. "Vers l’universalité des représentations visuelle et multimodales." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLC038/document.
Because of its key societal, economic and cultural stakes, Artificial Intelligence (AI) is a hot topic. One of its main goal, is to develop systems that facilitates the daily life of humans, with applications such as household robots, industrial robots, autonomous vehicle and much more. The rise of AI is highly due to the emergence of tools based on deep neural-networks which make it possible to simultaneously learn, the representation of the data (which were traditionally hand-crafted), and the task to solve (traditionally learned with statistical models). This resulted from the conjunction of theoretical advances, the growing computational capacity as well as the availability of many annotated data. A long standing goal of AI is to design machines inspired humans, capable of perceiving the world, interacting with humans, in an evolutionary way. We categorize, in this Thesis, the works around AI, in the two following learning-approaches: (i) Specialization: learn representations from few specific tasks with the goal to be able to carry out very specific tasks (specialized in a certain field) with a very good level of performance; (ii) Universality: learn representations from several general tasks with the goal to perform as many tasks as possible in different contexts. While specialization was extensively explored by the deep-learning community, only a few implicit attempts were made towards universality. Thus, the goal of this Thesis is to explicitly address the problem of improving universality with deep-learning methods, for image and text data. We have addressed this topic of universality in two different forms: through the implementation of methods to improve universality (“universalizing methods”); and through the establishment of a protocol to quantify its universality. Concerning universalizing methods, we proposed three technical contributions: (i) in a context of large semantic representations, we proposed a method to reduce redundancy between the detectors through, an adaptive thresholding and the relations between concepts; (ii) in the context of neural-network representations, we proposed an approach that increases the number of detectors without increasing the amount of annotated data; (iii) in a context of multimodal representations, we proposed a method to preserve the semantics of unimodal representations in multimodal ones. Regarding the quantification of universality, we proposed to evaluate universalizing methods in a Transferlearning scheme. Indeed, this technical scheme is relevant to assess the universal ability of representations. This also led us to propose a new framework as well as new quantitative evaluation criteria for universalizing methods
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066056/document.
This thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Moreau, Thomas. "Représentations Convolutives Parcimonieuses -- application aux signaux physiologiques et interpétabilité de l'apprentissage profond." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLN054/document.
Convolutional representations extract recurrent patterns which lead to the discovery of local structures in a set of signals. They are well suited to analyze physiological signals which requires interpretable representations in order to understand the relevant information. Moreover, these representations can be linked to deep learning models, as a way to bring interpretability intheir internal representations. In this disserta tion, we describe recent advances on both computational and theoretical aspects of these models.First, we show that the Singular Spectrum Analysis can be used to compute convolutional representations. This representation is dense and we describe an automatized procedure to improve its interpretability. Also, we propose an asynchronous algorithm, called DICOD, based on greedy coordinate descent, to solve convolutional sparse coding for long signals. Our algorithm has super-linear acceleration.In a second part, we focus on the link between representations and neural networks. An extra training step for deep learning, called post-training, is introduced to boost the performances of the trained network by making sure the last layer is optimal. Then, we study the mechanisms which allow to accelerate sparse coding algorithms with neural networks. We show that it is linked to afactorization of the Gram matrix of the dictionary.Finally, we illustrate the relevance of convolutional representations for physiological signals. Convolutional dictionary learning is used to summarize human walk signals and Singular Spectrum Analysis is used to remove the gaze movement in young infant’s oculometric recordings
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome." Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066056.
This thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Caron, Stéphane. "Détection d'anomalies basée sur les représentations latentes d'un autoencodeur variationnel." Master's thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69185.
In this master's thesis, we propose a methodology that aims to detect anomalies among complex data, such as images. In order to do that, we use a specific type of neural network called the varitionnal autoencoder (VAE). This non-supervised deep learning approach allows us to obtain a simple representation of our data on which we then use the Kullback-Leibler distance to discriminate between anomalies and "normal" observations. To determine if an image should be considered "abnormal", our approach is based on a proportion of observations to be filtered, which is easier and more intuitive to establish than applying a threshold based on the value of a distance metric. By using our methodology on real complex images, we can obtain superior anomaly detection performances in terms of area under the ROC curve (AUC),precision and recall compared to other non-supervised methods. Moreover, we demonstrate that the simplicity of our filtration level allows us to easily adapt the method to datasets having different levels of anomaly contamination.
Thomas, Hugues. "Apprentissage de nouvelles représentations pour la sémantisation de nuages de points 3D." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEM048/document.
In the recent years, new technologies have allowed the acquisition of large and precise 3D scenes as point clouds. They have opened up new applications like self-driving vehicles or infrastructure monitoring that rely on efficient large scale point cloud processing. Convolutional deep learning methods cannot be directly used with point clouds. In the case of images, convolutional filters brought the ability to learn new representations, which were previously hand-crafted in older computer vision methods. Following the same line of thought, we present in this thesis a study of hand-crafted representations previously used for point cloud processing. We propose several contributions, to serve as basis for the design of a new convolutional representation for point cloud processing. They include a new definition of multiscale radius neighborhood, a comparison with multiscale k-nearest neighbors, a new active learning strategy, the semantic segmentation of large scale point clouds, and a study of the influence of density in multiscale representations. Following these contributions, we introduce the Kernel Point Convolution (KPConv), which uses radius neighborhoods and a set of kernel points to play the role of the kernel pixels in image convolution. Our convolutional networks outperform state-of-the-art semantic segmentation approaches in almost any situation. In addition to these strong results, we designed KPConv with a great flexibility and a deformable version. To conclude our argumentation, we propose several insights on the representations that our method is able to learn
Mazari, Ahmed. "Apprentissage profond pour la reconnaissance d’actions en vidéos." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS171.
Nowadays, video contents are ubiquitous through the popular use of internet and smartphones, as well as social media. Many daily life applications such as video surveillance and video captioning, as well as scene understanding require sophisticated technologies to process video data. It becomes of crucial importance to develop automatic means to analyze and to interpret the large amount of available video data. In this thesis, we are interested in video action recognition, i.e. the problem of assigning action categories to sequences of videos. This can be seen as a key ingredient to build the next generation of vision systems. It is tackled with AI frameworks, mainly with ML and Deep ConvNets. Current ConvNets are increasingly deeper, data-hungrier and this makes their success tributary of the abundance of labeled training data. ConvNets also rely on (max or average) pooling which reduces dimensionality of output layers (and hence attenuates their sensitivity to the availability of labeled data); however, this process may dilute the information of upstream convolutional layers and thereby affect the discrimination power of the trained video representations, especially when the learned action categories are fine-grained
Franceschi, Jean-Yves. "Apprentissage de représentations et modèles génératifs profonds dans les systèmes dynamiques." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS014.
The recent rise of deep learning has been motivated by numerous scientific breakthroughs, particularly regarding representation learning and generative modeling. However, most of these achievements have been obtained on image or text data, whose evolution through time remains challenging for existing methods. Given their importance for autonomous systems to adapt in a constantly evolving environment, these challenges have been actively investigated in a growing body of work. In this thesis, we follow this line of work and study several aspects of temporality and dynamical systems in deep unsupervised representation learning and generative modeling. Firstly, we present a general-purpose deep unsupervised representation learning method for time series tackling scalability and adaptivity issues arising in practical applications. We then further study in a second part representation learning for sequences by focusing on structured and stochastic spatiotemporal data: videos and physical phenomena. We show in this context that performant temporal generative prediction models help to uncover meaningful and disentangled representations, and conversely. We highlight to this end the crucial role of differential equations in the modeling and embedding of these natural sequences within sequential generative models. Finally, we more broadly analyze in a third part a popular class of generative models, generative adversarial networks, under the scope of dynamical systems. We study the evolution of the involved neural networks with respect to their training time by describing it with a differential equation, allowing us to gain a novel understanding of this generative model
Francis, Danny. "Représentations sémantiques d'images et de vidéos." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS605.
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works
Goh, Hanlin. "Apprentissage de Représentations Visuelles Profondes." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00948376.
Bisot, Victor. "Apprentissage de représentations pour l'analyse de scènes sonores." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0016.
This thesis work focuses on the computational analysis of environmental sound scenes and events. The objective of such tasks is to automatically extract information about the context in which a sound has been recorded. The interest for this area of research has been rapidly increasing in the last few years leading to a constant growth in the number of works and proposed approaches. We explore and contribute to the main families of approaches to sound scene and event analysis, going from feature engineering to deep learning. Our work is centered at representation learning techniques based on nonnegative matrix factorization, which are particularly suited to analyse multi-source environments such as acoustic scenes. As a first approach, we propose a combination of image processing features with the goal of confirming that spectrograms contain enough information to discriminate sound scenes and events. From there, we leave the world of feature engineering to go towards automatically learning the features. The first step we take in that direction is to study the usefulness of matrix factorization for unsupervised feature learning techniques, especially by relying on variants of NMF. Several of the compared approaches allow us indeed to outperform feature engineering approaches to such tasks. Next, we propose to improve the learned representations by introducing the TNMF model, a supervised variant of NMF. The proposed TNMF models and algorithms are based on jointly learning nonnegative dictionaries and classifiers by minimising a target classification cost. The last part of our work highlights the links and the compatibility between NMF and certain deep neural network systems by proposing and adapting neural network architectures to the use of NMF as an input representation. The proposed models allow us to get state of the art performance on scene classification and overlapping event detection tasks. Finally we explore the possibility of jointly learning NMF and neural networks parameters, grouping the different stages of our systems in one optimisation problem
Dos, Santos Ludovic. "Representation learning for relational data." Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066480.
The increasing use of social and sensor networks generates a large quantity of data that can be represented as complex graphs. There are many tasks from information analysis, to prediction and retrieval one can imagine on those data where relation between graph nodes should be informative. In this thesis, we proposed different models for three different tasks: - Graph node classification - Relational time series forecasting - Collaborative filtering. All the proposed models use the representation learning framework in its deterministic or Gaussian variant. First, we proposed two algorithms for the heterogeneous graph labeling task, one using deterministic representations and the other one Gaussian representations. Contrary to other state of the art models, our solution is able to learn edge weights when learning simultaneously the representations and the classifiers. Second, we proposed an algorithm for relational time series forecasting where the observations are not only correlated inside each series, but also across the different series. We use Gaussian representations in this contribution. This was an opportunity to see in which way using Gaussian representations instead of deterministic ones was profitable. At last, we apply the Gaussian representation learning approach to the collaborative filtering task. This is a preliminary work to see if the properties of Gaussian representations found on the two previous tasks were also verified for the ranking one. The goal of this work was to then generalize the approach to more relational data and not only bipartite graphs between users and items
Hafidi, Hakim. "Robust machine learning for Graphs/Networks." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT004.
This thesis addresses advancements in graph representation learning, focusing on the challengesand opportunities presented by Graph Neural Networks (GNNs). It highlights the significanceof graphs in representing complex systems and the necessity of learning node embeddings that capture both node features and graph structure. The study identifies key issues in GNNs, such as their dependence on high-quality labeled data, inconsistent performanceacross various datasets, and susceptibility to adversarial attacks.To tackle these challenges, the thesis introduces several innovative approaches. Firstly, it employs contrastive learning for node representation, enabling self-supervised learning that reduces reliance on labeled data. Secondly, a Bayesian-based classifier isproposed for node classification, which considers the graph’s structure to enhance accuracy. Lastly, the thesis addresses the vulnerability of GNNs to adversarialattacks by assessing the robustness of the proposed classifier and introducing effective defense mechanisms.These contributions aim to improve both the performance and resilience of GNNs in graph representation learning
Nguyen, Thanh Tuan. "Représentations efficaces des textures dynamiques." Electronic Thesis or Diss., Toulon, 2020. https://bu.univ-tln.fr/files/userfiles/file/intranet/travuniv/theses/sciences/2020/2020_Nguyen_ThanhTuan.pdf.
Representation of dynamic textures (DTs), well-known as a sequence of moving textures, is a challenge in video analysis for various computer vision applications. It is partly due to disorientation of motions, the negative impacts of the well-known issues on capturing turbulent features: noise, changes of environment, illumination, similarity transformations, etc. In this work, we introduce significant solutions in order to deal with above problems. Accordingly, three streams of those are proposed for encoding DTs: i) based on dense trajectories extracted from a given video; ii) based on robust responses extracted by moment models; iii) based on filtered outcomes which are computed by variants of Gaussian-filtering kernels. In parallel, we also propose several discriminative descriptors to capture spatio-temporal features for above DT encodings. For DT representation based on dense trajectories, we firstly extract dense trajectories from a given video. Motion points along the paths of dense trajectories are then encoded by our xLVP operator, an important extension of Local Vector Patterns (LVP) in a completed encoding context, in order to capture directional dense-trajectory-based features for DT representation.For DT description based on moment models, motivated by the moment-image model, we propose a novel model of moment volumes based on statistical information of spherical supporting regions centered at a voxel. Two these models are then taken into account video analysis to point out moment-based images/volumes. In order to encode the moment-based images, we address CLSP operator, a variant of completed local binary patterns (CLBP). In the meanwhile, our xLDP, an important extension of Local Derivative Patterns (LDP) in a completed encoding context, is introduced to capture spatio-temporal features of the moment-volume-based outcomes. For DT representation based on the Gaussian-based filterings, we will investigate many kinds of filterings as pre-processing analysis of a video to point out its filtered outcomes. After that, these outputs are encoded by discriminative operators to structure DT descriptors correspondingly. More concretely, we exploit the Gaussian-based kernel and variants of high-order Gaussian gradients for the filtering analysis. Particularly, we introduce a novel filtering kernel (DoDG) in consideration of the difference of Gaussian gradients, which allows to point out robust DoDG-filtered components to construct prominent DoDG-based descriptors in small dimension. In parallel to the Gaussian-based filterings, some novel operators will be introduced to meet different contexts of the local DT encoding: CAIP, an adaptation of CLBP to fix the close-to-zero problem caused by separately bipolar features; LRP, based on a concept of a square cube of local neighbors sampled at a center voxel; CHILOP, a generalized formulation of CLBP to adequately investigate local relationships of hierarchical supporting regions. Experiments for DT recognition have validated that our proposals significantly perform in comparison with state of the art. Some of which have performance being very close to deep-learning approaches, expected as one of appreciated solutions for mobile applications due to their simplicity in computation and their DT descriptors in a small number of bins
Terreau, Enzo. "Apprentissage de représentations d'auteurs et d'autrices à partir de modèles de langue pour l'analyse des dynamiques d'écriture." Electronic Thesis or Diss., Lyon 2, 2024. http://www.theses.fr/2024LYO20001.
The recent and massive democratization of digital tools has empowered individuals to generate and share information on the web through various means such as blogs, social networks, sharing platforms, and more. The exponential growth of available information, mostly textual data, requires the development of Natural Language Processing (NLP) models to mathematically represent it and subsequently classify, sort, or recommend it. This is the essence of representation learning. It aims to construct a low-dimensional space where the distances between projected objects (words, texts) reflect real-world distances, whether semantic, stylistic, and so on.The proliferation of available data, coupled with the rise in computing power and deep learning, has led to the creation of highly effective language models for word and document embeddings. These models incorporate complex semantic and linguistic concepts while remaining accessible to everyone and easily adaptable to specific tasks or corpora. One can use them to create author embeddings. However, it is challenging to determine the aspects on which a model will focus to bring authors closer or move them apart. In a literary context, it is preferable for similarities to primarily relate to writing style, which raises several issues. The definition of literary style is vague, assessing the stylistic difference between two texts and their embeddings is complex. In computational linguistics, approaches aiming to characterize it are mainly statistical, relying on language markers. In light of this, our first contribution is a framework to evaluate the ability of language models to grasp writing style. We will have previously elaborated on text embedding models in machine learning and deep learning, at the word, document, and author levels. We will also have presented the treatment of the notion of literary style in Natural Language Processing, which forms the basis of our method. Transferring knowledge between black-box large language models and these methods derived from linguistics remains a complex task. Our second contribution aims to reconcile these approaches through a representation learning model focusing on style, VADES (Variational Author and Document Embedding with Style). We compare our model to state-of-the-art ones and analyze their limitations in this context.Finally, we delve into dynamic author and document embeddings. Temporal information is crucial, allowing for a more fine-grained representation of writing dynamics. After presenting the state of the art, we elaborate on our last contribution, B²ADE (Brownian Bridge Author and Document Embedding), which models authors as trajectories. We conclude by outlining several leads for improving our methods and highlighting potential research directions for the future
Dufumier, Benoit. "Representation learning in neuroimaging : transferring from big healthy data to small clinical cohorts." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG093.
Psychiatry currently lacks objective quantitative measures to guide the clinician in choosing the proper therapeutic treatment. The physio-pathology of mental illnesses such as schizophrenia and bipolar disorder is still poorly understood but the emergence of large-scale neuroimaging transdiagnostic datasets gives a unique opportunity for studying the neuroanatomical signatures of such diseases.While Deep Learning (DL) models for medical imaging unlocked unprecedented applications such as image segmentation, its applicability to single-subject prediction problems with neuroanatomical MRI remains limited. In this thesis, we first study the current performance and scaling trend of DL models, for several architectures representative of the recent progression in computer vision, as compared to regularized linear models and Kernel Support Vector Machine. We found a high over-fitting issue on clinical data-sets and a similar scaling trend with linear models, for the current accessible sample size in clinical research. This over-fitting effect was also due to the bias induced by MRI scanners and acquisition protocols.To tackle the sample size issue, we propose a new method to learn a representation of the healthy population brain anatomy on large multi-site cohorts with neural networks using contrastive learning, an innovative self-supervised framework. When transferring this knowledge to new datasets, we demonstrate an improvement in the classification performance of patients with mental illnesses. We provide a theoretical framework grounding these empirical results and we show good generalization properties of the model for downstream classification tasks with weaker hypotheses than in the literature.Moreover, as an advancement towards debiased deep models and reproducibility in neuroimaging, we introduce a new large-scale multi-site dataset, OpenBHB, for brain age prediction and site de-biasing as well as a permanent challenge focused on representation learning. We offer three pre-processing to study brain anatomical surface, geometry, and volume inside T1 images as well as a novel way to evaluate the bias in the model's representation
Mordan, Taylor. "Conception d'architectures profondes pour l'interprétation de données visuelles." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS270.
Nowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for labeled data to learn from. Since precise annotations are time-consuming to produce, bigger datasets can be built with partial labels. We design global pooling functions to work with them and to recover latent information in two cases: learning spatially localized and part-based representations from image- and object-level supervisions respectively. We address the issue of efficiency in end-to-end learning of these representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially in the data-deficient regime. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework
Dos, Santos Ludovic. "Representation learning for relational data." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066480/document.
The increasing use of social and sensor networks generates a large quantity of data that can be represented as complex graphs. There are many tasks from information analysis, to prediction and retrieval one can imagine on those data where relation between graph nodes should be informative. In this thesis, we proposed different models for three different tasks: - Graph node classification - Relational time series forecasting - Collaborative filtering. All the proposed models use the representation learning framework in its deterministic or Gaussian variant. First, we proposed two algorithms for the heterogeneous graph labeling task, one using deterministic representations and the other one Gaussian representations. Contrary to other state of the art models, our solution is able to learn edge weights when learning simultaneously the representations and the classifiers. Second, we proposed an algorithm for relational time series forecasting where the observations are not only correlated inside each series, but also across the different series. We use Gaussian representations in this contribution. This was an opportunity to see in which way using Gaussian representations instead of deterministic ones was profitable. At last, we apply the Gaussian representation learning approach to the collaborative filtering task. This is a preliminary work to see if the properties of Gaussian representations found on the two previous tasks were also verified for the ranking one. The goal of this work was to then generalize the approach to more relational data and not only bipartite graphs between users and items
Tran, Khanh-Hung. "Semi-supervised dictionary learning and Semi-supervised deep neural network." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPASP014.
Since the 2010's, machine learning (ML) has been one of the topics that attract a lot of attention from scientific researchers. Many ML models have been demonstrated their ability to produce excellent results in various fields such as Computer Vision, Natural Language Processing, Robotics... However, most of these models use supervised learning, which requires a massive annotation. Therefore, the objective of this thesis is to study and to propose semi-supervised learning approaches that have many advantages over supervised learning. Instead of directly applying a semi-supervised classifier on the original representation of data, we rather use models that integrate a representation learning stage before the classification stage, to better adapt to the non-linearity of the data. In the first step, we revisit tools that allow us to build our semi-supervised models. First, we present two types of model that possess representation learning in their architecture: dictionary learning and neural network, as well as the optimization methods for each type of model. Moreover, in the case of neural network, we specify the problem with adversarial examples. Then, we present the techniques that often accompany with semi-supervised learning such as variety learning and pseudo-labeling. In the second part, we work on dictionary learning. We synthesize generally three steps to build a semi-supervised model from a supervised model. Then, we propose our semi-supervised model to deal with the classification problem typically in the case of a low number of training samples (including both labelled and non-labelled samples). On the one hand, we apply the preservation of the data structure from the original space to the sparse code space (manifold learning), which is considered as regularization for sparse codes. On the other hand, we integrate a semi-supervised classifier in the sparse code space. In addition, we perform sparse coding for test samples by taking into account also the preservation of the data structure. This method provides an improvement on the accuracy rate compared to other existing methods. In the third step, we work on neural network models. We propose an approach called "manifold attack" which allows reinforcing manifold learning. This approach is inspired from adversarial learning : finding virtual points that disrupt the cost function on manifold learning (by maximizing it) while fixing the model parameters; then the model parameters are updated by minimizing this cost function while fixing these virtual points. We also provide criteria for limiting the space to which the virtual points belong and the method for initializing them. This approach provides not only an improvement on the accuracy rate but also a significant robustness to adversarial examples. Finally, we analyze the similarities and differences, as well as the advantages and disadvantages between dictionary learning and neural network models. We propose some perspectives on both two types of models. In the case of semi-supervised dictionary learning, we propose some techniques inspired by the neural network models. As for the neural network, we propose to integrate manifold attack on generative models
Hay, Julien. "Apprentissage de la représentation du style écrit, application à la recommandation d’articles d’actualité." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG010.
User modeling is an essential step when it comes to recommending products and offering services automatically. Social networks are a rich and abundant resource of user data (e.g. shared links, posted messages) that allow to model their interests and preferences. In this thesis, we propose to exploit news articles shared on social networks in order to enrich existing models with a new textual feature: the writing style. This thesis, at the intersection of the fields of natural language processing and recommender systems, focuses on the representation learning of writing style and its application to news recommendation. As a first step, we propose a new representation learning method that aims to project any document into a reference stylometric space. The hypothesis being tested is that such a space can be generalized by a sufficiently large set of reference authors, and that the vector projections of the writings of a "new" author will be stylistically close to the writings of a consistent subset of these reference authors. In a second step, we propose to exploit the stylometric representation for news recommendation by combining it with other representations (e.g. topical, lexical, semantic). We seek to identify the most relevant and complementary characteristics that can allow a more relevant and better quality recommendation of articles. The hypothesis that motivated this work is that the reading choices of individuals are not only influenced by the content (e.g. the theme of news articles, the entities mentioned), but also by the form (i.e. the style that can, for example, be descriptive, satirical, composed of personal anecdotes, interviews). The experiments conducted show that not only does writing style play a role in individuals' reading preferences, but also that, when combined with other textual features, it increases the accuracy and quality of recommendations in terms of diversity, novelty and serendipity
Barbano, Carlo Alberto Maria. "Collateral-Free Learning of Deep Representations : From Natural Images to Biomedical Applications." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT038.
Deep Learning (DL) has become one of the predominant tools for solving a variety of tasks, often with superior performance compared to previous state-of-the-art methods. DL models are often able to learn meaningful and abstract representations of the underlying data. However, it has been shown that they might also learn additional features, which are not necessarily relevant or required for the desired task. This could pose a number of issues, as this additional information can contain bias, noise, or sensitive information, that should not be taken into account (e.g. gender, race, age, etc.) by the model. We refer to this information as collateral. The presence of collateral information translates into practical issues when deploying DL-based pipelines, especially if they involve private users' data. Learning robust representations that are free of collateral information can be highly relevant for a variety of fields and applications, like medical applications and decision support systems.In this thesis, we introduce the concept of Collateral Learning, which refers to all those instances in which a model learns more information than intended. The aim of Collateral Learning is to bridge the gap between different fields in DL, such as robustness, debiasing, generalization in medical imaging, and privacy preservation. We propose different methods for achieving robust representations free of collateral information. Some of our contributions are based on regularization techniques, while others are represented by novel loss functions.In the first part of the thesis, we lay the foundations of our work, by developing techniques for robust representation learning on natural images. We focus on one of the most important instances of Collateral Learning, namely biased data. Specifically, we focus on Contrastive Learning (CL), and we propose a unified metric learning framework that allows us to both easily analyze existing loss functions, and derive novel ones. Here, we propose a novel supervised contrastive loss function, ε-SupInfoNCE, and two debiasing regularization techniques, EnD and FairKL, that achieve state-of-the-art performance on a number of standard vision classification and debiasing benchmarks.In the second part of the thesis, we focus on Collateral Learning in medical imaging, specifically on neuroimaging and chest X-ray images. For neuroimaging, we present a novel contrastive learning approach for brain age estimation. Our approach achieves state-of-the-art results on the OpenBHB dataset for age regression and shows increased robustness to the site effect. We also leverage this method to detect unhealthy brain aging patterns, showing promising results in the classification of brain conditions such as Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD). For chest X-ray images (CXR), we will target Covid-19 classification, showing how Collateral Learning can effectively hinder the reliability of such models. To tackle such issue, we propose a transfer learning approach that, combined with our regularization techniques, shows promising results on an original multi-site CXRs dataset.Finally, we provide some hints about Collateral Learning and privacy preservation in DL models. We show that some of our proposed methods can be effective in preventing certain information from being learned by the model, thus avoiding potential data leakage
Sadok, Samir. "Audiovisual speech representation learning applied to emotion recognition." Electronic Thesis or Diss., CentraleSupélec, 2024. http://www.theses.fr/2024CSUP0003.
Emotions are vital in our daily lives, becoming a primary focus of ongoing research. Automatic emotion recognition has gained considerable attention owing to its wide-ranging applications across sectors such as healthcare, education, entertainment, and marketing. This advancement in emotion recognition is pivotal for fostering the development of human-centric artificial intelligence. Supervised emotion recognition systems have significantly improved over traditional machine learning approaches. However, this progress encounters limitations due to the complexity and ambiguous nature of emotions. Acquiring extensive emotionally labeled datasets is costly, time-intensive, and often impractical.Moreover, the subjective nature of emotions results in biased datasets, impacting the learning models' applicability in real-world scenarios. Motivated by how humans learn and conceptualize complex representations from an early age with minimal supervision, this approach demonstrates the effectiveness of leveraging prior experience to adapt to new situations. Unsupervised or self-supervised learning models draw inspiration from this paradigm. Initially, they aim to establish a general representation learning from unlabeled data, akin to the foundational prior experience in human learning. These representations should adhere to criteria like invariance, interpretability, and effectiveness. Subsequently, these learned representations are applied to downstream tasks with limited labeled data, such as emotion recognition. This mirrors the assimilation of new situations in human learning. In this thesis, we aim to propose unsupervised and self-supervised representation learning methods designed explicitly for multimodal and sequential data and to explore their potential advantages in the context of emotion recognition tasks. The main contributions of this thesis encompass:1. Developing generative models via unsupervised or self-supervised learning for audiovisual speech representation learning, incorporating joint temporal and multimodal (audiovisual) modeling.2. Structuring the latent space to enable disentangled representations, enhancing interpretability by controlling human-interpretable latent factors.3. Validating the effectiveness of our approaches through both qualitative and quantitative analyses, in particular on emotion recognition task. Our methods facilitate signal analysis, transformation, and generation
Bordes, Patrick. "Deep Multimodal Learning for Joint Textual and Visual Reasoning." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS370.
In the last decade, the evolution of Deep Learning techniques to learn meaningful data representations for text and images, combined with an important increase of multimodal data, mainly from social network and e-commerce websites, has triggered a growing interest in the research community about the joint understanding of language and vision. The challenge at the heart of Multimodal Machine Learning is the intrinsic difference in semantics between language and vision: while vision faithfully represents reality and conveys low-level semantics, language is a human construction carrying high-level reasoning. One the one hand, language can enhance the performance of vision models. The underlying hypothesis is that textual representations contain visual information. We apply this principle to two Zero-Shot Learning tasks. In the first contribution on ZSL, we extend a common assumption in ZSL, which states that textual representations encode information about the visual appearance of objects, by showing that they also encode information about their visual surroundings and their real-world frequence. In a second contribution, we consider the transductive setting in ZSL. We propose a solution to the limitations of current transductive approaches, that assume that the visual space is well-clustered, which does not hold true when the number of unknown classes is high. On the other hand, vision can expand the capacities of language models. We demonstrate it by tackling Visual Question Generation (VQG), which extends the standard Question Generation task by using an image as complementary input, by using visual representations derived from Computer Vision
Banville, Hubert. "Enabling real-world EEG applications with deep learning." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG005.
Our understanding of the brain has improved considerably in the last decades, thanks to groundbreaking advances in the field of neuroimaging. Now, with the invention and wider availability of personal wearable neuroimaging devices, such as low-cost mobile EEG, we have entered an era in which neuroimaging is no longer constrained to traditional research labs or clinics. "Real-world'' EEG comes with its own set of challenges, though, ranging from a scarcity of labelled data to unpredictable signal quality and limited spatial resolution. In this thesis, we draw on the field of deep learning to help transform this century-old brain imaging modality from a purely clinical- and research-focused tool, to a practical technology that can benefit individuals in their day-to-day life. First, we study how unlabelled EEG data can be utilized to gain insights and improve performance on common clinical learning tasks using self-supervised learning. We present three such self-supervised approaches that rely on the temporal structure of the data itself, rather than onerously collected labels, to learn clinically-relevant representations. Through experiments on large-scale datasets of sleep and neurological screening recordings, we demonstrate the significance of the learned representations, and show how unlabelled data can help boost performance in a semi-supervised scenario. Next, we explore ways to ensure neural networks are robust to the strong sources of noise often found in out-of-the-lab EEG recordings. Specifically, we present Dynamic Spatial Filtering, an attention mechanism module that allows a network to dynamically focus its processing on the most informative EEG channels while de-emphasizing any corrupted ones. Experiments on large-scale datasets and real-world data demonstrate that, on sparse EEG, the proposed attention block handles strong corruption better than an automated noise handling approach, and that the predicted attention maps can be interpreted to inspect the functioning of the neural network. Finally, we investigate how weak labels can be used to develop a biomarker of neurophysiological health from real-world EEG. We translate the brain age framework, originally developed using lab and clinic-based magnetic resonance imaging, to real-world EEG data. Using recordings from more than a thousand individuals performing a focused attention exercise or sleeping overnight, we show not only that age can be predicted from wearable EEG, but also that age predictions encode information contained in well-known brain health biomarkers, but not in chronological age. Overall, this thesis brings us a step closer to harnessing EEG for neurophysiological monitoring outside of traditional research and clinical contexts, and opens the door to new and more flexible applications of this technology
Cherti, Mehdi. "Deep generative neural networks for novelty generation : a foundational framework, metrics and experiments." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS029/document.
In recent years, significant advances made in deep neural networks enabled the creation of groundbreaking technologies such as self-driving cars and voice-enabled personal assistants. Almost all successes of deep neural networks are about prediction, whereas the initial breakthroughs came from generative models. Today, although we have very powerful deep generative modeling techniques, these techniques are essentially being used for prediction or for generating known objects (i.e., good quality images of known classes): any generated object that is a priori unknown is considered as a failure mode (Salimans et al., 2016) or as spurious (Bengio et al., 2013b). In other words, when prediction seems to be the only possible objective, novelty is seen as an error that researchers have been trying hard to eliminate. This thesis defends the point of view that, instead of trying to eliminate these novelties, we should study them and the generative potential of deep nets to create useful novelty, especially given the economic and societal importance of creating new objects in contemporary societies. The thesis sets out to study novelty generation in relationship with data-driven knowledge models produced by deep generative neural networks. Our first key contribution is the clarification of the importance of representations and their impact on the kind of novelties that can be generated: a key consequence is that a creative agent might need to rerepresent known objects to access various kinds of novelty. We then demonstrate that traditional objective functions of statistical learning theory, such as maximum likelihood, are not necessarily the best theoretical framework for studying novelty generation. We propose several other alternatives at the conceptual level. A second key result is the confirmation that current models, with traditional objective functions, can indeed generate unknown objects. This also shows that even though objectives like maximum likelihood are designed to eliminate novelty, practical implementations do generate novelty. Through a series of experiments, we study the behavior of these models and the novelty they generate. In particular, we propose a new task setup and metrics for selecting good generative models. Finally, the thesis concludes with a series of experiments clarifying the characteristics of models that can exhibit novelty. Experiments show that sparsity, noise level, and restricting the capacity of the net eliminates novelty and that models that are better at recognizing novelty are also good at generating novelty
Sourty, Raphael. "Apprentissage de représentation de graphes de connaissances et enrichissement de modèles de langue pré-entraînés par les graphes de connaissances : approches basées sur les modèles de distillation." Electronic Thesis or Diss., Toulouse 3, 2023. http://www.theses.fr/2023TOU30337.
Natural language processing (NLP) is a rapidly growing field focusing on developing algorithms and systems to understand and manipulate natural language data. The ability to effectively process and analyze natural language data has become increasingly important in recent years as the volume of textual data generated by individuals, organizations, and society as a whole continues to grow significantly. One of the main challenges in NLP is the ability to represent and process knowledge about the world. Knowledge graphs are structures that encode information about entities and the relationships between them, they are a powerful tool that allows to represent knowledge in a structured and formalized way, and provide a holistic understanding of the underlying concepts and their relationships. The ability to learn knowledge graph representations has the potential to transform NLP and other domains that rely on large amounts of structured data. The work conducted in this thesis aims to explore the concept of knowledge distillation and, more specifically, mutual learning for learning distinct and complementary space representations. Our first contribution is proposing a new framework for learning entities and relations on multiple knowledge bases called KD-MKB. The key objective of multi-graph representation learning is to empower the entity and relation models with different graph contexts that potentially bridge distinct semantic contexts. Our approach is based on the theoretical framework of knowledge distillation and mutual learning. It allows for efficient knowledge transfer between KBs while preserving the relational structure of each knowledge graph. We formalize entity and relation inference between KBs as a distillation loss over posterior probability distributions on aligned knowledge. Grounded on this finding, we propose and formalize a cooperative distillation framework where a set of KB models are jointly learned by using hard labels from their own context and soft labels provided by peers. Our second contribution is a method for incorporating rich entity information from knowledge bases into pre-trained language models (PLM). We propose an original cooperative knowledge distillation framework to align the masked language modeling pre-training task of language models and the link prediction objective of KB embedding models. By leveraging the information encoded in knowledge bases, our proposed approach provides a new direction to improve the ability of PLM-based slot-filling systems to handle entities
Tuo, Aboubacar. "Extraction d'événements à partir de peu d'exemples par méta-apprentissage." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG098.
Information Extraction (IE) is a research field with the objective of automatically identifying and extracting structured information within a given domain from unstructured or minimally structured text data. The implementation of such extractions often requires significant human efforts, either in the form of rule development or the creation of annotated data for systems based on machine learning. One of the current challenges in information extraction is to develop methods that minimize the costs and development time of these systems whenever possible. This thesis focuses on few-shot event extraction through a meta-learning approach that aims to train IE models from only few data. We have redefined the task of event extraction from this perspective, aiming to develop systems capable of quickly adapting to new contexts with a small volume of training data. First, we propose methods to enhance event trigger detection by developing more robust representations for this task. Then, we tackle the specific challenge raised by the "NULL" class (absence of events) within this framework. Finally, we evaluate the effectiveness of our proposals within the broader context of event extraction by extending their application to the extraction of event arguments
Carvalho, Micael. "Deep representation spaces." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS292.
In recent years, Deep Learning techniques have swept the state-of-the-art of many applications of Machine Learning, becoming the new standard approach for them. The architectures issued from these techniques have been used for transfer learning, which extended the power of deep models to tasks that did not have enough data to fully train them from scratch. This thesis' subject of study is the representation spaces created by deep architectures. First, we study properties inherent to them, with particular interest in dimensionality redundancy and precision of their features. Our findings reveal a strong degree of robustness, pointing the path to simple and powerful compression schemes. Then, we focus on refining these representations. We choose to adopt a cross-modal multi-task problem, and design a loss function capable of taking advantage of data coming from multiple modalities, while also taking into account different tasks associated to the same dataset. In order to correctly balance these losses, we also we develop a new sampling scheme that only takes into account examples contributing to the learning phase, i.e. those having a positive loss. Finally, we test our approach in a large-scale dataset of cooking recipes and associated pictures. Our method achieves a 5-fold improvement over the state-of-the-art, and we show that the multi-task aspect of our approach promotes a semantically meaningful organization of the representation space, allowing it to perform subtasks never seen during training, like ingredient exclusion and selection. The results we present in this thesis open many possibilities, including feature compression for remote applications, robust multi-modal and multi-task learning, and feature space refinement. For the cooking application, in particular, many of our findings are directly applicable in a real-world context, especially for the detection of allergens, finding alternative recipes due to dietary restrictions, and menu planning
Ran, Peipei. "Imaging and diagnostic of sub-wavelength micro-structures, from closed-form algorithms to deep learning." Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG061.
Electromagnetic probing of a gridlike, finite set of infinitely long circular cylindrical dielectric rods affected by missing ones is investigated from time-harmonic single and multiple frequency data. Sub-wavelength distances between adjacent rods and sub-wavelength rod diameters are assumed throughout the frequency band of operation and this leads to a severe challenge due to need of super-resolution within the present micro-structure, well beyond the Rayleigh criterion. A wealth of solution methods is investigated and comprehensive numerical simulations illustrate pros and cons, completed by processing laboratory-controlled experimental data acquired on a micro-structure prototype in a microwave anechoic chamber. These methods, which differ per a priori information accounted for and consequent versatility, include time-reversal, binary-specialized contrast-source and sparsity-constrained inversions, and convolutional neural networks possibly combined with recurrent ones
Ben-Younes, Hedi. "Multi-modal representation learning towards visual reasoning." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS173.
The quantity of images that populate the Internet is dramatically increasing. It becomes of critical importance to develop the technology for a precise and automatic understanding of visual contents. As image recognition systems are becoming more and more relevant, researchers in artificial intelligence now seek for the next generation vision systems that can perform high-level scene understanding. In this thesis, we are interested in Visual Question Answering (VQA), which consists in building models that answer any natural language question about any image. Because of its nature and complexity, VQA is often considered as a proxy for visual reasoning. Classically, VQA architectures are designed as trainable systems that are provided with images, questions about them and their answers. To tackle this problem, typical approaches involve modern Deep Learning (DL) techniques. In the first part, we focus on developping multi-modal fusion strategies to model the interactions between image and question representations. More specifically, we explore bilinear fusion models and exploit concepts from tensor analysis to provide tractable and expressive factorizations of parameters. These fusion mechanisms are studied under the widely used visual attention framework: the answer to the question is provided by focusing only on the relevant image regions. In the last part, we move away from the attention mechanism and build a more advanced scene understanding architecture where we consider objects and their spatial and semantic relations. All models are thoroughly experimentally evaluated on standard datasets and the results are competitive with the literature
Rohé, Marc-Michel. "Représentation réduite de la segmentation et du suivi des images cardiaques pour l’analyse longitudinale de groupe." Thesis, Université Côte d'Azur (ComUE), 2017. http://www.theses.fr/2017AZUR4051/document.
This thesis presents image-based methods for the analysis of cardiac motion to enable group-wise statistics, automatic diagnosis and longitudinal study. This is achieved by combining advanced medical image processing with machine learning methods and statistical modelling. The first axis of this work is to define an automatic method for the segmentation of the myocardium. We develop a very-fast registration method based on convolutional neural networks that is trained to learn inter-subject heart registration. Then, we embed this registration method into a multi-atlas segmentation pipeline. The second axis of this work is focused on the improvement of cardiac motion tracking methods in order to define relevant low-dimensional representations. Two different methods are developed, one relying on Barycentric Subspaces built on ref- erences frames of the sequence, and another based on a reduced order representation of the motion from polyaffine transformations. Finally, in the last axis, we apply the previously defined representation to the problem of diagnosis and longitudinal analysis. We show that these representations encode relevant features allowing the diagnosis of infarcted patients and Tetralogy of Fallot versus controls and the analysis of the evolution through time of the cardiac motion of patients with either cardiomyopathies or obesity. These three axes form an end to end framework for the study of cardiac motion starting from the acquisition of the medical images to their automatic analysis. Such a framework could be used for diagonis and therapy planning in order to improve the clinical decision making with a more personalised computer-aided medicine
Coria, Juan Manuel. "Continual Representation Learning in Written and Spoken Language." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG025.
Although machine learning has recently witnessed major breakthroughs, today's models are mostly trained once on a target task and then deployed, rarely (if ever) revisiting their parameters.This problem affects performance after deployment, as task specifications and data may evolve with user needs and distribution shifts.To solve this, continual learning proposes to train models over time as new data becomes available.However, models trained in this way suffer from significant performance loss on previously seen examples, a phenomenon called catastrophic forgetting.Although many studies have proposed different strategies to prevent forgetting, they often rely on labeled data, which is rarely available in practice. In this thesis, we study continual learning for written and spoken language.Our main goal is to design autonomous and self-learning systems able to leverage scarce on-the-job data to adapt to the new environments they are deployed in.Contrary to recent work on learning general-purpose representations (or embeddings), we propose to leverage representations that are tailored to a downstream task.We believe the latter may be easier to interpret and exploit by unsupervised training algorithms like clustering, that are less prone to forgetting. Throughout our work, we improve our understanding of continual learning in a variety of settings, such as the adaptation of a language model to new languages for sequence labeling tasks, or even the adaptation to a live conversation in the context of speaker diarization.We show that task-specific representations allow for effective low-resource continual learning, and that a model's own predictions can be exploited for full self-learning
Merckling, Astrid. "Unsupervised pretraining of state representations in a rewardless environment." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS141.
This thesis seeks to extend the capabilities of state representation learning (SRL) to help scale deep reinforcement learning (DRL) algorithms to continuous control tasks with high-dimensional sensory observations (such as images). SRL allows to improve the performance of DRL by providing it with better inputs than the input embeddings learned from scratch with end-to-end strategies. Specifically, this thesis addresses the problem of performing state estimation in the manner of deep unsupervised pretraining of state representations without reward. These representations must verify certain properties to allow for the correct application of bootstrapping and other decision making mechanisms common to supervised learning, such as being low-dimensional and guaranteeing the local consistency and topology (or connectivity) of the environment, which we will seek to achieve through the models pretrained with the two SRL algorithms proposed in this thesis
Prang, Mathieu. "Representation learning for symbolic music." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS489.
A key part in the recent success of deep language processing models lies in the ability to learn efficient word embeddings. These methods provide structured spaces of reduced dimensionality with interesting metric relationship properties. These, in turn, can be used as efficient input representations for handling more complex tasks. In this thesis, we focus on the task of learning embedding spaces for polyphonic music in the symbolic domain. To do so, we explore two different approaches.We introduce an embedding model based on a convolutional network with a novel type of self-modulated hierarchical attention, which is computed at each layer to obtain a hierarchical vision of musical information.Then, we propose another system based on VAEs, a type of auto-encoder that constrains the data distribution of the latent space to be close to a prior distribution. As polyphonic music information is very complex, the design of input representation is a crucial process. Hence, we introduce a novel representation of symbolic music data, which transforms a polyphonic score into a continuous signal.Finally, we show the potential of the resulting embedding spaces through the development of several creative applications used to enhance musical knowledge and expression, through tasks such as melodies modification or composer identification
Engilberge, Martin. "Deep Inside Visual-Semantic Embeddings." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS150.
Nowadays Artificial Intelligence (AI) is omnipresent in our society. The recentdevelopment of learning methods based on deep neural networks alsocalled "Deep Learning" has led to a significant improvement in visual representation models.and textual.In this thesis, we aim to further advance image representation and understanding.Revolving around Visual Semantic Embedding (VSE) approaches, we explore different directions: We present relevant background covering images and textual representation and existing multimodal approaches. We propose novel architectures further improving retrieval capability of VSE and we extend VSE models to novel applications and leverage embedding models to visually ground semantic concept. Finally, we delve into the learning process andin particular the loss function by learning differentiable approximation of ranking based metric
Gainon, de Forsan de Gabriac Clara. "Deep Natural Language Processing for User Representation." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS274.
The last decade has witnessed the impressive expansion of Deep Learning (DL) methods, both in academic research and the private sector. This success can be explained by the ability DL to model ever more complex entities. In particular, Representation Learning methods focus on building latent representations from heterogeneous data that are versatile and re-usable, namely in Natural Language Processing (NLP). In parallel, the ever-growing number of systems relying on user data brings its own lot of challenges. This work proposes methods to leverage the representation power of NLP in order to learn rich and versatile user representations.Firstly, we detail the works and domains associated with this thesis. We study Recommendation. We then go over recent NLP advances and how they can be applied to leverage user-generated texts, before detailing Generative models.Secondly, we present a Recommender System (RS) that is based on the combination of a traditional Matrix Factorization (MF) representation method and a sentiment analysis model. The association of those modules forms a dual model that is trained on user reviews for rating prediction. Experiments show that, on top of improving performances, the model allows us to better understand what the user is really interested in in a given item, as well as to provide explanations to the suggestions made.Finally, we introduce a new task-centered on UR: Professional Profile Learning. We thus propose an NLP-based framework, to learn and evaluate professional profiles on different tasks, including next job generation
Vialatte, Jean-Charles. "Convolution et apprentissage profond sur graphes." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2018. http://www.theses.fr/2018IMTA0118/document.
Convolutional neural networks have proven to be the deep learning model that performs best on regularly structured datasets like images or sounds. However, they cannot be applied on datasets with an irregular structure (e.g. sensor networks, citation networks, MRIs). In this thesis, we develop an algebraic theory of convolutions on irregular domains. We construct a family of convolutions that are based on group actions (or, more generally, groupoid actions) that acts on the vertex domain and that have properties that depend on the edges. With the help of these convolutions, we propose extensions of convolutional neural netowrks to graph domains. Our researches lead us to propose a generic formulation of the propagation between layers, that we call the neural contraction. From this formulation, we derive many novel neural network models that can be applied on irregular domains. Through benchmarks and experiments, we show that they attain state-of-the-art performances, and beat them in some cases
Katranji, Mehdi. "Apprentissage profond de la mobilité des personnes." Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCA024.
Knowledge of mobility is a major challenge for authorities mobility organisers and urban planning. Due to the lack of formal definition of human mobility, the term "people's mobility" will be used in this book. This topic will be introduced by a description of the ecosystem by considering these actors and applications.The creation of a learning model has prerequisites: an understanding of the typologies of the available data sets, their strengths and weaknesses. This state of the art in mobility knowledge is based on the four-step model that has existed and been used since 1970, ending with the renewal of the methodologies of recent years.Our models of people's mobility are then presented. Their common point is the emphasis on the individual, unlike traditional approaches that take the locality as a reference. The models we propose are based on the fact that the intake of individuals' decisions is based on their perception of the environment.This finished book on the study of the deep learning methods of Boltzmann machines restricted. After a state of the art of this family of models, we are looking for strategies to make these models viable in the application world. This last chapter is our contribution main theoretical, by improving robustness and performance of these models
Deschaintre, Valentin. "Acquisition légère de matériaux par apprentissage profond." Thesis, Université Côte d'Azur (ComUE), 2019. http://theses.univ-cotedazur.fr/2019AZUR4078.
Whether it is used for entertainment or industrial design, computer graphics is ever more present in our everyday life. Yet, reproducing a real scene appearance in a virtual environment remains a challenging task, requiring long hours from trained artists. A good solution is the acquisition of geometries and materials directly from real world examples, but this often comes at the cost of complex hardware and calibration processes. In this thesis, we focus on lightweight material appearance capture to simplify and accelerate the acquisition process and solve industrial challenges such as result image resolution or calibration. Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in pictures. Designing algorithms able to leverage these cues to recover spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a few images has challenged computer graphics researchers for decades. We explore the use of deep learning to tackle lightweight appearance capture and make sense of these visual cues. Once trained, our networks are capable of recovering per-pixel normals, diffuse albedo, specular albedo and specular roughness from as little as one picture of a flat surface lit by the environment or a hand-held flash. We show how our method improves its prediction with the number of input pictures to reach high quality reconstructions with up to 10 images --- a sweet spot between existing single-image and complex multi-image approaches --- and allows to capture large scale, HD materials. We achieve this goal by introducing several innovations on training data acquisition and network design, bringing clear improvement over the state of the art for lightweight material capture
Paumard, Marie-Morgane. "Résolution automatique de puzzles par apprentissage profond." Thesis, CY Cergy Paris Université, 2020. http://www.theses.fr/2020CYUN1067.
The objective of this thesis is to develop semantic methods of reassembly in the complicated framework of heritage collections, where some blocks are eroded or missing.The reassembly of archaeological remains is an important task for heritage sciences: it allows to improve the understanding and conservation of ancient vestiges and artifacts. However, some sets of fragments cannot be reassembled with techniques using contour information or visual continuities. It is then necessary to extract semantic information from the fragments and to interpret them. These tasks can be performed automatically thanks to deep learning techniques coupled with a solver, i.e., a constrained decision making algorithm.This thesis proposes two semantic reassembly methods for 2D fragments with erosion and a new dataset and evaluation metrics.The first method, Deepzzle, proposes a neural network followed by a solver. The neural network is composed of two Siamese convolutional networks trained to predict the relative position of two fragments: it is a 9-class classification. The solver uses Dijkstra's algorithm to maximize the joint probability. Deepzzle can address the case of missing and supernumerary fragments, is capable of processing about 15 fragments per puzzle, and has a performance that is 25% better than the state of the art.The second method, Alphazzle, is based on AlphaZero and single-player Monte Carlo Tree Search (MCTS). It is an iterative method that uses deep reinforcement learning: at each step, a fragment is placed on the current reassembly. Two neural networks guide MCTS: an action predictor, which uses the fragment and the current reassembly to propose a strategy, and an evaluator, which is trained to predict the quality of the future result from the current reassembly. Alphazzle takes into account the relationships between all fragments and adapts to puzzles larger than those solved by Deepzzle. Moreover, Alphazzle is compatible with constraints imposed by a heritage framework: at the end of reassembly, MCTS does not access the reward, unlike AlphaZero. Indeed, the reward, which indicates if a puzzle is well solved or not, can only be estimated by the algorithm, because only a conservator can be sure of the quality of a reassembly
Haykal, Vanessa. "Modélisation des séries temporelles par apprentissage profond." Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4019.
Time series prediction is a problem that has been addressed for many years. In this thesis, we have been interested in methods resulting from deep learning. It is well known that if the relationships between the data are temporal, it is difficult to analyze and predict accurately due to non-linear trends and the existence of noise specifically in the financial and electrical series. From this context, we propose a new hybrid noise reduction architecture that models the recursive error series to improve predictions. The learning process fusessimultaneouslyaconvolutionalneuralnetwork(CNN)andarecurrentlongshort-term memory network (LSTM). This model is distinguished by its ability to capture globally a variety of hybrid properties, where it is able to extract local signal features, to learn long-term and non-linear dependencies, and to have a high noise resistance. The second contribution concerns the limitations of the global approaches because of the dynamic switching regimes in the signal. We present a local unsupervised modification with our previous architecture in order to adjust the results by adapting the Hidden Markov Model (HMM). Finally, we were also interested in multi-resolution techniques to improve the performance of the convolutional layers, notably by using the variational mode decomposition method (VMD)
Sors, Arnaud. "Apprentissage profond pour l'analyse de l'EEG continu." Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAS006/document.
The objective of this research is to explore and develop machine learning methods for the analysis of continuous electroencephalogram (EEG). Continuous EEG is an interesting modality for functional evaluation of cerebral state in the intensive care unit and beyond. Today its clinical use remains more limited that it could be because interpretation is still mostly performed visually by trained experts. In this work we develop automated analysis tools based on deep neural models.The subparts of this work hinge around post-anoxic coma prognostication, chosen as pilot application. A small number of long-duration records were performed and available existing data was gathered from CHU Grenoble. Different components of a semi-supervised architecture that addresses the application are imagined, developed, and validated on surrogate tasks.First, we validate the effectiveness of deep neural networks for EEG analysis from raw samples. For this we choose the supervised task of sleep stage classification from single-channel EEG. We use a convolutional neural network adapted for EEG and we train and evaluate the system on the SHHS (Sleep Heart Health Study) dataset. This constitutes the first neural sleep scoring system at this scale (5000 patients). Classification performance reaches or surpasses the state of the art.In real use for most clinical applications, the main challenge is the lack of (and difficulty of establishing) suitable annotations on patterns or short EEG segments. Available annotations are high-level (for example, clinical outcome) and therefore they are few. We search how to learn compact EEG representations in an unsupervised/semi-supervised manner. The field of unsupervised learning using deep neural networks is still young. To compare to existing work we start with image data and investigate the use of generative adversarial networks (GANs) for unsupervised adversarial representation learning. The quality and stability of different variants are evaluated. We then apply Gradient-penalized Wasserstein GANs on EEG sequences generation. The system is trained on single channel sequences from post-anoxic coma patients and is able to generate realistic synthetic sequences. We also explore and discuss original ideas for learning representations through matching distributions in the output space of representative networks.Finally, multichannel EEG signals have specificities that should be accounted for in characterization architectures. Each EEG sample is an instantaneous mixture of the activities of a number of sources. Based on this statement we propose an analysis system made of a spatial analysis subsystem followed by a temporal analysis subsystem. The spatial analysis subsystem is an extension of source separation methods built with a neural architecture with adaptive recombination weights, i.e. weights that are not learned but depend on features of the input. We show that this architecture learns to perform Independent Component Analysis if it is trained on a measure of non-gaussianity. For temporal analysis, standard (shared) convolutional neural networks applied on separate recomposed channels can be used
Assis, Youssef. "Détection des anévrismes intracrâniens par apprentissage profond." Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0012.
Intracranial aneurysms are local dilatations of cerebral blood vessels, presenting a significant risk of rupture, which can lead to serious consequences. Early detection of unruptured aneurysms is therefore crucial to prevent potentially fatal complications. However, analyzing medical images to locate these aneurysms is a complex and time-consuming task, requiring time and expertise, and yet remains prone to errors in interpretation. Faced with these challenges, this thesis explores automated methods for the detection of aneurysms, aiming to facilitate the work of radiologists and improve diagnostic efficiency. Our approach focuses on the use of artificial intelligence techniques, particularly deep neural networks, for the detection of aneurysms from time-of-flight magnetic resonance angiography (TOF-MRA) images. Our research work is centered around several main axes. Firstly, due to the scarcity of training data in the medical field, we adopt a rapid, although approximate, annotation method to facilitate data collection. Furthermore, we propose a strategy based on small patches. In association with data synthesis, the samples are multiplied in the training database. By selecting the samples, their distribution is adjusted to facilitate optimization. Secondly, for the automated detection of aneurysms, we investigate various neural network architectures. An initial approach explores image segmentation networks. Then, we propose an innovative architecture inspired by object detection methods. These architectures, especially the latter, lead to competitive results, particularly in terms of sensitivity compared to experts. Thirdly, beyond the detection of aneurysms, we extend our model to estimate the pose of aneurysms in 3D images. This can greatly facilitate their analysis and interpretation in reformatted cross-sectional plans. A thorough evaluation of the proposed models is systematically carried out, including ablation studies, the use of metrics adapted to the problem of detection, and evaluations conducted by clinical experts, allowing us to assess their potential effectiveness for clinical use. In particular, we highlight the issues related to uncertainty in the annotation of existing databases
Sheikh, Shakeel Ahmad. "Apprentissage profond pour la détection du bégaiement." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0005.
Stuttering is a speech disorder that is most frequently observed among speech impairments and results in the form of core behaviours. The tedious and time-consuming task of detecting and analyzing speech patterns of PWS, with the goal of rectifying them is often handled manually by speech therapists, and is biased towards their subjective beliefs. Moreover, the ASR systems also fail to recognize the stuttered speech, which makes it impractical for PWS to access virtual digital assistants such as Siri, Alexa, etc.This thesis tries to develop audio based SD systems that successfully capture different variabilities from stuttering utterances such as speaking styles, age, accents, etc., and learns robust stuttering representations with an aim to provide a fair, consistent, and unbiased assessment of stuttered speech.While most of the existing SD systems use multiple binary classifiers for each stutter type, we present a unified multi-class StutterNet capable of detecting multiple stutter types. Approaching the class-imbalance problem in stuttering domain, we investigated the impact of applying weighted loss function, and, also presented Multi-contextual (MC) Multi-branch (MB) StutterNet to improve the detection performance of minority classes.Exploiting the speaker information with an assumption that the stuttering models should be invariant to meta-data such as speaker information, we present, an adversarial MTL SD method that learns robust stutter discrimintaive speaker-invariant representations.Due to paucity of unlabeled data, the automated SD task is limited in its use of large deep models in capturing different varaibilities, we introduced the first-ever SSL framework to SD domain. The SSL framework first trains a feature extractor for a pre-text task using a large quantity of unlabeled non-stuttering audio data to capture these different varaibilities, and then applies the learned feature extractor to a downstream SD task using limited labeled stuttering audio data
Ostertag, Cécilia. "Analyse des pathologies neuro-dégénératives par apprentissage profond." Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS003.
Monitoring and predicting the cognitive state of a subject affected by a neuro-degenerative disorder is crucial to provide appropriate treatment as soon as possible. Thus, these patients are followed for several years, as part of longitudinal medical studies. During each visit, a large quantity of data is acquired : risk factors linked to the pathology, medical imagery (MRI or PET scans for example), cognitive tests results, sampling of molecules that have been identified as bio-markers, etc. These various modalities give information about the disease's progression, some of them are complementary and others can be redundant. Several deep learning models have been applied to bio-medical data, notably for organ segmentation or pathology diagnosis. This PhD is focused on the conception of a deep neural network model for cognitive decline prediction, using multimodal data, here both structural brain MRI images and clinical data. In this thesis we propose an architecture made of sub-modules tailored to each modality : 3D convolutional network for the brain MRI, and fully connected layers for the quantitative and qualitative clinical data. To predict the patient's evolution, this model takes as input data from two medical visits for each patient. These visits are compared using a siamese architecture. After training and validating this model with Alzheimer's disease as our use case, we look into knowledge transfer to other neuro-degenerative pathologies, and we use transfer learning to adapt our model to Parkinson's disease. Finally, we discuss the choices we made to take into account the temporal aspect of our problem, both during the ground truth creation using the long-term evolution of a cognitive score, and for the choice of using pairs of visits as input instead of longer sequences
Mensch, Arthur. "Apprentissage de représentations en imagerie fonctionnelle." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS300/document.
Thanks to the advent of functional brain-imaging technologies, cognitive neuroscience is accumulating maps of neural activity responses to specific tasks or stimuli, or of spontaneous activity. In this work, we consider data from functional Magnetic Resonance Imaging (fMRI), that we study in a machine learning setting: we learn a model of brain activity that should generalize on unseen data. After reviewing the standard fMRI data analysis techniques, we propose new methods and models to benefit from the recently released large fMRI data repositories. Our goal is to learn richer representations of brain activity. We first focus on unsupervised analysis of terabyte-scale fMRI data acquired on subjects at rest (resting-state fMRI). We perform this analysis using matrix factorization. We present new methods for running sparse matrix factorization/dictionary learning on hundreds of fMRI records in reasonable time. Our leading approach relies on introducing randomness in stochastic optimization loops and provides speed-up of an order of magnitude on a variety of settings and datasets. We provide an extended empirical validation of our stochastic subsampling approach, for datasets from fMRI, hyperspectral imaging and collaborative filtering. We derive convergence properties for our algorithm, in a theoretical analysis that reaches beyond the matrix factorization problem. We then turn to work with fMRI data acquired on subject undergoing behavioral protocols (task fMRI). We investigate how to aggregate data from many source studies, acquired with many different protocols, in order to learn more accurate and interpretable decoding models, that predicts stimuli or tasks from brain maps. Our multi-study shared-layer model learns to reduce the dimensionality of input brain images, simultaneously to learning to decode these images from their reduced representation. This fosters transfer learning in between studies, as we learn the undocumented cognitive common aspects that the many fMRI studies share. As a consequence, our multi-study model performs better than single-study decoding. Our approach identifies universally relevant representation of brain activity, supported by a few task-optimized networks learned during model fitting. Finally, on a related topic, we show how to use dynamic programming within end-to-end trained deep networks, with applications in natural language processing
Risser-Maroix, Olivier. "Similarité visuelle et apprentissage de représentations." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7327.
The objective of this CIFRE thesis is to develop an image search engine, based on computer vision, to assist customs officers. Indeed, we observe, paradoxically, an increase in security threats (terrorism, trafficking, etc.) coupled with a decrease in the number of customs officers. The images of cargoes acquired by X-ray scanners already allow the inspection of a load without requiring the opening and complete search of a controlled load. By automatically proposing similar images, such a search engine would help the customs officer in his decision making when faced with infrequent or suspicious visual signatures of products. Thanks to the development of modern artificial intelligence (AI) techniques, our era is undergoing great changes: AI is transforming all sectors of the economy. Some see this advent of "robotization" as the dehumanization of the workforce, or even its replacement. However, reducing the use of AI to the simple search for productivity gains would be reductive. In reality, AI could allow to increase the work capacity of humans and not to compete with them in order to replace them. It is in this context, the birth of Augmented Intelligence, that this thesis takes place. This manuscript devoted to the question of visual similarity is divided into two parts. Two practical cases where the collaboration between Man and AI is beneficial are proposed. In the first part, the problem of learning representations for the retrieval of similar images is still under investigation. After implementing a first system similar to those proposed by the state of the art, one of the main limitations is pointed out: the semantic bias. Indeed, the main contemporary methods use image datasets coupled with semantic labels only. The literature considers that two images are similar if they share the same label. This vision of the notion of similarity, however fundamental in AI, is reductive. It will therefore be questioned in the light of work in cognitive psychology in order to propose an improvement: the taking into account of visual similarity. This new definition allows a better synergy between the customs officer and the machine. This work is the subject of scientific publications and a patent. In the second part, after having identified the key components allowing to improve the performances of thepreviously proposed system, an approach mixing empirical and theoretical research is proposed. This secondcase, augmented intelligence, is inspired by recent developments in mathematics and physics. First applied tothe understanding of an important hyperparameter (temperature), then to a larger task (classification), theproposed method provides an intuition on the importance and role of factors correlated to the studied variable(e.g. hyperparameter, score, etc.). The processing chain thus set up has demonstrated its efficiency byproviding a highly explainable solution in line with decades of research in machine learning. These findings willallow the improvement of previously developed solutions
Nguyen, Thanh Hai. "Some contributions to deep learning for metagenomics." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS102.
Metagenomic data from human microbiome is a novel source of data for improving diagnosis and prognosis in human diseases. However, to do a prediction based on individual bacteria abundance is a challenge, since the number of features is much bigger than the number of samples. Hence, we face the difficulties related to high dimensional data processing, as well as to the high complexity of heterogeneous data. Machine Learning has obtained great achievements on important metagenomics problems linked to OTU-clustering, binning, taxonomic assignment, etc. The contribution of this PhD thesis is multi-fold: 1) a feature selection framework for efficient heterogeneous biomedical signature extraction, and 2) a novel deep learning approach for predicting diseases using artificial image representations. The first contribution is an efficient feature selection approach based on visualization capabilities of Self-Organizing Maps for heterogeneous data fusion. The framework is efficient on a real and heterogeneous datasets containing metadata, genes of adipose tissue, and gut flora metagenomic data with a reasonable classification accuracy compared to the state-of-the-art methods. The second approach is a method to visualize metagenomic data using a simple fill-up method, and also various state-of-the-art dimensional reduction learning approaches. The new metagenomic data representation can be considered as synthetic images, and used as a novel data set for an efficient deep learning method such as Convolutional Neural Networks. The results show that the proposed methods either achieve the state-of-the-art predictive performance, or outperform it on public rich metagenomic benchmarks
Cohen-Hadria, Alice. "Estimation de descriptions musicales et sonores par apprentissage profond." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS607.
In Music Information Retrieval (MIR) and voice processing, the use of machine learning tools has become in the last few years more and more standard. Especially, many state-of-the-art systems now rely on the use of Neural Networks.In this thesis, we propose a wide overview of four different MIR and voice processing tasks, using systems built with neural networks. More precisely, we will use convolutional neural networks, an image designed class neural networks. The first task presented is music structure estimation. For this task, we will show how the choice of input representation can be critical, when using convolutional neural networks. The second task is singing voice detection. We will present how to use a voice detection system to automatically align lyrics and audio tracks.With this alignment mechanism, we have created the largest synchronized audio and speech data set, called DALI. Singing voice separation is the third task. For this task, we will present a data augmentation strategy, a way to significantly increase the size of a training set. Finally, we tackle voice anonymization. We will present an anonymization method that both obfuscate content and mask the speaker identity, while preserving the acoustic scene