Tesis sobre el tema "Apprentissage profond supervisé"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Apprentissage profond supervisé".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Moradi, Fard Maziar. "Apprentissage de représentations de données dans un apprentissage non-supervisé". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM053.
Texto completoDue to the great impact of deep learning on variety fields of machine learning, recently their abilities to improve clustering approaches have been investi- gated. At first, deep learning approaches (mostly Autoencoders) have been used to reduce the dimensionality of the original space and to remove possible noises (also to learn new data representations). Such clustering approaches that utilize deep learning approaches are called Deep Clustering. This thesis focuses on developing Deep Clustering models which can be used for different types of data (e.g., images, text). First we propose a Deep k-means (DKM) algorithm where learning data representations (through a deep Autoencoder) and cluster representatives (through the k-means) are performed in a joint way. The results of our DKM approach indicate that this framework is able to outperform similar algorithms in Deep Clustering. Indeed, our proposed framework is able to truly and smoothly backpropagate the loss function error through all learnable variables.Moreover, we propose two frameworks named SD2C and PCD2C which are able to integrate respectively seed words and pairwise constraints into end-to-end Deep Clustering frameworks. In fact, by utilizing such frameworks, the users can observe the reflection of their needs in clustering. Finally, the results obtained from these frameworks indicate their ability to obtain more tailored results
Doan, Tien Tai. "Réalisation d’une aide au diagnostic en orthodontie par apprentissage profond". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG033.
Texto completoAccurate processing and diagnosis of dental images is an essential factor determining the success of orthodontic treatment. Many image processing methods have been proposed to address this problem. Those studies mainly work on small datasets of radiographs under laboratory conditions and are not highly applicable as complete products or services. In this thesis, we train deep learning models to diagnose dental problems such as gingivitis and crowded teeth using mobile phones' images. We study feature layers of these models to find the strengths and limitations of each method. Besides training deep learning models, we also embed each of them in a pipeline, including preprocessing and post-processing steps, to create a complete product. For the lack of training data problem, we studied a variety of methods for data augmentation, especially domain adaptation methods using image-to-image translation models, both supervised and unsupervised, and obtain promising results. Image translation networks are also used to simplifying patients' choice of orthodontic appliances by showing them how their teeth could look like during treatment. Generated images have are realistic and in high resolution. Researching further into unsupervised image translation neural networks, we propose an unsupervised imageto- image translation model which can manipulate features of objects in the image without requiring additional annotation. Our model outperforms state-of-the-art techniques on multiple image translation applications and is also extended for few-shot learning problems
Katranji, Mehdi. "Apprentissage profond de la mobilité des personnes". Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCA024.
Texto completoKnowledge of mobility is a major challenge for authorities mobility organisers and urban planning. Due to the lack of formal definition of human mobility, the term "people's mobility" will be used in this book. This topic will be introduced by a description of the ecosystem by considering these actors and applications.The creation of a learning model has prerequisites: an understanding of the typologies of the available data sets, their strengths and weaknesses. This state of the art in mobility knowledge is based on the four-step model that has existed and been used since 1970, ending with the renewal of the methodologies of recent years.Our models of people's mobility are then presented. Their common point is the emphasis on the individual, unlike traditional approaches that take the locality as a reference. The models we propose are based on the fact that the intake of individuals' decisions is based on their perception of the environment.This finished book on the study of the deep learning methods of Boltzmann machines restricted. After a state of the art of this family of models, we are looking for strategies to make these models viable in the application world. This last chapter is our contribution main theoretical, by improving robustness and performance of these models
Boussik, Amine. "Apprentissage profond non-supervisé : Application à la détection de situations anormales dans l’environnement du train autonome". Electronic Thesis or Diss., Valenciennes, Université Polytechnique Hauts-de-France, 2023. http://www.theses.fr/2023UPHF0040.
Texto completoThe thesis addresses the challenges of monitoring the environment and detecting anomalies, especially obstacles, for an autonomous freight train. Although traditionally, rail transport was under human supervision, autonomous trains offer potential advantages in terms of costs, time, and safety. However, their operation in complex environments poses significant safety concerns. Instead of a supervised approach that requires costly and limited annotated data, this research adopts an unsupervised technique, using unlabeled data to detect anomalies based on methods capable of identifying atypical behaviors.Two environmental surveillance models are presented : the first, based on a convolutional autoencoder (CAE), is dedicated to identifying obstacles on the main track; the second, an advanced version incorporating the vision transformer (ViT), focuses on overall environmental surveillance. Both employ unsupervised learning techniques for anomaly detection.The results show that the highlighted method offers relevant insights for monitoring the environment of the autonomous freight train, holding potential to enhance its reliability and safety. The use of unsupervised techniques thus showcases the utility and relevance of their adoption in an application context for the autonomous train
Bilodeau, Anthony. "Apprentissage faiblement supervisé appliqué à la segmentation d'images de protéines neuronales". Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/39752.
Texto completoThèse ou mémoire avec insertion d'articles
Tableau d'honneur de la Faculté des études supérieures et postdoctorales, 2020-2021
En biologie cellulaire, la microscopie optique est couramment utilisée pour visualiser et caractériser la présence et la morphologie des structures biologiques. Suite à l’acquisition, un expert devra effectuer l’annotation des structures pour quantification. Cette tâche est ardue, requiert de nombreuses heures de travail, parfois répétitif, qui peut résulter en erreurs d’annotations causées par la fatigue d’étiquetage. L’apprentissage machine promet l’automatisation de tâches complexes à partir d’un grand lot de données exemples annotés. Mon projet de maîtrise propose d’utiliser des techniques faiblement supervisées, où les annotations requises pour l’entraînement sont réduites et/ou moins précises, pour la segmentation de structures neuronales. J’ai d’abord testé l’utilisation de polygones délimitant la structure d’intérêt pour la tâche complexe de segmentation de la protéine neuronale F-actine dans des images de microscopie à super-résolution. La complexité de la tâche est supportée par la morphologie hétérogène des neurones, le nombre élevé d’instances à segmenter dans une image et la présence de nombreux distracteurs. Malgré ces difficultés, l’utilisation d’annotations faibles a permis de quantifier un changement novateur de la conformation de la protéine F-actine en fonction de l’activité neuronale. J’ai simplifié davantage la tâche d’annotation en requérant seulement des étiquettes binaires renseignant sur la présence des structures dans l’image réduisant d’un facteur 30 le temps d’annotation. De cette façon, l’algorithme est entraîné à prédire le contenu d’une image et extrait ensuite les caractéristiques sémantiques importantes pour la reconnaissance de la structure d’intérêt à l’aide de mécanismes d’attention. La précision de segmentation obtenue sur les images de F-actine est supérieure à celle des annotations polygonales et équivalente à celle des annotations précises d’un expert. Cette nouvelle approche devrait faciliter la quantification des changements dynamiques qui se produisent sous le microscope dans des cellules vivantes et réduire les erreurs causées par l’inattention ou le biais de sélection des régions d’intérêt dans les images de microscopie.
In cell biology, optical microscopy is commonly used to visualize and characterize the presenceand morphology of biological structures. Following the acquisition, an expert will have toannotate the structures for quantification. This is a difficult task, requiring many hours ofwork, sometimes repetitive, which can result in annotation errors caused by labelling fatigue.Machine learning promises to automate complex tasks from a large set of annotated sampledata. My master’s project consists of using weakly supervised techniques, where the anno-tations required for training are reduced and/or less precise, for the segmentation of neuralstructures.I first tested the use of polygons delimiting the structure of interest for the complex taskof segmentation of the neuronal protein F-actin in super-resolution microscopy images. Thecomplexity of the task is supported by the heterogeneous morphology of neurons, the highnumber of instances to segment in an image and the presence of many distractors. Despitethese difficulties, the use of weak annotations has made it possible to quantify an innovativechange in the conformation of the F-actin protein as a function of neuronal activity. I furthersimplified the annotation task by requiring only binary labels that indicate the presence ofstructures in the image, reducing annotation time by a factor of 30. In this way, the algorithmis trained to predict the content of an image and then extract the semantic characteristicsimportant for recognizing the structure of interest using attention mechanisms. The segmen-tation accuracy obtained on F-actin images is higher than that of polygonal annotations andequivalent to that of an expert’s precise annotations. This new approach should facilitate thequantification of dynamic changes that occur under the microscope in living cells and reduceerrors caused by inattention or bias in the selection of regions of interest in microscopy images.
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066056/document.
Texto completoThis thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Chen, Hao. "Vers la ré-identification de personnes non-supervisée". Thesis, Université Côte d'Azur, 2022. http://www.theses.fr/2022COAZ4014.
Texto completoAs a core component of intelligent video surveillance systems, person re-identification (ReID) targets at retrieving a person of interest across non-overlapping cameras. Despite significant improvements in supervised ReID, cumbersome annotation process makes it less scalable in real-world deployments. Moreover, as appearance representations can be affected by noisy factors, such as illumination level and camera properties, between different domains, person ReID models suffer a large performance drop in the presence of domain gaps. We are particularly interested in designing algorithms that can adapt a person ReID model to a target domain without human supervision. In such context, we mainly focus on designing unsupervised domain adaptation and unsupervised representation learning methods for person ReID.In this thesis, we first explore how to build robust representations by combining both global and local features under the supervised condition. Then, towards an unsupervised domain adaptive ReID system, we propose three unsupervised methods for person ReID, including 1) teacher-student knowledge distillation with asymmetric network structures for feature diversity encouragement, 2) joint generative and contrastive learning framework that generates augmented views with a generative adversarial network for contrastive learning, and 3) exploring inter-instance relations and designing relation-aware loss functions for better contrastive learning based person ReID.Our methods have been extensively evaluated on main-stream ReID datasets, such as Market-1501, DukeMTMC-reID and MSMT17. The proposed methods significantly outperform previous methods on the ReID datasets, significantly pushing person ReID to real-world deployments
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066056.
Texto completoThis thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Sahasrabudhe, Mihir. "Unsupervised and weakly supervised deep learning methods for computer vision and medical imaging". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASC010.
Texto completoThe first two contributions of this thesis (Chapter 2 and 3) are models for unsupervised 2D alignment and learning 3D object surfaces, called Deforming Autoencoders (DAE) and Lifting Autoencoders (LAE). These models are capable of identifying canonical space in order to represent different object properties, for example, appearance in a canonical space, deformation associated with this appearance that maps it to the image space, and for human faces, a 3D model for a face, its facial expression, and the angle of the camera. We further illustrate applications of models to other domains_ alignment of lung MRI images in medical image analysis, and alignment of satellite images for remote sensing imagery. In Chapter 4, we concentrate on a problem in medical image analysis_ diagnosis of lymphocytosis. We propose a convolutional network to encode images of blood smears obtained from a patient, followed by an aggregation operation to gather information from all images in order to represent them in one feature vector which is used to determine the diagnosis. Our results show that the performance of the proposed models is at-par with biologists and can therefore augment their diagnosis
Mlynarski, Pawel. "Apprentissage profond pour la segmentation des tumeurs cérébrales et des organes à risque en radiothérapie". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4084.
Texto completoMedical images play an important role in cancer diagnosis and treatment. Oncologists analyze images to determine the different characteristics of the cancer, to plan the therapy and to observe the evolution of the disease. The objective of this thesis is to propose efficient methods for automatic segmentation of brain tumors and organs at risk in the context of radiotherapy planning, using Magnetic Resonance (MR) images. First, we focus on segmentation of brain tumors using Convolutional Neural Networks (CNN) trained on MRIs manually segmented by experts. We propose a segmentation model having a large 3D receptive field while being efficient in terms of computational complexity, based on combination of 2D and 3D CNNs. We also address problems related to the joint use of several MRI sequences (T1, T2, FLAIR). Second, we introduce a segmentation model which is trained using weakly-annotated images in addition to fully-annotated images (with voxelwise labels), which are usually available in very limited quantities due to their cost. We show that this mixed level of supervision considerably improves the segmentation accuracy when the number of fully-annotated images is limited.\\ Finally, we propose a methodology for an anatomy-consistent segmentation of organs at risk in the context of radiotherapy of brain tumors. The segmentations produced by our system on a set of MRIs acquired in the Centre Antoine Lacassagne (Nice, France) are evaluated by an experienced radiotherapist
Geiler, Louis. "Deep learning for churn prediction". Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7333.
Texto completoThe problem of churn prediction has been traditionally a field of study for marketing. However, in the wake of the technological advancements, more and more data can be collected to analyze the customers behaviors. This manuscript has been built in this frame, with a particular focus on machine learning. Thus, we first looked at the supervised learning problem. We have demonstrated that logistic regression, random forest and XGBoost taken as an ensemble offer the best results in terms of Area Under the Curve (AUC) among a wide range of traditional machine learning approaches. We also have showcased that the re-sampling approaches are solely efficient in a local setting and not a global one. Subsequently, we aimed at fine-tuning our prediction by relying on customer segmentation. Indeed,some customers can leave a service because of a cost that they deem to high, and other customers due to a problem with the customer’s service. Our approach was enriched with a novel deep neural network architecture, which operates with both the auto-encoders and the k-means approach. Going further, we focused on self-supervised learning in the tabular domain. More precisely, the proposed architecture was inspired by the work on the SimCLR approach, where we altered the architecture with the Mean-Teacher model from semi-supervised learning. We showcased through the win matrix the superiority of our approach with respect to the state of the art. Ultimately, we have proposed to apply what we have built in this manuscript in an industrial setting, the one of Brigad. We have alleviated the company churn problem with a random forest that we optimized through grid-search and threshold optimization. We also proposed to interpret the results with SHAP (SHapley Additive exPlanations)
Guerry, Joris. "Reconnaissance visuelle robuste par réseaux de neurones dans des scénarios d'exploration robotique. Détecte-moi si tu peux !" Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLX080/document.
Texto completoThe main objective of this thesis is visual recognition for a mobile robot in difficult conditions. We are particularly interested in neural networks which present today the best performances in computer vision. We studied the concept of method selection for the classification of 2D images by using a neural network selector to choose the best available classifier given the observed situation. This strategy works when data can be easily partitioned with respect to available classifiers, which is the case when complementary modalities are used. We have therefore used RGB-D data (2.5D) in particular applied to people detection. We propose a combination of independent neural network detectors specific to each modality (color & depth map) based on the same architecture (Faster RCNN). We share intermediate results of the detectors to allow them to complement and improve overall performance in difficult situations (luminosity loss or acquisition noise of the depth map). We are establishing new state of the art scores in the field and propose a more complex and richer data set to the community (ONERA.ROOM). Finally, we made use of the 3D information contained in the RGB-D images through a multi-view method. We have defined a strategy for generating 2D virtual views that are consistent with the 3D structure. For a semantic segmentation task, this approach artificially increases the training data for each RGB-D image and accumulates different predictions during the test. We obtain new reference results on the SUNRGBD and NYUDv2 datasets. All these works allowed us to handle in an original way 2D, 2.5D and 3D robotic data with neural networks. Whether for classification, detection and semantic segmentation, we not only validated our approaches on difficult data sets, but also brought the state of the art to a new level of performance
Chen, Mickaël. "Learning with weak supervision using deep generative networks". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS024.
Texto completoMany successes of deep learning rely on the availability of massive annotated datasets that can be exploited by supervised algorithms. Obtaining those labels at a large scale, however, can be difficult, or even impossible in many situations. Designing methods that are less dependent on annotations is therefore a major research topic, and many semi-supervised and weakly supervised methods have been proposed. Meanwhile, the recent introduction of deep generative networks provided deep learning methods with the ability to manipulate complex distributions, allowing for breakthroughs in tasks such as image edition and domain adaptation. In this thesis, we explore how these new tools can be useful to further alleviate the need for annotations. Firstly, we tackle the task of performing stochastic predictions. It consists in designing systems for structured prediction that take into account the variability in possible outputs. We propose, in this context, two models. The first one performs predictions on multi-view data with missing views, and the second one predicts possible futures of a video sequence. Then, we study adversarial methods to learn a factorized latent space, in a setting with two explanatory factors but only one of them is annotated. We propose models that aim to uncover semantically consistent latent representations for those factors. One model is applied to the conditional generation of motion capture data, and another one to multi-view data. Finally, we focus on the task of image segmentation, which is of crucial importance in computer vision. Building on previously explored ideas, we propose a model for object segmentation that is entirely unsupervised
Saporta, Antoine. "Domain Adaptation for Urban Scene Segmentation". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS115.
Texto completoThis thesis tackles some of the scientific locks of perception systems based on neural networks for autonomous vehicles. This dissertation discusses domain adaptation, a class of tools aiming at minimizing the need for labeled data. Domain adaptation allows generalization to so-called target data that share structures with the labeled so-called source data allowing supervision but nevertheless following a different statistical distribution. First, we study the introduction of privileged information in the source data, for instance, depth labels. The proposed strategy, BerMuDA, bases its domain adaptation on a multimodal representation obtained by bilinear fusion, modeling complex interactions between segmentation and depth. Next, we examine self-supervised learning strategies in domain adaptation, relying on selecting predictions on the unlabeled target data, serving as pseudo-labels. We propose two new selection criteria: first, an entropic criterion with ESL; then, with ConDA, using an estimate of the true class probability. Finally, the extension of adaptation scenarios to several target domains as well as in a continual learning framework is proposed. Two approaches are presented to extend traditional adversarial methods to multi-target domain adaptation: Multi-Dis. and MTKT. In a continual learning setting for which the target domains are discovered sequentially and without rehearsal, the proposed CTKT approach adapts MTKT to this new problem to tackle catastrophic forgetting
Blot, Michaël. "Étude de l'apprentissage et de la généralisation des réseaux profonds en classification d'images". Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS412.
Texto completoArtificial intelligence is experiencing a resurgence in recent years. This is due to the growing ability to collect and store a considerable amount of digitized data. These huge databases allow machine learning algorithms to respond to certain tasks through supervised learning. Among the digitized data, images remain predominant in the modern environment. Huge datasets have been created. moreover, the image classification has allowed the development of previously neglected models, deep neural networks or deep learning. This family of algorithms demonstrates a great facility to learn perfectly datasets, even very large. Their ability to generalize remains largely misunderstood, but the networks of convolutions are today the undisputed state of the art. From a research and application point of view of deep learning, the demands will be more and more demanding, requiring to make an effort to bring the performances of the neuron networks to the maximum of their capacities. This is the purpose of our research, whose contributions are presented in this thesis. We first looked at the issue of training and considered accelerating it through distributed methods. We then studied the architectures in order to improve them without increasing their complexity. Finally, we particularly study the regularization of network training. We studied a regularization criterion based on information theory that we deployed in two different ways
Durand, Thibaut. "Weakly supervised learning for visual recognition". Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066142.
Texto completoThis thesis studies the problem of classification of images, where the goal is to predict if a semantic category is present in the image, based on its visual content. To analyze complex scenes, it is important to learn localized representations. To limit the cost of annotation during training, we have focused on weakly supervised learning approaches. In this thesis, we propose several models that simultaneously classify and localize objects, using only global labels during training. The weak supervision significantly reduces the cost of full annotation, but it makes learning more challenging. The key issue is how to aggregate local scores - e.g. regions - into global score - e.g. image. The main contribution of this thesis is the design of new pooling functions for weakly supervised learning. In particular, we propose a “max + min” pooling function, which unifies many pooling functions. We describe how to use this pooling in the Latent Structured SVM framework as well as in convolutional networks. To solve the optimization problems, we present several solvers, some of which allow to optimize a ranking metric such as Average Precision. We experimentally show the interest of our models with respect to state-of-the-art methods, on ten standard image classification datasets, including the large-scale dataset ImageNet
Banville, Hubert. "Enabling real-world EEG applications with deep learning". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG005.
Texto completoOur understanding of the brain has improved considerably in the last decades, thanks to groundbreaking advances in the field of neuroimaging. Now, with the invention and wider availability of personal wearable neuroimaging devices, such as low-cost mobile EEG, we have entered an era in which neuroimaging is no longer constrained to traditional research labs or clinics. "Real-world'' EEG comes with its own set of challenges, though, ranging from a scarcity of labelled data to unpredictable signal quality and limited spatial resolution. In this thesis, we draw on the field of deep learning to help transform this century-old brain imaging modality from a purely clinical- and research-focused tool, to a practical technology that can benefit individuals in their day-to-day life. First, we study how unlabelled EEG data can be utilized to gain insights and improve performance on common clinical learning tasks using self-supervised learning. We present three such self-supervised approaches that rely on the temporal structure of the data itself, rather than onerously collected labels, to learn clinically-relevant representations. Through experiments on large-scale datasets of sleep and neurological screening recordings, we demonstrate the significance of the learned representations, and show how unlabelled data can help boost performance in a semi-supervised scenario. Next, we explore ways to ensure neural networks are robust to the strong sources of noise often found in out-of-the-lab EEG recordings. Specifically, we present Dynamic Spatial Filtering, an attention mechanism module that allows a network to dynamically focus its processing on the most informative EEG channels while de-emphasizing any corrupted ones. Experiments on large-scale datasets and real-world data demonstrate that, on sparse EEG, the proposed attention block handles strong corruption better than an automated noise handling approach, and that the predicted attention maps can be interpreted to inspect the functioning of the neural network. Finally, we investigate how weak labels can be used to develop a biomarker of neurophysiological health from real-world EEG. We translate the brain age framework, originally developed using lab and clinic-based magnetic resonance imaging, to real-world EEG data. Using recordings from more than a thousand individuals performing a focused attention exercise or sleeping overnight, we show not only that age can be predicted from wearable EEG, but also that age predictions encode information contained in well-known brain health biomarkers, but not in chronological age. Overall, this thesis brings us a step closer to harnessing EEG for neurophysiological monitoring outside of traditional research and clinical contexts, and opens the door to new and more flexible applications of this technology
Lucas, Thomas. "Modèles génératifs profonds : sur-généralisation et abandon de mode". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM049.
Texto completoThis dissertation explores the topic of generative modelling of natural images,which is the task of fitting a data generating distribution.Such models can be used to generate artificial data resembling the true data, or to compress images.Latent variable models, which are at the core of our contributions, seek to capture the main factors of variations of an image into a variable that can be manipulated.In particular we build on two successful latent variable generative models, the generative adversarial network (GAN) and Variational autoencoder (VAE) models.Recently GANs significantly improved the quality of images generated by deep models, obtaining very compelling samples.Unfortunately these models struggle to capture all the modes of the original distribution, ie they do not cover the full variability of the dataset.Conversely, likelihood based models such as VAEs typically cover the full variety of the data well and provide an objective measure of coverage.However these models produce samples of inferior visual quality that are more easily distinguished from real ones.The work presented in this thesis strives for the best of both worlds: to obtain compelling samples while modelling the full support of the distribution.To achieve that, we focus on i) the optimisation problems used and ii) practical model limitations that hinder performance.The first contribution of this manuscript is a deep generative model that encodes global image structure into latent variables, built on the VAE, and autoregressively models low level detail.We propose a training procedure relying on an auxiliary loss function to control what information is captured by the latent variables and what information is left to an autoregressive decoder.Unlike previous approaches to such hybrid models, ours does not need to restrict the capacity of the autoregressive decoder to prevent degenerate models that ignore the latent variables.The second contribution builds on the standard GAN model, which trains a discriminator network to provide feedback to a generative network.The discriminator usually assesses the quality of individual samples, which makes it hard to evaluate the variability of the data.Instead we propose to feed the discriminator with emph{batches} that mix both true and fake samples, and train it to predict the ratio of true samples in the batch.These batches work as approximations of the distribution of generated images and allows the discriminator to approximate distributional statistics.We introduce an architecture that is well suited to solve this problem efficiently,and show experimentally that our approach reduces mode collapse in GANs on two synthetic datasets, and obtains good results on the CIFAR10 and CelebA datasets.The mutual shortcomings of VAEs and GANs can in principle be addressed by training hybrid models that use both types of objective.In our third contribution, we show that usual parametric assumptions made in VAEs induce a conflict between them, leading to lackluster performance of hybrid models.We propose a solution based on deep invertible transformations, that trains a feature space in which usual assumptions can be made without harm.Our approach provides likelihood computations in image space while being able to take advantage of adversarial training.It obtains GAN-like samples that are competitive with fully adversarial models while improving likelihood scores over existing hybrid models at the time of publication, which is a significant advancement
Abou, Bakr Nachwa. "Reconnaissance et modélisation des actions de manipulation". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM010.
Texto completoThis thesis addresses the problem of recognition, modelling and description of human activities. We describe results on three problems: (1) the use of transfer learning for simultaneous visual recognition of objects and object states, (2) the recognition of manipulation actions from state transitions, and (3) the interpretation of a series of actions and states as events in a predefined story to construct a narrative description.These results have been developed using food preparation activities as an experimental domain. We start by recognising food classes such as tomatoes and lettuce and food states, such as sliced and diced, during meal preparation. We adapt the VGG network architecture to jointly learn the representations of food items and food states using transfer learning. We model actions as the transformation of object states. We use recognised object properties (state and type) to detect corresponding manipulation actions by tracking object transformations in the video. Experimental performance evaluation for this approach is provided using the 50 salads and EPIC-Kitchen datasets. We use the resulting action descriptions to construct narrative descriptions for complex activities observed in videos of 50 salads dataset
Durand, Thibaut. "Weakly supervised learning for visual recognition". Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066142/document.
Texto completoThis thesis studies the problem of classification of images, where the goal is to predict if a semantic category is present in the image, based on its visual content. To analyze complex scenes, it is important to learn localized representations. To limit the cost of annotation during training, we have focused on weakly supervised learning approaches. In this thesis, we propose several models that simultaneously classify and localize objects, using only global labels during training. The weak supervision significantly reduces the cost of full annotation, but it makes learning more challenging. The key issue is how to aggregate local scores - e.g. regions - into global score - e.g. image. The main contribution of this thesis is the design of new pooling functions for weakly supervised learning. In particular, we propose a “max + min” pooling function, which unifies many pooling functions. We describe how to use this pooling in the Latent Structured SVM framework as well as in convolutional networks. To solve the optimization problems, we present several solvers, some of which allow to optimize a ranking metric such as Average Precision. We experimentally show the interest of our models with respect to state-of-the-art methods, on ten standard image classification datasets, including the large-scale dataset ImageNet
Chareyre, Maxime. "Apprentissage non-supervisé pour la découverte de propriétés d'objets par découplage entre interaction et interprétation". Electronic Thesis or Diss., Université Clermont Auvergne (2021-...), 2023. http://www.theses.fr/2023UCFA0122.
Texto completoRobots are increasingly used to achieve tasks in controlled environments. However, their use in open environments is still fraught with difficulties. Robotic agents are likely to encounter objects whose behaviour and function they are unaware of. In some cases, it must interact with these elements to carry out its mission by collecting or moving them, but without knowledge of their dynamic properties it is not possible to implement an effective strategy for resolving the mission.In this thesis, we present a method for teaching an autonomous robot a physical interaction strategy with unknown objects, without any a priori knowledge, the aim being to extract information about as many of the object's physical properties as possible from the interactions observed by its sensors. Existing methods for characterising objects through physical interactions do not fully satisfy these criteria. Indeed, the interactions established only provide an implicit representation of the object's dynamics, requiring supervision to identify their properties. Furthermore, the proposed solution is based on unrealistic scenarios without an agent. Our approach differs from the state of the art by proposing a generic method for learning interaction that is independent of the object and its properties, and can therefore be decoupled from the prediction phase. In particular, this leads to a completely unsupervised global pipeline.In the first phase, we propose to learn an interaction strategy with the object via an unsupervised reinforcement learning method, using an intrinsic motivation signal based on the idea of maximising variations in a state vector of the object. The aim is to obtain a set of interactions containing information that is highly correlated with the object's physical properties. This method has been tested on a simulated robot interacting by pushing and has enabled properties such as the object's mass, shape and friction to be accurately identified.In a second phase, we make the assumption that the true physical properties define a latent space that explains the object's behaviours and that this space can be identified from observations collected through the agent's interactions. We set up a self-supervised prediction task in which we adapt a state-of-the-art architecture to create this latent space. Our simulations confirm that combining the behavioural model with this architecture leads to the emergence of a representation of the object's properties whose principal components are shown to be strongly correlated with the object's physical properties.Once the properties of the objects have been extracted, the agent can use them to improve its efficiency in tasks involving these objects. We conclude this study by highlighting the performance gains achieved by the agent through training via reinforcement learning on a simplified object repositioning task where the properties are perfectly known.All the work carried out in simulation confirms the effectiveness of an innovative method aimed at autonomously discovering the physical properties of an object through the physical interactions of a robot. The prospects for extending this work involve transferring it to a real robot in a cluttered environment
Chandra, Siddhartha. "Apprentissage Profond pour des Prédictions Structurées Efficaces appliqué à la Classification Dense en Vision par Ordinateur". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLC033/document.
Texto completoIn this thesis we propose a structured prediction technique that combines the virtues of Gaussian Conditional Random Fields (G-CRFs) with Convolutional Neural Networks (CNNs). The starting point of this thesis is the observation that while being of a limited form GCRFs allow us to perform exact Maximum-APosteriori (MAP) inference efficiently. We prefer exactness and simplicity over generality and advocate G-CRF based structured prediction in deep learning pipelines. Our proposed structured prediction methods accomodate (i) exact inference, (ii) both shortand long- term pairwise interactions, (iii) rich CNN-based expressions for the pairwise terms, and (iv) end-to-end training alongside CNNs. We devise novel implementation strategies which allow us to overcome memory and computational challenges
Leclerc, Sarah Marie-Solveig. "Automatisation de la segmentation sémantique de structures cardiaques en imagerie ultrasonore par apprentissage supervisé". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI121.
Texto completoThe analysis of medical images plays a critical role in cardiology. Ultrasound imaging, as a real-time, low cost and bed side applicable modality, is nowadays the most commonly used image modality to monitor patient status and perform clinical cardiac diagnosis. However, the semantic segmentation (i.e the accurate delineation and identification) of heart structures is a difficult task due to the low quality of ultrasound images, characterized in particular by the lack of clear boundaries. To compensate for missing information, the best performing methods before this thesis relied on the integration of prior information on cardiac shape or motion, which in turns reduced the adaptability of the corresponding methods. Furthermore, such approaches require man- ual identifications of key points to be adapted to a given image, which makes the full process difficult to reproduce. In this thesis, we propose several original fully-automatic algorithms for the semantic segmentation of echocardiographic images based on supervised learning ap- proaches, where the resolution of the problem is automatically set up using data previously analyzed by trained cardiologists. From the design of a dedicated dataset and evaluation platform, we prove in this project the clinical applicability of fully-automatic supervised learning methods, in particular deep learning methods, as well as the possibility to improve the robustness by incorporating in the full process the prior automatic detection of regions of interest
Gal, Viviane. "Vers une nouvelle Interaction Homme Environnement dans les jeux vidéo et pervasifs : rétroaction biologique et états émotionnels : apprentissage profond non supervisé au service de l'affectique". Electronic Thesis or Diss., Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1269.
Texto completoLiving exceptional moments, experiencing thrills, well-being, blooming, are often part of our dreams or aspirations. We choose various ways to get there like games. Whether the player is looking for originality, challenges, discovery, a story, or other goals, emotional states are the purpose of his quest. He remains until the game gives him pleasure, sensations. How bring them there? We are developing a new human environment interaction that takes into account and adapts to emotions. We address video or pervasive games or other applications. Through this goal, players should not be bothered by interfaces, or biosensors invasivness. This work raises two questions:- Can we discover emotional states based on physiological measurements from contact biosensors?- If so, can these sensors be replaced by remote, non-invasive devices and produce the same results?The models we have developed propose solutions based on unsupervised machine learning methods. We also present remote measurements technics and explain our future works in a new field we call affectics
De, La Bourdonnaye François. "Learning sensori-motor mappings using little knowledge : application to manipulation robotics". Thesis, Université Clermont Auvergne (2017-2020), 2018. http://www.theses.fr/2018CLFAC037/document.
Texto completoThe thesis is focused on learning a complex manipulation robotics task using little knowledge. More precisely, the concerned task consists in reaching an object with a serial arm and the objective is to learn it without camera calibration parameters, forward kinematics, handcrafted features, or expert demonstrations. Deep reinforcement learning algorithms suit well to this objective. Indeed, reinforcement learning allows to learn sensori-motor mappings while dispensing with dynamics. Besides, deep learning allows to dispense with handcrafted features for the state spacerepresentation. However, it is difficult to specify the objectives of the learned task without requiring human supervision. Some solutions imply expert demonstrations or shaping rewards to guiderobots towards its objective. The latter is generally computed using forward kinematics and handcrafted visual modules. Another class of solutions consists in decomposing the complex task. Learning from easy missions can be used, but this requires the knowledge of a goal state. Decomposing the whole complex into simpler sub tasks can also be utilized (hierarchical learning) but does notnecessarily imply a lack of human supervision. Alternate approaches which use several agents in parallel to increase the probability of success can be used but are costly. In our approach,we decompose the whole reaching task into three simpler sub tasks while taking inspiration from the human behavior. Indeed, humans first look at an object before reaching it. The first learned task is an object fixation task which is aimed at localizing the object in the 3D space. This is learned using deep reinforcement learning and a weakly supervised reward function. The second task consists in learning jointly end-effector binocular fixations and a hand-eye coordination function. This is also learned using a similar set-up and is aimed at localizing the end-effector in the 3D space. The third task uses the two prior learned skills to learn to reach an object and uses the same requirements as the two prior tasks: it hardly requires supervision. In addition, without using additional priors, an object reachability predictor is learned in parallel. The main contribution of this thesis is the learning of a complex robotic task with weak supervision
Monnier, Tom. "Unsupervised image analysis by synthesis". Electronic Thesis or Diss., Marne-la-vallée, ENPC, 2023. http://www.theses.fr/2023ENPC0037.
Texto completoThe goal of this thesis is to develop machine learning approaches to analyze collections of images without annotations. Advances in this area hold particular promises for high-impact 3D-related applications (e.g., reconstructing a real-world scene with 3D actionable components for animation movies or video games) where annotating examples to teach the machines is difficult, as well as more micro applications related to specific needs (e.g., analyzing the character evolution from 12th century documents) where spending significant effort on annotating large-scale database is debatable. The central idea of this dissertation is to build machines that learn to analyze an image collection by synthesizing the images in the collection. Learning analysis models by synthesis is difficult because it requires the design of a learnable image generation system that explicitly exhibits the desired analysis output. To achieve our goal, we present three key contributions.The first contribution of this thesis is a new conceptual approach to category modeling. We propose to represent the category of an image, a 2D object or a 3D shape, with a prototype that is transformed using deep learning to model the different instances within the category. Specifically, we design meaningful parametric transformations (e.g., geometric deformations or colorimetric variations) and use neural networks to predict the transformation parameters necessary to instantiate the prototype for a given image. We demonstrate the effectiveness of this idea to cluster images and reconstruct 3D objects from single-view images. We obtain performances on par with the best state-of-the-art methods which leverage handcrafted features or annotations.The second contribution is a new way to discover elements in a collection of images. We propose to represent an image collection by a set of learnable elements composed together to synthesize the images and optimized by gradient descent. We first demonstrate the effectiveness of this idea by discovering 2D elements related to semantic objects represented by a large image collection. Our approach have performances similar to the best concurrent methods which synthesize images with neural networks, and ours comes with better interpretability. We also showcase the capability of this idea by discovering 3D elements related to simple primitive shapes given as input a collection of images depicting a scene from multiple viewpoints. Compared to prior works finding primitives in 3D point clouds, we showcase much better qualitative and quantitative performances.The third contribution is more technical and consist in a new formulation to compute differentiable mesh rendering. Specifically, we formulate the differentiable rendering of a 3D mesh as the alpha compositing of the mesh faces in an increasing depth order. Compared to prior works, this formulation is key to enable us to learn 3D meshes without requiring object region annotations. In addition, it allows us to seamlessly introduce the possibility to learn transparent meshes, which we design to model a scene as a composition of a variable number of meshes
Rocco, Ignacio. "Neural architectures for estimating correspondences between images". Electronic Thesis or Diss., Université Paris sciences et lettres, 2020. http://www.theses.fr/2020UPSLE060.
Texto completoThe goal of this thesis is to develop methods for establishing correspondences between pairs of images in challenging situations, such as extreme illumination changes, scenes with little texture or with repetitive structures, and matching parts of objects which belong to the same class, but which may have large intra-class appearance differences. In summary, our contributions are the following: (i) we develop a trainable approach for parametric image alignment by means of a siamese network model, (ii) we devise a weakly-supervised training approach, which allow training from real image pairs having only annotation at the level of image-pairs, (iii) we propose the Neighbourhood Consensus Networks which can be used to robustly estimate correspondences in tasks where discrete correspondences are required, and (iv) because the dense formulation of the Neighbourhood Consensus Networks is memory and computationally intensive, we develop a more efficient variant that can reduce the memory requirements and run-time by more than ten times
Mehr, Éloi. "Unsupervised Learning of 3D Shape Spaces for 3D Modeling". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS566.
Texto completoEven though 3D data is becoming increasingly more popular, especially with the democratization of virtual and augmented experiences, it remains very difficult to manipulate a 3D shape, even for designers or experts. Given a database containing 3D instances of one or several categories of objects, we want to learn the manifold of plausible shapes in order to develop new intelligent 3D modeling and editing tools. However, this manifold is often much more complex compared to the 2D domain. Indeed, 3D surfaces can be represented using various embeddings, and may also exhibit different alignments and topologies. In this thesis we study the manifold of plausible shapes in the light of the aforementioned challenges, by deepening three different points of view. First of all, we consider the manifold as a quotient space, in order to learn the shapes’ intrinsic geometry from a dataset where the 3D models are not co-aligned. Then, we assume that the manifold is disconnected, which leads to a new deep learning model that is able to automatically cluster and learn the shapes according to their typology. Finally, we study the conversion of an unstructured 3D input to an exact geometry, represented as a structured tree of continuous solid primitives
Mordan, Taylor. "Conception d'architectures profondes pour l'interprétation de données visuelles". Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS270.
Texto completoNowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for labeled data to learn from. Since precise annotations are time-consuming to produce, bigger datasets can be built with partial labels. We design global pooling functions to work with them and to recover latent information in two cases: learning spatially localized and part-based representations from image- and object-level supervisions respectively. We address the issue of efficiency in end-to-end learning of these representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially in the data-deficient regime. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework
Lerousseau, Marvin. "Weakly Supervised Segmentation and Context-Aware Classification in Computational Pathology". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG015.
Texto completoAnatomic pathology is the medical discipline responsible for the diagnosis and characterization of diseases through the macroscopic, microscopic, molecular and immunologic inspection of tissues. Modern technologies have made possible the digitization of tissue glass slides into whole slide images, which can themselves be processed by artificial intelligence to enhance the capabilities of pathologists. This thesis presented several novel and powerful approaches that tackle pan-cancer segmentation and classification of whole slide images. Learning segmentation models for whole slide images is challenged by an annotation bottleneck which arises from (i) a shortage of pathologists, (ii) an intense cumbersomeness and boring annotation process, and (iii) major inter-annotators discrepancy. My first line of work tackled pan-cancer tumor segmentation by designing two novel state-of-the-art weakly supervised approaches that exploit slide-level annotations that are fast and easy to obtain. In particular, my second segmentation contribution was a generic and highly powerful algorithm that leverages percentage annotations on a slide basis, without needing any pixelbased annotation. Extensive large-scale experiments showed the superiority of my approaches over weakly supervised and supervised methods for pan-cancer tumor segmentation on a dataset of more than 15,000 unfiltered and extremely challenging whole slide images from snap-frozen tissues. My results indicated the robustness of my approaches to noise and systemic biases in annotations. Digital slides are difficult to classify due to their colossal sizes, which range from millions of pixels to billions of pixels, often weighing more than 500 megabytes. The straightforward use of traditional computer vision is therefore not possible, prompting the use of multiple instance learning, a machine learning paradigm consisting in assimilating a whole slide image as a set of patches uniformly sampled from it. Up to my works, the greater majority of multiple instance learning approaches considered patches as independently and identically sampled, i.e. discarded the spatial relationship of patches extracted from a whole slide image. Some approaches exploited such spatial interconnection by leveraging graph-based models, although the true domain of whole slide images is specifically the image domain which is more suited with convolutional neural networks. I designed a highly powerful and modular multiple instance learning framework that leverages the spatial relationship of patches extracted from a whole slide image by building a sparse map from the patches embeddings, which is then further processed into a whole slide image embedding by a sparse-input convolutional neural network, before being classified by a generic classifier model. My framework essentially bridges the gap between multiple instance learning, and fully convolutional classification. I performed extensive experiments on three whole slide image classification tasks, including the golden task of cancer pathologist of subtyping tumors, on a dataset of more than 20,000 whole slide images from public data. Results highlighted the superiority of my approach over all other widespread multiple instance learning methods. Furthermore, while my experiments only investigated my approach with sparse-input convolutional neural networks with two convolutional layers, the results showed that my framework works better as the number of parameters increases, suggesting that more sophisticated convolutional neural networks can easily obtain superior results
Tamaazousti, Youssef. "Vers l’universalité des représentations visuelle et multimodales". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLC038/document.
Texto completoBecause of its key societal, economic and cultural stakes, Artificial Intelligence (AI) is a hot topic. One of its main goal, is to develop systems that facilitates the daily life of humans, with applications such as household robots, industrial robots, autonomous vehicle and much more. The rise of AI is highly due to the emergence of tools based on deep neural-networks which make it possible to simultaneously learn, the representation of the data (which were traditionally hand-crafted), and the task to solve (traditionally learned with statistical models). This resulted from the conjunction of theoretical advances, the growing computational capacity as well as the availability of many annotated data. A long standing goal of AI is to design machines inspired humans, capable of perceiving the world, interacting with humans, in an evolutionary way. We categorize, in this Thesis, the works around AI, in the two following learning-approaches: (i) Specialization: learn representations from few specific tasks with the goal to be able to carry out very specific tasks (specialized in a certain field) with a very good level of performance; (ii) Universality: learn representations from several general tasks with the goal to perform as many tasks as possible in different contexts. While specialization was extensively explored by the deep-learning community, only a few implicit attempts were made towards universality. Thus, the goal of this Thesis is to explicitly address the problem of improving universality with deep-learning methods, for image and text data. We have addressed this topic of universality in two different forms: through the implementation of methods to improve universality (“universalizing methods”); and through the establishment of a protocol to quantify its universality. Concerning universalizing methods, we proposed three technical contributions: (i) in a context of large semantic representations, we proposed a method to reduce redundancy between the detectors through, an adaptive thresholding and the relations between concepts; (ii) in the context of neural-network representations, we proposed an approach that increases the number of detectors without increasing the amount of annotated data; (iii) in a context of multimodal representations, we proposed a method to preserve the semantics of unimodal representations in multimodal ones. Regarding the quantification of universality, we proposed to evaluate universalizing methods in a Transferlearning scheme. Indeed, this technical scheme is relevant to assess the universal ability of representations. This also led us to propose a new framework as well as new quantitative evaluation criteria for universalizing methods
Loiseau, Romain. "Real-World 3D Data Analysis : Toward Efficiency and Interpretability". Electronic Thesis or Diss., Marne-la-vallée, ENPC, 2023. http://www.theses.fr/2023ENPC0028.
Texto completoThis thesis explores new deep-learning approaches for modeling and analyzing real-world 3D data. 3D data processing is helpful for numerous high-impact applications such as autonomous driving, territory management, industry facilities monitoring, forest inventory, and biomass measurement. However, annotating and analyzing 3D data can be demanding. Specifically, matching constraints regarding computing resources or annotation efficiency is often challenging. The difficulty of interpreting and understanding the inner workings of deep learning models can also limit their adoption.The computer vision community has made significant efforts to design methods to analyze 3D data, to perform tasks such as shape classification, scene segmentation, and scene decomposition. Early automated analysis relied on hand-crafted descriptors and incorporated prior knowledge about real-world acquisitions. Modern deep learning techniques demonstrate the best performances but are often computationally expensive, rely on large annotated datasets, and have low interpretability. In this thesis, we propose contributions that address these limitations.The first contribution of this thesis is an efficient deep-learning architecture for analyzing LiDAR sequences in real time. Our approach explicitly considers the acquisition geometry of rotating LiDAR sensors, which many autonomous driving perception pipelines use. Compared to previous work, which considers complete LiDAR rotations individually, our model processes the acquisition in smaller increments. Our proposed architecture achieves accuracy on par with the best methods while reducing processing time by more than five times and model size by more than fifty times.The second contribution is a deep learning method to summarize extensive 3D shape collections with a small set of 3D template shapes. We learn end-to-end a small number of 3D prototypical shapes that are aligned and deformed to reconstruct input point clouds. The main advantage of our approach is that its representations are in the 3D space and can be viewed and manipulated. They constitute a compact and interpretable representation of 3D shape collections and facilitate annotation, leading to emph{state-of-the-art} results for few-shot semantic segmentation.The third contribution further expands unsupervised analysis for parsing large real-world 3D scans into interpretable parts. We introduce a probabilistic reconstruction model to decompose an input 3D point cloud using a small set of learned prototypical shapes. Our network determines the number of prototypes to use to reconstruct each scene. We outperform emph{state-of-the-art} unsupervised methods in terms of decomposition accuracy while remaining visually interpretable. We offer significant advantages over existing approaches as our model does not require manual annotations.This thesis also introduces two open-access annotated real-world datasets, HelixNet and the Earth Parser Dataset, acquired with terrestrial and aerial LiDARs, respectively. HelixNet is the largest LiDAR autonomous driving dataset with dense annotations and provides point-level sensor metadata crucial for precisely measuring the latency of semantic segmentation methods. The Earth Parser Dataset consists of seven aerial LiDAR scenes, which can be used to evaluate 3D processing techniques' performances in diverse environments.We hope that these datasets and reliable methods considering the specificities of real-world acquisitions will encourage further research toward more efficient and interpretable models
Simon, Etienne. "Deep Learning for Unsupervised Relation Extraction". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS198.
Texto completoCapturing concepts' interrelations is a fundamental of natural language understanding. It constitutes a bridge between two historically separate approaches of artificial intelligence: the use of symbolic and distributed representations. However, tackling this problem without human supervision poses several issues, and unsupervised models have difficulties echoing the expressive breakthroughs of supervised ones. This thesis addresses two supervision gaps we identified: the problem of regularization of sentence-level discriminative models and the problem of leveraging relational information from dataset-level structures. The first gap arises following the increased use of discriminative approaches, such as deep neural network classifiers, in the supervised setting. These models tend to collapse without supervision. To overcome this limitation, we introduce two relation distribution losses to constrain the relation classifier into a trainable state. The second gap arises from the development of dataset-level (aggregate) approaches. We show that unsupervised models can leverage a large amount of additional information from the structure of the dataset, even more so than supervised models. We close this gap by adapting existing unsupervised methods to capture topological information using graph convolutional networks. Furthermore, we show that we can exploit the mutual information between topological (dataset-level) and linguistic (sentence-level) information to design a new training paradigm for unsupervised relation extraction
Desir, Chesner. "Classification Automatique d'Images, Application à l'Imagerie du Poumon Profond". Phd thesis, Université de Rouen, 2013. http://tel.archives-ouvertes.fr/tel-00879356.
Texto completoDesir, Chesner. "Classification automatique d'images, application à l'imagerie du poumon profond". Phd thesis, Rouen, 2013. http://www.theses.fr/2013ROUES053.
Texto completoThis thesis deals with automated image classification, applied to images acquired with alveoscopy, a new imaging technique of the distal lung. The aim is to propose and develop a computer aided-diagnosis system, so as to help the clinician analyze these images never seen before. Our contributions lie in the development of effective, robust and generic methods to classify images of healthy and pathological patients. Our first classification system is based on a rich and local characterization of the images, an ensemble of random trees approach for classification and a rejection mechanism, providing the medical expert with tools to enhance the reliability of the system. Due to the complexity of alveoscopy images and to the lack of expertize on the pathological cases (unlike healthy cases), we adopt the one-class learning paradigm which allows to learn a classifier from healthy data only. We propose a one-class approach taking advantage of combining and randomization mechanisms of ensemble methods to respond to common issues such as the curse of dimensionality. Our method is shown to be effective, robust to the dimension, competitive and even better than state-of-the-art methods on various public datasets. It has proved to be particularly relevant to our medical problem
Debard, Quentin. "Automatic learning of next generation human-computer interactions". Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI036.
Texto completoArtificial Intelligence (AI) and Human-Computer Interactions (HCIs) are two research fields with relatively few common work. HCI specialists usually design the way we interact with devices directly from observations and measures of human feedback, manually optimizing the user interface to better fit users’ expectations. This process is hard to optimize: ergonomy, intuitivity and ease of use are key features in a User Interface (UI) that are too complex to be simply modelled from interaction data. This drastically restrains the possible uses of Machine Learning (ML) in this design process. Currently, ML in HCI is mostly applied to gesture recognition and automatic display, e.g. advertisement or item suggestion. It is also used to fine tune an existing UI to better optimize it, but as of now it does not participate in designing new ways to interact with computers. Our main focus in this thesis is to use ML to develop new design strategies for overall better UIs. We want to use ML to build intelligent – understand precise, intuitive and adaptive – user interfaces using minimal handcrafting. We propose a novel approach to UI design: instead of letting the user adapt to the interface, we want the interface and the user to adapt mutually to each other. The goal is to reduce human bias in protocol definition while building co-adaptive interfaces able to further fit individual preferences. In order to do so, we will put to use the different mechanisms available in ML to automatically learn behaviors, build representations and take decisions. We will be experimenting on touch interfaces, as these interfaces are vastly used and can provide easily interpretable problems. The very first part of our work will focus on processing touch data and use supervised learning to build accurate classifiers of touch gestures. The second part will detail how Reinforcement Learning (RL) can be used to model and learn interaction protocols given user actions. Lastly, we will combine these RL models with unsupervised learning to build a setup allowing for the design of new interaction protocols without the need for real user data
Oquab, Maxime. "Convolutional neural networks : towards less supervision for visual recognition". Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE061.
Texto completoConvolutional Neural Networks are flexible learning algorithms for computer vision that scale particularly well with the amount of data that is provided for training them. Although these methods had successful applications already in the ’90s, they were not used in visual recognition pipelines because of their lesser performance on realistic natural images. It is only after the amount of data and the computational power both reached a critical point that these algorithms revealed their potential during the ImageNet challenge of 2012, leading to a paradigm shift in visual recogntion. The first contribution of this thesis is a transfer learning setup with a Convolutional Neural Network for image classification. Using a pre-training procedure, we show that image representations learned in a network generalize to other recognition tasks, and their performance scales up with the amount of data used in pre-training. The second contribution of this thesis is a weakly supervised setup for image classification that can predict the location of objects in complex cluttered scenes, based on a dataset indicating only with the presence or absence of objects in training images. The third contribution of this thesis aims at finding possible paths for progress in unsupervised learning with neural networks. We study the recent trend of Generative Adversarial Networks and propose two-sample tests for evaluating models. We investigate possible links with concepts related to causality, and propose a two-sample test method for the task of causal discovery. Finally, building on a recent connection with optimal transport, we investigate what these generative algorithms are learning from unlabeled data
Chafaa, Irched. "Machine learning for beam alignment in mmWave networks". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG044.
Texto completoTo cope with the ever increasing mobile data traffic, an envisioned solution for future wireless networks is to exploit the large available spectrum in the millimeter wave (mmWave) band. However, communicating at these high frequencies is very challenging as the transmitted signal suffers from strong attenuation, which leads to a limited propagation range and few multipath components (sparse mmWave channels). Hence, highly-directional beams have to be employed to focus the signal energy towards the intended user and compensate all those losses. Such beams need to be steered appropriately to guarantee a reliable communication link. This represents the so called beam alignment problem where the beams of the transmitter and the receiver need to be constantly aligned. Moreover, beam alignment policies need to support devices mobility and the unpredicted dynamics of the network, which result in significant signaling and training overhead affecting the overall performance. In the first part of the thesis, we formulate the beam alignment problem via the adversarial multi-armed bandit framework, which copes with arbitrary network dynamics including non-stationary or adversarial components. We propose online and adaptive beam alignment policies relying only on one-bit feedback to steer the beams of both nodes of the communication link in a distributed manner. Building on the well-known exponential weights algorithm (EXP3) and by exploiting the sparse nature of mmWave channels, we propose a modified policy (MEXP3), which comes with optimal theoretical guarantees in terms of asymptotic regret. Moreover, for finite horizons, our regret upper-bound is tighter than that of the original EXP3 suggesting better performance in practice. We then introduce an additional modification that accounts for the temporal correlation between successive beams and propose another beam alignment policy (NBT-MEXP3). In the second part of the thesis, deep learning tools are investigated to select mmWave beams in an access point -- user link. We leverage unsupervised deep learning to exploit the channel knowledge at sub-6 GHz and predict beamforming vectors in the mmWave band; this complex channel-beam mapping is learned via data issued from the DeepMIMO dataset and lacking the ground truth. We also show how to choose an optimal size of our neural network depending on the number of transmit and receive antennas at the access point. Furthermore, we investigate the impact of training data availability and introduce a federated learning (FL) approach to predict the beams of multiple links by sharing only the parameters of the locally trained neural networks (and not the local data). We investigate both synchronous and asynchronous FL methods. Our numerical simulations show the high potential of our approach, especially when the local available data is scarce or imperfect (noisy). At last, we compare our proposed deep learning methods with reinforcement learning methods derived in the first part. Simulations show that choosing an appropriate beam steering method depends on the target application and is a tradeoff between rate performance and computational complexity
Barreau, Baptiste. "Machine Learning for Financial Products Recommendation". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPAST010.
Texto completoAnticipating clients’ needs is crucial to any business — this is particularly true for corporate and institutional banks such as BNP Paribas Corporate and Institutional Banking due to their role in the financial markets. This thesis addresses the problem of future interests prediction in the financial context and focuses on the development of ad hoc algorithms designed for solving specific financial challenges.This manuscript is composed of five chapters:- Chapter 1 introduces the problem of future interests prediction in the financial world. The goal of this chapter is to provide the reader with all the keys necessary to understand the remainder of this thesis. These keys are divided into three parts: a presentation of the datasets we have at our disposal to solve the future interests prediction problem and their characteristics, an overview of the candidate algorithms to solve this problem, and the development of metrics to monitor the performance of these algorithms on our datasets. This chapter finishes with some of the challenges that we face when designing algorithms to solve the future interests problem in finance, challenges that will be partly addressed in the following chapters;- Chapter 2 proposes a benchmark of some of the algorithms introduced in Chapter 1 on a real-word dataset from BNP Paribas CIB, along with a development on the difficulties encountered for comparing different algorithmic approaches on a same dataset and on ways to tackle them. This benchmark puts in practice classic recommendation algorithms that were considered on a theoretical point of view in the preceding chapter, and provides further intuition on the analysis of the metrics introduced in Chapter 1;- Chapter 3 introduces a new algorithm, called Experts Network, that is designed to solve the problem of behavioral heterogeneity of investors on a given financial market using a custom-built neural network architecture inspired from mixture-of-experts research. In this chapter, the introduced methodology is experimented on three datasets: a synthetic dataset, an open-source one and a real-world dataset from BNP Paribas CIB. The chapter provides further insights into the development of the methodology and ways to extend it;- Chapter 4 also introduces a new algorithm, called History-augmented Collaborative Filtering, that proposes to augment classic matrix factorization approaches with the information of users and items’ interaction histories. This chapter provides further experiments on the dataset used in Chapter 2, and extends the presented methodology with various ideas. Notably, this chapter exposes an adaptation of the methodology to solve the cold-start problem and applies it to a new dataset;- Chapter 5 brings to light a collection of ideas and algorithms, successful or not, that were experimented during the development of this thesis. This chapter finishes on a new algorithm that blends the methodologies introduced in Chapters 3 and 4
Roger, Vincent. "Modélisation de l'indice de sévérité du trouble de la parole à l'aide de méthodes d'apprentissage profond : d'une modélisation à partir de quelques exemples à un apprentissage auto-supervisé via une mesure entropique". Thesis, Toulouse 3, 2022. http://www.theses.fr/2022TOU30180.
Texto completoPeople with head and neck cancers have speech difficulties after surgery or radiation therapy. It is important for health practitioners to have a measure that reflects the severity of speech. To produce this measure, a perceptual study is commonly performed with a group of five to six clinical experts. This process limits the use of this assessment in practice. Thus, the creation of an automatic measure, similar to the severity index, would allow a better follow-up of the patients by facilitating its obtaining. To realise such a measure, we relied on a reading task, classically performed. We used the recordings of the C2SI-RUGBI corpus, which includes more than 100 people. This corpus represents about one hour of recording to model the severity index. In this PhD work, a review of state-of-the-art methods on speech, emotion and speaker recognition using little data was undertaken. We then attempted to model severity using transfer learning and deep learning. Since the results were not usable, we turned to the so-called "few shot" techniques (learning from only a few examples). Thus, after promising first attempts at phoneme recognition, we obtained promising results for categorising the severity of patients. Nevertheless, the exploitation of these results for a medical application would require improvements. We therefore performed projections of the data from our corpus. As some score slices were separable using acoustic parameters, we proposed a new entropic measurement method. This one is based on self-supervised speech representations on the Librispeech corpus: the PASE+ model, which is inspired by the Inception Score (generally used in image processing to evaluate the quality of images generated by models). Our method allows us to produce a score similar to the severity index with a Spearman correlation of 0.87 on the reading task of the cancer corpus. The advantage of our approach is that it does not require data from the C2SI-RUGBI corpus for training. Thus, we can use the whole corpus for the evaluation of our system. The quality of our results has allowed us to consider a use in a clinical environment through an application on a tablet: tests are underway at the Larrey Hospital in Toulouse
Pasquet, Jérôme. "Modélisation, détection et classification d'objets urbains à partir d’images photographiques aériennes". Thesis, Montpellier, 2016. http://www.theses.fr/2016MONTT283/document.
Texto completoThis thesis deals with the problems of automatic localization and recognition of urban objects in high-definition aerial images. Urban object detection is a challenging problem because they vary in appearance, color and size. Moreover, there are many urban objects which can be very close to each other in an image. The localization and the automatic recognition of different urban objects, considering these characteristics, are very difficult to detect and classical image processing algorithms do not lead to good performances. We propose then to use the supervised learning approach. In a first time, we have built a Support Vector Machine (SVM) network to merge different resolutions in an efficient way. However, this method highly increases the computational cost. We then proposed to use an “activation path” which reduces the complexity without any loss of efficiency. This path activates sequentially the network and stops the exploration when an urban object has a high probability of detection. In the case of localizations based on a feature extraction step followed by a classification step, this may reduce by a factor 5 the computational cost. Thereafter, we show that we can combine an SVM network with feature maps which have been extracted by a Convolutional Neural Network. Such an architecture associated with the activation path increased the performance by 8% on our database while giving a theoretical reduction of the computational costs up to 97%. We implemented all these new methods in order to be integrated in the software framework of Berger-Levrault company, to improve land registry for local communities
Othmani-Guibourg, Mehdi. "Supervised learning for distribution of centralised multiagent patrolling strategies". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS534.
Texto completoFor nearly two decades, patrolling has received significant attention from the multiagent community. Multiagent patrolling (MAP) consists in modelling a patrol task to optimise as a multiagent system. The problem of optimising a patrol task is to distribute the most efficiently agents over the area to patrol in space and time, which constitutes a decision-making problem. A range of algorithms based on reactive, cognitive, reinforcement learning, centralised and decentralised strategies, amongst others, have been developed to make such a task ever more efficient. However, the existing patrolling-specific approaches based on supervised learning were still at preliminary stages, although a few works addressed this issue. Central to supervised learning, which is a set of methods and tools that allow inferring new knowledge, is the idea of learning a function mapping any input to an output from a sample of data composed of input-output pairs; learning, in this case, enables the system to generalise to new data never observed before. Until now, the best online MAP strategy, namely without precalculation, has turned out to be a centralised strategy with a coordinator. However, as for any centralised decision process in general, such a strategy is hardly scalable. The purpose of this work is then to develop and implement a new methodology aiming at turning any high-performance centralised strategy into a distributed strategy. Indeed, distributed strategies are by design resilient, more adaptive to changes in the environment, and scalable. In doing so, the centralised decision process, generally represented in MAP by a coordinator, is distributed into patrolling agents by means of supervised learning methods, so that each agent of the resultant distributed strategy tends to capture a part of the algorithm executed by the centralised decision process. The outcome is a new distributed decision-making algorithm based on machine learning. In this dissertation therefore, such a procedure of distribution of centralised strategy is established, then concretely implemented using some artificial neural networks architectures. By doing so, after having exposed the context and motivations of this work, we pose the problematic that led our study. The main multiagent strategies devised until now as part of MAP are then described, particularly a high-performance coordinated strategy, which is the centralised strategy studied in this work, as well as a simple decentralised strategy used as reference for decentralised strategies. Among others, some existing strategies based on supervised learning are also described. Thereafter, the model as well as certain of key concepts of MAP are defined. We also define the methodology laid down to address and study this problematic. This methodology comes in the form of a procedure that allows decentralising any centralised strategy by means of supervised learning. Then, the software ecosystem we developed for the needs of this work is also described, particularly PyTrol a discrete-time simulator dedicated to MAP developed with the aim of performing MAP simulation, to assess strategies and generate data, and MAPTrainer, a framework hinging on the PyTorch machine learning library, dedicated to research in machine learning in the context of MAP
Boniol, Paul. "Detection of anomalies and identification of their precursors in large data series collections". Electronic Thesis or Diss., Université Paris Cité, 2021. http://www.theses.fr/2021UNIP5206.
Texto completoExtensive collections of data series are becoming a reality in a large number of scientific and social domains. There is, therefore, a growing interest and need to elaborate efficient techniques to analyze and process these data, such as in finance, environmental sciences, astrophysics, neurosciences, engineering. Informally, a data series is an ordered sequence of points or values. Once these series are collected and available, users often need to query them. These queries can be simple, such as the selection of time interval, but also complex, such as the similarities search or the detection of anomalies, often synonymous with malfunctioning of the system under study, or sudden and unusual evolution likely undesired. This last type of analysis represents a crucial problem for applications in a wide range of domains, all sharing the same objective: to detect anomalies as soon as possible to avoid critical events. Therefore, in this thesis, we address the following three objectives: (i) retrospective unsupervised subsequence anomaly detection in data series. (ii) unsupervised detection of anomalies in data streams. (iii) classification explanation of known anomalies in data series in order to identify possible precursors. This manuscript first presents the industrial context that motivated this thesis, fundamental definitions, a taxonomy of data series, and state-of-the-art anomaly detection methods. We then present our contributions along the three axes mentioned above. First, we describe two original solutions, NormA (that aims to build a weighted set of subsequences that represent the different behaviors of the data series) and Series2Graph (that transform the data series in a directed graph), for the task of unsupervised detection of anomalous subsequences in static data series. Secondly, we present the SAND (inspired from NormA) method for unsupervised detection of anomalous subsequences in data streams. Thirdly, we address the problem of the supervised identification of precursors. We subdivide this task into two generic problems: the supervised classification of time series and the explanation of this classification’s results by identifying discriminative subsequences. Finally, we illustrate the applicability and interest of our developments through an application concerning the identification of undesirable vibration precursors occurring in water supply pumps in the French nuclear power plants of EDF
Douzon, Thibault. "Language models for document understanding". Electronic Thesis or Diss., Lyon, INSA, 2023. http://www.theses.fr/2023ISAL0075.
Texto completoEvery day, an uncountable amount of documents are received and processed by companies worldwide. In an effort to reduce the cost of processing each document, the largest companies have resorted to document automation technologies. In an ideal world, a document can be automatically processed without any human intervention: its content is read, and information is extracted and forwarded to the relevant service. The state-of-the-art techniques have quickly evolved in the last decades, from rule-based algorithms to statistical models. This thesis focuses on machine learning models for document information extraction. Recent advances in model architecture for natural language processing have shown the importance of the attention mechanism. Transformers have revolutionized the field by generalizing the use of attention and by pushing self-supervised pre-training to the next level. In the first part, we confirm that transformers with appropriate pre-training were able to perform document understanding tasks with high performance. We show that, when used as a token classifier for information extraction, transformers are able to exceptionally efficiently learn the task compared to recurrent networks. Transformers only need a small proportion of the training data to reach close to maximum performance. This highlights the importance of self-supervised pre-training for future fine-tuning. In the following part, we design specialized pre-training tasks, to better prepare the model for specific data distributions such as business documents. By acknowledging the specificities of business documents such as their table structure and their over-representation of numeric figures, we are able to target specific skills useful for the model in its future tasks. We show that those new tasks improve the model's downstream performances, even with small models. Using this pre-training approach, we are able to reach the performances of significantly bigger models without any additional cost during finetuning or inference. Finally, in the last part, we address one drawback of the transformer architecture which is its computational cost when used on long sequences. We show that efficient architectures derived from the classic transformer require fewer resources and perform better on long sequences. However, due to how they approximate the attention computation, efficient models suffer from a small but significant performance drop on short sequences compared to classical architectures. This incentivizes the use of different models depending on the input length and enables concatenating multimodal inputs into a single sequence
Cappuzzo, Riccardo. "Deep learning models for tabular data curation". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS047.
Texto completoData retention is a pervasive and far-reaching topic, affecting everything from academia to industry. Current solutions rely on manual work by domain users, but they are not adequate. We are investigating how to apply deep learning to tabular data curation. We focus our work on developing unsupervised data curation systems and designing curation systems that intrinsically model categorical values in their raw form. We first implement EmbDI to generate embeddings for tabular data, and address the tasks of entity resolution and schema matching. We then turn to the data imputation problem using graphical neural networks in a multi-task learning framework called GRIMP
Cherti, Mehdi. "Deep generative neural networks for novelty generation : a foundational framework, metrics and experiments". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS029/document.
Texto completoIn recent years, significant advances made in deep neural networks enabled the creation of groundbreaking technologies such as self-driving cars and voice-enabled personal assistants. Almost all successes of deep neural networks are about prediction, whereas the initial breakthroughs came from generative models. Today, although we have very powerful deep generative modeling techniques, these techniques are essentially being used for prediction or for generating known objects (i.e., good quality images of known classes): any generated object that is a priori unknown is considered as a failure mode (Salimans et al., 2016) or as spurious (Bengio et al., 2013b). In other words, when prediction seems to be the only possible objective, novelty is seen as an error that researchers have been trying hard to eliminate. This thesis defends the point of view that, instead of trying to eliminate these novelties, we should study them and the generative potential of deep nets to create useful novelty, especially given the economic and societal importance of creating new objects in contemporary societies. The thesis sets out to study novelty generation in relationship with data-driven knowledge models produced by deep generative neural networks. Our first key contribution is the clarification of the importance of representations and their impact on the kind of novelties that can be generated: a key consequence is that a creative agent might need to rerepresent known objects to access various kinds of novelty. We then demonstrate that traditional objective functions of statistical learning theory, such as maximum likelihood, are not necessarily the best theoretical framework for studying novelty generation. We propose several other alternatives at the conceptual level. A second key result is the confirmation that current models, with traditional objective functions, can indeed generate unknown objects. This also shows that even though objectives like maximum likelihood are designed to eliminate novelty, practical implementations do generate novelty. Through a series of experiments, we study the behavior of these models and the novelty they generate. In particular, we propose a new task setup and metrics for selecting good generative models. Finally, the thesis concludes with a series of experiments clarifying the characteristics of models that can exhibit novelty. Experiments show that sparsity, noise level, and restricting the capacity of the net eliminates novelty and that models that are better at recognizing novelty are also good at generating novelty
Audibert, Julien. "Unsupervised anomaly detection in time-series". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS358.
Texto completoAnomaly detection in multivariate time series is a major issue in many fields. The increasing complexity of systems and the explosion of the amount of data have made its automation indispensable. This thesis proposes an unsupervised method for anomaly detection in multivariate time series called USAD. However, deep neural network methods suffer from a limitation in their ability to extract features from the data since they only rely on local information. To improve the performance of these methods, this thesis presents a feature engineering strategy that introduces non-local information. Finally, this thesis proposes a comparison of sixteen time series anomaly detection methods to understand whether the explosion in complexity of neural network methods proposed in the current literature is really necessary
Jezequel, Loïc. "Vers une détection d'anomalie unifiée avec une application à la détection de fraude". Electronic Thesis or Diss., CY Cergy Paris Université, 2023. http://www.theses.fr/2023CYUN1190.
Texto completoDetecting observations straying apart from a baseline case is becoming increasingly critical in many applications. It is found in fraud detection, medical imaging, video surveillance or even in manufacturing defect detection with data ranging from images to sound. Deep anomaly detection was introduced to tackle this challenge by properly modeling the normal class, and considering anything significantly different as anomalous. Given the anomalous class is not well-defined, classical binary classification will not be suitable and lack robustness and reliability outside its training domain. Nevertheless, the best-performing anomaly detection approaches still lack generalization to different types of anomalies. Indeed, each method is either specialized on high-scale object anomalies or low-scale local anomalies.In this context, we first introduce a more generic one-class pretext-task anomaly detector. The model, named OC-MQ, computes an anomaly score by learning to solve a complex pretext task on the normal class. The pretext task is composed of several sub-tasks allowing it to capture a wide variety of visual cues. More specifically, our model is made of two branches each representing discriminative and generative tasks.Nevertheless, an additional anomalous dataset is in reality often available in many applications and can provide harder edge-case anomalous examples. In this light, we explore two approaches for outlier-exposure. First, we generalize the concept of pretext task to outlier-exposure by dynamically learning the pretext task itself with normal and anomalous samples. We propose two the models SadTPS and SadRest that respectively learn a discriminative pretext task of thin plate transform recognition and generative task of image restoration. In addition, we present a new anomaly-distance model SadCLR, where the training of previously unreliable anomaly-distance models is stabilized by adding contrastive regularization on the representation direction. We further enrich existing anomalies by generating several types of pseudo-anomalies.Finally, we extend the two previous approaches to be usable in both one-class and outlier-exposure setting. Firstly, we introduce the AnoMem model which memorizes a set of multi-scale normal prototypes by using modern Hopfield layers. Anomaly distance estimators are then fitted on the deviations between the input and normal prototypes in a one-class or outlier-exposure manner. Secondly, we generalize learnable pretext tasks to be learned only using normal samples. Our proposed model HEAT adversarially learns the pretext task to be just challenging enough to keep good performance on normal samples, while failing on anomalies. Besides, we choose the recently proposed Busemann distance in the hyperbolic Poincaré ball model to compute the anomaly score.Extensive testing was conducted for each proposed method, varying from coarse and subtle style anomalies to a fraud detection dataset of face presentation attacks with local anomalies. These tests yielded state-of-the-art results, showing the significant success of our methods
Miech, Antoine. "Large-scale learning from video and natural language". Electronic Thesis or Diss., Université Paris sciences et lettres, 2020. http://www.theses.fr/2020UPSLE059.
Texto completoThe goal of this thesis is to build and train machine learning models capable of understanding the content of videos. Current video understanding approaches mainly rely on large-scale manually annotated video datasets for training. However, collecting and annotating such dataset is cumbersome, expensive and time-consuming. To address this issue, this thesis focuses on leveraging large amounts of readily-available, but noisy annotations in the form of natural language. In particular, we exploit a diverse corpus of textual metadata such as movie scripts, web video titles and descriptions or automatically transcribed speech obtained from narrated videos. Training video models on such readily-available textual data is challenging as such annotation is often imprecise or wrong. In this thesis, we introduce learning approaches to deal with weak annotation and design specialized training objectives and neural network architectures
Merckling, Astrid. "Unsupervised pretraining of state representations in a rewardless environment". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS141.
Texto completoThis thesis seeks to extend the capabilities of state representation learning (SRL) to help scale deep reinforcement learning (DRL) algorithms to continuous control tasks with high-dimensional sensory observations (such as images). SRL allows to improve the performance of DRL by providing it with better inputs than the input embeddings learned from scratch with end-to-end strategies. Specifically, this thesis addresses the problem of performing state estimation in the manner of deep unsupervised pretraining of state representations without reward. These representations must verify certain properties to allow for the correct application of bootstrapping and other decision making mechanisms common to supervised learning, such as being low-dimensional and guaranteeing the local consistency and topology (or connectivity) of the environment, which we will seek to achieve through the models pretrained with the two SRL algorithms proposed in this thesis