Rozprawy doktorskie na temat „Auto-supervisé”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 36 najlepszych rozpraw doktorskich naukowych na temat „Auto-supervisé”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Decoux, Benoît. "Un modèle connexionniste de vision 3-D : imagettes rétiniennes, convergence stéréoscopique, et apprentissage auto-supervisé de la fusion". Rouen, 1995. http://www.theses.fr/1995ROUES056.
Pełny tekst źródłaLefort, Mathieu. "Apprentissage spatial de corrélations multimodales par des mécanismes d'inspiration corticale". Phd thesis, Université Nancy II, 2012. http://tel.archives-ouvertes.fr/tel-00756687.
Pełny tekst źródłaLefort, Mathieu. "Apprentissage spatial de corrélations multimodales par des mécanismes d'inspiration corticale". Electronic Thesis or Diss., Université de Lorraine, 2012. http://www.theses.fr/2012LORR0106.
Pełny tekst źródłaThis thesis focuses on unifying multiple modal data flows that may be provided by sensors of an agent. This unification, inspired by psychological experiments like the ventriloquist effect, is based on detecting correlations which are defined as temporally recurrent spatial patterns that appear in the input flows. Learning of the input flow correlations space consists on sampling this space and generalizing theselearned samples. This thesis proposed some functional paradigms for multimodal data processing, leading to the connectionist, generic, modular and cortically inspired architecture SOMMA (Self-Organizing Maps for Multimodal Association). In this model, each modal stimulus is processed in a cortical map. Interconnectionof these maps provides an unifying multimodal data processing. Sampling and generalization of correlations are based on the constrained self-organization of each map. The model is characterised by a gradual emergence of these functional properties : monomodal properties lead to the emergence of multimodal ones and learning of correlations in each map precedes self-organization of these maps.Furthermore, the use of a connectionist architecture and of on-line and unsupervised learning provides plasticity and robustness properties to the data processing in SOMMA. Classical artificial intelligence models usually miss such properties
Geiler, Louis. "Deep learning for churn prediction". Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7333.
Pełny tekst źródłaThe problem of churn prediction has been traditionally a field of study for marketing. However, in the wake of the technological advancements, more and more data can be collected to analyze the customers behaviors. This manuscript has been built in this frame, with a particular focus on machine learning. Thus, we first looked at the supervised learning problem. We have demonstrated that logistic regression, random forest and XGBoost taken as an ensemble offer the best results in terms of Area Under the Curve (AUC) among a wide range of traditional machine learning approaches. We also have showcased that the re-sampling approaches are solely efficient in a local setting and not a global one. Subsequently, we aimed at fine-tuning our prediction by relying on customer segmentation. Indeed,some customers can leave a service because of a cost that they deem to high, and other customers due to a problem with the customer’s service. Our approach was enriched with a novel deep neural network architecture, which operates with both the auto-encoders and the k-means approach. Going further, we focused on self-supervised learning in the tabular domain. More precisely, the proposed architecture was inspired by the work on the SimCLR approach, where we altered the architecture with the Mean-Teacher model from semi-supervised learning. We showcased through the win matrix the superiority of our approach with respect to the state of the art. Ultimately, we have proposed to apply what we have built in this manuscript in an industrial setting, the one of Brigad. We have alleviated the company churn problem with a random forest that we optimized through grid-search and threshold optimization. We also proposed to interpret the results with SHAP (SHapley Additive exPlanations)
Zaiem, Mohamed Salah. "Informed Speech Self-supervised Representation Learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAT009.
Pełny tekst źródłaFeature learning has been driving machine learning advancement with the recently proposed methods getting progressively rid of handcrafted parts within the transformations from inputs to desired labels. Self-supervised learning has emerged within this context, allowing the processing of unlabeled data towards better performance on low-labeled tasks. The first part of my doctoral work is aimed towards motivating the choices in the speech selfsupervised pipelines learning the unsupervised representations. In this thesis, I first show how conditional-independence-based scoring can be used to efficiently and optimally select pretraining tasks tailored for the best performance on a target task. The second part of my doctoral work studies the evaluation and usage of pretrained self-supervised representations. I explore, first, the robustness of current speech self-supervision benchmarks to changes in the downstream modeling choices. I propose, second, fine-tuning approaches for better efficicency and generalization
Jouffroy, Emma. "Développement de modèles non supervisés pour l'obtention de représentations latentes interprétables d'images". Electronic Thesis or Diss., Bordeaux, 2024. http://www.theses.fr/2024BORD0050.
Pełny tekst źródłaThe Laser Megajoule (LMJ) is a large research device that simulates pressure and temperature conditions similar to those found in stars. During experiments, diagnostics are guided into an experimental chamber for precise positioning. To minimize the risks associated with human error in such an experimental context, the automation of an anti-collision system is envisaged. This involves the design of machine learning tools offering reliable decision levels based on the interpretation of images from cameras positioned in the chamber. Our research focuses on probabilistic generative neural methods, in particular variational auto-encoders (VAEs). The choice of this class of models is linked to the fact that it potentially enables access to a latent space directly linked to the properties of the objects making up the observed scene. The major challenge is to study the design of deep network models that effectively enable access to such a fully informative and interpretable representation, with a view to system reliability. The probabilistic formalism intrinsic to VAE allows us, if we can trace back to such a representation, to access an analysis of the uncertainties of the encoded information
Roger, Vincent. "Modélisation de l'indice de sévérité du trouble de la parole à l'aide de méthodes d'apprentissage profond : d'une modélisation à partir de quelques exemples à un apprentissage auto-supervisé via une mesure entropique". Thesis, Toulouse 3, 2022. http://www.theses.fr/2022TOU30180.
Pełny tekst źródłaPeople with head and neck cancers have speech difficulties after surgery or radiation therapy. It is important for health practitioners to have a measure that reflects the severity of speech. To produce this measure, a perceptual study is commonly performed with a group of five to six clinical experts. This process limits the use of this assessment in practice. Thus, the creation of an automatic measure, similar to the severity index, would allow a better follow-up of the patients by facilitating its obtaining. To realise such a measure, we relied on a reading task, classically performed. We used the recordings of the C2SI-RUGBI corpus, which includes more than 100 people. This corpus represents about one hour of recording to model the severity index. In this PhD work, a review of state-of-the-art methods on speech, emotion and speaker recognition using little data was undertaken. We then attempted to model severity using transfer learning and deep learning. Since the results were not usable, we turned to the so-called "few shot" techniques (learning from only a few examples). Thus, after promising first attempts at phoneme recognition, we obtained promising results for categorising the severity of patients. Nevertheless, the exploitation of these results for a medical application would require improvements. We therefore performed projections of the data from our corpus. As some score slices were separable using acoustic parameters, we proposed a new entropic measurement method. This one is based on self-supervised speech representations on the Librispeech corpus: the PASE+ model, which is inspired by the Inception Score (generally used in image processing to evaluate the quality of images generated by models). Our method allows us to produce a score similar to the severity index with a Spearman correlation of 0.87 on the reading task of the cancer corpus. The advantage of our approach is that it does not require data from the C2SI-RUGBI corpus for training. Thus, we can use the whole corpus for the evaluation of our system. The quality of our results has allowed us to consider a use in a clinical environment through an application on a tablet: tests are underway at the Larrey Hospital in Toulouse
Sarazin, Tugdual. "Apprentissage massivement distribué dans un environnement Big Data". Thesis, Sorbonne Paris Cité, 2018. http://www.theses.fr/2018USPCD050.
Pełny tekst źródłaIn recent years, the amount of data analysed by companies and research laboratories increased strongly, opening the era of BigData. However, these raw data are frequently non-categorized and uneasy to use. This thesis aims to improve and ease the pre-treatment and comprehension of these big amount of data by using unsupervised machine learning algorithms.The first part of this thesis is dedicated to a state-of-the-art of clustering and biclustering algorithms and to an introduction to big data technologies. The first part introduces the conception of clustering Self-Organizing Map algorithm [Kohonen,2001] in big data environment. Our algorithm (SOM-MR) provides the same advantages as the original algorithm, namely the creation of data visualisation map based on data clusters. Moreover, it uses the Spark platform that makes it able to treat a big amount of data in a short time. Thanks to the popularity of this platform, it easily fits in many data mining environments. This is what we demonstrated it in our project \Square Predict" carried out in partnership with Axa insurance. The aim of this project was to provide a real-time data analysing platform in order to estimate the severity of natural disasters or improve residential risks knowledge. Throughout this project, we proved the efficiency of our algorithm through its capacity to analyse and create visualisation out of a big volume of data coming from social networks and open data.The second part of this work is dedicated to a new bi-clustering algorithm. BiClustering consists in making a cluster of observations and variables at the same time. In this contribution we put forward a new approach of bi-clustering based on the self-organizing maps algorithm that can scale on big amounts of data (BiTM-MR). To reach this goal, this algorithm is also based on a the Spark platform. It brings out more information than the SOM-MR algorithm because besides producing observation groups, it also associates variables to these groups,thus creating bi-clusters of variables and observations
Luce-Vayrac, Pierre. "Open-Ended Affordance Discovery in Robotics Using Pertinent Visual Features". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS670.
Pełny tekst źródłaScene understanding is a challenging problem in computer vision and robotics. It is traditionally addressed as an observation only process, in which the robot acquires data on its environment through its exteroceptive sensors, and processes it with specific algorithms (using for example Deep Neural Nets in modern approaches), to produce an interpretation: 'This is a chair because this looks like a chair'. For a robot to properly operate in its environment it needs to understand it. It needs to make sense of it in relation to its motivations and to its action capacities. We believe that scene understanding requires interaction with the environment, wherein perception, action and proprioception are integrated. The work described in this thesis explores this avenue which is inspired by work in Psychology and Neuroscience showing the strong link between action and perception. The concept of affordance has been introduced by James J. Gibson in 1977. It states that animals tend to perceive their environment through what they can accomplish with it (what it affords them), rather than solely through its intrinsic properties: 'This is a chair because I can sit on it.'. There is a variety of approaches studying affordances in robotics, largely agreeing on representing an affordance as a triplet (effect, (action, entity)), such that the effect effect is generated when action action is exerted on entity entity. However most authors use predefined features to describe the environment. We argue that building affordances on predefined features is actually defeating their purpose, by limiting them to the perceptual subspace generated by these features. Furthermore we affirm the impracticability of predefining a set of features general enough to describe entities in open-ended environments. In this thesis, we propose and develop an approach to enable a robot to learn affordances while simultaneously building relevant features describing the environment. To bootstrap affordance discovery we use a classical interaction loop. The robot executes a sequence of motor controls (action a) on a part of the environment ('object' o) described using a predefined set of initial features (color and size) and observes the result (effect e). By repeating this process, a dataset of (e, (a, o)) instances is built. This dataset is then used to train a predictive model of the affordance. To learn a new feature, the same loop is used, but instead of using a predefined set of descriptors of o we use a deep convolutional neural network (CNN). The raw data (2D images) of o is used as input and the effect e as expected output. The action is implicit as a different CNN is trained for each specific action. The training is self-supervised as the interaction data is produced by the robot itself. In order to correctly predict the affordance, the network must extract features which are directly relevant to the environment and the motor capabilities of the robot. Any feature learned by the method can then be added to the initial descriptors set. To achieve open-ended learning, whenever the agent executes the same action on two apparently similar objects (regarding a currently used set of features), but does not observe the same effect, it has to assume that it does not possess the relevant features to distinguish those objects in regard to this action, hence it needs to discover and learn these new features to reduce ambiguity. The robot will use the same approach to enrich its descriptor set. Several experiments on a real robotic setup showed that we can reach predictive performance similar to classical approaches which use predefined descriptors, while avoiding their limitation
Chareyre, Maxime. "Apprentissage non-supervisé pour la découverte de propriétés d'objets par découplage entre interaction et interprétation". Electronic Thesis or Diss., Université Clermont Auvergne (2021-...), 2023. http://www.theses.fr/2023UCFA0122.
Pełny tekst źródłaRobots are increasingly used to achieve tasks in controlled environments. However, their use in open environments is still fraught with difficulties. Robotic agents are likely to encounter objects whose behaviour and function they are unaware of. In some cases, it must interact with these elements to carry out its mission by collecting or moving them, but without knowledge of their dynamic properties it is not possible to implement an effective strategy for resolving the mission.In this thesis, we present a method for teaching an autonomous robot a physical interaction strategy with unknown objects, without any a priori knowledge, the aim being to extract information about as many of the object's physical properties as possible from the interactions observed by its sensors. Existing methods for characterising objects through physical interactions do not fully satisfy these criteria. Indeed, the interactions established only provide an implicit representation of the object's dynamics, requiring supervision to identify their properties. Furthermore, the proposed solution is based on unrealistic scenarios without an agent. Our approach differs from the state of the art by proposing a generic method for learning interaction that is independent of the object and its properties, and can therefore be decoupled from the prediction phase. In particular, this leads to a completely unsupervised global pipeline.In the first phase, we propose to learn an interaction strategy with the object via an unsupervised reinforcement learning method, using an intrinsic motivation signal based on the idea of maximising variations in a state vector of the object. The aim is to obtain a set of interactions containing information that is highly correlated with the object's physical properties. This method has been tested on a simulated robot interacting by pushing and has enabled properties such as the object's mass, shape and friction to be accurately identified.In a second phase, we make the assumption that the true physical properties define a latent space that explains the object's behaviours and that this space can be identified from observations collected through the agent's interactions. We set up a self-supervised prediction task in which we adapt a state-of-the-art architecture to create this latent space. Our simulations confirm that combining the behavioural model with this architecture leads to the emergence of a representation of the object's properties whose principal components are shown to be strongly correlated with the object's physical properties.Once the properties of the objects have been extracted, the agent can use them to improve its efficiency in tasks involving these objects. We conclude this study by highlighting the performance gains achieved by the agent through training via reinforcement learning on a simplified object repositioning task where the properties are perfectly known.All the work carried out in simulation confirms the effectiveness of an innovative method aimed at autonomously discovering the physical properties of an object through the physical interactions of a robot. The prospects for extending this work involve transferring it to a real robot in a cluttered environment
Schutz, Georges. "Adaptations et applications de modèles mixtes de réseaux de neurones à un processus industriel". Phd thesis, Université Henri Poincaré - Nancy I, 2006. http://tel.archives-ouvertes.fr/tel-00115770.
Pełny tekst źródłaartificiels pour améliorer le contrôle de processus industriels
complexes, caractérisés en particulier par leur aspect temporel.
Les motivations principales pour traiter des séries temporelles
sont la réduction du volume de données, l'indexation pour la
recherche de similarités, la localisation de séquences,
l'extraction de connaissances (data mining) ou encore la
prédiction.
Le processus industriel choisi est un four à arc
électrique pour la production d'acier liquide au Luxembourg. Notre
approche est un concept de contrôle prédictif et se base sur des
méthodes d'apprentissage non-supervisé dans le but d'une
extraction de connaissances.
Notre méthode de codage se base sur
des formes primitives qui composent les signaux. Ces formes,
composant un alphabet de codage, sont extraites par une méthode
non-supervisée, les cartes auto-organisatrices de Kohonen (SOM).
Une méthode de validation des alphabets de codage accompagne
l'approche.
Un sujet important abordé durant ces recherches est
la similarité de séries temporelles. La méthode proposée est
non-supervisée et intègre la capacité de traiter des séquences de
tailles variées.
Belhadj, Djedjiga. "Multi-GAT semi-supervisé pour l’extraction d’informations et son adaptation au chiffrement homomorphe". Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0023.
Pełny tekst źródłaThis thesis is being carried out as part of the BPI DeepTech project, in collaboration with the company Fair&Smart, primarily looking after the protection of personal data in accordance with the General Data Protection Regulation (RGPD). In this context, we have proposed a deep neural model for extracting information in semi-structured administrative documents (SSDs). Due to the lack of public training datasets, we have proposed an artificial generator of SSDs that can generate several classes of documents with a wide variation in content and layout. Documents are generated using random variables to manage content and layout, while respecting constraints aimed at ensuring their similarity to real documents. Metrics were introduced to evaluate the content and layout diversity of the generated SSDs. The results of the evaluation have shown that the generated datasets for three SSD types (payslips, receipts and invoices) present a high diversity level, thus avoiding overfitting when training the information extraction systems. Based on the specific format of SSDs, consisting specifically of word pairs (keywords-information) located in spatially close neighborhoods, the document is modeled as a graph where nodes represent words and edges, neighborhood connections. The graph is fed into a multi-layer graph attention network (Multi-GAT). The latter applies the multi-head attention mechanism to learn the importance of each word's neighbors in order to better classify it. A first version of this model was used in supervised mode and obtained an F1 score of 96% on two generated invoice and payslip datasets, and 89% on a real receipt dataset (SROIE). We then enriched the multi-GAT with multimodal embedding of word-level information (textual, visual and positional), and combined it with a variational graph auto-encoder (VGAE). This model operates in semi-supervised mode, being able to learn on both labeled and unlabeled data simultaneously. To further optimize the graph node classification, we have proposed a semi-VGAE whose encoder shares its first layers with the multi-GAT classifier. This is also reinforced by the proposal of a VGAE loss function managed by the classification loss. Using a small unlabeled dataset, we were able to improve the F1 score obtained on a generated invoice dataset by over 3%. Intended to operate in a protected environment, we have adapted the architecture of the model to suit its homomorphic encryption. We studied a method of dimensionality reduction of the Multi-GAT model. We then proposed a polynomial approximation approach for the non-linear functions in the model. To reduce the dimensionality of the model, we proposed a multimodal feature fusion method that requires few additional parameters and reduces the dimensions of the model while improving its performance. For the encryption adaptation, we studied low-degree polynomial approximations of nonlinear functions, using knowledge distillation and fine-tuning techniques to better adapt the model to the new approximations. We were able to minimize the approximation loss by around 3% on two invoice datasets as well as one payslip dataset and by 5% on SROIE
Khacef, Lyes. "Exploration du calcul bio-inspiré avec des architectures neuromorphiques auto-organisées". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4085.
Pełny tekst źródłaThe brain's cortical plasticity is one of the main features that enable our capability to learn and adapt in our environment. Indeed, the cerebral cortex has the ability to self-organize itself through two distinct forms of plasticity: the structural plasticity that creates (sprouting) or cuts (pruning) synaptic connections between neurons, and the synaptic plasticity that modifies the synaptic connections strength. These mechanisms are very likely at the basis of an extremely interesting characteristic of the human brain development: the multimodal association. In spite of the diversity of the sensory modalities, like sight, sound and touch, the brain arrives at the same concepts. Moreover, biological observations show that one modality can activate the internal representation of another modality when both are correlated. To model such a behavior, Edelman and Damasio proposed respectively the Reentry and the Convergence Divergence Zone frameworks where bi-directional neural communications can lead to both multimodal fusion (convergence) and inter-modal activation (divergence). Nevertheless, these theoretical frameworks do not provide a computational model at the neuron level.The objective of this thesis is first to explore the foundations of brain-inspired self-organization in terms of (1) multimodal unsupervised learning, (2) massively parallel, distributed and local computing, and (3) extremely energy-efficient processing. Based on these guidelines and a review of the neural models in the literature, we choose the Self-Organizing Map (SOM) proposed by Kohonen as the main component of our system. We introduce the Iterative Grid, a fully distributed architecture with local connectivity amongst hardware neurons which enables cellular computing in the SOM, and thus a scalable system is terms of processing time and connectivity complexity.Then, we assess the performance of the SOM in the problem of post-labeled unsupervised learning: no label is available during training, then very few labels are available for naming the SOM neurons. We propose and compare different labeling methods so that we minimize the number of labels while keeping the best accuracy. We compare our performance to a different approach using Spiking Neural Networks (SNNs) with Spike Timing Dependant Plasticity (STDP) learning.Next, we propose to improve the SOM performance by using extracted features instead of raw data. We conduct a comparative study on the SOM classification accuracy with unsupervised feature extraction from the MNIST dataset using two different approaches: a machine learning approach with Sparse Convolutional Auto-Encoders using gradient-based learning, and a neuroscience approach with SNNs using STDP learning.To prove the SOM ability to handle more complex datasets, we use transfer learning in the mini-ImageNet few shot classification benchmark to exploit a Wide Residual Network backbone trained on a base dataset as a feature extractor, then we use the SOM to classify the obtained features from the target dataset.Finally, we move into the multimodal association mechanism. We build the Reentrant SOM (ReSOM), a brain-inspired neural system based on the Reentry principles using SOMs and Hebbian-like learning. We propose and compare different computational methods for multimodal unsupervised learning and inference, then quantify the gain of both convergence and divergence mechanisms on three multimodal datasets. The divergence mechanism is used to label one modality based on the other, while the convergence mechanism is used to improve the overall accuracy of the system. We compare our results to SNNs with STDP learning and different fusion strategies, then we show the gain of the so-called hardware plasticity induced by our model, where the system's topology is not fixed by the user but learned along the system's experience through self-organization
Li, Chuyuan. "Facing Data Scarcity in Dialogues for Discourse Structure Discovery and Prediction". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0107.
Pełny tekst źródłaA document is more than a random combination of sentences. It is, instead, a cohesive entity where sentences interact with each other to create a coherent structure and convey specific communicative goals. The field of discourse examines the sentence organization within a document, aiming to reveal its underlying structural information. Discourse analysis plays a crucial role in Natural Language Processing (NLP) and has demonstrated its usefulness in various downstream applications like summarization and question answering. Existing research efforts have focused on automatically extracting discourse structures through tasks such as discourse relation identification and discourse parsing. However, these data-driven methods have predominantly been applied to monologue scenarios, leading to limited availability and generalizability of discourse parsers for dialogues. In this thesis, we address this challenging problem: discourse analysis in dialogues, which presents unique difficulties due to the scarcity of suitable annotated data.We approach discourse analysis along two research lines: “Discourse Feature Discovery” and “Discourse Structure Prediction”. In the first research line, we conduct experiments to investigate linguistic markers, both lexical and non-lexical, in text classification tasks. We are particularly interested in the context of mental disorder identification since it reflects a realistic scenario. To address the issue of data sparsity, we propose techniques for enhancing data representation and feature engineering. Our results demonstrate that non-lexical and discourse-level (even though shallow) features are reliable indicators in developing more general and robust classifiers. In the second research line, our objective is to directly predict the discourse structure of a given document. We adopt the Segmented Discourse Representation Theory (SDRT) framework, which represents a document as a graph. The task of extracting this graph-like structure using machine learning techniques is commonly known as discourse parsing. Taking inspiration from recent studies that investigate the inner workings of Transformer-based models (“BERTology”'), we leverage discourse information encoded in Pre-trained Language Models (PLMs) such as Bidirectional Encoder Representations from Transformers (BERT) and propose innovative extraction methods that require minimal supervision. Our discourse parsing approach involves two steps: first, we predict the discourse structure, and then we identify the relations within the structure. This two-stage process allows for a comprehensive analysis of the parser's performance at each stage. Using self-supervised learning strategies, our parser achieves encouraging results for the full parsing. We conduct extensive analyses to evaluate the parser's performance across different discourse structures and propose directions for future improvements
Mehr, Éloi. "Unsupervised Learning of 3D Shape Spaces for 3D Modeling". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS566.
Pełny tekst źródłaEven though 3D data is becoming increasingly more popular, especially with the democratization of virtual and augmented experiences, it remains very difficult to manipulate a 3D shape, even for designers or experts. Given a database containing 3D instances of one or several categories of objects, we want to learn the manifold of plausible shapes in order to develop new intelligent 3D modeling and editing tools. However, this manifold is often much more complex compared to the 2D domain. Indeed, 3D surfaces can be represented using various embeddings, and may also exhibit different alignments and topologies. In this thesis we study the manifold of plausible shapes in the light of the aforementioned challenges, by deepening three different points of view. First of all, we consider the manifold as a quotient space, in order to learn the shapes’ intrinsic geometry from a dataset where the 3D models are not co-aligned. Then, we assume that the manifold is disconnected, which leads to a new deep learning model that is able to automatically cluster and learn the shapes according to their typology. Finally, we study the conversion of an unstructured 3D input to an exact geometry, represented as a structured tree of continuous solid primitives
Banville, Hubert. "Enabling real-world EEG applications with deep learning". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG005.
Pełny tekst źródłaOur understanding of the brain has improved considerably in the last decades, thanks to groundbreaking advances in the field of neuroimaging. Now, with the invention and wider availability of personal wearable neuroimaging devices, such as low-cost mobile EEG, we have entered an era in which neuroimaging is no longer constrained to traditional research labs or clinics. "Real-world'' EEG comes with its own set of challenges, though, ranging from a scarcity of labelled data to unpredictable signal quality and limited spatial resolution. In this thesis, we draw on the field of deep learning to help transform this century-old brain imaging modality from a purely clinical- and research-focused tool, to a practical technology that can benefit individuals in their day-to-day life. First, we study how unlabelled EEG data can be utilized to gain insights and improve performance on common clinical learning tasks using self-supervised learning. We present three such self-supervised approaches that rely on the temporal structure of the data itself, rather than onerously collected labels, to learn clinically-relevant representations. Through experiments on large-scale datasets of sleep and neurological screening recordings, we demonstrate the significance of the learned representations, and show how unlabelled data can help boost performance in a semi-supervised scenario. Next, we explore ways to ensure neural networks are robust to the strong sources of noise often found in out-of-the-lab EEG recordings. Specifically, we present Dynamic Spatial Filtering, an attention mechanism module that allows a network to dynamically focus its processing on the most informative EEG channels while de-emphasizing any corrupted ones. Experiments on large-scale datasets and real-world data demonstrate that, on sparse EEG, the proposed attention block handles strong corruption better than an automated noise handling approach, and that the predicted attention maps can be interpreted to inspect the functioning of the neural network. Finally, we investigate how weak labels can be used to develop a biomarker of neurophysiological health from real-world EEG. We translate the brain age framework, originally developed using lab and clinic-based magnetic resonance imaging, to real-world EEG data. Using recordings from more than a thousand individuals performing a focused attention exercise or sleeping overnight, we show not only that age can be predicted from wearable EEG, but also that age predictions encode information contained in well-known brain health biomarkers, but not in chronological age. Overall, this thesis brings us a step closer to harnessing EEG for neurophysiological monitoring outside of traditional research and clinical contexts, and opens the door to new and more flexible applications of this technology
Chéhab, L'Émir Omar. "Advances in Self-Supervised Learning : applications to neuroscience and sample-efficiency". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG079.
Pełny tekst źródłaSelf-supervised learning has gained popularity as a method for learning from unlabeled data. Essentially, it involves creating and then solving a prediction task using the data, such as reordering shuffled data. In recent years, this approach has been successful in training neural networks to learn useful representations from data, without any labels. However, our understanding of what is actually being learned and how well it is learned is still somewhat limited. This document contributes to our understanding of self-supervised learning in these two key aspects.Empirically, we address the question of what is learned. We design prediction tasks specifically tailored to learning from brain recordings with magnetoencephalography (MEG) or electroencephalography (EEG). These prediction tasks share a common objective: recognizing temporal structure within the brain data. Our results show that representations learnt by solving these tasks contain interpretable cognitive and clinical neurophysiological features.Theoretically, we explore the quality of the learning procedure. Our focus is on a specific category of prediction tasks: binary classification. We extend prior research that has highlighted the utility of binary classification for statistical inference, though it may involve trading off some measure of statistical efficiency for another measure of computational efficiency. Our contributions aim to improve statistical efficiency. We theoretically analyze the statistical estimation error and find situations when it can be provably reduced. Specifically, we characterize optimal hyperparameters of the binary classification task and also prove that the popular heuristic of "annealing" can lead to more efficient estimation, even in high dimensions
Ozcelik, Furkan. "Déchiffrer le langage visuel du cerveau : reconstruction d'images naturelles à l'aide de modèles génératifs profonds à partir de signaux IRMf". Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSES073.
Pełny tekst źródłaThe great minds of humanity were always curious about the nature of mind, brain, and consciousness. Through physical and thought experiments, they tried to tackle challenging questions about visual perception. As neuroimaging techniques were developed, neural encoding and decoding techniques provided profound understanding about how we process visual information. Advancements in Artificial Intelligence and Deep Learning areas have also influenced neuroscientific research. With the emergence of deep generative models like Variational Autoencoders (VAE), Generative Adversarial Networks (GAN) and Latent Diffusion Models (LDM), researchers also used these models in neural decoding tasks such as visual reconstruction of perceived stimuli from neuroimaging data. The current thesis provides two frameworks in the above-mentioned area of reconstructing perceived stimuli from neuroimaging data, particularly fMRI data, using deep generative models. These frameworks focus on different aspects of the visual reconstruction task than their predecessors, and hence they may bring valuable outcomes for the studies that will follow. The first study of the thesis (described in Chapter 2) utilizes a particular generative model called IC-GAN to capture both semantic and realistic aspects of the visual reconstruction. The second study (mentioned in Chapter 3) brings new perspective on visual reconstruction by fusing decoded information from different modalities (e.g. text and image) using recent latent diffusion models. These studies become state-of-the-art in their benchmarks by exhibiting high-fidelity reconstructions of different attributes of the stimuli. In both of our studies, we propose region-of-interest (ROI) analyses to understand the functional properties of specific visual regions using our neural decoding models. Statistical relations between ROIs and decoded latent features show that while early visual areas carry more information about low-level features (which focus on layout and orientation of objects), higher visual areas are more informative about high-level semantic features. We also observed that generated ROI-optimal images, using these visual reconstruction frameworks, are able to capture functional selectivity properties of the ROIs that have been examined in many prior studies in neuroscientific research. Our thesis attempts to bring valuable insights for future studies in neural decoding, visual reconstruction, and neuroscientific exploration using deep learning models by providing the results of two visual reconstruction frameworks and ROI analyses. The findings and contributions of the thesis may help researchers working in cognitive neuroscience and have implications for brain-computer-interface applications
Bojko, Adrian. "Self-supervised Dynamic SLAM : Tackling Consensus Inversions". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG031.
Pełny tekst źródłaThe ability of self-localization is essential for autonomous vehicles, robots, mixed reality and more generally to systems that interact with their environment. When maps are not available, SLAM (Simultaneous Localization and Mapping) algorithms create a map of the environment and at the same time locate the system within it. A popular sensor is the camera, which has the benefit of passively providing a visual representation of the environment at a low cost, and for this reason the one we use in this thesis.SLAM in Dynamic environments, or Dynamic SLAM, is challenging as the algorithm must be able to continuously perceive what parts of the image are fixed with respect to the frame of reference the user wants, usually the ground. Problems arise when assumptions SLAM algorithms rely on become invalid. A remarkable case is the Motion Consensus Inversion (MCI): when most of an image is made of moving objects, the SLAM does not use the correct frame of reference and fails. Another one is excessive masking: some SLAM algorithms remove from images -- ie, mask -- all objects that might be dynamic even if they are not moving, and consequently fail if images become empty.More generally, the user may need to use a SLAM algorithm in an unsupported context. In fact, the gap between what the user needs and what SLAM algorithms do is a blind spot in SLAM research and the cause for issues like motion consensus inversions, which are themselves seldom seen in the literature. Hence, instead of making a more general SLAM algorithm, we propose a SLAM algorithm that adapts to new environments through automated self-supervised training: to automatically learn what parts of a scene may not be fixed with respect to the user's desired frame of reference, and when they are fixed or not. The user provides unlabeled training videos and our SLAM automatically learns what to do to from them.In the first part of this document, we present the State of the Art of algorithms for SLAM and Dynamic SLAM, reference datasets and metrics. We detail the challenges of Dynamic SLAM and robustness evaluation. Current SLAM datasets and metrics are also subject to the user need gap, so we propose our own. Our datasets are the first to explicitly include video sequences with motion consensus inversions or excessive masking and our metric is more general that the usual accuracy metrics, which are misleading in very difficult scenarios.In the second part, we explore the relation between image features and SLAM performance, and from this work we present a novel self-supervised Dynamic SLAM that learns what objects to mask, using SLAM outliers. Outliers are features rejected during the standard SLAM process: we observed that outliers on objects in motion have unique properties in easy dynamic sequences. Thus, we locate dynamic objects using outliers and learn to segment them, so we can mask dynamic objects in sequences of any difficulty at runtime.Finally, we present a self-supervised approach that learns when to mask objects: Dynamic SLAM with Temporal Masking. Leveraging an existing method to mask objects, it automatically learns when to mask objects of certain classes. It automatically annotates every frame of training sequences with masking decisions (to mask objects or not), then learn the circumstances that led to these decisions with a memory-based network. We do not make any geometrical assumption, unlike other SLAM algorithms. Using a memory-based approach prevents at runtime motion consensus inversions and excessive masking, which is hardly possible when relying on geometrical methods.The results of this thesis show that a self-supervised Dynamic SLAM is a viable approach to tackle motion consensus inversions. More generally, self-supervision is the key to have a SLAM adapt to user needs. We surpassed the State of the Art in terms of robustness, in addition to clarifying blind spots of the literature in Dynamic SLAM robustness evaluation
Jezequel, Loïc. "Vers une détection d'anomalie unifiée avec une application à la détection de fraude". Electronic Thesis or Diss., CY Cergy Paris Université, 2023. http://www.theses.fr/2023CYUN1190.
Pełny tekst źródłaDetecting observations straying apart from a baseline case is becoming increasingly critical in many applications. It is found in fraud detection, medical imaging, video surveillance or even in manufacturing defect detection with data ranging from images to sound. Deep anomaly detection was introduced to tackle this challenge by properly modeling the normal class, and considering anything significantly different as anomalous. Given the anomalous class is not well-defined, classical binary classification will not be suitable and lack robustness and reliability outside its training domain. Nevertheless, the best-performing anomaly detection approaches still lack generalization to different types of anomalies. Indeed, each method is either specialized on high-scale object anomalies or low-scale local anomalies.In this context, we first introduce a more generic one-class pretext-task anomaly detector. The model, named OC-MQ, computes an anomaly score by learning to solve a complex pretext task on the normal class. The pretext task is composed of several sub-tasks allowing it to capture a wide variety of visual cues. More specifically, our model is made of two branches each representing discriminative and generative tasks.Nevertheless, an additional anomalous dataset is in reality often available in many applications and can provide harder edge-case anomalous examples. In this light, we explore two approaches for outlier-exposure. First, we generalize the concept of pretext task to outlier-exposure by dynamically learning the pretext task itself with normal and anomalous samples. We propose two the models SadTPS and SadRest that respectively learn a discriminative pretext task of thin plate transform recognition and generative task of image restoration. In addition, we present a new anomaly-distance model SadCLR, where the training of previously unreliable anomaly-distance models is stabilized by adding contrastive regularization on the representation direction. We further enrich existing anomalies by generating several types of pseudo-anomalies.Finally, we extend the two previous approaches to be usable in both one-class and outlier-exposure setting. Firstly, we introduce the AnoMem model which memorizes a set of multi-scale normal prototypes by using modern Hopfield layers. Anomaly distance estimators are then fitted on the deviations between the input and normal prototypes in a one-class or outlier-exposure manner. Secondly, we generalize learnable pretext tasks to be learned only using normal samples. Our proposed model HEAT adversarially learns the pretext task to be just challenging enough to keep good performance on normal samples, while failing on anomalies. Besides, we choose the recently proposed Busemann distance in the hyperbolic Poincaré ball model to compute the anomaly score.Extensive testing was conducted for each proposed method, varying from coarse and subtle style anomalies to a fraud detection dataset of face presentation attacks with local anomalies. These tests yielded state-of-the-art results, showing the significant success of our methods
Zheng, Léon. "Frugalité en données et efficacité computationnelle dans l'apprentissage profond". Electronic Thesis or Diss., Lyon, École normale supérieure, 2024. http://www.theses.fr/2024ENSL0009.
Pełny tekst źródłaThis thesis focuses on two challenges of frugality and efficiency in modern deep learning: data frugality and computational resource efficiency. First, we study self-supervised learning, a promising approach in computer vision that does not require data annotations for learning representations. In particular, we propose a unification of several self-supervised objective functions under a framework based on rotation-invariant kernels, which opens up prospects to reduce the computational cost of these objective functions. Second, given that matrix multiplication is the predominant operation in deep neural networks, we focus on the construction of fast algorithms that allow matrix-vector multiplication with nearly linear complexity. More specifically, we examine the problem of sparse matrix factorization under the constraint of butterfly sparsity, a structure common to several fast transforms like the discrete Fourier transform. The thesis establishes new theoretical guarantees for butterfly factorization algorithms, and explores the potential of butterfly sparsity to reduce the computational costs of neural networks during their training or inference phase. In particular, we explore the efficiency of GPU implementations for butterfly sparse matrix multiplication, with the goal of truly accelerating sparse neural networks
Marsal, Rémi. "Motion analysis in videos with deep self-supervised learning". Electronic Thesis or Diss., Sorbonne université, 2024. http://www.theses.fr/2024SORUS137.
Pełny tekst źródłaThese thesis works explore self-supervised learning methods based on motion in videos to reduce the reliance on costly annotated datasets for the tasks of optical flow and monocular depth estimation. In the absence of ground truth, both tasks are mainly learned with an image reconstruction loss, which relies on the brightness constancy hypothesis. In practice, this assumption may not be verified due to brightness changes caused by moving shadows or non-Lambertian surfaces, which prevents some reconstructions. On the one hand, solutions can be implemented to limit the impact of these brightness changes. Thus, our first contribution improves the performance of self-supervised optical flow estimation methods thanks to an auxiliary neural network that is designed to compensate for any brightness change at the training stage only, so that the running time at inference is not affected. On the other hand, since the reconstruction loss limits make some cases poorly supervised and therefore difficult to estimate for a depth estimation neural network, they are a source of aleatoric uncertainty that can be estimated. In our second contribution, we show that using our new probabilistic formulation of the problem of self-supervised learning of monocular depth provides both better depth and uncertainty predictions
Robert, Thomas. "Improving Latent Representations of ConvNets for Visual Understanding". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS343.
Pełny tekst źródłaFor a decade now, convolutional deep neural networks have demonstrated their ability to produce excellent results for computer vision. For this, these models transform the input image into a series of latent representations. In this thesis, we work on improving the "quality'' of the latent representations of ConvNets for different tasks. First, we work on regularizing those representations to increase their robustness toward intra-class variations and thus improve their performance for classification. To do so, we develop a loss based on information theory metrics to decrease the entropy conditionally to the class. Then, we propose to structure the information in two complementary latent spaces, solving a conflict between the invariance of the representations and the reconstruction task. This structure allows to release the constraint posed by classical architecture, allowing to obtain better results in the context of semi-supervised learning. Finally, we address the problem of disentangling, i.e. explicitly separating and representing independent factors of variation of the dataset. We pursue our work on structuring the latent spaces and use adversarial costs to ensure an effective separation of the information. This allows to improve the quality of the representations and allows semantic image editing
Gillard, Tristan. "Auto-organisation multi-échelle pour l’émergence de comportements sensorimoteurs coordonnés". Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0353.
Pełny tekst źródłaNon-associative learning is widely observed throughout phylogeny and appears to be fundamental for the adaptation and, thus, the survival of living organisms. This thesis explores adaptation mechanisms inspired by these non-associative learnings. We propose three computational models of habituation, three models of site-specific sensitization and one model of pseudo-conditioning. We develop these models within the framework of the Iterant Deformable Sensorimotor Medium (IDSM), a recently developed abstract model of sensorimotor behavior formation. The characteristics of the presented models are studied and analyzed in light of our long-term goal of investigating new unsupervised learning mechanisms for autonomous artificial agents
Shahid, Mustafizur Rahman. "Deep learning for Internet of Things (IoT) network security". Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAS003.
Pełny tekst źródłaThe growing Internet of Things (IoT) introduces new security challenges for network activity monitoring. Most IoT devices are vulnerable because of a lack of security awareness from device manufacturers and end users. As a consequence, they have become prime targets for malware developers who want to turn them into bots. Contrary to general-purpose devices, an IoT device is designed to perform very specific tasks. Hence, its networking behavior is very stable and predictable making it well suited for data analysis techniques. Therefore, the first part of this thesis focuses on leveraging recent advances in the field of deep learning to develop network monitoring tools for the IoT. Two types of network monitoring tools are explored: IoT device type recognition systems and IoT network Intrusion Detection Systems (NIDS). For IoT device type recognition, supervised machine learning algorithms are trained to perform network traffic classification and determine what IoT device the traffic belongs to. The IoT NIDS consists of a set of autoencoders, each trained for a different IoT device type. The autoencoders learn the legitimate networking behavior profile and detect any deviation from it. Experiments using network traffic data produced by a smart home show that the proposed models achieve high performance.Despite yielding promising results, training and testing machine learning based network monitoring systems requires tremendous amount of IoT network traffic data. But, very few IoT network traffic datasets are publicly available. Physically operating thousands of real IoT devices can be very costly and can rise privacy concerns. In the second part of this thesis, we propose to leverage Generative Adversarial Networks (GAN) to generate bidirectional flows that look like they were produced by a real IoT device. A bidirectional flow consists of the sequence of the sizes of individual packets along with a duration. Hence, in addition to generating packet-level features which are the sizes of individual packets, our developed generator implicitly learns to comply with flow-level characteristics, such as the total number of packets and bytes in a bidirectional flow or the total duration of the flow. Experimental results using data produced by a smart speaker show that our method allows us to generate high quality and realistic looking synthetic bidirectional flows
Denize, Julien. "Self-supervised representation learning and applications to image and video analysis". Electronic Thesis or Diss., Normandie, 2023. http://www.theses.fr/2023NORMIR37.
Pełny tekst źródłaIn this thesis, we develop approaches to perform self-supervised learning for image and video analysis. Self-supervised representation learning allows to pretrain neural networks to learn general concepts without labels before specializing in downstream tasks faster and with few annotations. We present three contributions to self-supervised image and video representation learning. First, we introduce the theoretical paradigm of soft contrastive learning and its practical implementation called Similarity Contrastive Estimation (SCE) connecting contrastive and relational learning for image representation. Second, SCE is extended to global temporal video representation learning. Lastly, we propose COMEDIAN a pipeline for local-temporal video representation learning for transformers. These contributions achieved state-of-the-art results on multiple benchmarks and led to several academic and technical published contributions
Martinroche, Guillaume. "Quantification et caractérisation des maladies auto-immunes et allergiques à l'aide de méthodes d'apprentissage profond". Electronic Thesis or Diss., Bordeaux, 2024. http://www.theses.fr/2024BORD0154.
Pełny tekst źródłaDiagnostic tools based on artificial intelligence (AI) and capable of integrating several types of data, will be crucial in the next coming years in helping practitioners provide more personalized, precision medicine for patients. Autoimmune and allergic diseases are perfect examples of complex, multi-parametric diagnostics that could benefit from such tools. Antinuclear antibodies (ANA) on human epithelial cells (HEp-2) are important biomarkers for the screening and diagnosis of autoimmune diseases. For harmonization of biological practices and clinical management, automatic reading and classification of ANA immunofluorescence patterns for HEp-2 images according to the nomenclature recommended by the International Consensus on Antinuclear Antibody Patterns (ICAP) seems to be a growing requirement. In our study, an automatic classification system for Indirect Immunofluorescence (IIF) patterns of HEp-2 cells images was developed using a supervised learning methodology, based on a complete collection of HEp-2 cell images from Bordeaux University Hospital labelled accordingly to ICAP recommendations and local practices. The system consists of a classifier for nucleus patterns only (16 patterns and allowing recognition of up to two aspects per image) and a second classifier for cytoplasm aspects only. With this contribution to the automation of ANA in medical biology laboratories, it will enable reflex quantitative tests targeted on a few autoantibodies, ultimately facilitating efficient and accurate diagnosis of autoimmune diseases. Allergen microarrays, enable the simultaneous detection of up to 300 specific IgE antibodies and are part of a bottom-up diagnostic approach in which, on the basis of the broadest possible analysis, we then seek to determine which allergen(s) is (are) likely to explain the patient's symptoms. However, the mass of data produced by this single analysis is beyond the analytical capacity of the average user and the large number of results obtained simultaneously can mask those that are truly clinically relevant. A database of 4271 patients (Société Française d'Allergologie) was created, including allergen microarrays data and twenty-five demographic and clinical data. This database allowed the development of the first models capable of predicting patients' allergic profiles thanks to an international data challenge. The best F1-scores were around 80%. A more comprehensive tool adapted to daily practice is currently under development. Based essentially on microarrays data and a very few clinical and demographic data, it will be able to provide clinicians with a probability of molecular allergy by protein family, thus limiting diagnostic delays and the need for oral provocation tests. Diagnostic tools using so-called AI technologies are helping to improve the efficiency of current techniques, leveraging locks for repetitive, low-value-added tasks. These tools are generally poorly perceived by practitioners, who feel that they are losing their expertise, and even that they are being replaced by algorithms. This impression is particularly strong in Medical Biology, where this improvement directly affects the function of the Medical Biologist. In an attempt to better understand this, we took a closer look at the relationship of trust, if there can be one, between the practitioner and the diagnostic tool. The concepts of reliability and veracity were discussed. Thanks to a survey of medical biologists working on the analysis of aspects of HEp-2 cells, a certain reticence can be highlighted, with reasons linked to performance scores and unfamiliarity with the systems. The deployment and commitment to similar strategies in the field of biological hematology shows real interest once performance has been established. The development of two diagnostic tools for autoimmune and allergic diseases is laying the foundations for improved results and lasting integration into a more personalized, precision medicine
Douzon, Thibault. "Language models for document understanding". Electronic Thesis or Diss., Lyon, INSA, 2023. http://www.theses.fr/2023ISAL0075.
Pełny tekst źródłaEvery day, an uncountable amount of documents are received and processed by companies worldwide. In an effort to reduce the cost of processing each document, the largest companies have resorted to document automation technologies. In an ideal world, a document can be automatically processed without any human intervention: its content is read, and information is extracted and forwarded to the relevant service. The state-of-the-art techniques have quickly evolved in the last decades, from rule-based algorithms to statistical models. This thesis focuses on machine learning models for document information extraction. Recent advances in model architecture for natural language processing have shown the importance of the attention mechanism. Transformers have revolutionized the field by generalizing the use of attention and by pushing self-supervised pre-training to the next level. In the first part, we confirm that transformers with appropriate pre-training were able to perform document understanding tasks with high performance. We show that, when used as a token classifier for information extraction, transformers are able to exceptionally efficiently learn the task compared to recurrent networks. Transformers only need a small proportion of the training data to reach close to maximum performance. This highlights the importance of self-supervised pre-training for future fine-tuning. In the following part, we design specialized pre-training tasks, to better prepare the model for specific data distributions such as business documents. By acknowledging the specificities of business documents such as their table structure and their over-representation of numeric figures, we are able to target specific skills useful for the model in its future tasks. We show that those new tasks improve the model's downstream performances, even with small models. Using this pre-training approach, we are able to reach the performances of significantly bigger models without any additional cost during finetuning or inference. Finally, in the last part, we address one drawback of the transformer architecture which is its computational cost when used on long sequences. We show that efficient architectures derived from the classic transformer require fewer resources and perform better on long sequences. However, due to how they approximate the attention computation, efficient models suffer from a small but significant performance drop on short sequences compared to classical architectures. This incentivizes the use of different models depending on the input length and enables concatenating multimodal inputs into a single sequence
Gotab, Pierre. "Classification automatique pour la compréhension de la parole : vers des systèmes semi-supervisés et auto-évolutifs". Phd thesis, Université d'Avignon, 2012. http://tel.archives-ouvertes.fr/tel-00858980.
Pełny tekst źródłaGuillaumin, Matthieu. "Données multimodales pour l'analyse d'image". Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM048.
Pełny tekst źródłaThis dissertation delves into the use of textual metadata for image understanding. We seek to exploit this additional textual information as weak supervision to improve the learning of recognition models. There is a recent and growing interest for methods that exploit such data because they can potentially alleviate the need for manual annotation, which is a costly and time-consuming process. We focus on two types of visual data with associated textual information. First, we exploit news images that come with descriptive captions to address several face related tasks, including face verification, which is the task of deciding whether two images depict the same individual, and face naming, the problem of associating faces in a data set to their correct names. Second, we consider data consisting of images with user tags. We explore models for automatically predicting tags for new images, i. E. Image auto-annotation, which can also used for keyword-based image search. We also study a multimodal semi-supervised learning scenario for image categorisation. In this setting, the tags are assumed to be present in both labelled and unlabelled training data, while they are absent from the test data. Our work builds on the observation that most of these tasks can be solved if perfectly adequate similarity measures are used. We therefore introduce novel approaches that involve metric learning, nearest neighbour models and graph-based methods to learn, from the visual and textual data, task-specific similarities. For faces, our similarities focus on the identities of the individuals while, for images, they address more general semantic visual concepts. Experimentally, our approaches achieve state-of-the-art results on several standard and challenging data sets. On both types of data, we clearly show that learning using additional textual information improves the performance of visual recognition systems
Guillaumin, Matthieu. "Données multimodales pour l'analyse d'image". Phd thesis, Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00522278/en/.
Pełny tekst źródłaGhemmogne, Fossi Leopold. "Gestion des règles basée sur l'indice de puissance pour la détection de fraude : Approches supervisées et semi-supervisées". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI079.
Pełny tekst źródłaThis thesis deals with the detection of credit card fraud. According to the European Central Bank, the value of frauds using cards in 2016 amounted to 1.8 billion euros. The challenge for institutions is to reduce these frauds. In general, fraud detection systems consist of an automatic system built with "if-then" rules that control all incoming transactions and trigger an alert if the transaction is considered suspicious. An expert group checks the alert and decides whether it is true or not. The criteria used in the selection of the rules that are kept operational are mainly based on the individual performance of the rules. This approach ignores the non-additivity of the rules. We propose a new approach using power indices. This approach assigns to the rules a normalized score that quantifies the influence of the rule on the overall performance of the group. The indexes we use are the Shapley Value and Banzhaf Value. Their applications are 1) Decision support to keep or delete a rule; 2) Selection of the number k of best-ranked rules, in order to work with a more compact set. Using real credit card fraud data, we show that: 1) This approach performs better than the one that evaluates the rules in isolation. 2) The performance of the set of rules can be achieved by keeping one-tenth of the rules. We observe that this application can be considered as a task of selection of characteristics: We show that our approach is comparable to the current algorithms of the selection of characteristics. It has an advantage in rule management because it assigns a standard score to each rule. This is not the case for most algorithms, which focus only on an overall solution. We propose a new version of Banzhaf Value, namely k-Banzhaf; which outperforms the previous in terms of computing time and has comparable performance. Finally, we implement a self-learning process to reinforce the learning in an automatic learning algorithm. We compare these with our power indices to rank credit card fraud data. In conclusion, we observe that the selection of characteristics based on the power indices has comparable results with the other algorithms in the self-learning process
Wandeto, John Mwangi. "Self-organizing map quantization error approach for detecting temporal variations in image sets". Thesis, Strasbourg, 2018. http://www.theses.fr/2018STRAD025/document.
Pełny tekst źródłaA new approach for image processing, dubbed SOM-QE, that exploits the quantization error (QE) from self-organizing maps (SOM) is proposed in this thesis. SOM produce low-dimensional discrete representations of high-dimensional input data. QE is determined from the results of the unsupervised learning process of SOM and the input data. SOM-QE from a time-series of images can be used as an indicator of changes in the time series. To set-up SOM, a map size, the neighbourhood distance, the learning rate and the number of iterations in the learning process are determined. The combination of these parameters that gives the lowest value of QE, is taken to be the optimal parameter set and it is used to transform the dataset. This has been the use of QE. The novelty in SOM-QE technique is fourfold: first, in the usage. SOM-QE employs a SOM to determine QE for different images - typically, in a time series dataset - unlike the traditional usage where different SOMs are applied on one dataset. Secondly, the SOM-QE value is introduced as a measure of uniformity within the image. Thirdly, the SOM-QE value becomes a special, unique label for the image within the dataset and fourthly, this label is used to track changes that occur in subsequent images of the same scene. Thus, SOM-QE provides a measure of variations within the image at an instance in time, and when compared with the values from subsequent images of the same scene, it reveals a transient visualization of changes in the scene of study. In this research the approach was applied to artificial, medical and geographic imagery to demonstrate its performance. Changes that occur in geographic scenes of interest, such as new buildings being put up in a city or lesions receding in medical images are of interest to scientists and engineers. The SOM-QE technique provides a new way for automatic detection of growth in urban spaces or the progressions of diseases, giving timely information for appropriate planning or treatment. In this work, it is demonstrated that SOM-QE can capture very small changes in images. Results also confirm it to be fast and less computationally expensive in discriminating between changed and unchanged contents in large image datasets. Pearson's correlation confirmed that there was statistically significant correlations between SOM-QE values and the actual ground truth data. On evaluation, this technique performed better compared to other existing approaches. This work is important as it introduces a new way of looking at fast, automatic change detection even when dealing with small local changes within images. It also introduces a new method of determining QE, and the data it generates can be used to predict changes in a time series dataset
Racah, Evan. "Unsupervised representation learning in interactive environments". Thèse, 2019. http://hdl.handle.net/1866/23788.
Pełny tekst źródłaExtracting a representation of all the high-level factors of an agent’s state from level-level sensory information is an important, but challenging task in machine learning. In this thesis, we will explore several unsupervised approaches for learning these state representations. We apply and analyze existing unsupervised representation learning methods in reinforcement learning environments, as well as contribute our own evaluation benchmark and our own novel state representation learning method. In the first chapter, we will overview and motivate unsupervised representation learning for machine learning in general and for reinforcement learning. We will then introduce a relatively new subfield of representation learning: self-supervised learning. We will then cover two core representation learning approaches, generative methods and discriminative methods. Specifically, we will focus on a collection of discriminative representation learning methods called contrastive unsupervised representation learning (CURL) methods. We will close the first chapter by detailing various approaches for evaluating the usefulness of representations. In the second chapter, we will present a workshop paper, where we evaluate a handful of off-the-shelf self-supervised methods in reinforcement learning problems. We discover that the performance of these representations depends heavily on the dynamics and visual structure of the environment. As such, we determine that a more systematic study of environments and methods is required. Our third chapter covers our second article, Unsupervised State Representation Learning in Atari, where we try to execute a more thorough study of representation learning methods in RL as motivated by the second chapter. To facilitate a more thorough evaluation of representations in RL we introduce a benchmark of 22 fully labelled Atari games. In addition, we choose the representation learning methods for comparison in a more systematic way by focusing on comparing generative methods with contrastive methods, instead of the less systematically chosen off-the-shelf methods from the second chapter. Finally, we introduce a new contrastive method, ST-DIM, which excels at the 22 Atari games.
Schwarzer, Max. "Data-efficient reinforcement learning with self-predictive representations". Thesis, 2020. http://hdl.handle.net/1866/25105.
Pełny tekst źródłaData efficiency remains a key challenge in deep reinforcement learning. Although modern techniques have been shown to be capable of attaining high performance in extremely complex tasks, including strategy games such as StarCraft, Chess, Shogi, and Go as well as in challenging visual domains such as Atari games, doing so generally requires enormous amounts of interactional data, limiting how broadly reinforcement learning can be applied. In this thesis, we propose SPR, a method drawing from recent advances in self-supervised representation learning designed to enhance the data efficiency of deep reinforcement learning agents. We evaluate this method on the Atari Learning Environment, and show that it dramatically improves performance with limited computational overhead. When given roughly the same amount of learning time as human testers, a reinforcement learning agent augmented with SPR achieves super-human performance on 7 out of 26 games, an increase of 350% over the previous state of the art, while also strongly improving mean and median performance. We also evaluate this method on a set of continuous control tasks, showing substantial improvements over previous methods. Chapter 1 introduces concepts necessary to understand the work presented, including overviews of Deep Reinforcement Learning and Self-Supervised Representation learning. Chapter 2 contains a detailed description of our contributions towards leveraging self-supervised representation learning to improve data-efficiency in reinforcement learning. Chapter 3 provides some conclusions drawn from this work, including a number of proposals for future work.
Lajoie, Isabelle. "Apprentissage de représentations sur-complètes par entraînement d’auto-encodeurs". Thèse, 2009. http://hdl.handle.net/1866/3768.
Pełny tekst źródłaProgress in the machine learning domain allows computational system to address more and more complex tasks associated with vision, audio signal or natural language processing. Among the existing models, we find the Artificial Neural Network (ANN), whose popularity increased suddenly with the recent breakthrough of Hinton et al. [22], that consists in using Restricted Boltzmann Machines (RBM) for performing an unsupervised, layer by layer, pre-training initialization, of a Deep Belief Network (DBN), which enables the subsequent successful supervised training of such architecture. Since this discovery, researchers studied the efficiency of other similar pre-training strategies such as the stacking of traditional auto-encoder (SAE) [5, 38] and the stacking of denoising auto-encoder (SDAE) [44]. This is the context in which the present study started. After a brief introduction of the basic machine learning principles and of the pre-training methods used until now with RBM, AE and DAE modules, we performed a series of experiments to deepen our understanding of pre-training with SDAE, explored its different proprieties and explored variations on the DAE algorithm as alternative strategies to initialize deep networks. We evaluated the sensitivity to the noise level, and influence of number of layers and number of hidden units on the generalization error obtained with SDAE. We experimented with other noise types and saw improved performance on the supervised task with the use of pepper and salt noise (PS) or gaussian noise (GS), noise types that are more justified then the one used until now which is masking noise (MN). Moreover, modifying the algorithm by imposing an emphasis on the corrupted components reconstruction during the unsupervised training of each different DAE showed encouraging performance improvements. Our work also allowed to reveal that DAE was capable of learning, on naturals images, filters similar to those found in V1 cells of the visual cortex, that are in essence edges detectors. In addition, we were able to verify that the learned representations of SDAE, are very good characteristics to be fed to a linear or gaussian support vector machine (SVM), considerably enhancing its generalization performance. Also, we observed that, alike DBN, and unlike SAE, the SDAE had the potential to be used as a good generative model. As well, we opened the door to novel pre-training strategies and discovered the potential of one of them : the stacking of renoising auto-encoders (SRAE).