Dissertations / Theses on the topic 'Réseaux de neurones profonds parcimonieux'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Réseaux de neurones profonds parcimonieux.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Le, Quoc Tung. "Algorithmic and theoretical aspects of sparse deep neural networks." Electronic Thesis or Diss., Lyon, École normale supérieure, 2023. http://www.theses.fr/2023ENSL0105.
Sparse deep neural networks offer a compelling practical opportunity to reduce the cost of training, inference and storage, which are growing exponentially in the state of the art of deep learning. In this presentation, we will introduce an approach to study sparse deep neural networks through the lens of another related problem: sparse matrix factorization, i.e., the problem of approximating a (dense) matrix by the product of (multiple) sparse factors. In particular, we identify and investigate in detail some theoretical and algorithmic aspects of a variant of sparse matrix factorization named fixed support matrix factorization (FSMF) in which the set of non-zero entries of sparse factors are known. Several fundamental questions of sparse deep neural networks such as the existence of optimal solutions of the training problem or topological properties of its function space can be addressed using the results of (FSMF). In addition, by applying the results of (FSMF), we also study the butterfly parametrization, an approach that consists of replacing (large) weight matrices by the products of extremely sparse and structured ones in sparse deep neural networks
Nono, Wouafo Hugues Gérald. "Architectures matérielles numériques intégrées et réseaux de neurones à codage parcimonieux." Thesis, Lorient, 2016. http://www.theses.fr/2016LORIS394/document.
Nowadays, artificial neural networks are widely used in many applications such as image and signal processing. Recently, a new model of neural network was proposed to design associative memories, the GBNN (Gripon-Berrou Neural Network). This model offers a storage capacity exceeding those of Hopfield networks when the information to be stored has a uniform distribution. Methods improving performance for non-uniform distributions and hardware architectures implementing the GBNN networks were proposed. However, on one hand, these solutions are very expensive in terms of hardware resources and on the other hand, the proposed architectures can only implement fixed size networks and are not scalable. The objectives of this thesis are: (1) to design GBNN inspired models outperforming the state of the art, (2) to propose architectures cheaper than existing solutions and (3) to design a generic architecture implementing the proposed models and able to handle various sizes of networks. The results of these works are exposed in several parts. Initially, the concept of clone based neural networks and its variants are presented. These networks offer better performance than the state of the art for the same memory cost when a non-uniform distribution of the information to be stored is considered. The hardware architecture optimizations are then introduced to significantly reduce the cost in terms of resources. Finally, a generic scalable architecture able to handle various sizes of networks is proposed
Chabot, Florian. "Analyse fine 2D/3D de véhicules par réseaux de neurones profonds." Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC018/document.
In this thesis, we are interested in fine-grained analysis of vehicle from an image. We define fine-grained analysis as the following concepts : vehicle detection in the image, vehicle viewpoint (or orientation) estimation, vehicle visibility characterization, vehicle 3D localization and make and model recognition. The design of reliable solutions for fine-grained analysis of vehicle open the door to multiple applications in particular for intelligent transport systems as well as video surveillance systems. In this work, we propose several contributions allowing to address partially or wholly this issue. Proposed approaches are based on joint deep learning technologies and 3D models. In a first section, we deal with make and model classification keeping in mind the difficulty to create training data. In a second section, we investigate a novel method for both vehicle detection and fine-grained viewpoint estimation based on local apparence features and geometric spatial coherence. It uses models learned only on synthetic data. Finally, in a third section, a complete system for fine-grained analysis is proposed. It is based on the multi-task concept. Throughout this report, we provide quantitative and qualitative results. On several aspects related to vehicle fine-grained analysis, this work allowed to outperform state of the art methods
Simonnet, Edwin. "Réseaux de neurones profonds appliqués à la compréhension de la parole." Thesis, Le Mans, 2019. http://www.theses.fr/2019LEMA1006/document.
This thesis is a part of the emergence of deep learning and focuses on spoken language understanding assimilated to the automatic extraction and representation of the meaning supported by the words in a spoken utterance. We study a semantic concept tagging task used in a spoken dialogue system and evaluated with the French corpus MEDIA. For the past decade, neural models have emerged in many natural language processing tasks through algorithmic advances or powerful computing tools such as graphics processors. Many obstacles make the understanding task complex, such as the difficult interpretation of automatic speech transcriptions, as many errors are introduced by the automatic recognition process upstream of the comprehension module. We present a state of the art describing spoken language understanding and then supervised automatic learning methods to solve it, starting with classical systems and finishing with deep learning techniques. The contributions are then presented along three axes. First, we develop an efficient neural architecture consisting of a bidirectional recurrent network encoder-decoder with attention mechanism. Then we study the management of automatic recognition errors and solutions to limit their impact on our performances. Finally, we envisage a disambiguation of the comprehension task making the systems more efficient
Metz, Clément. "Codages optimisés pour la conception d'accélérateurs matériels de réseaux de neurones profonds." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPAST190.
Neural networks are an important component of machine learning tools because of their wide range of applications (health, energy, defence, finance, autonomous navigation, etc.). The performance of neural networks is greatly influenced by the complexity of their architecture in terms of the number of layers, neurons and connections. But the training and inference of ever-larger networks translates to greater demands on hardware resources and longer computing times. Conversely, their portability is limited on embedded systems with low memory and/or computing capacity.The aim of this thesis is to study and design methods for reducing the hardware footprint of neural networks while preserving their performance as much as possible. We restrict ourselves to convolution networks dedicated to computer vision by studying the possibilities offered by quantization. Quantization aims to reduce the hardware footprint, in terms of memory, bandwidth and computation operators, by reducing the number of bits in the network parameters and activations.The contributions of this thesis consist of a new post-training quantization method based on the exploitation of spatial correlations of network parameters, an approach facilitating the learning of very highly quantized networks, and a method aiming to combine mixed precision quantization and lossless entropy coding.The contents of this thesis are essentially limited to algorithmic aspects, but the research orientations were strongly influenced by the requirement for hardware feasibility of our solutions
Huet, Romain. "Codage neural parcimonieux pour un système de vision." Thesis, Lorient, 2017. http://www.theses.fr/2017LORIS439/document.
The neural networks have gained a renewed interest through the deep learning paradigm. Whilethe so called optimised neural nets, by optimising the parameters necessary for learning, require massive computational resources, we focus here on neural nets designed as addressable content memories, or neural associative memories. The challenge consists in realising operations, traditionally obtained through computation, exclusively with neural memory in order to limit the need in computational resources. In this thesis, we study an associative memory based on cliques, whose sparse neural coding optimises the data diversity encoded in the network. This large diversity allows the clique based network to be more efficient in messages retrieval from its memory than other neural associative memories. The associative memories are known for their incapacity to identify without ambiguities the messages stored in a saturated memory. Indeed, depending of the information present in the network and its encoding, a memory can fail to retrieve a desired result. We are interested in tackle this issue and propose several contributions in order to reduce the ambiguities in the cliques based neural network. Besides, these cliques based nets are unable to retrieve an information within their memories if the message is unknown. We propose a solution to this problem through a new associative memory based on cliques which preserves the initial network's corrective ability while being able to hierarchise the information. The hierarchy relies on a surjective and bidirectional transition to generalise an unknown input with an approximation of learnt information. The associative memories' experimental validation is usually based on low dimension artificial dataset. In the computer vision context, we report here the results obtained with real datasets used in the state-of-the-art, such as MNIST, Yale or CIFAR
Chollet, Paul. "Traitement parcimonieux de signaux biologiques." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2017. http://www.theses.fr/2017IMTA0024/document.
Body area sensor networks gained great focused through the promiseof better quality and cheaper medical care system. They are used todetect anomalies and treat them as soon as they arise. Sensors are under heavy constraints such as reliability, sturdiness, size and power consumption. This thesis analyzes the operations perform by a body area sensor network. The different energy requirements are evaluated in order to choose the focus of the research to improve the battery life of the sensors. A sensor for arrhythmia detection is proposed. It includes some signal processing through a clique-based neural network. The system simulations allow a classification between three types of arrhythmia with 95 % accuracy. The prototype, based on a 65 nm CMOS mixed signal circuit, requires only 1.4 μJ. To further reduce energy consumption, a new sensing method is used. A converter architecture is proposed for heart beat acquisition. Simulations and estimation show a 1.18 nJ energy requirement for parameter acquisition while offering 98 % classification accuracy. This work leads the way to the development of low energy sensor with a lifetime battery life
Ducoffe, Mélanie. "Active learning et visualisation des données d'apprentissage pour les réseaux de neurones profonds." Thesis, Université Côte d'Azur (ComUE), 2018. http://www.theses.fr/2018AZUR4115/document.
Our work is presented in three separate parts which can be read independently. Firstly we propose three active learning heuristics that scale to deep neural networks: We scale query by committee, an ensemble active learning methods. We speed up the computation time by sampling a committee of deep networks by applying dropout on the trained model. Another direction was margin-based active learning. We propose to use an adversarial perturbation to measure the distance to the margin. We also establish theoretical bounds on the convergence of our Adversarial Active Learning strategy for linear classifiers. Some inherent properties of adversarial examples opens up promising opportunity to transfer active learning data from one network to another. We also derive an active learning heuristic that scales to both CNN and RNN by selecting the unlabeled data that minimize the variational free energy. Secondly, we focus our work on how to fasten the computation of Wasserstein distances. We propose to approximate Wasserstein distances using a Siamese architecture. From another point of view, we demonstrate the submodular properties of Wasserstein medoids and how to apply it in active learning. Eventually, we provide new visualization tools for explaining the predictions of CNN on a text. First, we hijack an active learning strategy to confront the relevance of the sentences selected with active learning to state-of-the-art phraseology techniques. These works help to understand the hierarchy of the linguistic knowledge acquired during the training of CNNs on NLP tasks. Secondly, we take advantage of deconvolution networks for image analysis to present a new perspective on text analysis to the linguistic community that we call Text Deconvolution Saliency
Mathieu, Félix. "Traitement de la phase des signaux audio dans les réseaux de neurones profonds." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAT046.
The task of separating sound sources in an audio recording requires particular attention. The advent of deep neural networks has improved this task at the expense of increased computational complexity and algorithmic opacity. Interferences induced by these algorithms, whether parasitic or structured, can disrupt the understanding of the signal, especially in the context of voice reproduction. These issues become particularly pronounced during real-time discussions, necessitating performance metrics to evaluate source separation models. Criteria include the quality of reconstructing individual tracks, intelligibility of vocal signals, resilience to interferences, and other aspects such as reducing computational costs and improving interpretability of treatments. This thesis aims to enhance the interpretability of these models while mitigating their computational costs, with a specific focus on modeling the phase of signals. The current challenge lies in finding an appropriate model for this crucial component, essential for understanding audio signals. We will explore strategies such as using complex-valued models, phase-invariant representations, and models allowing abstraction from the phase component. The ultimate goal is to achieve significant advancements in modeling signal phase within deep neural networks, while preserving or reducing computational costs and enhancing interpretability of existing algorithmic decisions
Sarr, Jean Michel Amath. "Étude de l’augmentation de données pour la robustesse des réseaux de neurones profonds." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS072.
In this thesis, we considered the problem of the robustness of neural networks. That is, we have considered the case where the learning set and the deployment set are not independently and identically distributed from the same source. This hypothesis is called : the i.i.d hypothesis. Our main research axis has been data augmentation. Indeed, an extensive literature review and preliminary experiments showed us the regularization potential of data augmentation. Thus, as a first step, we sought to use data augmentation to make neural networks more robust to various synthetic and natural dataset shifts. A dataset shift being simply a violation of the i.i.d assumption. However, the results of this approach have been mixed. Indeed, we observed that in some cases the augmented data could lead to performance jumps on the deployment set. But this phenomenon did not occur every time. In some cases, the augmented data could even reduce performance on the deployment set. In our conclusion, we offer a granular explanation for this phenomenon. Better use of data augmentation toward neural network robustness is to generate stress tests to observe a model behavior when various shift occurs. Then, to use that information to estimate the error on the deployment set of interest even without labels, we call this deployment error estimation. Furthermore, we show that the use of independent data augmentation can improve deployment error estimation. We believe that this use of data augmentation will allow us to better quantify the reliability of neural networks when deployed on new unknown datasets
Mohamed, Moussa Elmokhtar. "Conversion d’écriture hors-ligne en écriture en-ligne et réseaux de neurones profonds." Electronic Thesis or Diss., Nantes Université, 2024. http://www.theses.fr/2024NANU4001.
This thesis focuses on the conversion of static images of offline handwriting into temporal signals of online handwriting. Our goal is to extend neural networks beyond the scale of images of isolated letters and as well to generalize to other complex types of content. The thesis explores two distinct neural network-based approaches, the first approach is a fully convolutional multitask UNet-based network, inspired by the method of [ZYT18]. This approach demonstrated good results for skeletonization but suboptimal stroke extrac- tion. Partly due to the inherent temporal mod- eling limitations of CNN architecture. The second approach builds on the pre- vious skeletonization model to extract sub- strokes and proposes a sub-stroke level modeling with Transformers, consisting of a sub- stroke embedding transformer (SET) and a sub-stroke ordering transformer (SORT) to or- der the different sub-strokes as well as pen up predictions. This approach outperformed the state of the art on text lines and mathematical equations databases and addressed several limitations identified in the literature. These advancements have expanded the scope of offline-to-online conversion to include entire text lines and generalize to bidimensional content, such as mathematical equations
Langlois, Julien. "Vision industrielle et réseaux de neurones profonds : application au dévracage de pièces plastiques industrielles." Thesis, Nantes, 2019. http://www.theses.fr/2019NANT4010/document.
This work presents a pose estimation method from a RGB image of industrial parts placed in a bin. In a first time, neural networks are used to segment a certain number of parts in the scene. After applying an object mask to the original image, a second network is inferring the local depth of the part. Both the local pixel coordinates of the part and the local depth are used in two networks estimating the orientation of the object as a quaternion and its translation on the Z axis. Finally, a registration module working on the back-projected local depth and the 3D model of the part is refining the pose inferred from the previous networks. To deal with the lack of annotated real images in an industrial context, an data generation process is proposed. By using various light parameters, the dataset versatility allows to anticipate multiple challenging exploitation scenarios within an industrial environment
Ogier, du Terrail Jean. "Réseaux de neurones convolutionnels profonds pour la détection de petits véhicules en imagerie aérienne." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC276/document.
The following manuscript is an attempt to tackle the problem of small vehicles detection in vertical aerial imagery through the use of deep learning algorithms. The specificities of the matter allows the use of innovative techniques leveraging the invariance and self similarities of automobiles/planes vehicles seen from the sky.We will start by a thorough study of single shot detectors. Building on that we will examine the effect of adding multiple stages to the detection decision process. Finally we will try to come to grips with the domain adaptation problem in detection through the generation of better looking synthetic data and its use in the training process of these detectors
Mercadier, Yves. "Classification automatique de textes par réseaux de neurones profonds : application au domaine de la santé." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS068.
This Ph.D focuses on the analysis of textual data in the health domain and in particular on the supervised multi-class classification of data from biomedical literature and social media.One of the major difficulties when exploring such data by supervised learning methods is to have a sufficient number of data sets for models training. Indeed, it is generally necessary to label manually the data before performing the learning step. The large size of the data sets makes this labellisation task very expensive, which should be reduced with semi-automatic systems.In this context, active learning, in which the Oracle intervenes to choose the best examples to label, is promising. The intuition is as follows: by choosing the smartly the examples and not randomly, the models should improve with less effort for the oracle and therefore at lower cost (i.e. with less annotated examples). In this PhD, we will evaluate different active learning approaches combined with recent deep learning models.In addition, when small annotated data set is available, one possibility of improvement is to artificially increase the data quantity during the training phase, by automatically creating new data from existing data. More precisely, we inject knowledge by taking into account the invariant properties of the data with respect to certain transformations. The augmented data can thus cover an unexplored input space, avoid overfitting and improve the generalization of the model. In this Ph.D, we will propose and evaluate a new approach for textual data augmentation.These two contributions will be evaluated on different textual datasets in the medical domain
Caubriere, Antoine. "Du signal au concept : réseaux de neurones profonds appliqués à la compréhension de la parole." Thesis, Le Mans, 2021. https://tel.archives-ouvertes.fr/tel-03177996.
This thesis is part of the deep learning applied to spoken language understanding. Until now, this task was performed through a pipeline of components implementing, for example, a speech recognition system, then different natural language processing, before involving a language understanding system on enriched automatic transcriptions. Recently, work in the field of speech recognition has shown that it is possible to produce a sequence of words directly from the acoustic signal. Within the framework of this thesis, the aim is to exploit these advances and extend them to design a system composed of a single neural model fully optimized for the spoken language understanding task, from signal to concept. First, we present a state of the art describing the principles of deep learning, speech recognition, and speech understanding. Then, we describe the contributions made along three main axes. We propose a first system answering the problematic posed and apply it to a task of named entities recognition. Then, we propose a transfer learning strategy guided by a curriculum learning approach. This strategy is based on the generic knowledge learned to improve the performance of a neural system on a semantic concept extraction task. Then, we perform an analysis of the errors produced by our approach, while studying the functioning of the proposed neural architecture. Finally, we set up a confidence measure to evaluate the reliability of a hypothesis produced by our system
Nugraha, Aditya Arie. "Réseaux de neurones profonds pour la séparation des sources et la reconnaissance robuste de la parole." Electronic Thesis or Diss., Université de Lorraine, 2017. http://www.theses.fr/2017LORR0212.
This thesis addresses the problem of multichannel audio source separation by exploiting deep neural networks (DNNs). We build upon the classical expectation-maximization (EM) based source separation framework employing a multichannel Gaussian model, in which the sources are characterized by their power spectral densities and their source spatial covariance matrices. We explore and optimize the use of DNNs for estimating these spectral and spatial parameters. Employing the estimated source parameters, we then derive a time-varying multichannel Wiener filter for the separation of each source. We extensively study the impact of various design choices for the spectral and spatial DNNs. We consider different cost functions, time-frequency representations, architectures, and training data sizes. Those cost functions notably include a newly proposed task-oriented signal-to-distortion ratio cost function for spectral DNNs. Furthermore, we present a weighted spatial parameter estimation formula, which generalizes the corresponding exact EM formulation. On a singing-voice separation task, our systems perform remarkably close to the current state-of-the-art method and provide up to 2 dB improvement of the source-to-interference ratio. On a speech enhancement task, our systems outperforms the state-of-the-art GEV-BAN beamformer by 14%, 7%, and 1% relative word error rate improvement on 6-channel, 4-channel, and 2-channel data, respectively
Moysset, Bastien. "Détection, localisation et typage de texte dans des images de documents hétérogènes par Réseaux de Neurones Profonds." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEI044/document.
Being able to automatically read the texts written in documents, both printed and handwritten, makes it possible to access the information they convey. In order to realize full page text transcription, the detection and localization of the text lines is a crucial step. Traditional methods tend to use image processing based approaches, but they hardly generalize to very heterogeneous datasets. In this thesis, we propose to use a deep neural network based approach. We first propose a mono-dimensional segmentation of text paragraphs into lines that uses a technique inspired by the text recognition models. The connexionist temporal classification (CTC) method is used to implicitly align the sequences. Then, we propose a neural network that directly predicts the coordinates of the boxes bounding the text lines. Adding a confidence prediction to these hypothesis boxes enables to locate a varying number of objects. We propose to predict the objects locally in order to share the network parameters between the locations and to increase the number of different objects that each single box predictor sees during training. This compensates the rather small size of the available datasets. In order to recover the contextual information that carries knowledge on the document layout, we add multi-dimensional LSTM recurrent layers between the convolutional layers of our networks. We propose three full page text recognition strategies that tackle the need of high preciseness of the text line position predictions. We show on the heterogeneous Maurdor dataset how our methods perform on documents that can be printed or handwritten, in French, English or Arabic and we favourably compare to other state of the art methods. Visualizing the concepts learned by our neurons enables to underline the ability of the recurrent layers to convey the contextual information
Cîrstea, Bogdan-Ionut. "Contribution à la reconnaissance de l'écriture manuscrite en utilisant des réseaux de neurones profonds et le calcul quantique." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0059.
In this thesis, we provide several contributions from the fields of deep learning and quantum computation to handwriting recognition. We begin by integrating some of the more recent deep learning techniques (such as dropout, batch normalization and different activation functions) into convolutional neural networks and show improved performance on the well-known MNIST dataset. We then propose Tied Spatial Transformer Networks (TSTNs), a variant of Spatial Transformer Networks (STNs) with shared weights, as well as different training variants of the TSTN. We show improved performance on a distorted variant of the MNIST dataset. In another work, we compare the performance of Associative Long Short-Term Memory (ALSTM), a recently introduced recurrent neural network (RNN) architecture, against Long Short-Term Memory (LSTM), on the Arabic handwriting recognition IFN-ENIT dataset. Finally, we propose a neural network architecture, which we name a hybrid classical-quantum neural network, which can integrate and take advantage of quantum computing. While our simulations are performed using classical computation (on a GPU), our results on the Fashion-MNIST dataset suggest that exponential improvements in computational requirements might be achievable, especially for recurrent neural networks trained for sequence classification
Muliukov, Artem. "Étude croisée des cartes auto-organisatrices et des réseaux de neurones profonds pour l'apprentissage multimodal inspiré du cerveau." Electronic Thesis or Diss., Université Côte d'Azur, 2024. https://intranet-theses.unice.fr/2024COAZ4008.
Cortical plasticity is one of the main features that enable our capability to learn and adapt in our environment. Indeed, the cerebral cortex has the ability to self-organize itself through two distinct forms of plasticity: the structural plasticity and the synaptic plasticity. These mechanisms are very likely at the basis of an extremely interesting characteristic of the human brain development: the multimodal association. The brain uses spatio-temporal correlations between several modalities to structure the data and create sense from observations. Moreover, biological observations show that one modality can activate the internal representation of another modality when both are correlated. To model such a behavior, Edelman and Damasio proposed respectively the Reentry and the Convergence Divergence Zone frameworks where bi-directional neural communications can lead to both multimodal fusion (convergence) and inter-modal activation (divergence). Nevertheless, these frameworks do not provide a computational model at the neuron level, and only few works tackle this issue of bio-inspired multimodal association which is yet necessary for a complete representation of the environment especially when targeting autonomous and embedded intelligent systems. In this doctoral project, we propose to pursue the exploration of brain-inspired computational models of self-organization for multimodal unsupervised learning in neuromorphic systems. These neuromorphic architectures get their energy-efficient from the bio-inspired models they support, and for that reason we only consider in our work learning rules based on local and distributed processing
Halnaut, Adrien. "Méthodes et outils d’analyse visuelle pour la compréhension, l’optimisation et l’élaboration de modèles de réseaux de neurones profonds." Electronic Thesis or Diss., Bordeaux, 2024. http://www.theses.fr/2024BORD0042.
Deep‐learning methods are widely used in a variety of research and industrial domains, especially in the data classification task. However, this technology is often notoriously compared to a “black box”. The user can understand input and output data of the network, but has little to no knowledge about its internal processing. This aspect of neural networks makes difficult to justify their predictions. Explainability and Interpretability of deep neural networks is a research domain merging with a variety of scientific communities. Its goal is to make easier the understanding of neural networks for both users and experts. Information visualization is one of the techniques used to answer this need. It consists in building tools which make easier the understanding and the analysis of usually high dimensional datasets, using visual abstractions and interactions. In this thesis, we make use of data extracted from the output of each layer of the neural network to interpret the model decisions using visualization methods. First, we show it is possible to visualize groups of samples processed similarly by the network using a Sankey diagram. This method asks for large data processing, which we enable by using machine clusters infrastructures used in BigData operations. In order to study more complex scenarios, involving larger datasets and heavier network architectures, we develop compact data visualization methods. We propose two approaches: the first one implies representation of data proximities using data reduction to the ℕ space, the other one implies post‐processing to ℝ�� → ℝ2 data projections to build compact grids of data. In order to evaluate the performances of these projection methods, we propose a user study protocol. Its goal is to measure the suitability of visualization methods in tasks related to the understanding of high‐dimensional data. Finally, we carry out an evaluation following this protocol to compare the efficiency between compact data visualization and scatter plot visualization. This evaluation is conducted using state of the art methods t‐SNE and Self‐Sorting Maps
Lauly, Stanislas. "Exploration des réseaux de neurones à base d'autoencodeur dans le cadre de la modélisation des données textuelles." Thèse, Université de Sherbrooke, 2016. http://hdl.handle.net/11143/9461.
Dupont, Robin. "Deep Neural Network Compression for Visual Recognition." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS565.
Thanks to the miniaturisation of electronics, embedded devices have become ubiquitous since the 2010s, performing various tasks around us. As their usage expands, there's an increasing demand for efficient data processing and decision-making. Deep neural networks are apt tools for this, but they are often too large and intricate for embedded systems. Therefore, methods to compress these networks without affecting their performance are crucial. This PhD thesis introduces two methods focused on pruning to compress networks, maintaining accuracy. The thesis first details a budget-aware method for compressing large neural networks using weight reparametrisation and a budget loss, eliminating the need for fine-tuning. Traditional pruning methods often use post-training indicators to cut weights, ignoring desired pruning rates. Our method incorporates a budget loss, directing pruning during training, enabling simultaneous topology and weight optimisation. By soft-pruning smaller weights via reparametrisation, we reduce accuracy loss compared to standard pruning. We validate our method on several datasets and architectures. Later, the thesis examines extracting efficient subnetworks without weight training. We aim to discern the optimal subnetwork topology within a large network, bypassing weight optimisation yet ensuring strong performance. This is realized with our Arbitrarily Shifted Log Parametrisation, a differentiable method for discrete topology sampling, facilitating masks' training to denote weight selection probability. Additionally, a weight recalibration technique, Smart Rescale, is presented. It boosts extracted subnetworks' performance and hastens their training. Our method identifies the best pruning rate in a single training cycle, averting exhaustive hyperparameter searches and various rate training. Through extensive tests, our technique consistently surpasses similar state-of-the-art methods, creating streamlined networks that achieve high sparsity without notable accuracy drops
Franceschi, Jean-Yves. "Apprentissage de représentations et modèles génératifs profonds dans les systèmes dynamiques." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS014.
The recent rise of deep learning has been motivated by numerous scientific breakthroughs, particularly regarding representation learning and generative modeling. However, most of these achievements have been obtained on image or text data, whose evolution through time remains challenging for existing methods. Given their importance for autonomous systems to adapt in a constantly evolving environment, these challenges have been actively investigated in a growing body of work. In this thesis, we follow this line of work and study several aspects of temporality and dynamical systems in deep unsupervised representation learning and generative modeling. Firstly, we present a general-purpose deep unsupervised representation learning method for time series tackling scalability and adaptivity issues arising in practical applications. We then further study in a second part representation learning for sequences by focusing on structured and stochastic spatiotemporal data: videos and physical phenomena. We show in this context that performant temporal generative prediction models help to uncover meaningful and disentangled representations, and conversely. We highlight to this end the crucial role of differential equations in the modeling and embedding of these natural sequences within sequential generative models. Finally, we more broadly analyze in a third part a popular class of generative models, generative adversarial networks, under the scope of dynamical systems. We study the evolution of the involved neural networks with respect to their training time by describing it with a differential equation, allowing us to gain a novel understanding of this generative model
Hardy, Corentin. "Contribution au développement de l’apprentissage profond dans les systèmes distribués." Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S020/document.
Deep learning enables the development of a growing number of services. However, it requires large training databases and a lot of computing power. In order to reduce the costs of this deep learning, we propose a distributed computing setup to enable collaborative learning. Future users can participate with their devices and their data without moving private data in datacenters. We propose methods to train deep neural network in this distibuted system context
Tafforeau, Jérémie. "Modèle joint pour le traitement automatique de la langue : perspectives au travers des réseaux de neurones." Thesis, Aix-Marseille, 2017. http://www.theses.fr/2017AIXM0430/document.
NLP researchers has identified different levels of linguistic analysis. This lead to a hierarchical division of the various tasks performed in order to analyze a text statement. The traditional approach considers task-specific models which are subsequently arranged in cascade within processing chains (pipelines). This approach has a number of limitations: the empirical selection of models features, the errors accumulation in the pipeline and the lack of robusteness to domain changes. These limitations lead to particularly high performance losses in the case of non-canonical language with limited data available such as transcriptions of conversations over phone. Disfluencies and speech-specific syntactic schemes, as well as transcription errors in automatic speech recognition systems, lead to a significant drop of performances. It is therefore necessary to develop robust and flexible systems. We intend to perform a syntactic and semantic analysis using a deep neural network multitask model while taking into account the variations of domain and/or language registers within the data
Dvornik, Mikita. "Learning with Limited Annotated Data for Visual Understanding." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM050.
The ability of deep-learning methods to excel in computer vision highly depends on the amount of annotated data available for training. For some tasks, annotation may be too costly and labor intensive, thus becoming the main obstacle to better accuracy. Algorithms that learn from data automatically, without human supervision, perform substantially worse than their fully-supervised counterparts. Thus, there is a strong motivation to work on effective methods for learning with limited annotations. This thesis proposes to exploit prior knowledge about the task and develops more effective solutions for scene understanding and few-shot image classification.Main challenges of scene understanding include object detection, semantic and instance segmentation. Similarly, all these tasks aim at recognizing and localizing objects, at region- or more precise pixel-level, which makes the annotation process difficult. The first contribution of this manuscript is a Convolutional Neural Network (CNN) that performs both object detection and semantic segmentation. We design a specialized network architecture, that is trained to solve both problems in one forward pass, and operates in real-time. Thanks to the multi-task training procedure, both tasks benefit from each other in terms of accuracy, with no extra labeled data.The second contribution introduces a new technique for data augmentation, i.e., artificially increasing the amount of training data. It aims at creating new scenes by copy-pasting objects from one image to another, within a given dataset. Placing an object in a right context was found to be crucial in order to improve scene understanding performance. We propose to model visual context explicitly using a CNN that discovers correlations between object categories and their typical neighborhood, and then proposes realistic locations for augmentation. Overall, pasting objects in ``right'' locations allows to improve object detection and segmentation performance, with higher gains in limited annotation scenarios.For some problems, the data is extremely scarce, and an algorithm has to learn new concepts from a handful of examples. Few-shot classification consists of learning a predictive model that is able to effectively adapt to a new class, given only a few annotated samples. While most current methods concentrate on the adaptation mechanism, few works have tackled the problem of scarce training data explicitly. In our third contribution, we show that by addressing the fundamental high-variance issue of few-shot learning classifiers, it is possible to significantly outperform more sophisticated existing techniques. Our approach consists of designing an ensemble of deep networks to leverage the variance of the classifiers, and introducing new strategies to encourage the networks to cooperate, while encouraging prediction diversity. By matching different networks outputs on similar input images, we improve model accuracy and robustness, comparing to classical ensemble training. Moreover, a single network obtained by distillation shows similar to the full ensemble performance and yields state-of-the-art results with no computational overhead at test time
Andreux, Mathieu. "Foveal autoregressive neural time-series modeling." Electronic Thesis or Diss., Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE073.
This dissertation studies unsupervised time-series modelling. We first focus on the problem of linearly predicting future values of a time-series under the assumption of long-range dependencies, which requires to take into account a large past. We introduce a family of causal and foveal wavelets which project past values on a subspace which is adapted to the problem, thereby reducing the variance of the associated estimators. We then investigate under which conditions non-linear predictors exhibit better performances than linear ones. Time-series which admit a sparse time-frequency representation, such as audio ones, satisfy those requirements, and we propose a prediction algorithm using such a representation. The last problem we tackle is audio time-series synthesis. We propose a new generation method relying on a deep convolutional neural network, with an encoder-decoder architecture, which allows to synthesize new realistic signals. Contrary to state-of-the-art methods, we explicitly use time-frequency properties of sounds to define an encoder with the scattering transform, while the decoder is trained to solve an inverse problem in an adapted metric
Oyallon, Edouard. "Analyzing and introducing structures in deep convolutional neural networks." Thesis, Paris Sciences et Lettres (ComUE), 2017. http://www.theses.fr/2017PSLEE060.
This thesis studies empirical properties of deep convolutional neural networks, and in particular the Scattering Transform. Indeed, the theoretical analysis of the latter is hard and until now remains a challenge: successive layers of neurons have the ability to produce complex computations, whose nature is still unknown, thanks to learning algorithms whose convergence guarantees are not well understood. However, those neural networks are outstanding tools to tackle a wide variety of difficult tasks, like image classification or more formally statistical prediction. The Scattering Transform is a non-linear mathematical operator whose properties are inspired by convolutional networks. In this work, we apply it to natural images, and obtain competitive accuracies with unsupervised architectures. Cascading a supervised neural networks after the Scattering permits to compete on ImageNet2012, which is the largest dataset of labeled images available. An efficient GPU implementation is provided. Then, this thesis focuses on the properties of layers of neurons at various depths. We show that a progressive dimensionality reduction occurs and we study the numerical properties of the supervised classification when we vary the hyper parameters of the network. Finally, we introduce a new class of convolutional networks, whose linear operators are structured by the symmetry groups of the classification task
Chevalier, Marion. "Résolution variable et information privilégiée pour la reconnaissance d'images." Electronic Thesis or Diss., Paris 6, 2016. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2016PA066726.pdf.
Image classification has a prominent interest in numerous visual recognition tasks, particularly for vehicle recognition in airborne systems, where the images have a low resolution because of the large distance between the system and the observed scene. During the training phase, complementary data such as knowledge on the position of the system or high-resolution images may be available. In our work, we focus on the task of low-resolution image classification while taking into account supplementary information during the training phase. We first show the interest of deep convolutional networks for the low-resolution image recognition, especially by proposing an architecture learned on the targeted data. On the other hand, we rely on the framework of learning using privileged information to benefit from the complementary training data, here the high-resolution versions of the images. We propose two novel methods for integrating privileged information in the learning phase of neural networks. Our first model relies on these complementary data to compute an absolute difficulty level, assigning a large weight to the most easily recognized images. Our second model introduces a similarity constraint between the networks learned on each type of data. We experimentally validate our models on several application cases, especially in a fine-grained oriented context and on a dataset containing annotation noise
Letard, Mathilde. "Environnemental knowledge extraction from topo-bathymetric lidar : machine learning and deep neural networds for point clouds and waveforms." Electronic Thesis or Diss., Université de Rennes (2023-....), 2023. http://www.theses.fr/2023URENB072.
Land-water interfaces face escalating threats from climate change and human activities, necessitating systematic observation to comprehend and effectively address these challenges. Nevertheless, constraints associated with the presence of water hinder the uninterrupted observation of submerged and emerged areas. Topo-bathymetric lidar remote sensing emerges as a suitable solution, ensuring a continuous representation of landwater zones through 3D point clouds and 1D waveforms. However, fully harnessing the potential of this data requires tools specifically crafted to address its unique characteristics. This thesis introduces methodologies for extracting environmental knowledge from topobathymetric lidar surveys. Initially, we introduce methods for classifying land and seabed covers using bi-spectral point clouds or waveform features. Subsequently, we employ deep neural networks for semantic segmentation, component detection and classification, and the estimation of water physical parameters based on bathymetric waveforms. Leveraging radiative transfer models, these approaches alleviate the need for manual waveform labeling, thereby enhancing waveform processing in challenging settings like extremely shallow or turbid waters
Chevalier, Marion. "Résolution variable et information privilégiée pour la reconnaissance d'images." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066726/document.
Image classification has a prominent interest in numerous visual recognition tasks, particularly for vehicle recognition in airborne systems, where the images have a low resolution because of the large distance between the system and the observed scene. During the training phase, complementary data such as knowledge on the position of the system or high-resolution images may be available. In our work, we focus on the task of low-resolution image classification while taking into account supplementary information during the training phase. We first show the interest of deep convolutional networks for the low-resolution image recognition, especially by proposing an architecture learned on the targeted data. On the other hand, we rely on the framework of learning using privileged information to benefit from the complementary training data, here the high-resolution versions of the images. We propose two novel methods for integrating privileged information in the learning phase of neural networks. Our first model relies on these complementary data to compute an absolute difficulty level, assigning a large weight to the most easily recognized images. Our second model introduces a similarity constraint between the networks learned on each type of data. We experimentally validate our models on several application cases, especially in a fine-grained oriented context and on a dataset containing annotation noise
Caglayan, Ozan. "Multimodal Machine Translation." Thesis, Le Mans, 2019. http://www.theses.fr/2019LEMA1016/document.
Machine translation aims at automatically translating documents from one language to another without human intervention. With the advent of deep neural networks (DNN), neural approaches to machine translation started to dominate the field, reaching state-ofthe-art performance in many languages. Neural machine translation (NMT) also revived the interest in interlingual machine translation due to how it naturally fits the task into an encoder-decoder framework which produces a translation by decoding a latent source representation. Combined with the architectural flexibility of DNNs, this framework paved the way for further research in multimodality with the objective of augmenting the latent representations with other modalities such as vision or speech, for example. This thesis focuses on a multimodal machine translation (MMT) framework that integrates a secondary visual modality to achieve better and visually grounded language understanding. I specifically worked with a dataset containing images and their translated descriptions, where visual context can be useful forword sense disambiguation, missing word imputation, or gender marking when translating from a language with gender-neutral nouns to one with grammatical gender system as is the case with English to French. I propose two main approaches to integrate the visual modality: (i) a multimodal attention mechanism that learns to take into account both sentence and convolutional visual representations, (ii) a method that uses global visual feature vectors to prime the sentence encoders and the decoders. Through automatic and human evaluation conducted on multiple language pairs, the proposed approaches were demonstrated to be beneficial. Finally, I further show that by systematically removing certain linguistic information from the input sentences, the true strength of both methods emerges as they successfully impute missing nouns, colors and can even translate when parts of the source sentences are completely removed
Rosar, Kós Lassance Carlos Eduardo. "Graphs for deep learning representations." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2020. http://www.theses.fr/2020IMTA0204.
In recent years, Deep Learning methods have achieved state of the art performance in a vast range of machine learning tasks, including image classification and multilingual automatic text translation. These architectures are trained to solve machine learning tasks in an end-to-end fashion. In order to reach top-tier performance, these architectures often require a very large number of trainable parameters. There are multiple undesirable consequences, and in order to tackle these issues, it is desired to be able to open the black boxes of deep learning architectures. Problematically, doing so is difficult due to the high dimensionality of representations and the stochasticity of the training process. In this thesis, we investigate these architectures by introducing a graph formalism based on the recent advances in Graph Signal Processing (GSP). Namely, we use graphs to represent the latent spaces of deep neural networks. We showcase that this graph formalism allows us to answer various questions including: ensuring generalization abilities, reducing the amount of arbitrary choices in the design of the learning process, improving robustness to small perturbations added to the inputs, and reducing computational complexity
Blier, Léonard. "Some Principled Methods for Deep Reinforcement Learning." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG040.
This thesis develops and studies some principled methods for Deep Learning (DL) and deep Reinforcement Learning (RL).In Part II, we study the efficiency of DL models from the context of the Minimum Description Length principle, which formalize Occam's razor, and holds that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. Surprisingly, we demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding, hence showing that DL approaches are well principled from this information theory viewpoint.In Part III, we tackle two limitations of standard approaches in DL and RL, and develop principled methods, improving robustness empirically.The first one concerns optimisation of deep learning models with SGD, and the cost of finding the optimal learning rate, which prevents using a new method out of the box without hyperparameter tuning. When design a principled optimisation method for DL, 'All Learning Rates At Once' : each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. Perhaps surprisingly, Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems.The second one tackles near continuous-time RL environments (such as robotics, control environment, …) : we show that time discretization (number of action per second) in as a critical factor, and that empirically, Q-learning-based approaches collapse with small time steps. Formally, we prove that Q-learning does not exist in continuous time. We detail a principled way to build an off-policy RL algorithm that yields similar performances over a wide range of time discretizations, and confirm this robustness empirically.The main part of this thesis, (Part IV), studies the Successor States Operator in RL, and how it can improve sample efficiency of policy evaluation. In an environment with a very sparse reward, learning the value function is a hard problem. At the beginning of training, no learning will occur until a reward is observed. This highlight the fact that not all the observed information is used. Leveraging this information might lead to better sample efficiency. The Successor State Operator is an object that expresses the value functions of all possible reward functions for a given, fixed policy. Learning the successor state operator can be done without reward signals, and can extract information from every observed transition, illustrating an unsupervised reinforcement learning approach.We offer a formal treatment of these objects in both finite and continuous spaces with function approximators. We present several learning algorithms and associated results. Similarly to the value function, the successor states operator satisfies a Bellman equation. Additionally, it also satisfies two other fixed point equations: a backward Bellman equation and a Bellman-Newton equation, expressing path compositionality in the Markov process. These new relation allow us to generalize from observed trajectories in several ways, potentially leading to more sample efficiency. Every of these equations lead to corresponding algorithms for any function approximators such as neural networks.Finally, (Part V) the study of the successor states operator and its algorithms allow us to derive unbiased methods in the setting of multi-goal RL, dealing with the issue of extremely sparse rewards. We additionally show that the popular Hindsight Experience Replay algorithm, known to be biased, is actually unbiased in the large class of deterministic environments
Hocquet, Guillaume. "Class Incremental Continual Learning in Deep Neural Networks." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPAST070.
We are interested in the problem of continual learning of artificial neural networks in the case where the data are available for only one class at a time. To address the problem of catastrophic forgetting that restrain the learning performances in these conditions, we propose an approach based on the representation of the data of a class by a normal distribution. The transformations associated with these representations are performed using invertible neural networks, which can be trained with the data of a single class. Each class is assigned a network that will model its features. In this setting, predicting the class of a sample corresponds to identifying the network that best fit the sample. The advantage of such an approach is that once a network is trained, it is no longer necessary to update it later, as each network is independent of the others. It is this particularly advantageous property that sets our method apart from previous work in this area. We support our demonstration with experiments performed on various datasets and show that our approach performs favorably compared to the state of the art. Subsequently, we propose to optimize our approach by reducing its impact on memory by factoring the network parameters. It is then possible to significantly reduce the storage cost of these networks with a limited performance loss. Finally, we also study strategies to produce efficient feature extractor models for continual learning and we show their relevance compared to the networks traditionally used for continual learning
Maczyta, Léo. "Dynamic visual saliency in image sequences." Thesis, Rennes 1, 2020. http://www.theses.fr/2020REN1S046.
Our thesis research is concerned with the estimation of motion saliency in image sequences. First, we have defined an original method to detect frames in which a salient motion is present. For this, we propose a framework relying on a deep neural network, and on the compensation of the dominant camera motion. Second, we have designed a method for estimating motion saliency maps. This method requires no learning. The motion saliency cue is obtained by an optical flow inpainting step, followed by a comparison with the initial flow. Third, we consider the problem of trajectory saliency estimation to handle progressive saliency over time. We have built a weakly supervised framework based on a recurrent auto-encoder that represents trajectories with latent codes. Performance of the three methods was experimentally assessed on real video datasets
Hu, Xu. "Towards efficient learning of graphical models and neural networks with variational techniques." Thesis, Paris Est, 2019. http://www.theses.fr/2019PESC1037.
In this thesis, I will mainly focus on variational inference and probabilistic models. In particular, I will cover several projects I have been working on during my PhD about improving the efficiency of AI/ML systems with variational techniques. The thesis consists of two parts. In the first part, the computational efficiency of probabilistic graphical models is studied. In the second part, several problems of learning deep neural networks are investigated, which are related to either energy efficiency or sample efficiency
Manenti, Céline. "Découverte d'unités linguistiques à l'aide de méthodes d'apprentissage non supervisé." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30074.
The discovery of elementary linguistic units (phonemes, words) only from sound recordings is an unresolved problem that arouses a strong interest from the community of automatic speech processing, as evidenced by the many recent contributions of the state of the art. During this thesis, we focused on using neural networks to answer the problem. We approached the problem using neural networks in a supervised, poorly supervised and multilingual manner. We have developed automatic phoneme segmentation and phonetic classification tools based on convolutional neural networks. The automatic segmentation tool obtained 79% F-measure on the BUCKEYE conversational speech corpus. This result is similar to a human annotator according to the inter-annotator agreement provided by the creators of the corpus. In addition, it does not need a lot of data (about ten minutes per speaker and 5 different speakers) to be effective. In addition, it is portable to other languages (especially for poorly endowed languages such as xitsonga). The phonetic classification system makes it possible to set the various parameters and hyperparameters that are useful for an unsupervised scenario. In the unsupervised context, the neural networks (Auto-Encoders) allowed us to generate new parametric representations, concentrating the information of the input frame and its neighboring frames. We studied their utility for audio compression from the raw signal, for which they were effective (low RMS, even at 99% compression). We also carried out an innovative pre-study on a different use of neural networks, to generate vectors of parameters not from the outputs of the layers but from the values of the weights of the layers. These parameters are designed to mimic Linear Predictive Coefficients (LPC). In the context of the unsupervised discovery of phoneme-like units (called pseudo-phones in this memory) and the generation of new phonetically discriminative parametric representations, we have coupled a neural network with a clustering tool (k-means ). The iterative alternation of these two tools allowed the generation of phonetically discriminating parameters for the same speaker: low rates of intra-speaker ABx error of 7.3% for English, 8.5% for French and 8 , 4% for Mandarin were obtained. These results allow an absolute gain of about 4% compared to the baseline (conventional parameters MFCC) and are close to the best current approaches (1% more than the winner of the Zero Resource Speech Challenge 2017). The inter-speaker results vary between 12% and 15% depending on the language, compared to 21% to 25% for MFCCs
Delecraz, Sébastien. "Approches jointes texte/image pour la compréhension multimodale de documents." Thesis, Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634/document.
The human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Benamar, Alexandra. "Évaluation et adaptation de plongements lexicaux au domaine à travers l'exploitation de connaissances syntaxiques et sémantiques." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG035.
Word embeddings have established themselves as the most popular representation in NLP. To achieve good performance, they require training on large data sets mainly from the general domain and are frequently finetuned for specialty data. However, finetuning is a resource-intensive practice and its effectiveness is controversial.In this thesis, we evaluate the use of word embedding models on specialty corpora and show that proximity between the vocabularies of the training and application data plays a major role in the representation of out-of-vocabulary terms. We observe that this is mainly due to the initial tokenization of words and propose a measure to compute the impact of the tokenization of words on their representation. To solve this problem, we propose two methods for injecting linguistic knowledge into representations generated by Transformers: one at the data level and the other at the model level. Our research demonstrates that adding syntactic and semantic context can improve the application of self-supervised models to specialty domains, both for vocabulary representation and for NLP tasks.The proposed methods can be used for any language with linguistic information or external knowledge available. The code used for the experiments has been published to facilitate reproducibility and measures have been taken to limit the environmental impact by reducing the number of experiments
Botella, Christophe. "Méthodes statistiques pour la modélisation de la distribution spatiale des espèces végétales à partir de grandes masses d’observations incertaines issues de programmes de sciences citoyennes." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS135.
Human botanical expertise is becoming too scarce to provide the field data needed to monitor plant biodiversity. The use of geolocated botanical observations from major citizen science projects, such as Pl@ntNet, opens interesting paths for a temporal monitoring of plant species distribution. Pl@ntNet provides automatically identified flora observations, a confidence score, and can thus be used for species distribution models (SDM). They enable to monitor the distribution of invasive or rare plants, as well as the effects of global changes on species, if we can (i) take into account identification uncertainty, (ii) correct for spatial sampling bias, and (iii) predict species abundances accurately at a fine spatial grain.First, we ask ourselves if we can estimate realistic distributions of invasive plant species on automatically identified occurrences of Pl@ntNet, and what is the effect of filtering with a confidence score threshold. Filtering improves predictions when the confidence level increases until the sample size is limiting. The predicted distributions are generally consistent with expert data, but also indicate urban areas of abundance due to ornamental cultivation and new areas of presence.Next, we studied the correction of spatial sampling bias in SDMs based on presences only. We first mathematically analyzed the bias when the occurrences of a target group of species (Target Group Background, TGB) are used as background points, and compared this bias with that of a spatially uniform selection of base points. We then show that the bias of TGB is due to the variation in the cumulative abundance of target group species in the environmental space, which is difficult to control. We can alternatively jointly model the global observation effort with the abundances of several species. We model the observation effort as a step spatial function defined on a mesh of geographical cells. The addition of massively observed species to the model then reduces the variance in the estimation of the observation effort and thus on the models of the other species.Finally, we propose a new type of SDM based on convolutional neural networks using environmental images as input variables. These models can capture complex spatial patterns of several environmental variables. We propose to share the architecture of the neural network between several species in order to extract common high-level predictors and regularize the model. Our results show that this model outperforms existing SDMs, that performance is improved by simultaneously predicting many species, and this is confirmed by two cooperative SDM evaluation campaigns conducted on independent data sets. This supports the hypothesis that there are common environmental models describing the distribution of many species.Our results support the use of Pl@ntnet occurrences for monitoring plant invasions. Joint modelling of multiple species and observation effort is a promising strategy that transforms the bias problem into a more controllable estimation variance problem. However, the effect of certain factors, such as the level of anthropization, on species abundance is difficult to separate from the effect on observation effort with occurrence data. This can be solved by additional protocolled data collection. The deep learning methods developed show good performance and could be used to deploy spatial species prediction services
Delecraz, Sébastien. "Approches jointes texte/image pour la compréhension multimodale de documents." Electronic Thesis or Diss., Aix-Marseille, 2018. http://www.theses.fr/2018AIXM0634.
The human faculties of understanding are essentially multimodal. To understand the world around them, human beings fuse the information coming from all of their sensory receptors. Most of the documents used in automatic information processing contain multimodal information, for example text and image in textual documents or image and sound in video documents, however the processings used are most often monomodal. The aim of this thesis is to propose joint processes applying mainly to text and image for the processing of multimodal documents through two studies: one on multimodal fusion for the speaker role recognition in television broadcasts, the other on the complementarity of modalities for a task of linguistic analysis on corpora of images with captions. In the first part of this study, we interested in audiovisual documents analysis from news television channels. We propose an approach that uses in particular deep neural networks for representation and fusion of modalities. In the second part of this thesis, we are interested in approaches allowing to use several sources of multimodal information for a monomodal task of natural language processing in order to study their complementarity. We propose a complete system of correction of prepositional attachments using visual information, trained on a multimodal corpus of images with captions
Hu, Kaitong. "Jeux différentiels stochastiques non-Markoviens etdynamiques de Langevin à champ-moyen." Thesis, Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAX005.
Two independent subjects are studied in this thesis, the first of which consists of two distinct problems.In the first part, we begin with the Principal-Agent problem in degenerate systems, which appear naturally in partially observed random environment in which the Agent and the Principal can only observe one part of the system. Our approach is based on the stochastic maximum principle, the goal of which is to extend the existing results using dynamic programming principle to the degenerate case. We first solve the Principal's problem in an enlarged set of contracts given by the first order condition of the Agent's problem in form of a path-dependent forward-backward stochastic differential equation (abbreviated FBSDE). Afterward, we use the sufficient condition of the Agent's problem to verify that the previously obtained optimal contract is indeed implementable. Meanwhile, a parallel study is devoted to the wellposedness of path-dependent FBSDEs in the chapter IV. We generalize the decoupling field method to the case where the coefficients of the equations can depend on the whole path of the forward process and show the stability property of this type of FBSDEs. Finally, we study the Principal-Agent problem with multiple Principals. The Agent can only work for one Principal at a time and therefore needs to solve an optimal switching problem. By using randomization, we show that the value function of the Agent's problem and his optimal control are given by an Itô process. This representation allows us to solve the Principal's problem in the mean-field case when there is an infinite number of Principals. We justify the mean-field formulation using an argument of backward propagation of chaos.The second part of the thesis consists of chapter V and VI. The motivation of this work is to give a rigorous theoretical underpinning for the convergence of gradient-descent type of algorithms frequently used in non-convex optimization problems like calibrating a deep neural network.For one-layer neural networks, the key insight is to convexify the problem by lifting it to the measure space. We show that the corresponding energy function has a unique minimiser which can be characterized by some first order condition using derivatives in measure space. We present a probabilistic analysis of the long-time behavior of the mean-field Langevin dynamics, which have a gradient flow structure in 2-Wasserstein metric. By using a generalization of LaSalle's invariance principle, we show that the flow of marginal laws induced by the mean-field Langevin dynamics converges to the stationary distribution, which is exactly the minimiser of the energy function.As for deep neural networks, we model them as some continuous-time optimal control problems. Firstly, we find the first order condition by using Pontryagin maximum principle, which later helps us find the associated mean-field Langevin system, the invariant measure of which is again the minimiser of the optimal control problem. As last, by using the reflection coupling, we show that the marginal distribution of the mean-field Langevin system converges to the unique invariant measure exponentially
Rivet, Julie. "Non-iterative methods for image improvement in digital holography of the retina." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS246.
With the increase of the number of people with moderate to severe visual impairment, monitoring and treatment of vision disorders have become major issues in medicine today. At the Quinze-Vingts national ophthalmology hospital in Paris, two optical benches have been settled in recent years to develop two real-time digital holography techniques for the retina: holographic optical coherence tomography (OCT) and laser Doppler holography. The first reconstructs three-dimensional images, while the second allows visualization of blood flow in vessels. Besides problems inherent to the imaging system itself, optical devices are subject to external disturbance, bringing also difficulties in imaging and loss of accuracy. The main obstacles these technologies face are eye motion and eye aberrations.In this thesis, we have introduced several methods for image quality improvement in digital holography, and validated them experimentally. The resolution of holographic images has been improved by robust non-iterative methods: lateral and axial tracking and compensation of translation movements, and measurement and compensation of optical aberrations. This allows us to be optimistic that structures on holographic images of the retina will be more visible and sharper, which could ultimately provide very valuable information to clinicians
Resmerita, Diana. "Compression pour l'apprentissage en profondeur." Thesis, Université Côte d'Azur, 2022. http://www.theses.fr/2022COAZ4043.
Autonomous cars are complex applications that need powerful hardware machines to be able to function properly. Tasks such as staying between the white lines, reading signs, or avoiding obstacles are solved by using convolutional neural networks (CNNs) to classify or detect objects. It is highly important that all the networks work in parallel in order to transmit all the necessary information and take a common decision. Nowadays, as the networks improve, they also have become bigger and more computational expensive. Deploying even one network becomes challenging. Compressing the networks can solve this issue. Therefore, the first objective of this thesis is to find deep compression methods in order to cope with the memory and computational power limitations present on embedded systems. The compression methods need to be adapted to a specific processor, Kalray's MPPA, for short term implementations. Our contributions mainly focus on compressing the network post-training for storage purposes, which means compressing the parameters of the network without retraining or changing the original architecture and the type of the computations. In the context of our work, we decided to focus on quantization. Our first contribution consists in comparing the performances of uniform quantization and non-uniform quantization, in order to identify which of the two has a better rate-distortion trade-off and could be quickly supported in the company. The company's interest is also directed towards finding new innovative methods for future MPPA generations. Therefore, our second contribution focuses on comparing standard floating-point representations (FP32, FP16) to recently proposed alternative arithmetical representations such as BFloat16, msfp8, Posit8. The results of this analysis were in favor for Posit8. This motivated the company Kalray to conceive a decompressor from FP16 to Posit8. Finally, since many compression methods already exist, we decided to move to an adjacent topic which aims to quantify theoretically the effects of quantization error on the network's accuracy. This is the second objective of the thesis. We notice that well-known distortion measures are not adapted to predict accuracy degradation in the case of inference for compressed neural networks. We define a new distortion measure with a closed form which looks like a signal-to-noise ratio. A set of experiments were done using simulated data and small networks, which show the potential of this distortion measure
Carbajal, Guillaume. "Apprentissage profond bout-en-bout pour le rehaussement de la parole." Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0017.
This PhD falls within the development of hands-free telecommunication systems, more specifically smart speakers in domestic environments. The user interacts with another speaker at a far-end point and can be typically a few meters away from this kind of system. The microphones are likely to capture sounds of the environment which are added to the user's voice, such background noise, acoustic echo and reverberation. These types of distortion degrade speech quality, intelligibility and listening comfort for the far-end speaker, and must be reduced. Filtering methods can reduce individually each of these types of distortion. Reducing all of them implies combining the corresponding filtering methods. As these methods interact with each other which can deteriorate the user's speech, they must be jointly optimized. First of all, we introduce an acoustic echo reduction approach which combines an echo cancellation filter with a residual echo postfilter designed to adapt to the echo cancellation filter. To do so, we propose to estimate the postfilter coefficients using the short term spectra of multiple known signals, including the output of the echo cancellation filter, as inputs to a neural network. We show that this approach improves the performance and the robustness of the postfilter in terms of echo reduction, while limiting speech degradation, on several scenarios in real conditions. Secondly, we describe a joint approach for multichannel reduction of echo, reverberation and noise. We propose to simultaneously model the target speech and undesired residual signals after echo cancellation and dereveberation in a probabilistic framework, and to jointly represent their short-term spectra by means of a recurrent neural network. We develop a block-coordinate ascent algorithm to update the echo cancellation and dereverberation filters, as well as the postfilter that reduces the undesired residual signals. We evaluate our approach on real recordings in different conditions. We show that it improves speech quality and reduction of echo, reverberation and noise compared to a cascade of individual filtering methods and another joint reduction approach. Finally, we present an online version of our approach which is suitable for time-varying acoustic conditions. We evaluate the perceptual quality achieved on real examples where the user moves during the conversation
Kalinicheva, Ekaterina. "Unsupervised satellite image time series analysis using deep learning techniques." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS335.
This thesis presents a set of unsupervised algorithms for satellite image time series (SITS) analysis. Our methods exploit machine learning algorithms and, in particular, neural networks to detect different spatio-temporal entities and their eventual changes in the time.In our thesis, we aim to identify three different types of temporal behavior: no change areas, seasonal changes (vegetation and other phenomena that have seasonal recurrence) and non-trivial changes (permanent changes such as constructions or demolishment, crop rotation, etc). Therefore, we propose two frameworks: one for detection and clustering of non-trivial changes and another for clustering of “stable” areas (seasonal changes and no change areas). The first framework is composed of two steps which are bi-temporal change detection and the interpretation of detected changes in a multi-temporal context with graph-based approaches. The bi-temporal change detection is performed for each pair of consecutive images of the SITS and is based on feature translation with autoencoders (AEs). At the next step, the changes from different timestamps that belong to the same geographic area form evolution change graphs. The graphs are then clustered using a recurrent neural networks AE model to identify different types of change behavior. For the second framework, we propose an approach for object-based SITS clustering. First, we encode SITS with a multi-view 3D convolutional AE in a single image. Second, we perform a two steps SITS segmentation using the encoded SITS and original images. Finally, the obtained segments are clustered exploiting their encoded descriptors
Ben, Naceur Mostefa. "Deep Neural Networks for the segmentation and classification in Medical Imaging." Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC2014.
Nowadays, getting an efficient segmentation of Glioblastoma Multiforme (GBM) braintumors in multi-sequence MRI images as soon as possible, gives an early clinical diagnosis, treatment, and follow-up. The MRI technique is designed specifically to provide radiologists with powerful visualization tools to analyze medical images, but the challenge lies more in the information interpretation of radiological images with clinical and pathologies data and their causes in the GBM tumors. This is why quantitative research in neuroimaging often requires anatomical segmentation of the human brain from MRI images for the detection and segmentation of brain tumors. The objective of the thesis is to propose automatic Deep Learning methods for brain tumors segmentation using MRI images.First, we are mainly interested in the segmentation of patients’ MRI images with GBMbrain tumors using Deep Learning methods, in particular, Deep Convolutional NeuralNetworks (DCNN). We propose two end-to-end DCNN-based approaches for fully automaticbrain tumor segmentation. The first approach is based on the pixel-wise techniquewhile the second one is based on the patch-wise technique. Then, we prove that thelatter is more efficient in terms of segmentation performance and computational benefits. We also propose a new guided optimization algorithm to optimize the suitable hyperparameters for the first approach. Second, to enhance the segmentation performance of the proposed approaches, we propose new segmentation pipelines of patients’ MRI images, where these pipelines are based on deep learned features and two stages of training. We also address problems related to unbalanced data in addition to false positives and false negatives to increase the model segmentation sensitivity towards the tumor regions and specificity towards the healthy regions. Finally, the segmentation performance and the inference time of the proposed approaches and pipelines are reported along with state-of-the-art methods on a public dataset annotated by radiologists and approved by neuroradiologists
Bisot, Victor. "Apprentissage de représentations pour l'analyse de scènes sonores." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0016.
This thesis work focuses on the computational analysis of environmental sound scenes and events. The objective of such tasks is to automatically extract information about the context in which a sound has been recorded. The interest for this area of research has been rapidly increasing in the last few years leading to a constant growth in the number of works and proposed approaches. We explore and contribute to the main families of approaches to sound scene and event analysis, going from feature engineering to deep learning. Our work is centered at representation learning techniques based on nonnegative matrix factorization, which are particularly suited to analyse multi-source environments such as acoustic scenes. As a first approach, we propose a combination of image processing features with the goal of confirming that spectrograms contain enough information to discriminate sound scenes and events. From there, we leave the world of feature engineering to go towards automatically learning the features. The first step we take in that direction is to study the usefulness of matrix factorization for unsupervised feature learning techniques, especially by relying on variants of NMF. Several of the compared approaches allow us indeed to outperform feature engineering approaches to such tasks. Next, we propose to improve the learned representations by introducing the TNMF model, a supervised variant of NMF. The proposed TNMF models and algorithms are based on jointly learning nonnegative dictionaries and classifiers by minimising a target classification cost. The last part of our work highlights the links and the compatibility between NMF and certain deep neural network systems by proposing and adapting neural network architectures to the use of NMF as an input representation. The proposed models allow us to get state of the art performance on scene classification and overlapping event detection tasks. Finally we explore the possibility of jointly learning NMF and neural networks parameters, grouping the different stages of our systems in one optimisation problem
Jaureguiberry, Xabier. "Fusion pour la séparation de sources audio." Thesis, Paris, ENST, 2015. http://www.theses.fr/2015ENST0030/document.
Underdetermined blind source separation is a complex mathematical problem that can be satisfyingly resolved for some practical applications, providing that the right separation method has been selected and carefully tuned. In order to automate this selection process, we propose in this thesis to resort to the principle of fusion which has been widely used in the related field of classification yet is still marginally exploited in source separation. Fusion consists in combining several methods to solve a given problem instead of selecting a unique one. To do so, we introduce a general fusion framework in which a source estimate is expressed as a linear combination of estimates of this same source given by different separation algorithms, each source estimate being weighted by a fusion coefficient. For a given task, fusion coefficients can then be learned on a representative training dataset by minimizing a cost function related to the separation objective. To go further, we also propose two ways to adapt the fusion coefficients to the mixture to be separated. The first one expresses the fusion of several non-negative matrix factorization (NMF) models in a Bayesian fashion similar to Bayesian model averaging. The second one aims at learning time-varying fusion coefficients thanks to deep neural networks. All proposed methods have been evaluated on two distinct corpora. The first one is dedicated to speech enhancement while the other deals with singing voice extraction. Experimental results show that fusion always outperform simple selection in all considered cases, best results being obtained by adaptive time-varying fusion with neural networks