Tesi sul tema "Apprentissages profond"
Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili
Vedi i top-50 saggi (tesi di laurea o di dottorato) per l'attività di ricerca sul tema "Apprentissages profond".
Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.
Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.
Vedi le tesi di molte aree scientifiche e compila una bibliografia corretta.
Hassanaly, Ravi. "Pseudo-healthy image reconstruction with deep generative models for the detection of dementia-related anomalies". Electronic Thesis or Diss., Sorbonne université, 2024. http://www.theses.fr/2024SORUS118.
Testo completoNeuroimaging has become an essential tool in the study of markers of Alzheimer's disease. However, analyzing complex multimodal brain images remains a major challenge for clinicians. To overcome this difficulty, deep learning methods have emerged as a promising solution for the automatic and robust analysis of neuroimaging data. In this thesis, we explore the use of deep generative models for the detection of anomalies associated with dementia in 18F-fluorodesoxyglucose positron emission tomography (FDG PET) data. Our method is based on the principle of pseudo-healthy reconstruction, where we train a generative model to reconstruct healthy images from pathological data. This approach has the advantage of not requiring annotated data, which are time-consuming and costly to acquire, as well as being generalizable to different types of anomalies. We chose to implement a variational autoencoder (VAE), a simple model, but that proved its worth in the field of deep learning. However, assessing the performance of our generative models without labeled data or ground truth anomaly maps leads to an incomplete evaluation. To solve this issue, we have introduced an evaluation framework based on the simulation of hypometabolism on FDG PET images. Thus, by creating pairs of healthy and diseased images, we are able to assess the model's ability to reconstruct pseudo-healthy images. In addition, this methodology has enabled us to define new metrics for assessing the quality of reconstructions obtained from generative models. The evaluation framework allowed us to carry out a comparative study on twenty VAE variants in the context of FDG PET pseudo-healthy reconstruction. The proposed benchmark enabled us to identify the best-performing models for detecting dementia-related anomalies. Finally, several significant contributions have been made to open-source software. A PET image processing pipeline has been integrated into the Clinica software. In addition, this thesis gave rise to numerous contributions to the development of the ClinicaDL software, including its improvement, the addition of new functionalities, software maintenance and participation in project management
Béthune, Louis. "Apprentissage profond avec contraintes Lipschitz". Electronic Thesis or Diss., Université de Toulouse (2023-....), 2024. http://www.theses.fr/2024TLSES014.
Testo completoThis thesis explores the characteristics and applications of Lipschitz networks in machine learning tasks. First, the framework of "optimization as a layer" is presented, showcasing various applications, including the parametrization of Lipschitz-constrained layers. Then, the expressiveness of these networks in classification tasks is investigated, revealing an accuracy/robustness tradeoff controlled by entropic regularization of the loss, accompanied by generalization guarantees. Subsequently, the research delves into the utilization of signed distance functions as a solution to a regularized optimal transport problem, showcasing their efficacy in robust one-class learning and the construction of neural implicit surfaces. After, the thesis demonstrates the adaptability of the back-propagation algorithm to propagate bounds instead of vectors, enabling differentially private training of Lipschitz networks without incurring runtime and memory overhead. Finally, it goes beyond Lipschitz constraints and explores the use of convexity constraint for multivariate quantiles
Vialatte, Jean-Charles. "Convolution et apprentissage profond sur graphes". Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2018. http://www.theses.fr/2018IMTA0118/document.
Testo completoConvolutional neural networks have proven to be the deep learning model that performs best on regularly structured datasets like images or sounds. However, they cannot be applied on datasets with an irregular structure (e.g. sensor networks, citation networks, MRIs). In this thesis, we develop an algebraic theory of convolutions on irregular domains. We construct a family of convolutions that are based on group actions (or, more generally, groupoid actions) that acts on the vertex domain and that have properties that depend on the edges. With the help of these convolutions, we propose extensions of convolutional neural netowrks to graph domains. Our researches lead us to propose a generic formulation of the propagation between layers, that we call the neural contraction. From this formulation, we derive many novel neural network models that can be applied on irregular domains. Through benchmarks and experiments, we show that they attain state-of-the-art performances, and beat them in some cases
Terreau, Enzo. "Apprentissage de représentations d'auteurs et d'autrices à partir de modèles de langue pour l'analyse des dynamiques d'écriture". Electronic Thesis or Diss., Lyon 2, 2024. http://www.theses.fr/2024LYO20001.
Testo completoThe recent and massive democratization of digital tools has empowered individuals to generate and share information on the web through various means such as blogs, social networks, sharing platforms, and more. The exponential growth of available information, mostly textual data, requires the development of Natural Language Processing (NLP) models to mathematically represent it and subsequently classify, sort, or recommend it. This is the essence of representation learning. It aims to construct a low-dimensional space where the distances between projected objects (words, texts) reflect real-world distances, whether semantic, stylistic, and so on.The proliferation of available data, coupled with the rise in computing power and deep learning, has led to the creation of highly effective language models for word and document embeddings. These models incorporate complex semantic and linguistic concepts while remaining accessible to everyone and easily adaptable to specific tasks or corpora. One can use them to create author embeddings. However, it is challenging to determine the aspects on which a model will focus to bring authors closer or move them apart. In a literary context, it is preferable for similarities to primarily relate to writing style, which raises several issues. The definition of literary style is vague, assessing the stylistic difference between two texts and their embeddings is complex. In computational linguistics, approaches aiming to characterize it are mainly statistical, relying on language markers. In light of this, our first contribution is a framework to evaluate the ability of language models to grasp writing style. We will have previously elaborated on text embedding models in machine learning and deep learning, at the word, document, and author levels. We will also have presented the treatment of the notion of literary style in Natural Language Processing, which forms the basis of our method. Transferring knowledge between black-box large language models and these methods derived from linguistics remains a complex task. Our second contribution aims to reconcile these approaches through a representation learning model focusing on style, VADES (Variational Author and Document Embedding with Style). We compare our model to state-of-the-art ones and analyze their limitations in this context.Finally, we delve into dynamic author and document embeddings. Temporal information is crucial, allowing for a more fine-grained representation of writing dynamics. After presenting the state of the art, we elaborate on our last contribution, B²ADE (Brownian Bridge Author and Document Embedding), which models authors as trajectories. We conclude by outlining several leads for improving our methods and highlighting potential research directions for the future
Katranji, Mehdi. "Apprentissage profond de la mobilité des personnes". Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCA024.
Testo completoKnowledge of mobility is a major challenge for authorities mobility organisers and urban planning. Due to the lack of formal definition of human mobility, the term "people's mobility" will be used in this book. This topic will be introduced by a description of the ecosystem by considering these actors and applications.The creation of a learning model has prerequisites: an understanding of the typologies of the available data sets, their strengths and weaknesses. This state of the art in mobility knowledge is based on the four-step model that has existed and been used since 1970, ending with the renewal of the methodologies of recent years.Our models of people's mobility are then presented. Their common point is the emphasis on the individual, unlike traditional approaches that take the locality as a reference. The models we propose are based on the fact that the intake of individuals' decisions is based on their perception of the environment.This finished book on the study of the deep learning methods of Boltzmann machines restricted. After a state of the art of this family of models, we are looking for strategies to make these models viable in the application world. This last chapter is our contribution main theoretical, by improving robustness and performance of these models
Deschaintre, Valentin. "Acquisition légère de matériaux par apprentissage profond". Thesis, Université Côte d'Azur (ComUE), 2019. http://theses.univ-cotedazur.fr/2019AZUR4078.
Testo completoWhether it is used for entertainment or industrial design, computer graphics is ever more present in our everyday life. Yet, reproducing a real scene appearance in a virtual environment remains a challenging task, requiring long hours from trained artists. A good solution is the acquisition of geometries and materials directly from real world examples, but this often comes at the cost of complex hardware and calibration processes. In this thesis, we focus on lightweight material appearance capture to simplify and accelerate the acquisition process and solve industrial challenges such as result image resolution or calibration. Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in pictures. Designing algorithms able to leverage these cues to recover spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a few images has challenged computer graphics researchers for decades. We explore the use of deep learning to tackle lightweight appearance capture and make sense of these visual cues. Once trained, our networks are capable of recovering per-pixel normals, diffuse albedo, specular albedo and specular roughness from as little as one picture of a flat surface lit by the environment or a hand-held flash. We show how our method improves its prediction with the number of input pictures to reach high quality reconstructions with up to 10 images --- a sweet spot between existing single-image and complex multi-image approaches --- and allows to capture large scale, HD materials. We achieve this goal by introducing several innovations on training data acquisition and network design, bringing clear improvement over the state of the art for lightweight material capture
Paumard, Marie-Morgane. "Résolution automatique de puzzles par apprentissage profond". Thesis, CY Cergy Paris Université, 2020. http://www.theses.fr/2020CYUN1067.
Testo completoThe objective of this thesis is to develop semantic methods of reassembly in the complicated framework of heritage collections, where some blocks are eroded or missing.The reassembly of archaeological remains is an important task for heritage sciences: it allows to improve the understanding and conservation of ancient vestiges and artifacts. However, some sets of fragments cannot be reassembled with techniques using contour information or visual continuities. It is then necessary to extract semantic information from the fragments and to interpret them. These tasks can be performed automatically thanks to deep learning techniques coupled with a solver, i.e., a constrained decision making algorithm.This thesis proposes two semantic reassembly methods for 2D fragments with erosion and a new dataset and evaluation metrics.The first method, Deepzzle, proposes a neural network followed by a solver. The neural network is composed of two Siamese convolutional networks trained to predict the relative position of two fragments: it is a 9-class classification. The solver uses Dijkstra's algorithm to maximize the joint probability. Deepzzle can address the case of missing and supernumerary fragments, is capable of processing about 15 fragments per puzzle, and has a performance that is 25% better than the state of the art.The second method, Alphazzle, is based on AlphaZero and single-player Monte Carlo Tree Search (MCTS). It is an iterative method that uses deep reinforcement learning: at each step, a fragment is placed on the current reassembly. Two neural networks guide MCTS: an action predictor, which uses the fragment and the current reassembly to propose a strategy, and an evaluator, which is trained to predict the quality of the future result from the current reassembly. Alphazzle takes into account the relationships between all fragments and adapts to puzzles larger than those solved by Deepzzle. Moreover, Alphazzle is compatible with constraints imposed by a heritage framework: at the end of reassembly, MCTS does not access the reward, unlike AlphaZero. Indeed, the reward, which indicates if a puzzle is well solved or not, can only be estimated by the algorithm, because only a conservator can be sure of the quality of a reassembly
Haykal, Vanessa. "Modélisation des séries temporelles par apprentissage profond". Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4019.
Testo completoTime series prediction is a problem that has been addressed for many years. In this thesis, we have been interested in methods resulting from deep learning. It is well known that if the relationships between the data are temporal, it is difficult to analyze and predict accurately due to non-linear trends and the existence of noise specifically in the financial and electrical series. From this context, we propose a new hybrid noise reduction architecture that models the recursive error series to improve predictions. The learning process fusessimultaneouslyaconvolutionalneuralnetwork(CNN)andarecurrentlongshort-term memory network (LSTM). This model is distinguished by its ability to capture globally a variety of hybrid properties, where it is able to extract local signal features, to learn long-term and non-linear dependencies, and to have a high noise resistance. The second contribution concerns the limitations of the global approaches because of the dynamic switching regimes in the signal. We present a local unsupervised modification with our previous architecture in order to adjust the results by adapting the Hidden Markov Model (HMM). Finally, we were also interested in multi-resolution techniques to improve the performance of the convolutional layers, notably by using the variational mode decomposition method (VMD)
Sors, Arnaud. "Apprentissage profond pour l'analyse de l'EEG continu". Thesis, Université Grenoble Alpes (ComUE), 2018. http://www.theses.fr/2018GREAS006/document.
Testo completoThe objective of this research is to explore and develop machine learning methods for the analysis of continuous electroencephalogram (EEG). Continuous EEG is an interesting modality for functional evaluation of cerebral state in the intensive care unit and beyond. Today its clinical use remains more limited that it could be because interpretation is still mostly performed visually by trained experts. In this work we develop automated analysis tools based on deep neural models.The subparts of this work hinge around post-anoxic coma prognostication, chosen as pilot application. A small number of long-duration records were performed and available existing data was gathered from CHU Grenoble. Different components of a semi-supervised architecture that addresses the application are imagined, developed, and validated on surrogate tasks.First, we validate the effectiveness of deep neural networks for EEG analysis from raw samples. For this we choose the supervised task of sleep stage classification from single-channel EEG. We use a convolutional neural network adapted for EEG and we train and evaluate the system on the SHHS (Sleep Heart Health Study) dataset. This constitutes the first neural sleep scoring system at this scale (5000 patients). Classification performance reaches or surpasses the state of the art.In real use for most clinical applications, the main challenge is the lack of (and difficulty of establishing) suitable annotations on patterns or short EEG segments. Available annotations are high-level (for example, clinical outcome) and therefore they are few. We search how to learn compact EEG representations in an unsupervised/semi-supervised manner. The field of unsupervised learning using deep neural networks is still young. To compare to existing work we start with image data and investigate the use of generative adversarial networks (GANs) for unsupervised adversarial representation learning. The quality and stability of different variants are evaluated. We then apply Gradient-penalized Wasserstein GANs on EEG sequences generation. The system is trained on single channel sequences from post-anoxic coma patients and is able to generate realistic synthetic sequences. We also explore and discuss original ideas for learning representations through matching distributions in the output space of representative networks.Finally, multichannel EEG signals have specificities that should be accounted for in characterization architectures. Each EEG sample is an instantaneous mixture of the activities of a number of sources. Based on this statement we propose an analysis system made of a spatial analysis subsystem followed by a temporal analysis subsystem. The spatial analysis subsystem is an extension of source separation methods built with a neural architecture with adaptive recombination weights, i.e. weights that are not learned but depend on features of the input. We show that this architecture learns to perform Independent Component Analysis if it is trained on a measure of non-gaussianity. For temporal analysis, standard (shared) convolutional neural networks applied on separate recomposed channels can be used
Sheikh, Shakeel Ahmad. "Apprentissage profond pour la détection du bégaiement". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0005.
Testo completoStuttering is a speech disorder that is most frequently observed among speech impairments and results in the form of core behaviours. The tedious and time-consuming task of detecting and analyzing speech patterns of PWS, with the goal of rectifying them is often handled manually by speech therapists, and is biased towards their subjective beliefs. Moreover, the ASR systems also fail to recognize the stuttered speech, which makes it impractical for PWS to access virtual digital assistants such as Siri, Alexa, etc.This thesis tries to develop audio based SD systems that successfully capture different variabilities from stuttering utterances such as speaking styles, age, accents, etc., and learns robust stuttering representations with an aim to provide a fair, consistent, and unbiased assessment of stuttered speech.While most of the existing SD systems use multiple binary classifiers for each stutter type, we present a unified multi-class StutterNet capable of detecting multiple stutter types. Approaching the class-imbalance problem in stuttering domain, we investigated the impact of applying weighted loss function, and, also presented Multi-contextual (MC) Multi-branch (MB) StutterNet to improve the detection performance of minority classes.Exploiting the speaker information with an assumption that the stuttering models should be invariant to meta-data such as speaker information, we present, an adversarial MTL SD method that learns robust stutter discrimintaive speaker-invariant representations.Due to paucity of unlabeled data, the automated SD task is limited in its use of large deep models in capturing different varaibilities, we introduced the first-ever SSL framework to SD domain. The SSL framework first trains a feature extractor for a pre-text task using a large quantity of unlabeled non-stuttering audio data to capture these different varaibilities, and then applies the learned feature extractor to a downstream SD task using limited labeled stuttering audio data
Assis, Youssef. "Détection des anévrismes intracrâniens par apprentissage profond". Electronic Thesis or Diss., Université de Lorraine, 2024. http://www.theses.fr/2024LORR0012.
Testo completoIntracranial aneurysms are local dilatations of cerebral blood vessels, presenting a significant risk of rupture, which can lead to serious consequences. Early detection of unruptured aneurysms is therefore crucial to prevent potentially fatal complications. However, analyzing medical images to locate these aneurysms is a complex and time-consuming task, requiring time and expertise, and yet remains prone to errors in interpretation. Faced with these challenges, this thesis explores automated methods for the detection of aneurysms, aiming to facilitate the work of radiologists and improve diagnostic efficiency. Our approach focuses on the use of artificial intelligence techniques, particularly deep neural networks, for the detection of aneurysms from time-of-flight magnetic resonance angiography (TOF-MRA) images. Our research work is centered around several main axes. Firstly, due to the scarcity of training data in the medical field, we adopt a rapid, although approximate, annotation method to facilitate data collection. Furthermore, we propose a strategy based on small patches. In association with data synthesis, the samples are multiplied in the training database. By selecting the samples, their distribution is adjusted to facilitate optimization. Secondly, for the automated detection of aneurysms, we investigate various neural network architectures. An initial approach explores image segmentation networks. Then, we propose an innovative architecture inspired by object detection methods. These architectures, especially the latter, lead to competitive results, particularly in terms of sensitivity compared to experts. Thirdly, beyond the detection of aneurysms, we extend our model to estimate the pose of aneurysms in 3D images. This can greatly facilitate their analysis and interpretation in reformatted cross-sectional plans. A thorough evaluation of the proposed models is systematically carried out, including ablation studies, the use of metrics adapted to the problem of detection, and evaluations conducted by clinical experts, allowing us to assess their potential effectiveness for clinical use. In particular, we highlight the issues related to uncertainty in the annotation of existing databases
Moradi, Fard Maziar. "Apprentissage de représentations de données dans un apprentissage non-supervisé". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM053.
Testo completoDue to the great impact of deep learning on variety fields of machine learning, recently their abilities to improve clustering approaches have been investi- gated. At first, deep learning approaches (mostly Autoencoders) have been used to reduce the dimensionality of the original space and to remove possible noises (also to learn new data representations). Such clustering approaches that utilize deep learning approaches are called Deep Clustering. This thesis focuses on developing Deep Clustering models which can be used for different types of data (e.g., images, text). First we propose a Deep k-means (DKM) algorithm where learning data representations (through a deep Autoencoder) and cluster representatives (through the k-means) are performed in a joint way. The results of our DKM approach indicate that this framework is able to outperform similar algorithms in Deep Clustering. Indeed, our proposed framework is able to truly and smoothly backpropagate the loss function error through all learnable variables.Moreover, we propose two frameworks named SD2C and PCD2C which are able to integrate respectively seed words and pairwise constraints into end-to-end Deep Clustering frameworks. In fact, by utilizing such frameworks, the users can observe the reflection of their needs in clustering. Finally, the results obtained from these frameworks indicate their ability to obtain more tailored results
Ostertag, Cécilia. "Analyse des pathologies neuro-dégénératives par apprentissage profond". Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS003.
Testo completoMonitoring and predicting the cognitive state of a subject affected by a neuro-degenerative disorder is crucial to provide appropriate treatment as soon as possible. Thus, these patients are followed for several years, as part of longitudinal medical studies. During each visit, a large quantity of data is acquired : risk factors linked to the pathology, medical imagery (MRI or PET scans for example), cognitive tests results, sampling of molecules that have been identified as bio-markers, etc. These various modalities give information about the disease's progression, some of them are complementary and others can be redundant. Several deep learning models have been applied to bio-medical data, notably for organ segmentation or pathology diagnosis. This PhD is focused on the conception of a deep neural network model for cognitive decline prediction, using multimodal data, here both structural brain MRI images and clinical data. In this thesis we propose an architecture made of sub-modules tailored to each modality : 3D convolutional network for the brain MRI, and fully connected layers for the quantitative and qualitative clinical data. To predict the patient's evolution, this model takes as input data from two medical visits for each patient. These visits are compared using a siamese architecture. After training and validating this model with Alzheimer's disease as our use case, we look into knowledge transfer to other neuro-degenerative pathologies, and we use transfer learning to adapt our model to Parkinson's disease. Finally, we discuss the choices we made to take into account the temporal aspect of our problem, both during the ground truth creation using the long-term evolution of a cognitive score, and for the choice of using pairs of visits as input instead of longer sequences
Mazari, Ahmed. "Apprentissage profond pour la reconnaissance d’actions en vidéos". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS171.
Testo completoNowadays, video contents are ubiquitous through the popular use of internet and smartphones, as well as social media. Many daily life applications such as video surveillance and video captioning, as well as scene understanding require sophisticated technologies to process video data. It becomes of crucial importance to develop automatic means to analyze and to interpret the large amount of available video data. In this thesis, we are interested in video action recognition, i.e. the problem of assigning action categories to sequences of videos. This can be seen as a key ingredient to build the next generation of vision systems. It is tackled with AI frameworks, mainly with ML and Deep ConvNets. Current ConvNets are increasingly deeper, data-hungrier and this makes their success tributary of the abundance of labeled training data. ConvNets also rely on (max or average) pooling which reduces dimensionality of output layers (and hence attenuates their sensitivity to the availability of labeled data); however, this process may dilute the information of upstream convolutional layers and thereby affect the discrimination power of the trained video representations, especially when the learned action categories are fine-grained
Cohen-Hadria, Alice. "Estimation de descriptions musicales et sonores par apprentissage profond". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS607.
Testo completoIn Music Information Retrieval (MIR) and voice processing, the use of machine learning tools has become in the last few years more and more standard. Especially, many state-of-the-art systems now rely on the use of Neural Networks.In this thesis, we propose a wide overview of four different MIR and voice processing tasks, using systems built with neural networks. More precisely, we will use convolutional neural networks, an image designed class neural networks. The first task presented is music structure estimation. For this task, we will show how the choice of input representation can be critical, when using convolutional neural networks. The second task is singing voice detection. We will present how to use a voice detection system to automatically align lyrics and audio tracks.With this alignment mechanism, we have created the largest synchronized audio and speech data set, called DALI. Singing voice separation is the third task. For this task, we will present a data augmentation strategy, a way to significantly increase the size of a training set. Finally, we tackle voice anonymization. We will present an anonymization method that both obfuscate content and mask the speaker identity, while preserving the acoustic scene
Trabelsi, Anis. "Robustesse aux attaques en authentification digitale par apprentissage profond". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS580.
Testo completoThe identity of people on the Internet is becoming a major security issue. Since the Bale agreements, banking institutions have integrated the verification of people's identity or Know Your Customer (KYC) in their registration process. With the dematerialization of banks, this procedure has become e-KYC or remote KYC which works remotely through the user's smartphone. Similarly, remote identity verification has become the standard for enrollment in electronic signature tools. New regulations are emerging to secure this approach, for example, in France, the PVID framework regulates the remote acquisition of identity documents and people's faces under the eIDAS regulation. This is required because a new type of digital crime is emerging: deep identity theft. With new deep learning tools, imposters can change their appearance to look like someone else in real time. Imposters can then perform all the common actions required in a remote registration without being detected by identity verification algorithms. Today, smartphone applications and tools for a more limited audience exist allowing imposters to easily transform their appearance in real time. There are even methods to spoof an identity based on a single image of the victim's face. The objective of this thesis is to study the vulnerabilities of remote identity authentication systems against new attacks in order to propose solutions based on deep learning to make the systems more robust
Bertrand, Hadrien. "Optimisation d'hyper-paramètres en apprentissage profond et apprentissage par transfert : applications en imagerie médicale". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT001/document.
Testo completoIn the last few years, deep learning has changed irrevocably the field of computer vision. Faster, giving better results, and requiring a lower degree of expertise to use than traditional computer vision methods, deep learning has become ubiquitous in every imaging application. This includes medical imaging applications. At the beginning of this thesis, there was still a strong lack of tools and understanding of how to build efficient neural networks for specific tasks. Thus this thesis first focused on the topic of hyper-parameter optimization for deep neural networks, i.e. methods for automatically finding efficient neural networks on specific tasks. The thesis includes a comparison of different methods, a performance improvement of one of these methods, Bayesian optimization, and the proposal of a new method of hyper-parameter optimization by combining two existing methods: Bayesian optimization and Hyperband.From there, we used these methods for medical imaging applications such as the classification of field-of-view in MRI, and the segmentation of the kidney in 3D ultrasound images across two populations of patients. This last task required the development of a new transfer learning method based on the modification of the source network by adding new geometric and intensity transformation layers.Finally this thesis loops back to older computer vision methods, and we propose a new segmentation algorithm combining template deformation and deep learning. We show how to use a neural network to predict global and local transformations without requiring the ground-truth of these transformations. The method is validated on the task of kidney segmentation in 3D US images
Bertrand, Hadrien. "Optimisation d'hyper-paramètres en apprentissage profond et apprentissage par transfert : applications en imagerie médicale". Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT001.
Testo completoIn the last few years, deep learning has changed irrevocably the field of computer vision. Faster, giving better results, and requiring a lower degree of expertise to use than traditional computer vision methods, deep learning has become ubiquitous in every imaging application. This includes medical imaging applications. At the beginning of this thesis, there was still a strong lack of tools and understanding of how to build efficient neural networks for specific tasks. Thus this thesis first focused on the topic of hyper-parameter optimization for deep neural networks, i.e. methods for automatically finding efficient neural networks on specific tasks. The thesis includes a comparison of different methods, a performance improvement of one of these methods, Bayesian optimization, and the proposal of a new method of hyper-parameter optimization by combining two existing methods: Bayesian optimization and Hyperband.From there, we used these methods for medical imaging applications such as the classification of field-of-view in MRI, and the segmentation of the kidney in 3D ultrasound images across two populations of patients. This last task required the development of a new transfer learning method based on the modification of the source network by adding new geometric and intensity transformation layers.Finally this thesis loops back to older computer vision methods, and we propose a new segmentation algorithm combining template deformation and deep learning. We show how to use a neural network to predict global and local transformations without requiring the ground-truth of these transformations. The method is validated on the task of kidney segmentation in 3D US images
Goh, Hanlin. "Apprentissage de Représentations Visuelles Profondes". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2013. http://tel.archives-ouvertes.fr/tel-00948376.
Testo completoMoukari, Michel. "Estimation de profondeur à partir d'images monoculaires par apprentissage profond". Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC211/document.
Testo completoComputer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation
Vielzeuf, Valentin. "Apprentissage neuronal profond pour l'analyse de contenus multimodaux et temporels". Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC229/document.
Testo completoOur perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain tasks, it is therefore relevant to use different modalities, such as sound or image.This thesis focuses on this notion in the context of deep learning. For this, it seeks to answer a particular problem: how to merge the different modalities within a deep neural network?We first propose to study a problem of concrete application: the automatic recognition of emotion in audio-visual contents.This leads us to different considerations concerning the modeling of emotions and more particularly of facial expressions. We thus propose an analysis of representations of facial expression learned by a deep neural network.In addition, we observe that each multimodal problem appears to require the use of a different merge strategy.This is why we propose and validate two methods to automatically obtain an efficient fusion neural architecture for a given multimodal problem, the first one being based on a central fusion network and aimed at preserving an easy interpretation of the adopted fusion strategy. While the second adapts a method of neural architecture search in the case of multimodal fusion, exploring a greater number of strategies and therefore achieving better performance.Finally, we are interested in a multimodal view of knowledge transfer. Indeed, we detail a non-traditional method to transfer knowledge from several sources, i.e. from several pre-trained models. For that, a more general neural representation is obtained from a single model, which brings together the knowledge contained in the pre-trained models and leads to state-of-the-art performances on a variety of facial analysis tasks
Kaabi, Rabeb. "Apprentissage profond et traitement d'images pour la détection de fumée". Electronic Thesis or Diss., Toulon, 2020. http://www.theses.fr/2020TOUL0017.
Testo completoThis thesis deals with the problem of forest fire detection using image processing and machine learning tools. A forest fire is a fire that spreads over a wooded area. It can be of natural origin (due to lightning or a volcanic eruption) or human. Around the world, the impact of forest fires on many aspects of our daily lives is becoming more and more apparent on the entire ecosystem.Many methods have been shown to be effective in detecting forest fires. The originality of the present work lies in the early detection of fires through the detection of forest smoke and the classification of smoky and non-smoky regions using deep learning and image processing tools. A set of pre-processing techniques helped us to have an important database which allowed us afterwards to test the robustness of the model based on deep belief network we proposed and to evaluate the performance by calculating the following metrics (IoU, Accuracy, Recall, F1 score). Finally, the proposed algorithm is tested on several images in order to validate its efficiency. The simulations of our algorithm have been compared with those processed in the state of the art (Deep CNN, SVM...) and have provided very good results. The results of the proposed methods gave an average classification accuracy of about 96.5% for the early detection of smoke
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains". Thesis, Paris, ENST, 2017. http://www.theses.fr/2017ENST0071/document.
Testo completoThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Doan, Tien Tai. "Réalisation d’une aide au diagnostic en orthodontie par apprentissage profond". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG033.
Testo completoAccurate processing and diagnosis of dental images is an essential factor determining the success of orthodontic treatment. Many image processing methods have been proposed to address this problem. Those studies mainly work on small datasets of radiographs under laboratory conditions and are not highly applicable as complete products or services. In this thesis, we train deep learning models to diagnose dental problems such as gingivitis and crowded teeth using mobile phones' images. We study feature layers of these models to find the strengths and limitations of each method. Besides training deep learning models, we also embed each of them in a pipeline, including preprocessing and post-processing steps, to create a complete product. For the lack of training data problem, we studied a variety of methods for data augmentation, especially domain adaptation methods using image-to-image translation models, both supervised and unsupervised, and obtain promising results. Image translation networks are also used to simplifying patients' choice of orthodontic appliances by showing them how their teeth could look like during treatment. Generated images have are realistic and in high resolution. Researching further into unsupervised image translation neural networks, we propose an unsupervised imageto- image translation model which can manipulate features of objects in the image without requiring additional annotation. Our model outperforms state-of-the-art techniques on multiple image translation applications and is also extended for few-shot learning problems
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains". Electronic Thesis or Diss., Paris, ENST, 2017. http://www.theses.fr/2017ENST0071.
Testo completoThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Israilov, Sardor. "De l'identification basée apprentissage profond à la commande basée modèle". Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4003.
Testo completoFish swimming remains a complex subject that is not yet fully understood due to the inter-section of biology and fluid dynamics. Through years of evolution, organisms in nature have perfected their biological mechanisms to navigate efficiently in their environment and adaptto particular situations. Throughout history, mankind has been inspired by nature to innovateand develop nature-like systems. Biomimetic robotic fish, in particular, has a number of appli-cations in the real world and its control is yet to be optimized. Deep Reinforcement Learning showed excellent results in control of robotic systems, where dynamics is too complex to befully modeled and analyzed. In this thesis, we explored new venues of control of a biomimetic fish via reinforcement learning to effectively maximize the thrust and speed. However, to fully comprehend the newly-emerged data-based algorithms, we first studied the application of these methods on a standard benchmark of a control theory, the inverted pendulum with a cart. We demonstrated that deep Reinforcement Learning could control the system without any prior knowledge of the system, achieving performance comparable to traditional model-based con-trol theory methods. In the third chapter, we focus on the undulatory swimming of a roboticfish, exploring various objectives and information sources for control. Our studies indicate that the thrust force of a robotic fish can be optimized using inputs from both force sensors and cameras as feedback for control. Our findings demonstrate that a square wave control with a particular frequency maximizes the thrust and we rationalize it using Pontryagin Maximum Principle. An appropriate model is established that shows an excellent agreement between simulation and experimental results. Subsequently, we concentrate on the speed maximization of a robotic fish both in several virtual environments and experiments using visual data. Once again, we find that deep Reinforcement Learning can find an excellent swimming gait with a square wave control that maximizes the swimming speed
Ganaye, Pierre-Antoine. "A priori et apprentissage profond pour la segmentation en imagerie cérébrale". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI100.
Testo completoMedical imaging is a vast field guided by advances in instrumentation, acquisition techniques and image processing. Advances in these major disciplines all contribute to the improvement of the understanding of both physiological and pathological phenomena. In parallel, access to broader imaging databases, combined with the development of computing power, has fostered the development of machine learning methodologies for automatic image processing, including approaches based on deep neural networks. Among the applications where deep neural networks provide solutions, we find image segmentation, which consists in locating and delimiting in an image regions with specific properties that will be associated with the same structure. Despite many recent studies in deep learning based segmentation, learning the parameters of a neural network is still guided by quantitative performance measures that do not include high-level knowledge of anatomy. The objective of this thesis is to develop methods to integrate a priori into deep neural networks, targeting the segmentation of brain structures in MRI imaging. Our first contribution proposes a strategy for integrating the spatial position of the patch to be classified, to improve the discriminating power of the segmentation model. This first work considerably corrects segmentation errors that are far away from the anatomical reality, also improving the overall quality of the results. Our second contribution focuses on a methodology to constrain adjacency relationships between anatomical structures, directly while learning network parameters, in order to reinforce the realism of the produced segmentations. Our experiments conclude that the proposed constraint corrects non-admitted adjacencies, thus improving the anatomical consistency of the segmentations produced by the neural network
Routhier, Etienne. "Conception de séquences génomiques artificielles chez la levure par apprentissage profond". Thesis, Sorbonne université, 2021. http://www.theses.fr/2021SORUS465.
Testo completoRecent technological advances in the field of biotechnologies such as CRISPR and the de novo DNA oligonucleotides synthesis now make it possible to modify precisely and intensively genomes. Projects aiming to design partially or completely synthetic genomes, in particular yeast genomes, have been developed by taking advantage of these technologies. However, to achieve this goal it is necessary to control the activity of artificial sequences, which remains a challenge today. Fortunately, the recent emergence of deep learning methodologies able to recognize the genomic function associated to a DNA sequence seems to provide a powerful tool for anticipating the activity of synthetic genomes and facilitating their design. In this perspective, we propose to use deep learning methodologies in order to design synthetic yeast sequences controlling the local structure of the genome. In particular, I will present the methodology we have developed in order to design synthetic sequences precisely positioning nucleosomes - a molecule determining the structure of DNA at the lowest scale - in yeast. I will also show that this methodology opens up the prospect of designing sequences controlling the immediately higher level of structure: loops. The design of sequences controlling the local structure makes it possible to precisely identify the determinants of this structure
Etienne, Caroline. "Apprentissage profond appliqué à la reconnaissance des émotions dans la voix". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS517.
Testo completoThis thesis deals with the application of artificial intelligence to the automatic classification of audio sequences according to the emotional state of the customer during a commercial phone call. The goal is to improve on existing data preprocessing and machine learning models, and to suggest a model that is as efficient as possible on the reference IEMOCAP audio dataset. We draw from previous work on deep neural networks for automatic speech recognition, and extend it to the speech emotion recognition task. We are therefore interested in End-to-End neural architectures to perform the classification task including an autonomous extraction of acoustic features from the audio signal. Traditionally, the audio signal is preprocessed using paralinguistic features, as part of an expert approach. We choose a naive approach for data preprocessing that does not rely on specialized paralinguistic knowledge, and compare it with the expert approach. In this approach, the raw audio signal is transformed into a time-frequency spectrogram by using a short-term Fourier transform. In order to apply a neural network to a prediction task, a number of aspects need to be considered. On the one hand, the best possible hyperparameters must be identified. On the other hand, biases present in the database should be minimized (non-discrimination), for example by adding data and taking into account the characteristics of the chosen dataset. We study these aspects in order to develop an End-to-End neural architecture that combines convolutional layers specialized in the modeling of visual information with recurrent layers specialized in the modeling of temporal information. We propose a deep supervised learning model, competitive with the current state-of-the-art when trained on the IEMOCAP dataset, justifying its use for the rest of the experiments. This classification model consists of a four-layer convolutional neural networks and a bidirectional long short-term memory recurrent neural network (BLSTM). Our model is evaluated on two English audio databases proposed by the scientific community: IEMOCAP and MSP-IMPROV. A first contribution is to show that, with a deep neural network, we obtain high performances on IEMOCAP, and that the results are promising on MSP-IMPROV. Another contribution of this thesis is a comparative study of the output values of the layers of the convolutional module and the recurrent module according to the data preprocessing method used: spectrograms (naive approach) or paralinguistic indices (expert approach). We analyze the data according to their emotion class using the Euclidean distance, a deterministic proximity measure. We try to understand the characteristics of the emotional information extracted autonomously by the network. The idea is to contribute to research focused on the understanding of deep neural networks used in speech emotion recognition and to bring more transparency and explainability to these systems, whose decision-making mechanism is still largely misunderstood
Carbajal, Guillaume. "Apprentissage profond bout-en-bout pour le rehaussement de la parole". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0017.
Testo completoThis PhD falls within the development of hands-free telecommunication systems, more specifically smart speakers in domestic environments. The user interacts with another speaker at a far-end point and can be typically a few meters away from this kind of system. The microphones are likely to capture sounds of the environment which are added to the user's voice, such background noise, acoustic echo and reverberation. These types of distortion degrade speech quality, intelligibility and listening comfort for the far-end speaker, and must be reduced. Filtering methods can reduce individually each of these types of distortion. Reducing all of them implies combining the corresponding filtering methods. As these methods interact with each other which can deteriorate the user's speech, they must be jointly optimized. First of all, we introduce an acoustic echo reduction approach which combines an echo cancellation filter with a residual echo postfilter designed to adapt to the echo cancellation filter. To do so, we propose to estimate the postfilter coefficients using the short term spectra of multiple known signals, including the output of the echo cancellation filter, as inputs to a neural network. We show that this approach improves the performance and the robustness of the postfilter in terms of echo reduction, while limiting speech degradation, on several scenarios in real conditions. Secondly, we describe a joint approach for multichannel reduction of echo, reverberation and noise. We propose to simultaneously model the target speech and undesired residual signals after echo cancellation and dereveberation in a probabilistic framework, and to jointly represent their short-term spectra by means of a recurrent neural network. We develop a block-coordinate ascent algorithm to update the echo cancellation and dereverberation filters, as well as the postfilter that reduces the undesired residual signals. We evaluate our approach on real recordings in different conditions. We show that it improves speech quality and reduction of echo, reverberation and noise compared to a cascade of individual filtering methods and another joint reduction approach. Finally, we present an online version of our approach which is suitable for time-varying acoustic conditions. We evaluate the perceptual quality achieved on real examples where the user moves during the conversation
Bouindour, Samir. "Apprentissage profond appliqué à la détection d'événements anormaux dans les flux vidéos". Electronic Thesis or Diss., Troyes, 2019. http://www.theses.fr/2019TROY0036.
Testo completoThe use of surveillance cameras has increased considerably in recent years. This proliferation poses a major societal problem, which is the exploitation of the generated video streams. Currently, most of these data are being analyzed by human operators. However, several studies question the relevance of this approach. It is time-consuming and laborious for an operator to monitor surveillance videos for long time periods. Given recent advances in computer vision, particularly through deep learning, one solution to this problem consists in the development of intelligent systems that can support the human operator in the exploitation of this data. These intelligent systems will aim to model the normal behaviours of a monitored scene and detect any deviant event that could lead to a security breach. Within the context of this thesis entitled "Deep learning applied to the detection of abnormal events in video streams", we propose to develop algorithms based on deep learning for the detection and localization of abnormal video events that may reflect dangerous situations. The purpose is to extract robust spatial and temporal descriptors and define classification algorithms adapted to detect suspicious behaviour with the minimum possible number of false alarms, while ensuring a high detection rate
Dahmani, Sara. "Synthèse audiovisuelle de la parole expressive : modélisation des émotions par apprentissage profond". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0137.
Testo completo: The work of this thesis concerns the modeling of emotions for expressive audiovisual textto-speech synthesis. Today, the results of text-to-speech synthesis systems are of good quality, however audiovisual synthesis remains an open issue and expressive synthesis is even less studied. As part of this thesis, we present an emotions modeling method which is malleable and flexible, and allows us to mix emotions as we mix shades on a palette of colors. In the first part, we present and study two expressive corpora that we have built. The recording strategy and the expressive content of these corpora are analyzed to validate their use for the purpose of audiovisual speech synthesis. In the second part, we present two neural architectures for speech synthesis. We used these two architectures to model three aspects of speech : 1) the duration of sounds, 2) the acoustic modality and 3) the visual modality. First, we use a fully connected architecture. This architecture allowed us to study the behavior of neural networks when dealing with different contextual and linguistic descriptors. We were also able to analyze, with objective measures, the network’s ability to model emotions. The second neural architecture proposed is a variational auto-encoder. This architecture is able to learn a latent representation of emotions without using emotion labels. After analyzing the latent space of emotions, we presented a procedure for structuring it in order to move from a discrete representation of emotions to a continuous one. We were able to validate, through perceptual experiments, the ability of our system to generate emotions, nuances of emotions and mixtures of emotions, and this for expressive audiovisual text-to-speech synthesis
Deschamps, Sébastien. "Apprentissage actif profond pour la reconnaissance visuelle à partir de peu d’exemples". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS199.
Testo completoAutomatic image analysis has improved the exploitation of image sensors, with data coming from different sensors such as phone cameras, surveillance cameras, satellite imagers or even drones. Deep learning achieves excellent results in image analysis applications where large amounts of annotated data are available, but learning a new image classifier from scratch is a difficult task. Most image classification methods are supervised, requiring annotations, which is a significant investment. Different frugal learning solutions (with few annotated examples) exist, including transfer learning, active learning, semi-supervised learning or meta-learning. The goal of this thesis is to study these frugal learning solutions for visual recognition tasks, namely image classification and change detection in satellite images. The classifier is trained iteratively by starting with only a few annotated samples, and asking the user to annotate as little data as possible to obtain satisfactory performance. Deep active learning was initially studied with other methods and suited our operational problem the most, so we chose this solution. In this thesis, we have developed an interactive approach, where we ask the most informative questions about the relevance of the data to an oracle (annotator). Based on its answers, a decision function is iteratively updated. We model the probability that the samples are relevant, by minimizing an objective function capturing the representativeness, diversity and ambiguity of the data. Data with high probability are then selected for annotation. We have improved this approach, using reinforcement learning to dynamically and accurately weight the importance of representativeness, diversity and ambiguity of the data in each active learning cycle. Finally, our last approach consists of a display model that selects the most representative and diverse virtual examples, which adversely challenge the learned model, in order to obtain a highly discriminative model in subsequent iterations of active learning. The good results obtained against the different baselines and the state of the art in the tasks of satellite image change detection and image classification have demonstrated the relevance of the proposed frugal learning models, and have led to various publications (Sahbi et al. 2021; Deschamps and Sahbi 2022b; Deschamps and Sahbi 2022a; Sahbi and Deschamps2022)
Philip, Julien. "Édition et rendu à base d’images multi-vues par apprentissage profond et optimisation". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4048.
Testo completoComputer-generated imagery (CGI) takes a growing place in our everyday environment. Whether it is in video games or movies, CGI techniques are constantly improving in quality but also require ever more qualitative artistic content which takes a growing time to create. With the emergence of virtual and augmented reality, often comes the need to render or re-render assets that exist in our world. To allow widespread use of CGI in applications such as telepresence or virtual visits, the need for manual artistic replication of assets must be removed from the process. This can be done with the help of Image-Based Rendering (IBR) techniques that allow scenes or objects to be rendered in a free-viewpoint manner from a set of sparse input photographs. While this process requires little to no artistic work, it also does not allow for artistic control or editing of scene content. In this dissertation, we explore Multi-view Image Editing and Rendering. To allow casually captured scenes to be rendered with content alterations such as object removal, lighting edition, or scene compositing, we leverage the use of optimization techniques and modern deep-learning. We design our methods to take advantage of all the information present in multi-view content while handling specific constraints such as multi-view coherency. For object removal, we introduce a new plane-based multi-view inpainting algorithm. Planes are a simple yet effective way to fill geometry and they naturally enforce multi-view coherency as inpainting is computed in a shared rectified texture space, allowing us to correctly respect perspective. We demonstrate instance-based object removal at the scale of a street in scenes composed of several hundreds of images. We next address outdoor relighting with a learning-based algorithm that efficiently allows the illumination in a scene to be changed, while removing and synthesizing cast shadows for any given sun position and accounting for global illumination. An approximate geometric proxy built using multi-view stereo is used to generate illumination and shadow related image buffers that guide a neural network. We train this network on a set of synthetic scenes allowing full supervision of the learning pipeline. Careful data augmentation allows our network to transfer to real scenes and provides state of the art relighting results. We also demonstrate the capacity of this network to be used to compose real scenes captured under different lighting conditions and orientation. We then present contributions to image-based rendering quality. We discuss how our carefully designed depth-map meshing and simplification algorithm improve rendering performance and quality of a new learning-based IBR method. Finally, we present a method that combines relighting, IBR, and material analysis. To enable relightable IBR with accurate glossy effects, we extract both material appearance variations and qualitative texture information from multi-view content in the form of several IBR heuristics. We further combine them with path-traced irradiance images that specify the input and target lighting. This combination allows a neural network to be trained to implicitly extract material properties and produce realistic-looking relit viewpoints. Separating diffuse and specular supervision is crucial in obtaining high-quality output
Frizzi, Sebastien. "Apprentissage profond en traitement d'images : application pour la détection de fumée et feu". Electronic Thesis or Diss., Toulon, 2021. http://www.theses.fr/2021TOUL0007.
Testo completoResearchers have found a strong correlation between hot summers and the frequency and intensity of forestfires. Global warming due to greenhouse gases such as carbon dioxide is increasing the temperature in someparts of the world. Fires release large amounts of greenhouse gases, causing an increase in the earth'saverage temperature, which in turn causes an increase in forest fires... Fires destroy millions of hectares offorest areas, ecosystems sheltering numerous species and have a significant cost for our societies. Theprevention and control of fires must be a priority to stop this infernal spiral.In this context, smoke detection is very important because it is the first clue of an incipient fire. Fire andespecially smoke are difficult objects to detect in visible images due to their complexity in terms of shape, colorand texture. However, deep learning coupled with video surveillance can achieve this goal. Convolutionalneural network (CNN) architecture is able to detect smoke and fire in RGB images with very good accuracy.Moreover, these structures can segment smoke as well as fire in real time. The richness of the deep networklearning database is a very important element allowing a good generalization.This manuscript presents different deep architectures based on convolutional networks to detect and localizesmoke and fire in video images in the visible domain
Guesdon, Romain. "Estimation de poses humaines par apprentissage profond : application aux passagers des véhicules autonomes". Electronic Thesis or Diss., Lyon 2, 2024. http://www.theses.fr/2024LYO20002.
Testo completoResearch into autonomous cars has made great strides in recent decades, focusing particularly on analysis of the external environment and driving-related tasks. This has led to a significant increase in the autonomy of private vehicles. In this new context, it may be relevant to take an interest in the passengers of these autonomous vehicles, to study their behavior in the face of this revolution in the means of transport. The AURA AutoBehave project has been set up to explore these issues in greater depth. This project brings together several laboratories conducting research in different scientific disciplines linked to this theme, such as computer vision, biomechanics, emotions, and transport economics. This thesis carried out at the LIRIS laboratory is part of this project, in which we focus on methods for estimating the human poses of passengers using deep learning. We first looked at state-of-the-art solutions and developed both a dataset and a metric better suited to the constraints of our context. We also studied the visibility of the keypoints to help estimate the pose. We then tackled the problem of domain generalisation for pose estimation to propose an efficient solution under unknown conditions. Thus, we focused on the generation of synthetic passenger data for pose estimation. Among other things, we studied the application of generative networks and 3D modeling methods to our problem. We have used this data to propose different training strategies and two new network architectures. The proposed fusion approach associated with the training strategies makes it possible to take advantage of both generic and specific datasets, to improve the generalisation capabilities of pose estimation methods inside a car, particularly on the lower body
Peiffer, Elsa. "Implications des structures cérébrales profondes dans les apprentissages procéduraux". Lyon 1, 2000. http://www.theses.fr/2000LYO1T267.
Testo completoFranceschi, Jean-Yves. "Apprentissage de représentations et modèles génératifs profonds dans les systèmes dynamiques". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS014.
Testo completoThe recent rise of deep learning has been motivated by numerous scientific breakthroughs, particularly regarding representation learning and generative modeling. However, most of these achievements have been obtained on image or text data, whose evolution through time remains challenging for existing methods. Given their importance for autonomous systems to adapt in a constantly evolving environment, these challenges have been actively investigated in a growing body of work. In this thesis, we follow this line of work and study several aspects of temporality and dynamical systems in deep unsupervised representation learning and generative modeling. Firstly, we present a general-purpose deep unsupervised representation learning method for time series tackling scalability and adaptivity issues arising in practical applications. We then further study in a second part representation learning for sequences by focusing on structured and stochastic spatiotemporal data: videos and physical phenomena. We show in this context that performant temporal generative prediction models help to uncover meaningful and disentangled representations, and conversely. We highlight to this end the crucial role of differential equations in the modeling and embedding of these natural sequences within sequential generative models. Finally, we more broadly analyze in a third part a popular class of generative models, generative adversarial networks, under the scope of dynamical systems. We study the evolution of the involved neural networks with respect to their training time by describing it with a differential equation, allowing us to gain a novel understanding of this generative model
Zhang, Jian. "Modèles de Mobilité de Véhicules par Apprentissage Profond dans les Systèmes de Tranport Intelligents". Thesis, Ecole centrale de Lille, 2018. http://www.theses.fr/2018ECLI0015/document.
Testo completoThe intelligent transportation systems gain great research interests in recent years. Although the realistic traffic simulation plays an important role, it has not received enough attention. This thesis is devoted to studying the traffic simulation in microscopic level, and proposes corresponding vehicular mobility models. Using deep learning methods, these mobility models have been proven with a promising credibility to represent the vehicles in real-world. Firstly, a data-driven neural network based mobility model is proposed. This model comes from real-world trajectory data and allows mimicking local vehicle behaviors. By analyzing the performance of this basic learning based mobility model, we indicate that an improvement is possible and we propose its specification. An HMM is then introduced. The preparation of this integration is necessary, which includes an examination of traditional dynamics based mobility models and the adaptation method of “classical” models to our situation. At last, the enhanced model is presented, and a sophisticated scenario simulation is built with it to validate the theoretical results. The performance of our mobility model is promising and implementation issues have also been discussed
Jneid, Khoder. "Apprentissage par Renforcement Profond pour l'Optimisation du Contrôle et de la Gestion des Bâtiment". Electronic Thesis or Diss., Université Grenoble Alpes, 2023. http://www.theses.fr/2023GRALM062.
Testo completoHeating, ventilation, and air-conditioning (HVAC) systems account for high energy consumption in buildings. Conventional approaches used to control HVAC systems rely on rule-based control (RBC) that consists of predefined rules set by an expert. Model-predictive control (MPC), widely explored in literature, is not adopted in the industry since it is a model-based approach that requires to build models of the building at the first stage to be used in the optimization phase and thus is time-consuming and expensive. During the PhD, we investigate reinforcement learning (RL) to optimize the energy consumption of HVAC systems while maintaining good thermal comfort and good air quality. Specifically, we focus on model-free RL algorithms that learn through interaction with the environment (building including the HVAC) and thus not requiring to have accurate models of the environment. In addition, online approaches are considered. The main challenge of an online model-free RL is the number of days that are necessary for the algorithm to acquire enough data and actions feedback to start acting properly. Hence, the research subject of the PhD is boosting model-free RL algorithms to converge faster to make them applicable in real-world applications, HVAC control. Two approaches have been explored during the PhD to achieve our objective: the first approach combines RBC with value-based RL, and the second approach combines fuzzy rules with policy-based RL. Both approaches aim to boost the convergence of RL by guiding the RL policy but they are completely different. The first approach exploits RBC rules during training while in the second approach, the fuzzy rules are injected directly into the policy. Tests areperformed on a simulated office during winter. This simulated office is a replica of a real office at Grenoble INP
Furnon, Nicolas. "Apprentissage profond pour le rehaussement de la parole dans les antennes acoustiques ad-hoc". Electronic Thesis or Diss., Université de Lorraine, 2021. http://www.theses.fr/2021LORR0277.
Testo completoMore and more devices we use in our daily life are embedded with one or more microphones so that they can be voice controlled. Put together, these devices can form a so-called ad-hoc microphone array (AHMA). A speech enhancement step is often applied on the recorded signals to optimise the execution of the voice commands. To this effect, AHMAs are of high interest because of their flexible usage, their wide spatial coverage and the diversity of their recordings. However, it is challenging to exploit the potential of mbox{AHMAs} because devices that compose them may move and have a limited power and bandwidth capacity. Because of these limits, the speech enhancement solutions deployed in ``classic'' microphone arrays, relying on a fusion center and high processing loads, cannot be afforded.This thesis combines the modelling power of deep neural networks (DNNs) with the flexibility of use of AHMAs. To this end, we introduce a distributed speech enhancement system, which does not rely on a fusion center. So-called compressed signals are sent among the nodes and convey the spatial information recorded by the whole AHMA, while reducing the bandwidth requirements. DNNs are used to estimate the coefficients of a multichannel Wiener filter. We conduct an empirical analysis of this sytem, both on synthesized and real data, in order to validate its efficiency and to highlight the benefits of jointly using DNNs and distributed speech enhancement algorithms. We show that our system performs comparatively well compared with a state-of-the-art solution, while being more flexible and significantly reducing the computation cost.Besides, we develop our solution to adapt it to the typical usage conditions of mbox{AHMAs}. We study its behaviour when the number of devices in the AHMA varies. We introduce and compare a spatial attention mechanism and a self-attention mechanism. Both mechanisms make our system robust to a varying number of devices. We show that the weights of the self-attention mechanism reveal the utility of the information carried by each signal.We also analyse our system when the signals recorded by different devices are not synchronised. We propose a solution to improve its performance in such conditions by introducing a temporal attention mechanism. We show that this mechanism can help estimating the sampling time offset between the several devices of the AHMA.Lastly, we show that our system is also efficient for source separation. It can efficiently process the spatial information recorded by the whole AHMA in a typical meeting scenario and alleviate the needs of a complex DNN architecture
Khodji, Hiba. "Apprentissage profond et transfert de connaissances pour la détection d'erreurs dans les séquences biologiques". Electronic Thesis or Diss., Strasbourg, 2023. http://www.theses.fr/2023STRAD058.
Testo completoThe widespread use of high throughput technologies in the biomedical field is producing massive amounts of data, notably the new generation of genome sequencing technologies. Multiple Sequence Alignment (MSA) serves as a fundamental tool for the analysis of this data, with applications including genome annotation, protein structure and function prediction, or understanding evolutionary relationships, etc. However, the accuracy of MSA is often compromised due to factors such as unreliable alignment algorithms, inaccurate gene prediction, or incomplete genome sequencing. This thesis addresses the issue of data quality assessment by leveraging deep learning techniques. We propose novel models based on convolutional neural networks for the identification of errors in visual representations of MSAs. Our primary objective is to assist domain experts in their research studies, where the accuracy of MSAs is crucial. Therefore, we focused on providing reliable explanations for our model predictions by harnessing the potential of explainable artificial intelligence (XAI). Particularly, we leveraged visual explanations as a foundation for a transfer learning framework that aims essentially to improve a model's ability to focus on underlying features in an input. Finally, we proposed novel evaluation metrics designed to assess this ability. Initial findings suggest that our approach achieves a good balance between model complexity, performance, and explainability, and could be leveraged in domains where data availability is limited and the need for comprehensive result explanation is paramount
Zheng, Léon. "Frugalité en données et efficacité computationnelle dans l'apprentissage profond". Electronic Thesis or Diss., Lyon, École normale supérieure, 2024. http://www.theses.fr/2024ENSL0009.
Testo completoThis thesis focuses on two challenges of frugality and efficiency in modern deep learning: data frugality and computational resource efficiency. First, we study self-supervised learning, a promising approach in computer vision that does not require data annotations for learning representations. In particular, we propose a unification of several self-supervised objective functions under a framework based on rotation-invariant kernels, which opens up prospects to reduce the computational cost of these objective functions. Second, given that matrix multiplication is the predominant operation in deep neural networks, we focus on the construction of fast algorithms that allow matrix-vector multiplication with nearly linear complexity. More specifically, we examine the problem of sparse matrix factorization under the constraint of butterfly sparsity, a structure common to several fast transforms like the discrete Fourier transform. The thesis establishes new theoretical guarantees for butterfly factorization algorithms, and explores the potential of butterfly sparsity to reduce the computational costs of neural networks during their training or inference phase. In particular, we explore the efficiency of GPU implementations for butterfly sparse matrix multiplication, with the goal of truly accelerating sparse neural networks
Mollaret, Sébastien. "Artificial intelligence algorithms in quantitative finance". Thesis, Paris Est, 2021. http://www.theses.fr/2021PESC2002.
Testo completoArtificial intelligence has become more and more popular in quantitative finance given the increase of computer capacities as well as the complexity of models and has led to many financial applications. In the thesis, we have explored three different applications to solve financial derivatives challenges, from model selection, to model calibration and pricing. In Part I, we focus on a regime-switching model to price equity derivatives. The model parameters are estimated using the Expectation-Maximization (EM) algorithm and a local volatility component is added to fit vanilla option prices using the particle method. In Part II, we then use deep neural networks to calibrate a stochastic volatility model, where the volatility is modelled as the exponential of an Ornstein-Uhlenbeck process, by approximating the mapping between model parameters and corresponding implied volatilities offline. Once the expensive approximation has been performed offline, the calibration reduces to a standard & fast optimization problem.In Part III, we finally use deep neural networks to price American option on large baskets to solve the curse of the dimensionality. Different methods are studied with a Longstaff-Schwartz approach, where we approximate the continuation values, and a stochastic control approach, where we solve the pricing partial differential equation by reformulating the problem as a stochastic control problem using the non-linear Feynman-Kac formula
Mlynarski, Pawel. "Apprentissage profond pour la segmentation des tumeurs cérébrales et des organes à risque en radiothérapie". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4084.
Testo completoMedical images play an important role in cancer diagnosis and treatment. Oncologists analyze images to determine the different characteristics of the cancer, to plan the therapy and to observe the evolution of the disease. The objective of this thesis is to propose efficient methods for automatic segmentation of brain tumors and organs at risk in the context of radiotherapy planning, using Magnetic Resonance (MR) images. First, we focus on segmentation of brain tumors using Convolutional Neural Networks (CNN) trained on MRIs manually segmented by experts. We propose a segmentation model having a large 3D receptive field while being efficient in terms of computational complexity, based on combination of 2D and 3D CNNs. We also address problems related to the joint use of several MRI sequences (T1, T2, FLAIR). Second, we introduce a segmentation model which is trained using weakly-annotated images in addition to fully-annotated images (with voxelwise labels), which are usually available in very limited quantities due to their cost. We show that this mixed level of supervision considerably improves the segmentation accuracy when the number of fully-annotated images is limited.\\ Finally, we propose a methodology for an anatomy-consistent segmentation of organs at risk in the context of radiotherapy of brain tumors. The segmentations produced by our system on a set of MRIs acquired in the Centre Antoine Lacassagne (Nice, France) are evaluated by an experienced radiotherapist
Zheng, Qiao. "Apprentissage profond pour la segmentation robuste et l’analyse explicable des images cardiaques volumiques et dynamiques". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4013.
Testo completoCardiac MRI is widely used by cardiologists as it allows extracting rich information from images. However, if done manually, the information extraction process is tedious and time-consuming. Given the advance of artificial intelligence, I develop deep learning methods to address the automation of several essential tasks on cardiac MRI analysis. First, I propose a method based on convolutional neural networks to perform cardiac segmentation on short axis MRI image stacks. In this method, since the prediction of a segmentation of a slice is dependent upon the already existing segmentation of an adjacent slice, 3D-consistency and robustness is explicitly enforced. Second, I develop a method to classify cardiac pathologies, with a novel deep learning approach to extract image-derived features to characterize the shape and motion of the heart. In particular, the classification model is explainable, simple and flexible. Last but not least, the same feature extraction method is applied to an exceptionally large dataset (UK Biobank). Unsupervised cluster analysis is then performed on the extracted features in search of their further relation with cardiac pathology characterization. To conclude, I discuss several possible extensions of my research
De, Bois Maxime. "Apprentissage profond sous contraintes biomédicales pour la prédiction de la glycémie future de patients diabétiques". Electronic Thesis or Diss., université Paris-Saclay, 2020. http://www.theses.fr/2020UPASG065.
Testo completoDespite its recent successes in computer vision or machine translation, the use of deep learning in the biomedical field faces many challenges. Among them, we have the difficult access to data in sufficient quantity and quality, as well as the need of having interoperable and the interpretable models. In this thesis, we are interested in these different issues from the perspective of the creation of models predicting future glucose values of diabetic patients. Such models would allow patients to anticipate daily glucose variations, helping its regulation in order to avoid states of hypoglycemia or hyperglycemia.To this end, we use three datasets. While the first was collected during this thesis on several type-2 diabetic patients, the other two are composed of type-1 diabetic patients, both real and virtual. Across the studies, we use each patient’s past glucose, insulin, and carbohydrate data to build personalized models that predict the patient’s glucose values 30 minutes into the future.First, we do a detailed state-of-the-art analysis by building an open-source benchmark of glucosepredictive models. While promising, we highlight the difficulty deep models have in making predictions that are at the same time accurate and safe for the patient.In order to improve the clinical acceptability of the models, we investigate the integration of clinical constraints within the training of the models. We propose new cost functions enhancing the coherence of successive predictions. In addition, they enable the training to focus on clinically dangerous errors. We explore its practical use through an algorithm that enables the training of a model maximizing the precision of the predictions while respecting the clinical constraints set beforehand.Then, we study the use of transfer learning to improve the performance of glucose-predictive models. It eases the learning of personalized models by reusing the knowledge learned on other patients. In particular, we propose the adversarial multi-source transfer learning framework. It significantly improves the performance of the models by allowing the learning of a priori knowledge which is more general, by being agnostic of the patients that are the source of the transfer. We investigate different transfer scenarios through the use of our three datasets. We show that it is possible to transfer knowledge using data coming from different experimental devices, from patients of different types of diabetes, but also from virtual patients.Finally, we are interested in improving the interpretability of deep models through the attention mechanism. In particular, we explore the use of a deep and interpretable model for the prediction of glucose. It implements a double attention mechanism enabling the estimation of the contribution of each input variable to the model to the final prediction. We empirically show the value of such a model for the prediction of glucose by analyzing its behavior in the computation of its predictions
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066056/document.
Testo completoThis thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Droniou, Alain. "Apprentissage de représentations et robotique développementale : quelques apports de l'apprentissage profond pour la robotique autonome". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066056.
Testo completoThis thesis studies the use of deep neural networks to learn high level representations from raw inputs on robots, based on the "manifold hypothesis"
Bouchama, Lyes. "Apport des techniques d'apprentissage (profond) à la microscopie holographique pour applications médicales". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAS022.
Testo completoThis research is part of the Télécom SudParis (TSP) and TRIBVN/T-life strategic partnership, dedicated to the development of new approaches in optical microscopy, coupled with artificial intelligence, to identify, predict and monitor hematological and parasitological pathologies.In this regard, we developed a prototype microscope based on a computational imaging principle with a synthetic aperture, based on the FPM (Fourier Ptychographic Microscopy) approach. This approach makes it possible to overcome conventional optics' resolution limits, or equivalently access very large fields of view (from 4 to 25 times larger) at fixed resolution. It also enables us to diversify the nature of the data acquired (with phase recording in addition to intensity data).However, despite its promise, the technology faces challenges in widespread adoption and commercialization within the microscopy field, primarily due to constraints such as the time-intensive process required for image acquisition and reconstruction to achieve optimal quality.The research conducted in this thesis has led to substantial advancements in overcoming certain limitations of this technology, leveraging models based on neural networks.We have proposed an efficient automatic refocusing of bimodal images over large fields of view, thanks to post-processing based on a U-Net. We have also proposed an original approach, combining statistical learning and physics-driven optimization to reduce image acquisition and reconstruction times.These frameworks have validated their efficacy by yielding more precise and discriminating diagnoses in the fields of parasitology and haematology.The potential applications of these contributions go far beyond the field of FPM, opening up perspectives in various other fields of computational imaging