Дисертації з теми "Réseaux convolutionnels"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-28 дисертацій для дослідження на тему "Réseaux convolutionnels".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Bietti, Alberto. "Méthodes à noyaux pour les réseaux convolutionnels profonds." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM051.
Повний текст джерелаThe increased availability of large amounts of data, from images in social networks, speech waveforms from mobile devices, and large text corpuses, to genomic and medical data, has led to a surge of machine learning techniques. Such methods exploit statistical patterns in these large datasets for making accurate predictions on new data. In recent years, deep learning systems have emerged as a remarkably successful class of machine learning algorithms, which rely on gradient-based methods for training multi-layer models that process data in a hierarchical manner. These methods have been particularly successful in tasks where the data consists of natural signals such as images or audio; this includes visual recognition, object detection or segmentation, and speech recognition.For such tasks, deep learning methods often yield the best known empirical performance; yet, the high dimensionality of the data and large number of parameters of these models make them challenging to understand theoretically. Their success is often attributed in part to their ability to exploit useful structure in natural signals, such as local stationarity or invariance, for instance through choices of network architectures with convolution and pooling operations. However, such properties are still poorly understood from a theoretical standpoint, leading to a growing gap between the theory and practice of machine learning. This thesis is aimed towards bridging this gap, by studying spaces of functions which arise from given network architectures, with a focus on the convolutional case. Our study relies on kernel methods, by considering reproducing kernel Hilbert spaces (RKHSs) associated to certain kernels that are constructed hierarchically based on a given architecture. This allows us to precisely study smoothness, invariance, stability to deformations, and approximation properties of functions in the RKHS. These representation properties are also linked with optimization questions when training deep networks with gradient methods in some over-parameterized regimes where such kernels arise. They also suggest new practical regularization strategies for obtaining better generalization performance on small datasets, and state-of-the-art performance for adversarial robustness on image tasks
Ogier, du Terrail Jean. "Réseaux de neurones convolutionnels profonds pour la détection de petits véhicules en imagerie aérienne." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC276/document.
Повний текст джерелаThe following manuscript is an attempt to tackle the problem of small vehicles detection in vertical aerial imagery through the use of deep learning algorithms. The specificities of the matter allows the use of innovative techniques leveraging the invariance and self similarities of automobiles/planes vehicles seen from the sky.We will start by a thorough study of single shot detectors. Building on that we will examine the effect of adding multiple stages to the detection decision process. Finally we will try to come to grips with the domain adaptation problem in detection through the generation of better looking synthetic data and its use in the training process of these detectors
Morvan, Ludivine. "Prédiction de la progression du myélome multiple par imagerie TEP : Adaptation des forêts de survie aléatoires et de réseaux de neurones convolutionnels." Thesis, Ecole centrale de Nantes, 2021. http://www.theses.fr/2021ECDN0045.
Повний текст джерелаThe aim of this work is to provide a model for survival prediction and biomarker identification in the context of multiple myeloma (MM) using PET (Positron Emission Tomography) imaging and clinical data. This PhD is divided into two parts: The first part provides a model based on Random Survival Forests (RSF). The second part is based on the adaptation of deep learning to survival and to our data. The main contributions are the following: 1) Production of a model based on RSF and PET images allowing the prediction of a risk group for multiple myeloma patients. 2) Determination of biomarkers using this model.3) Demonstration of the interest of PET radiomics.4) Extension of the state of the art of methods for the adaptation of deep learning to a small database and small images. 5) Study of the cost functions used in survival. In addition, we are, to our knowledge, the first to investigate the use of RSFs in the context of MM and PET images, to use self-supervised pre-training with PET images, and, with a survival task, to fit the triplet cost function to survival and to fit a convolutional neural network to MM survival from PET lesions
Guedria, Soulaimane. "Une plateforme d'apprentissage profond à base de composants qui passe à l'échelle : une application aux réseaux de neurones convolutionnels pour la segmentation en imagerie médicale." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM023.
Повний текст джерелаDeep neural networks (DNNs) and particularly convolutional neural networks (CNNs) trained on large datasets are getting great success across a plethora of paramount applications. It has been providing powerful solutions and revolutionizing medicine, particularly, in the medical image analysis field. However, deep learning field comes up with multiple challenges: (1) training Convolutional Neural Networks (CNNs) is a computationally intensive and time-consuming task (2) introducing parallelism to CNNs in practice as it is a tedious, repetitive and error-prone process and (3) there is currently no broad study of the generalizability and the reproducibility of the CNN parallelism techniques on concrete medical imaging segmentation applications.Within this context, the present PhD thesis aims to tackle the aforementioned challenges. To achieve this goal, we conceived, implemented and validated an all-in-one scalable and component-based deep learning parallelism platform for medical imaging segmentation. First, we introduce R2D2, an end-to-end scalable deep learning toolkit for medical imaging segmentation. R2D2 proposes a set of new distributed versions of widely-used deep learning architectures (FCN and U-Net) in order to speed up building new distributive deep learning models and reduce the gap between researchers and talent-intensive deep learning. Next, this thesis also introduces Auto-CNNp, a component-based software framework to automate CNN parallelism throughout encapsulating and hiding typical CNNs parallelization routine tasks within a backbone structure while being extensible for user-specific customization. The evaluation results of our proposed automated component-based approach are promising. It shows that a significant speedup in the CNN parallelization task has been achieved to the detriment of a negligible framework execution time, compared to the manual parallelization strategy. The previously introduced couple of software solutions (R2D2 and Auto-CNNp) at our disposal led us to conduct a thorough and practical analysis of the generalizability of the CNN parallelism techniques to the imaging segmentation applications. Concurrently, we perform an in-depth literature review aiming to identify the sources of variability and study reproducibility issues of deep learning training process for particular CNNs training configurations applied for medical imaging segmentation. We also draw a set of good practices recommendations aiming to alleviate the aforementioned reproducibility issues for medical imaging segmentation DNNs training process. Finally, we make a number of observations based on a broad analysis of the results of the already conducted CNN parallelism experimental study which led us to propose a guideline and recommendations for scaling up CNNs for segmentation applications. We succeeded to eliminate the accuracy loss with scale for the U-Net CNN architecture and alleviate the accuracy degradation for the FCN CNN architecture
Douillard, Arthur. "Continual Learning for Computer Vision." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS165.
Повний текст джерелаI first review the existing methods based on regularization for continual learning. While regularizing a model's probabilities is very efficient to reduce forgetting in large-scale datasets, there are few works considering constraints on intermediate features. I cover in this chapter two contributions aiming to regularize directly the latent space of ConvNet. The first one, PODNet, aims to reduce the drift of spatial statistics between the old and new model, which in effect reduces drastically forgetting of old classes while enabling efficient learning of new classes. I show in a second part a complementary method where we avoid pre-emptively forgetting by allocating locations in the latent space for yet unseen future class. Then, I describe a recent application of CIL to semantic segmentation. I show that the very nature of CSS offer new specific challenges, namely forgetting on large images and a background shift. We tackle the first problem by extending our distillation loss introduced in the previous chapter to multi-scales. The second problem is solved by an efficient pseudo-labeling strategy. Finally, we consider the common rehearsal learning, but applied this time to CSS. I show that it cannot be used naively because of memory complexity and design a light-weight rehearsal that is even more efficient. Finally, I consider a completely different approach to continual learning: dynamic networks where the parameters are extended during training to adapt to new tasks. Previous works on this domain are hard to train and often suffer from parameter count explosion. For the first time in continual computer vision, we propose to use the Transformer architecture: the model dimension mostly fixed and shared across tasks, except for an expansion of learned task tokens. With an encoder/decoder strategy where the decoder forward is specialized by a task token, we show state-of-the-art robustness to forgetting while our memory and computational complexities barely grow
Chevalier, Marion. "Résolution variable et information privilégiée pour la reconnaissance d'images." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066726/document.
Повний текст джерелаImage classification has a prominent interest in numerous visual recognition tasks, particularly for vehicle recognition in airborne systems, where the images have a low resolution because of the large distance between the system and the observed scene. During the training phase, complementary data such as knowledge on the position of the system or high-resolution images may be available. In our work, we focus on the task of low-resolution image classification while taking into account supplementary information during the training phase. We first show the interest of deep convolutional networks for the low-resolution image recognition, especially by proposing an architecture learned on the targeted data. On the other hand, we rely on the framework of learning using privileged information to benefit from the complementary training data, here the high-resolution versions of the images. We propose two novel methods for integrating privileged information in the learning phase of neural networks. Our first model relies on these complementary data to compute an absolute difficulty level, assigning a large weight to the most easily recognized images. Our second model introduces a similarity constraint between the networks learned on each type of data. We experimentally validate our models on several application cases, especially in a fine-grained oriented context and on a dataset containing annotation noise
Saxena, Shreyas. "Apprentissage de représentations pour la reconnaissance visuelle." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM080/document.
Повний текст джерелаIn this dissertation, we propose methods and data driven machine learning solutions which address and benefit from the recent overwhelming growth of digital media content.First, we consider the problem of improving the efficiency of image retrieval. We propose a coordinated local metric learning (CLML) approach which learns local Mahalanobis metrics, and integrates them in a global representation where the l2 distance can be used. This allows for data visualization in a single view, and use of efficient ` 2 -based retrieval methods. Our approach can be interpreted as learning a linear projection on top of an explicit high-dimensional embedding of a kernel. This interpretation allows for the use of existing frameworks for Mahalanobis metric learning for learning local metrics in a coordinated manner. Our experiments show that CLML improves over previous global and local metric learning approaches for the task of face retrieval.Second, we present an approach to leverage the success of CNN models forvisible spectrum face recognition to improve heterogeneous face recognition, e.g., recognition of near-infrared images from visible spectrum training images. We explore different metric learning strategies over features from the intermediate layers of the networks, to reduce the discrepancies between the different modalities. In our experiments we found that the depth of the optimal features for a given modality, is positively correlated with the domain shift between the source domain (CNN training data) and the target domain. Experimental results show the that we can use CNNs trained on visible spectrum images to obtain results that improve over the state-of-the art for heterogeneous face recognition with near-infrared images and sketches.Third, we present convolutional neural fabrics for exploring the discrete andexponentially large CNN architecture space in an efficient and systematic manner. Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyperparameters of the fabric (the number of channels and layers) are not critical for performance. The acyclic nature of the fabric allows us to use backpropagation for learning. Learning can thus efficiently configure the fabric to implement each one of exponentially many architectures and, more generally, ensembles of all of them. While scaling linearly in terms of computation and memory requirements, the fabric leverages exponentially many chain-structured architectures in parallel by massively sharing weights between them. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset
Tang, Yuxing. "Weakly supervised learning of deformable part models and convolutional neural networks for object detection." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEC062/document.
Повний текст джерелаIn this dissertation we address the problem of weakly supervised object detection, wherein the goal is to recognize and localize objects in weakly-labeled images where object-level annotations are incomplete during training. To this end, we propose two methods which learn two different models for the objects of interest. In our first method, we propose a model enhancing the weakly supervised Deformable Part-based Models (DPMs) by emphasizing the importance of location and size of the initial class-specific root filter. We first compute a candidate pool that represents the potential locations of the object as this root filter estimate, by exploring the generic objectness measurement (region proposals) to combine the most salient regions and “good” region proposals. We then propose learning of the latent class label of each candidate window as a binary classification problem, by training category-specific classifiers used to coarsely classify a candidate window into either a target object or a non-target class. Furthermore, we improve detection by incorporating the contextual information from image classification scores. Finally, we design a flexible enlarging-and-shrinking post-processing procedure to modify the DPMs outputs, which can effectively match the approximate object aspect ratios and further improve final accuracy. Second, we investigate how knowledge about object similarities from both visual and semantic domains can be transferred to adapt an image classifier to an object detector in a semi-supervised setting on a large-scale database, where a subset of object categories are annotated with bounding boxes. We propose to transform deep Convolutional Neural Networks (CNN)-based image-level classifiers into object detectors by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We have evaluated both our approaches extensively on several challenging detection benchmarks, e.g. , PASCAL VOC, ImageNet ILSVRC and Microsoft COCO. Both our approaches compare favorably to the state-of-the-art and show significant improvement over several other recent weakly supervised detection methods
Tsogkas, Stavros. "Mid-level representations for modeling objects." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLC012/document.
Повний текст джерелаIn this thesis we propose the use of mid-level representations, and in particular i) medial axes, ii) object parts, and iii)convolutional features, for modelling objects.The first part of the thesis deals with detecting medial axes in natural RGB images. We adopt a learning approach, utilizing colour, texture and spectral clustering features, to build a classifier that produces a dense probability map for symmetry. Multiple Instance Learning (MIL) allows us to treat scale and orientation as latent variables during training, while a variation based on random forests offers significant gains in terms of running time.In the second part of the thesis we focus on object part modeling using both hand-crafted and learned feature representations. We develop a coarse-to-fine, hierarchical approach that uses probabilistic bounds for part scores to decrease the computational cost of mixture models with a large number of HOG-based templates. These efficiently computed probabilistic bounds allow us to quickly discard large parts of the image, and evaluate the exact convolution scores only at promising locations. Our approach achieves a $4times-5times$ speedup over the naive approach with minimal loss in performance.We also employ convolutional features to improve object detection. We use a popular CNN architecture to extract responses from an intermediate convolutional layer. We integrate these responses in the classic DPM pipeline, replacing hand-crafted HOG features, and observe a significant boost in detection performance (~14.5% increase in mAP).In the last part of the thesis we experiment with fully convolutional neural networks for the segmentation of object parts.We re-purpose a state-of-the-art CNN to perform fine-grained semantic segmentation of object parts and use a fully-connected CRF as a post-processing step to obtain sharp boundaries.We also inject prior shape information in our model through a Restricted Boltzmann Machine, trained on ground-truth segmentations.Finally, we train a new fully-convolutional architecture from a random initialization, to segment different parts of the human brain in magnetic resonance image data.Our methods achieve state-of-the-art results on both types of data
Laifa, Oumeima. "A joint discriminative-generative approach for tumour angiogenesis assessment in computational pathology." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS230.
Повний текст джерелаAngiogenesis is the process through which new blood vessels are formed from pre-existing ones. During angiogenesis, tumour cells secrete growth factors that activate the proliferation and migration of endothelial cells and stimulate over production of the vascular endothelial growth factor (VEGF). The fundamental role of vascular supply in tumour growth and anti-cancer therapies makes the evaluation of angiogenesis crucial in assessing the effect of anti-angiogenic therapies as a promising anti-cancer therapy. In this study, we establish a quantitative and qualitative panel to evaluate tumour blood vessels structures on non-invasive fluorescence images and histopathological slide across the full tumour to identify architectural features and quantitative measurements that are often associated with prediction of therapeutic response. We develop a Markov Random Field (MFRs) and Watershed framework to segment blood vessel structures and tumour micro-enviroment components to assess quantitatively the effect of the anti-angiogenic drug Pazopanib on the tumour vasculature and the tumour micro-enviroment interaction. The anti-angiogenesis agent Pazopanib was showing a direct effect on tumour network vasculature via the endothelial cells crossing the whole tumour. Our results show a specific relationship between apoptotic neovascularization and nucleus density in murine tumor treated by Pazopanib. Then, qualitative evaluation of tumour blood vessels structures is performed in whole slide images, known to be very heterogeneous. We develop a discriminative-generative neural network model based on both learning driven model convolutional neural network (CNN), and rule-based knowledge model Marked Point Process (MPP) to segment blood vessels in very heterogeneous images using very few annotated data comparing to the state of the art. We detail the intuition and the design behind the discriminative-generative model, and we analyze its similarity with Generative Adversarial Network (GAN). Finally, we evaluate the performance of the proposed model on histopathology slide and synthetic data. The limits of this promising framework as its perspectives are shown
Haj, Hassan Hawraa. "Détection et classification temps réel de biocellules anormales par technique de segmentation d’images." Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0043.
Повний текст джерелаDevelopment of methods for help diagnosis of the real time detection of abnormal cells (which can be considered as cancer cells) through bio-image processing and detection are most important research directions in information science and technology. Our work has been concerned by developing automatic reading procedures of the normal and abnormal bio-images tissues. Therefore, the first step of our work is to detect a certain type of abnormal bio-images associated to many types evolution of cancer within a Microscopic multispectral image, which is an image, repeated in many wavelengths. And using a new segmentation method that reforms itself in an iterative adaptive way to localize and cover the real cell contour, using some segmentation techniques. It is based on color intensity and can be applied on sequences of objects in the image. This work presents a classification of the abnormal tissues using the Convolution neural network (CNN), where it was applied on the microscopic images segmented using the snake method, which gives a high performance result with respect to the other segmentation methods. This classification method reaches high performance values, where it reaches 100% for training and 99.168% for testing. This method was compared to different papers that uses different feature extraction, and proved its high performance with respect to other methods. As a future work, we will aim to validate our approach on a larger datasets, and to explore different CNN architectures and the optimization of the hyper-parameters, in order to increase its performance, and it will be applied to relevant medical imaging tasks including computer-aided diagnosis
Zhang, Wuming. "Towards non-conventional face recognition : shadow removal and heterogeneous scenario." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEC030/document.
Повний текст джерелаIn recent years, biometrics have received substantial attention due to the evergrowing need for automatic individual authentication. Among various physiological biometric traits, face offers unmatched advantages over the others, such as fingerprints and iris, because it is natural, non-intrusive and easily understandable by humans. Nowadays conventional face recognition techniques have attained quasi-perfect performance in a highly constrained environment wherein poses, illuminations, expressions and other sources of variations are strictly controlled. However these approaches are always confined to restricted application fields because non-ideal imaging environments are frequently encountered in practical cases. To adaptively address these challenges, this dissertation focuses on this unconstrained face recognition problem, where face images exhibit more variability in illumination. Moreover, another major question is how to leverage limited 3D shape information to jointly work with 2D based techniques in a heterogeneous face recognition system. To deal with the problem of varying illuminations, we explicitly build the underlying reflectance model which characterizes interactions between skin surface, lighting source and camera sensor, and elaborate the formation of face color. With this physics-based image formation model involved, an illumination-robust representation, namely Chromaticity Invariant Image (CII), is proposed which can subsequently help reconstruct shadow-free and photo-realistic color face images. Due to the fact that this shadow removal process is achieved in color space, this approach could thus be combined with existing gray-scale level lighting normalization techniques to further improve face recognition performance. The experimental results on two benchmark databases, CMU-PIE and FRGC Ver2.0, demonstrate the generalization ability and robustness of our approach to lighting variations. We further explore the effective and creative use of 3D data in heterogeneous face recognition. In such a scenario, 3D face is merely available in the gallery set and not in the probe set, which one would encounter in real-world applications. Two Convolutional Neural Networks (CNN) are constructed for this purpose. The first CNN is trained to extract discriminative features of 2D/3D face images for direct heterogeneous comparison, while the second CNN combines an encoder-decoder structure, namely U-Net, and Conditional Generative Adversarial Network (CGAN) to reconstruct depth face image from its counterpart in 2D. Specifically, the recovered depth face images can be fed to the first CNN as well for 3D face recognition, leading to a fusion scheme which achieves gains in recognition performance. We have evaluated our approach extensively on the challenging FRGC 2D/3D benchmark database. The proposed method compares favorably to the state-of-the-art and show significant improvement with the fusion scheme
Tarando, Sebastian Roberto. "Quantitative follow-up of pulmonary diseases using deep learning models." Thesis, Evry, Institut national des télécommunications, 2018. http://www.theses.fr/2018TELE0008/document.
Повний текст джерелаInfiltrative lung diseases (ILDs) enclose a large group of irreversible lung disorders which require regular follow-up with computed tomography (CT) imaging. A quantitative assessment is mandatory to establish the (regional) disease progression and/or the therapeutic impact. This implies the development of automated computer-aided diagnosis (CAD) tools for pathological lung tissue segmentation, problem addressed as pixel-based texture classification. Traditionally, such classification relies on a two-dimensional analysis of axial CT images by means of handcrafted features. Recently, the use of deep learning techniques, especially Convolutional Neural Networks (CNNs) for visual tasks, has shown great improvements with respect to handcrafted heuristics-based methods. However, it has been demonstrated the limitations of "classic" CNN architectures when applied to texture-based datasets, due to their inherently higher dimension compared to handwritten digits or other object recognition datasets, implying the need of redesigning the network or enriching the system to learn meaningful textural features from input data. This work addresses an automated quantitative assessment of different disorders based on lung texture classification. The proposed approach exploits a cascade of CNNs (specially redesigned for texture categorization) for a hierarchical classification and a specific preprocessing of input data based on locally connected filtering (applied to the lung images to attenuate the vessel densities while preserving high opacities related to pathologies). The classification targeting the whole lung parenchyma achieves an average of 84% accuracy (75.8% for normal, 90% for emphysema and fibrosis, 81.5% for ground glass)
Cohen-Hadria, Alice. "Estimation de descriptions musicales et sonores par apprentissage profond." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS607.
Повний текст джерелаIn Music Information Retrieval (MIR) and voice processing, the use of machine learning tools has become in the last few years more and more standard. Especially, many state-of-the-art systems now rely on the use of Neural Networks.In this thesis, we propose a wide overview of four different MIR and voice processing tasks, using systems built with neural networks. More precisely, we will use convolutional neural networks, an image designed class neural networks. The first task presented is music structure estimation. For this task, we will show how the choice of input representation can be critical, when using convolutional neural networks. The second task is singing voice detection. We will present how to use a voice detection system to automatically align lyrics and audio tracks.With this alignment mechanism, we have created the largest synchronized audio and speech data set, called DALI. Singing voice separation is the third task. For this task, we will present a data augmentation strategy, a way to significantly increase the size of a training set. Finally, we tackle voice anonymization. We will present an anonymization method that both obfuscate content and mask the speaker identity, while preserving the acoustic scene
Nguyen, Thanh Hai. "Some contributions to deep learning for metagenomics." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS102.
Повний текст джерелаMetagenomic data from human microbiome is a novel source of data for improving diagnosis and prognosis in human diseases. However, to do a prediction based on individual bacteria abundance is a challenge, since the number of features is much bigger than the number of samples. Hence, we face the difficulties related to high dimensional data processing, as well as to the high complexity of heterogeneous data. Machine Learning has obtained great achievements on important metagenomics problems linked to OTU-clustering, binning, taxonomic assignment, etc. The contribution of this PhD thesis is multi-fold: 1) a feature selection framework for efficient heterogeneous biomedical signature extraction, and 2) a novel deep learning approach for predicting diseases using artificial image representations. The first contribution is an efficient feature selection approach based on visualization capabilities of Self-Organizing Maps for heterogeneous data fusion. The framework is efficient on a real and heterogeneous datasets containing metadata, genes of adipose tissue, and gut flora metagenomic data with a reasonable classification accuracy compared to the state-of-the-art methods. The second approach is a method to visualize metagenomic data using a simple fill-up method, and also various state-of-the-art dimensional reduction learning approaches. The new metagenomic data representation can be considered as synthetic images, and used as a novel data set for an efficient deep learning method such as Convolutional Neural Networks. The results show that the proposed methods either achieve the state-of-the-art predictive performance, or outperform it on public rich metagenomic benchmarks
Weinzaepfel, Philippe. "Le mouvement en action : estimation du flot optique et localisation d'actions dans les vidéos." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM013/document.
Повний текст джерелаWith the recent overwhelming growth of digital video content, automatic video understanding has become an increasingly important issue.This thesis introduces several contributions on two automatic video understanding tasks: optical flow estimation and human action localization.Optical flow estimation consists in computing the displacement of every pixel in a video andfaces several challenges including large non-rigid displacements, occlusions and motion boundaries.We first introduce an optical flow approach based on a variational model that incorporates a new matching method.The proposed matching algorithm is built upon a hierarchical multi-layer correlational architecture and effectively handles non-rigid deformations and repetitive textures.It improves the flow estimation in the presence of significant appearance changes and large displacements.We also introduce a novel scheme for estimating optical flow based on a sparse-to-dense interpolation of matches while respecting edges.This method leverages an edge-aware geodesic distance tailored to respect motion boundaries and to handle occlusions.Furthermore, we propose a learning-based approach for detecting motion boundaries.Motion boundary patterns are predicted at the patch level using structured random forests.We experimentally show that our approach outperforms the flow gradient baseline on both synthetic data and real-world videos,including an introduced dataset with consumer videos.Human action localization consists in recognizing the actions that occur in a video, such as `drinking' or `phoning', as well as their temporal and spatial extent.We first propose a novel approach based on Deep Convolutional Neural Network.The method extracts class-specific tubes leveraging recent advances in detection and tracking.Tube description is enhanced by spatio-temporal local features.Temporal detection is performed using a sliding window scheme inside each tube.Our approach outperforms the state of the art on challenging action localization benchmarks.Second, we introduce a weakly-supervised action localization method, ie, which does not require bounding box annotation.Action proposals are computed by extracting tubes around the humans.This is performed using a human detector robust to unusual poses and occlusions, which is learned on a human pose benchmark.A high recall is reached with only several human tubes, allowing to effectively apply Multiple Instance Learning.Furthermore, we introduce a new dataset for human action localization.It overcomes the limitations of existing benchmarks, such as the diversity and the duration of the videos.Our weakly-supervised approach obtains results close to fully-supervised ones while significantly reducing the required amount of annotations
Varol, Gül. "Learning human body and human action representations from visual data." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE029.
Повний текст джерелаThe focus of visual content is often people. Automatic analysis of people from visual data is therefore of great importance for numerous applications in content search, autonomous driving, surveillance, health care, and entertainment. The goal of this thesis is to learn visual representations for human understanding. Particular emphasis is given to two closely related areas of computer vision: human body analysis and human action recognition. In summary, our contributions are the following: (i) we generate photo-realistic synthetic data for people that allows training CNNs for human body analysis, (ii) we propose a multi-task architecture to recover a volumetric body shape from a single image, (iii) we study the benefits of long-term temporal convolutions for human action recognition using 3D CNNs, (iv) we incorporate similarity training in multi-view videos to design view-independent representations for action recognition
Grard, Matthieu. "Generic instance segmentation for object-oriented bin-picking." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEC015.
Повний текст джерелаReferred to as robotic random bin-picking, a fast-expanding industrial task consists in robotizing the unloading of many object instances piled up in bulk, one at a time, for further processing such as kitting or part assembling. However, explicit object models are not always available in many bin-picking applications, especially in the food and automotive industries. Furthermore, object instances are often subject to intra-class variations, for example due to elastic deformations.Object pose estimation techniques, which require an explicit model and assume rigid transformations, are therefore not suitable in such contexts. The alternative approach, which consists in detecting grasps without an explicit notion of object, proves hardly efficient when the object geometry makes bulk instances prone to occlusion and entanglement. These approaches also typically rely on a multi-view scene reconstruction that may be unfeasible due to transparent and shiny textures, or that reduces critically the time frame for image processing in high-throughput robotic applications.In collaboration with Siléane, a French company in industrial robotics, we thus aim at developing a learning-based solution for localizing the most affordable instance of a pile from a single image, in open loop, without explicit object models. In the context of industrial bin-picking, our contribution is two-fold.First, we propose a novel fully convolutional network (FCN) for jointly delineating instances and inferring the spatial layout at their boundaries. Indeed, the state-of-the-art methods for such a task rely on two independent streams for boundaries and occlusions respectively, whereas occlusions often cause boundaries. Specifically, the mainstream approach, which consists in isolating instances in boxes before detecting boundaries and occlusions, fails in bin-picking scenarios as a rectangle region often includes several instances. By contrast, our box proposal-free architecture recovers fine instance boundaries, augmented with their occluding side, from a unified scene representation. As a result, the proposed network outperforms the two-stream baselines on synthetic data and public real-world datasets.Second, as FCNs require large training datasets that are not available in bin-picking applications, we propose a simulation-based pipeline for generating training images using physics and rendering engines. Specifically, piles of instances are simulated and rendered with their ground-truth annotations from sets of texture images and meshes to which multiple random deformations are applied. We show that the proposed synthetic data is plausible for real-world applications in the sense that it enables the learning of deep representations transferable to real data. Through extensive experiments on a real-world robotic setup, our synthetically trained network outperforms the industrial baseline while achieving real-time performances. The proposed approach thus establishes a new baseline for model-free object-oriented bin-picking
Bailly, Adeline. "Classification de séries temporelles avec applications en télédétection." Thesis, Rennes 2, 2018. http://www.theses.fr/2018REN20021/document.
Повний текст джерелаTime Series Classification (TSC) has received an important amount of interest over the past years due to many real-life applications. In this PhD, we create new algorithms for TSC, with a particular emphasis on Remote Sensing (RS) time series data. We first propose the Dense Bag-of-Temporal-SIFT-Words (D-BoTSW) method that uses dense local features based on SIFT features for 1D data. Extensive experiments exhibit that D-BoTSW significantly outperforms nearly all compared standalone baseline classifiers. Then, we propose an enhancement of the Learning Time Series Shapelets (LTS) algorithm called Adversarially-Built Shapelets (ABS) based on the introduction of adversarial time series during the learning process. Adversarial time series provide an additional regularization benefit for the shapelets and experiments show a performance improvementbetween the baseline and our proposed framework. Due to the lack of available RS time series datasets,we also present and experiment on two remote sensing time series datasets called TiSeLaCand Brazilian-Amazon
Moukari, Michel. "Estimation de profondeur à partir d'images monoculaires par apprentissage profond." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC211/document.
Повний текст джерелаComputer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation
Martin, Victor. "Computing methods for facial aging prevention and prediction." Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0014.
Повний текст джерелаThe use of computer simulation to understand how human faces age has been a growing area of research since decades. It has been applied to the search for missing children as well as to the fields of entertainment, cosmetics and dermatology research. Our objective is to elaborate a model for the age-related changes of facial cues which affect the perception of age, so that we may better predict them. In this work, a new framework to make a face age is proposed: Wrinkle Oriented Active Appearance Model. First, faces are decomposed in terms of appearance and shape using Active Appearance Model. In addition, wrinkles in each face are transformed in appearance and shape parameters.A new effective way to model the distribution of wrinkle parameters in a face is introduced. Finally, it is shown that artificially aged faces produced by the system better influence age perception than those produced by two other systems. This framework is a first step in the construction of a more accurate facial aging system. In addition, a new health estimation system using a convolutional neural network is introduced. This system is able to estimate how a face is perceived in terms of health by humans. It is shown how this tool reacts in the same way as health perception by humans. Finally, the impact of specific facial features on health perception that have never been studied before is etablished
Martineau, Maxime. "Deep learning onto graph space : application to image-based insect recognition." Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4024.
Повний текст джерелаThe goal of this thesis is to investigate insect recognition as an image-based pattern recognition problem. Although this problem has been extensively studied along the previous three decades, an element is to the best of our knowledge still to be experimented as of 2017: deep approaches. Therefore, a contribution is about determining to what extent deep convolutional neural networks (CNNs) can be applied to image-based insect recognition. Graph-based representations and methods have also been tested. Two attempts are presented: The former consists in designing a graph-perceptron classifier and the latter graph-based work in this thesis is on defining convolution on graphs to build graph convolutional neural networks. The last chapter of the thesis deals with applying most of the aforementioned methods to insect image recognition problems. Two datasets are proposed. The first one consists of lab-based images with constant background. The second one is generated by taking a ImageNet subset. This set is composed of field-based images. CNNs with transfer learning are the most successful method applied on these datasets
Rapadamnaba, Robert. "Uncertainty analysis, sensitivity analysis, and machine learning in cardiovascular biomechanics." Thesis, Montpellier, 2020. http://www.theses.fr/2020MONTS058.
Повний текст джерелаThis thesis follows on from a recent study conducted by a few researchers from University of Montpellier, with the aim of proposing to the scientific community an inversion procedure capable of noninvasively estimating patient-specific blood pressure in cerebral arteries. Its first objective is, on the one hand, to examine the accuracy and robustness of the inversion procedure proposed by these researchers with respect to various sources of uncertainty related to the models used, formulated assumptions and patient-specific clinical data, and on the other hand, to set a stopping criterion for the ensemble Kalman filter based algorithm used in their inversion procedure. For this purpose, uncertainty analysis and several sensitivity analyses are carried out. The second objective is to illustrate how machine learning, mainly focusing on convolutional neural networks, can be a very good alternative to the time-consuming and costly inversion procedure implemented by these researchers for cerebral blood pressure estimation.An approach taking into account the uncertainties related to the patient-specific medical images processing and the blood flow model assumptions, such as assumptions about boundary conditions, physical and physiological parameters, is first presented to quantify uncertainties in the inversion procedure outcomes. Uncertainties related to medical images segmentation are modelled using a Gaussian distribution and uncertainties related to modeling assumptions choice are analyzed by considering several possible hypothesis choice scenarii. From this approach, it emerges that the uncertainties on the procedure results are of the same order of magnitude as those related to segmentation errors. Furthermore, this analysis shows that the procedure outcomes are very sensitive to the assumptions made about the model boundary conditions. In particular, the choice of the symmetrical Windkessel boundary conditions for the model proves to be the most relevant for the case of the patient under study.Next, an approach for ranking the parameters estimated during the inversion procedure in order of importance and setting a stopping criterion for the algorithm used in the inversion procedure is presented. The results of this strategy show, on the one hand, that most of the model proximal resistances are the most important parameters for blood flow estimation in the internal carotid arteries and, on the other hand, that the inversion algorithm can be stopped as soon as a certain reasonable convergence threshold for the most influential parameter is reached.Finally, a new numerical platform, based on machine learning and allowing to estimate the patient-specific blood pressure in the cerebral arteries much faster than with the inversion procedure but with the same accuracy, is presented. The application of this platform to the patient-specific data used in the inversion procedure provides noninvasive and real-time estimate of patient-specific cerebral pressure consistent with the inversion procedure estimation
Liu, Chenguang. "Low level feature detection in SAR images." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT015.
Повний текст джерелаIn this thesis we develop low level feature detectors for Synthetic Aperture Radar (SAR) images to facilitate the joint use of SAR and optical data. Line segments and edges are very important low level features in images which can be used for many applications like image analysis, image registration and object detection. Contrarily to the availability of many efficient low level feature detectors dedicated to optical images, there are very few efficient line segment detector and edge detector for SAR images mostly because of the strong multiplicative noise. In this thesis we develop a generic line segment detector and an efficient edge detector for SAR images.The proposed line segment detector which is named as LSDSAR, is based on a Markovian a contrario model and the Helmholtz principle, where line segments are validated according to their meaningfulness. More specifically, a line segment is validated if its expected number of occurences in a random image under the hypothesis of the Markovian a contrario model is small. Contrarily to the usual a contrario approaches, the Markovian a contrario model allows strong filtering in the gradient computation step, since dependencies between local orientations of neighbouring pixels are permitted thanks to the use of a first order Markov chain. The proposed Markovian a contrario model based line segment detector LSDSAR benefit from the accuracy and efficiency of the new definition of the background model, indeed, many true line segments in SAR images are detected with a control of the number of false detections. Moreover, very little parameter tuning is required in the practical applications of LSDSAR. The second work of this thesis is that we propose a deep learning based edge detector for SAR images. The contributions of the proposed edge detector are two fold: 1) under the hypothesis that both optical images and real SAR images can be divided into piecewise constant areas, we propose to simulate a SAR dataset using optical dataset; 2) we propose to train a classical CNN (convolutional neural network) edge detector, HED, directly on the graident fields of images. This, by using an adequate method to compute the gradient, enables SAR images at test time to have statistics similar to the training set as inputs to the network. More precisely, the gradient distribution for all homogeneous areas are the same and the gradient distribution for two homogeneous areas across boundaries depends only on the ratio of their mean intensity values. The proposed method, GRHED, significantly improves the state-of-the-art, especially in very noisy cases such as 1-look images
Lemieux, Simon. "Espaces de timbre générés par des réseaux profonds convolutionnels." Thèse, 2011. http://hdl.handle.net/1866/6294.
Повний текст джерелаThis thesis presents a novel way of modelling timbre using machine learning algorithms. More precisely, we have attempted to build a timbre space by extracting audio features using deep-convolutional Boltzmann machines. We first present an overview of machine learning with an emphasis on convolutional Boltzmann machines as well as models from which they are derived. We also present a summary of the literature relevant to timbre spaces and highlight their limitations, such as the small number of timbres used to build them. To address this problem, we have developed a sound generation tool that can generate as many sounds as we wish. At the system's core are plug-ins that are parameterizable and that we can combine to create a virtually infinite range of sounds. We use it to build a massive randomly generated timbre dataset that is made up of real and synthesized instruments. We then train deep-convolutional Boltzmann machines on those timbres in an unsupervised way and use the produced feature space as a timbre space. The timbre space we obtain is a better space than a similar space built using MFCCs. We consider it as better in the sense that the distance between two timbres in that space is more similar to the one perceived by a human listener. However, we are far from reaching the performance of a human. We finish by proposing possible improvements that could be tried to close our performance gap.
Zhang, Ying. "Sequence to sequence learning and its speech applications." Thèse, 2018. http://hdl.handle.net/1866/21287.
Повний текст джерелаGouiaa, Rafik. "Reconnaissance de postures humaines par fusion de la silhouette et de l'ombre dans l'infrarouge." Thèse, 2017. http://hdl.handle.net/1866/19538.
Повний текст джерелаHuman posture recognition (HPR) from video sequences is one of the major active research areas of computer vision. It is one step of the global process of human activity recognition (HAR) for behaviors analysis. Many HPR application systems have been developed including video surveillance, human-machine interaction, and the video retrieval. Generally, applications related to HPR can be achieved using mainly two approaches : single camera or multi-cameras. Despite the interesting performance achieved by multi-camera systems, their complexity and the huge information to be processed greatly limit their widespread use for HPR. The main goal of this thesis is to simplify the multi-camera system by replacing a camera by a light source. In fact, a light source can be seen as a virtual camera, which generates a cast shadow image representing the silhouette of the person that blocks the light. Our system will consist of a single camera and one or more infrared light sources. Despite some technical difficulties in cast shadow segmentation and cast shadow deformation because of walls and furniture, different advantages can be achieved by using our system. Indeed, we can avoid the synchronization and calibration problems of multiple cameras, reducing the cost of the system and the amount of processed data by replacing a camera by one light source. We introduce two different approaches in order to automatically recognize human postures. The first approach directly combines the person’s silhouette and cast shadow information, and uses 2D silhouette descriptor in order to extract discriminative features useful for HPR. The second approach is inspired from the shape from silhouette technique to reconstruct the visual hull of the posture using a set of cast shadow silhouettes, and extract informative features through 3D shape descriptor. Using these approaches, our goal is to prove the utility of the combination of person’s silhouette and cast shadow information for recognizing elementary human postures (stand, bend, crouch, fall,...) The proposed system can be used for video surveillance of uncluttered areas such as a corridor in a senior’s residence (for example, for the detection of falls) or in a company (for security). Its low cost may allow greater use of video surveillance for the benefit of society.
Touati, Redha. "Détection de changement en imagerie satellitaire multimodale." Thèse, 2019. http://hdl.handle.net/1866/22662.
Повний текст джерелаCette recherche a pour objet l’étude de la détection de changements temporels entre deux (ou plusieurs) images satellitaires multimodales, i.e., avec deux modalités d’imagerie différentes acquises par deux capteurs hétérogènes donnant pour la même scène deux images encodées différemment suivant la nature du capteur utilisé pour chacune des prises de vues. Les deux (ou multiples) images satellitaires multimodales sont prises et co-enregistrées à deux dates différentes, avant et après un événement. Dans le cadre de cette étude, nous proposons des nouveaux modèles de détection de changement en imagerie satellitaire multimodale semi ou non supervisés. Comme première contribution, nous présentons un nouveau scénario de contraintes exprimé sur chaque paire de pixels existant dans l’image avant et après changement. Une deuxième contribution de notre travail consiste à proposer un opérateur de gradient textural spatio-temporel exprimé avec des normes complémentaires ainsi qu’une nouvelle stratégie de dé-bruitage de la carte de différence issue de cet opérateur. Une autre contribution consiste à construire un champ d’observation à partir d’une modélisation par paires de pixels et proposer une solution au sens du maximum a posteriori. Une quatrième contribution est proposée et consiste à construire un espace commun de caractéristiques pour les deux images hétérogènes. Notre cinquième contribution réside dans la modélisation des zones de changement comme étant des anomalies et sur l’analyse des erreurs de reconstruction dont nous proposons d’apprendre un modèle non-supervisé à partir d’une base d’apprentissage constituée seulement de zones de non-changement afin que le modèle reconstruit les motifs de non-changement avec une faible erreur. Dans la dernière contribution, nous proposons une architecture d’apprentissage par paires de pixels basée sur un réseau CNN pseudo-siamois qui prend en entrée une paire de données au lieu d’une seule donnée et est constituée de deux flux de réseau (descripteur) CNN parallèles et partiellement non-couplés suivis d’un réseau de décision qui comprend de couche de fusion et une couche de classification au sens du critère d’entropie. Les modèles proposés s’avèrent assez flexibles pour être utilisés efficacement dans le cas des données-images mono-modales.