Tesis sobre el tema "Segmentation par apprentissage profond"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Segmentation par apprentissage profond".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Bertrand, Hadrien. "Optimisation d'hyper-paramètres en apprentissage profond et apprentissage par transfert : applications en imagerie médicale". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT001/document.
Texto completoIn the last few years, deep learning has changed irrevocably the field of computer vision. Faster, giving better results, and requiring a lower degree of expertise to use than traditional computer vision methods, deep learning has become ubiquitous in every imaging application. This includes medical imaging applications. At the beginning of this thesis, there was still a strong lack of tools and understanding of how to build efficient neural networks for specific tasks. Thus this thesis first focused on the topic of hyper-parameter optimization for deep neural networks, i.e. methods for automatically finding efficient neural networks on specific tasks. The thesis includes a comparison of different methods, a performance improvement of one of these methods, Bayesian optimization, and the proposal of a new method of hyper-parameter optimization by combining two existing methods: Bayesian optimization and Hyperband.From there, we used these methods for medical imaging applications such as the classification of field-of-view in MRI, and the segmentation of the kidney in 3D ultrasound images across two populations of patients. This last task required the development of a new transfer learning method based on the modification of the source network by adding new geometric and intensity transformation layers.Finally this thesis loops back to older computer vision methods, and we propose a new segmentation algorithm combining template deformation and deep learning. We show how to use a neural network to predict global and local transformations without requiring the ground-truth of these transformations. The method is validated on the task of kidney segmentation in 3D US images
Ganaye, Pierre-Antoine. "A priori et apprentissage profond pour la segmentation en imagerie cérébrale". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI100.
Texto completoMedical imaging is a vast field guided by advances in instrumentation, acquisition techniques and image processing. Advances in these major disciplines all contribute to the improvement of the understanding of both physiological and pathological phenomena. In parallel, access to broader imaging databases, combined with the development of computing power, has fostered the development of machine learning methodologies for automatic image processing, including approaches based on deep neural networks. Among the applications where deep neural networks provide solutions, we find image segmentation, which consists in locating and delimiting in an image regions with specific properties that will be associated with the same structure. Despite many recent studies in deep learning based segmentation, learning the parameters of a neural network is still guided by quantitative performance measures that do not include high-level knowledge of anatomy. The objective of this thesis is to develop methods to integrate a priori into deep neural networks, targeting the segmentation of brain structures in MRI imaging. Our first contribution proposes a strategy for integrating the spatial position of the patch to be classified, to improve the discriminating power of the segmentation model. This first work considerably corrects segmentation errors that are far away from the anatomical reality, also improving the overall quality of the results. Our second contribution focuses on a methodology to constrain adjacency relationships between anatomical structures, directly while learning network parameters, in order to reinforce the realism of the produced segmentations. Our experiments conclude that the proposed constraint corrects non-admitted adjacencies, thus improving the anatomical consistency of the segmentations produced by the neural network
Zheng, Qiao. "Apprentissage profond pour la segmentation robuste et l’analyse explicable des images cardiaques volumiques et dynamiques". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4013.
Texto completoCardiac MRI is widely used by cardiologists as it allows extracting rich information from images. However, if done manually, the information extraction process is tedious and time-consuming. Given the advance of artificial intelligence, I develop deep learning methods to address the automation of several essential tasks on cardiac MRI analysis. First, I propose a method based on convolutional neural networks to perform cardiac segmentation on short axis MRI image stacks. In this method, since the prediction of a segmentation of a slice is dependent upon the already existing segmentation of an adjacent slice, 3D-consistency and robustness is explicitly enforced. Second, I develop a method to classify cardiac pathologies, with a novel deep learning approach to extract image-derived features to characterize the shape and motion of the heart. In particular, the classification model is explainable, simple and flexible. Last but not least, the same feature extraction method is applied to an exceptionally large dataset (UK Biobank). Unsupervised cluster analysis is then performed on the extracted features in search of their further relation with cardiac pathology characterization. To conclude, I discuss several possible extensions of my research
Mlynarski, Pawel. "Apprentissage profond pour la segmentation des tumeurs cérébrales et des organes à risque en radiothérapie". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4084.
Texto completoMedical images play an important role in cancer diagnosis and treatment. Oncologists analyze images to determine the different characteristics of the cancer, to plan the therapy and to observe the evolution of the disease. The objective of this thesis is to propose efficient methods for automatic segmentation of brain tumors and organs at risk in the context of radiotherapy planning, using Magnetic Resonance (MR) images. First, we focus on segmentation of brain tumors using Convolutional Neural Networks (CNN) trained on MRIs manually segmented by experts. We propose a segmentation model having a large 3D receptive field while being efficient in terms of computational complexity, based on combination of 2D and 3D CNNs. We also address problems related to the joint use of several MRI sequences (T1, T2, FLAIR). Second, we introduce a segmentation model which is trained using weakly-annotated images in addition to fully-annotated images (with voxelwise labels), which are usually available in very limited quantities due to their cost. We show that this mixed level of supervision considerably improves the segmentation accuracy when the number of fully-annotated images is limited.\\ Finally, we propose a methodology for an anatomy-consistent segmentation of organs at risk in the context of radiotherapy of brain tumors. The segmentations produced by our system on a set of MRIs acquired in the Centre Antoine Lacassagne (Nice, France) are evaluated by an experienced radiotherapist
Zotti, Clément. "Réseaux de neurones à convolutions pour la segmentation multi structures d'images par résonance magnétique cardiaque". Mémoire, Université de Sherbrooke, 2018. http://hdl.handle.net/11143/11817.
Texto completoLuc, Pauline. "Apprentissage autosupervisé de modèles prédictifs de segmentation à partir de vidéos". Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM024/document.
Texto completoPredictive models of the environment hold promise for allowing the transfer of recent reinforcement learning successes to many real-world contexts, by decreasing the number of interactions needed with the real world.Video prediction has been studied in recent years as a particular case of such predictive models, with broad applications in robotics and navigation systems.While RGB frames are easy to acquire and hold a lot of information, they are extremely challenging to predict, and cannot be directly interpreted by downstream applications.Here we introduce the novel tasks of predicting semantic and instance segmentation of future frames.The abstract feature spaces we consider are better suited for recursive prediction and allow us to develop models which convincingly predict segmentations up to half a second into the future.Predictions are more easily interpretable by downstream algorithms and remain rich, spatially detailed and easy to obtain, relying on state-of-the-art segmentation methods.We first focus on the task of semantic segmentation, for which we propose a discriminative approach based on adversarial training.Then, we introduce the novel task of predicting future semantic segmentation, and develop an autoregressive convolutional neural network to address it.Finally, we extend our method to the more challenging problem of predicting future instance segmentation, which additionally segments out individual objects.To deal with a varying number of output labels per image, we develop a predictive model in the space of high-level convolutional image features of the Mask R-CNN instance segmentation model.We are able to produce visually pleasing segmentations at a high resolution for complex scenes involving a large number of instances, and with convincing accuracy up to half a second ahead
Guerry, Joris. "Reconnaissance visuelle robuste par réseaux de neurones dans des scénarios d'exploration robotique. Détecte-moi si tu peux !" Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLX080/document.
Texto completoThe main objective of this thesis is visual recognition for a mobile robot in difficult conditions. We are particularly interested in neural networks which present today the best performances in computer vision. We studied the concept of method selection for the classification of 2D images by using a neural network selector to choose the best available classifier given the observed situation. This strategy works when data can be easily partitioned with respect to available classifiers, which is the case when complementary modalities are used. We have therefore used RGB-D data (2.5D) in particular applied to people detection. We propose a combination of independent neural network detectors specific to each modality (color & depth map) based on the same architecture (Faster RCNN). We share intermediate results of the detectors to allow them to complement and improve overall performance in difficult situations (luminosity loss or acquisition noise of the depth map). We are establishing new state of the art scores in the field and propose a more complex and richer data set to the community (ONERA.ROOM). Finally, we made use of the 3D information contained in the RGB-D images through a multi-view method. We have defined a strategy for generating 2D virtual views that are consistent with the 3D structure. For a semantic segmentation task, this approach artificially increases the training data for each RGB-D image and accumulates different predictions during the test. We obtain new reference results on the SUNRGBD and NYUDv2 datasets. All these works allowed us to handle in an original way 2D, 2.5D and 3D robotic data with neural networks. Whether for classification, detection and semantic segmentation, we not only validated our approaches on difficult data sets, but also brought the state of the art to a new level of performance
Fourure, Damien. "Réseaux de neurones convolutifs pour la segmentation sémantique et l'apprentissage d'invariants de couleur". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSES056/document.
Texto completoComputer vision is an interdisciplinary field that investigates how computers can gain a high level of understanding from digital images or videos. In artificial intelligence, and more precisely in machine learning, the field in which this thesis is positioned,computer vision involves extracting characteristics from images and then generalizing concepts related to these characteristics. This field of research has become very popular in recent years, particularly thanks to the results of the convolutional neural networks that form the basis of so-called deep learning methods. Today, neural networks make it possible, among other things, to recognize different objects present in an image, to generate very realistic images or even to beat the champions at the Go game. Their performance is not limited to the image domain, since they are also used in other fields such as natural language processing (e. g. machine translation) or sound recognition. In this thesis, we study convolutional neural networks in order to develop specialized architectures and loss functions for low-level tasks (color constancy) as well as high-level tasks (semantic segmentation). Color constancy, is the ability of the human visual system to perceive constant colours for a surface despite changes in the spectrum of illumination (lighting change). In computer vision, the main approach consists in estimating the color of the illuminant and then suppressing its impact on the perceived color of objects. We approach the task of color constancy with the use of neural networks by developing a new architecture composed of a subsampling operator inspired by traditional methods. Our experience shows that our method makes it possible to obtain competitive performances with the state of the art. Nevertheless, our architecture requires a large amount of training data. In order to partially correct this problem and improve the training of neural networks, we present several techniques for artificial data augmentation. We are also making two contributions on a high-level issue : semantic segmentation. This task, which consists of assigning a semantic class to each pixel of an image, is a challenge in computer vision because of its complexity. On the one hand, it requires many examples of training that are costly to obtain. On the other hand, it requires the adaptation of traditional convolutional neural networks in order to obtain a so-called dense prediction, i. e., a prediction for each pixel present in the input image. To solve the difficulty of acquiring training data, we propose an approach that uses several databases annotated with different labels at the same time. To do this, we define a selective loss function that has the advantage of allowing the training of a convolutional neural network from data from multiple databases. We also developed self-context approach that captures the correlations between labels in different databases. Finally, we present our third contribution : a new convolutional neural network architecture called GridNet specialized for semantic segmentation. Unlike traditional networks, implemented with a single path from the input (image) to the output (prediction), our architecture is implemented as a 2D grid allowing several interconnected streams to operate at different resolutions. In order to exploit all the paths of the grid, we propose a technique inspired by dropout. In addition, we empirically demonstrate that our architecture generalize many of well-known stateof- the-art networks. We conclude with an analysis of the empirical results obtained with our architecture which, although trained from scratch, reveals very good performances, exceeding popular approaches often pre-trained
Borne, Léonie. "Conception d’un algorithme de vision par ordinateur « top-down » dédié à la reconnaissance des sillons corticaux". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS322/document.
Texto completoWe are seven billion humans with unique cortical folding patterns. The cortical folding process occurs during the last trimester of pregnancy, during the emergence of cortical architecture. The folding patterns are impacted by architectural features specific to each individual. Hence, they could reveal signatures of abnormal developments that can lead to psychiatric syndroms. For the last 25 years, the image analysis lab of Neurospin has been designing dedicated computer vision tools to tackle the research of such signatures. The resulting tools are distributed to the community (http://brainvisa.info).This thesis has resulted in the emergence of a new generation of tools based on machine learning techniques. The first proposed tool automatically classifies local patterns of cortical folds, a problem that had never been addressed before. The second tool aims at the automatic labeling of cortical sulci by modeling the top-down recognition mechanisms necessary to overcome weaknesses of the current bottom-up systems. Thus, in addition to having higher recognition rates and shorter execution time, the proposed new model is robust to sub-segmentation errors, which is one of the greatest weaknesses of the old system. To realize these two tools, several machine learning algorithms were implemented and compared. These algorithms are inspired on the one hand by multi-atlas methods, in particular the patch approach, which are widely used for the anatomical segmentation of medical images and on the other hand by the deep learning methods that are revolutionizing the world of computer vision. The work of this thesis confirms the incredible effectiveness of deep learning techniques to adapt well to complex problems. However, the performances obtained with these techniques are generally equivalent to those of patch approaches, or even worse if the training database is limited. What makes deep learning a particularly interesting tool in practice is its fast execution, especially for the analysis of the huge databases now available
Leclerc, Sarah Marie-Solveig. "Automatisation de la segmentation sémantique de structures cardiaques en imagerie ultrasonore par apprentissage supervisé". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI121.
Texto completoThe analysis of medical images plays a critical role in cardiology. Ultrasound imaging, as a real-time, low cost and bed side applicable modality, is nowadays the most commonly used image modality to monitor patient status and perform clinical cardiac diagnosis. However, the semantic segmentation (i.e the accurate delineation and identification) of heart structures is a difficult task due to the low quality of ultrasound images, characterized in particular by the lack of clear boundaries. To compensate for missing information, the best performing methods before this thesis relied on the integration of prior information on cardiac shape or motion, which in turns reduced the adaptability of the corresponding methods. Furthermore, such approaches require man- ual identifications of key points to be adapted to a given image, which makes the full process difficult to reproduce. In this thesis, we propose several original fully-automatic algorithms for the semantic segmentation of echocardiographic images based on supervised learning ap- proaches, where the resolution of the problem is automatically set up using data previously analyzed by trained cardiologists. From the design of a dedicated dataset and evaluation platform, we prove in this project the clinical applicability of fully-automatic supervised learning methods, in particular deep learning methods, as well as the possibility to improve the robustness by incorporating in the full process the prior automatic detection of regions of interest
Yan, Yongzhe. "Deep Face Analysis for Aesthetic Augmented Reality Applications". Thesis, Université Clermont Auvergne (2017-2020), 2020. http://www.theses.fr/2020CLFAC011.
Texto completoPrecise and robust facial component detection is of great importance for the good user experience in aesthetic augmented reality applications such as virtual make-up and virtual hair dying. In this context, this thesis addresses the problem of facial component detection via facial landmark detection and face parsing. The scope of this thesis is limited to deep learning-based models.The first part of this thesis addresses the problem of facial landmark detection. In this direction, we propose three contributions. For the first contribution, we aim at improving the precision of the detection. To improve the precision to pixel-level, we propose a coarse-to-fine framework which leverages the detail information on the low-level feature maps. We train different stages with different loss functions, among which we propose a boundary-aware loss that forces the predicted landmarks to stay on the boundary. For the second contribution in facial landmark detection, we improve the robustness of facial landmark detection. We propose 2D Wasserstein loss to integrate additional geometric information during training. Moreover, we propose several modifications to the conventional evaluation metrics for model robustness.To provide a new perspective for facial landmark detection, we present a third contribution on exploring a novel tool to illustrate the relationship between the facial landmarks. We study the Canonical Correlation Analysis (CCA) of the landmark coordinates. Two applications are introduced based on this tool: (1) the interpretation of different facial landmark detection models (2) a novel weakly-supervised learning method that allows to considerably reduce the manual effort for dense landmark annotation.The second part of this thesis tackles the problem of face parsing. We present two contributions in this part. For the first contribution, we present a framework for hair segmentation with a shape prior to enhance the robustness against the cluttered background. Additionally, we propose a spatial attention module attached to this framework, to improve the output of the hair boundary. For the second contribution in this part, we present a fast face parsing framework for mobile phones, which leverages temporal consistency to yield a more robust output mask. The implementation of this framework runs in real-time on an iPhone X
Roynard, Xavier. "Sémantisation à la volée de nuages de points 3D acquis par systèmes embarqués". Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEM078.
Texto completoThis thesis is at the confluence of two worlds in rapid growth: autonomous cars and artificial intelligence (especially deep learning). As the first takes advantage of the second, autonomous vehicles are increasingly using deep learning methods to analyze the data produced by its various sensors (including LiDARs) and to make decisions. While deep learning methods have revolutionized image analysis (in classification and segmentation for example), they do not produce such spectacular results on 3D point clouds. This is particularly true because the datasets of annotated 3D point clouds are rare and of moderate quality. This thesis therefore presents a new dataset developed by mobile acquisition to produce enough data and annotated by hand to ensure a good quality of segmentation. In addition, these datasets are inherently unbalanced in number of samples per class and contain many redundant samples, so a sampling method adapted to these datasets is proposed. Another problem encountered when trying to classify a point from its neighbourhood as a voxel grid is the compromise between a fine discretization step (for accurately describing the surface adjacent to the point) and a large grid (to look for context a little further away). We therefore also propose network methods that take advantage of multi-scale neighbourhoods. These methods achieve the state of the art of point classification methods on public benchmarks. Finally, to respect the constraints imposed by embedded systems (real-time processing and low computing power), we present a method that allows convolutional layers to be applied only where there is information to be processed
Salehi, Achkan. "Localisation précise d'un véhicule par couplage vision/capteurs embarqués/systèmes d'informations géographiques". Thesis, Université Clermont Auvergne (2017-2020), 2018. http://www.theses.fr/2018CLFAC064/document.
Texto completoThe fusion between sensors and databases whose errors are independant is the most re-liable and therefore most widespread solution to the localization problem. Current autonomousand semi-autonomous vehicles, as well as augmented reality applications targeting industrialcontexts exploit large sensor and database graphs that are difficult and expensive to synchro-nize and calibrate. Thus, the democratization of these technologies requires the exploration ofthe possiblity of exploiting low-cost and easily accessible sensors and databases. These infor-mation sources are naturally tainted by higher uncertainty levels, and many obstacles to theireffective and efficient practical usage persist. Moreover, the recent but dazzling successes ofdeep neural networks in various tasks seem to indicate that they could be a viable and low-costalternative to some components of current SLAM systems.In this thesis, we focused on large-scale localization of a vehicle in a georeferenced co-ordinate frame from a low-cost system, which is based on the fusion between a monocularvideo stream, 3d non-textured but georeferenced building models, terrain elevation models anddata either from a low-cost GPS or from vehicle odometry. Our work targets the resolutionof two problems. The first one is related to the fusion via barrier term optimization of VS-LAM and positioning measurements provided by a low-cost GPS. This method is, to the bestof our knowledge, the most robust against GPS uncertainties, but it is more demanding in termsof computational resources. We propose an algorithmic optimization of that approach basedon the definition of a novel barrier term. The second problem is the data association problembetween the primitives that represent the geometry of the scene (e.g. 3d points) and the 3d buil-ding models. Previous works in that area use simple geometric criteria and are therefore verysensitive to occlusions in urban environments. We exploit deep convolutional neural networksin order to identify and associate elements from the map that correspond to 3d building mo-del façades. Although our contributions are for the most part independant from the underlyingSLAM system, we based our experiments on constrained key-frame based bundle adjustment.The solutions that we propose are evaluated on synthetic sequences as well as on real urbandatasets. These experiments show important performance gains for VSLAM/GPS fusion, andconsiderable improvements in the robustness of building constraints to occlusions
Kobold, Jonathan. "Deep Learning for lesion and thrombus segmentation from cerebral MRI". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLE044.
Texto completoDeep learning, the world's best set of methods for identifying ob-jects on images. Stroke, a deadly disease whose treatment requiresidentifying objects on medical imaging. Sounds like an obvious com-bination yet it is not trivial to marry the two. Segmenting the lesionfrom stroke MRI has had some attention in literature but thrombussegmentation is still uncharted area. This work shows that contem-porary convolutional neural network architectures cannot reliablyidentify the thrombus on stroke MRI. Also it is demonstrated whythese models don't work on this problem. With this knowledge arecurrent neural network architecture, the logic LSTM, is developedthat takes into account the way medical doctors identify the throm-bus. Not only this architecture provides the first reliable thrombusidentification, it also provides new insights to neural network design.Especially the methods for increasing the receptive field are enrichedwith a new parameter free option. And last but not least the logicLSTM also improves the results of lesion segmentation by providinga lesion segmentation with human level performance
Grard, Matthieu. "Generic instance segmentation for object-oriented bin-picking". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEC015.
Texto completoReferred to as robotic random bin-picking, a fast-expanding industrial task consists in robotizing the unloading of many object instances piled up in bulk, one at a time, for further processing such as kitting or part assembling. However, explicit object models are not always available in many bin-picking applications, especially in the food and automotive industries. Furthermore, object instances are often subject to intra-class variations, for example due to elastic deformations.Object pose estimation techniques, which require an explicit model and assume rigid transformations, are therefore not suitable in such contexts. The alternative approach, which consists in detecting grasps without an explicit notion of object, proves hardly efficient when the object geometry makes bulk instances prone to occlusion and entanglement. These approaches also typically rely on a multi-view scene reconstruction that may be unfeasible due to transparent and shiny textures, or that reduces critically the time frame for image processing in high-throughput robotic applications.In collaboration with Siléane, a French company in industrial robotics, we thus aim at developing a learning-based solution for localizing the most affordable instance of a pile from a single image, in open loop, without explicit object models. In the context of industrial bin-picking, our contribution is two-fold.First, we propose a novel fully convolutional network (FCN) for jointly delineating instances and inferring the spatial layout at their boundaries. Indeed, the state-of-the-art methods for such a task rely on two independent streams for boundaries and occlusions respectively, whereas occlusions often cause boundaries. Specifically, the mainstream approach, which consists in isolating instances in boxes before detecting boundaries and occlusions, fails in bin-picking scenarios as a rectangle region often includes several instances. By contrast, our box proposal-free architecture recovers fine instance boundaries, augmented with their occluding side, from a unified scene representation. As a result, the proposed network outperforms the two-stream baselines on synthetic data and public real-world datasets.Second, as FCNs require large training datasets that are not available in bin-picking applications, we propose a simulation-based pipeline for generating training images using physics and rendering engines. Specifically, piles of instances are simulated and rendered with their ground-truth annotations from sets of texture images and meshes to which multiple random deformations are applied. We show that the proposed synthetic data is plausible for real-world applications in the sense that it enables the learning of deep representations transferable to real data. Through extensive experiments on a real-world robotic setup, our synthetically trained network outperforms the industrial baseline while achieving real-time performances. The proposed approach thus establishes a new baseline for model-free object-oriented bin-picking
Daudé, Pierre. "Quantification du tissu adipeux épicardique à haut champ par IRM-Dixon, pour le phénotypage de la cardiomyopathie diabétique". Electronic Thesis or Diss., Aix-Marseille, 2022. http://www.theses.fr/2022AIXM0333.
Texto completoImproving the management of cardiac complications in metabolic diseases, obesity and diabetes, is a major challenge for our society. The measurement of epicardial adipose tissue (EAT), a fat depot attached to the heart, is an emerging and promising diagnosis to identify patients at risk. We developed the automation of this measurement on routine MRI images by deep learning. Then, an innovative MRI technique was proposed to measure and characterize the EAT in 3D, combining: a free-breathing acquisition, an image reconstruction robust to cardio-respiratory motion and MRI imperfections, an optimized and validated fat characterization algorithm and the knowledge of the composition of ex-vivo EAT samples. Together, this allows for in vivo, non-invasive characterization of EAT, a novel diagnosis for cardiometabolic risk
Duran, Audrey. "Intelligence artificielle pour la caractérisation du cancer de la prostate par agressivité en IRM multiparamétrique". Thesis, Lyon, 2022. http://theses.insa-lyon.fr/publication/2022LYSEI008/these.pdf.
Texto completoProstate cancer (PCa) is the most frequently diagnosed cancer in men in more than half the countries in the world and the fifth leading cause of cancer death among men in 2020. Diagnosis of PCa includes multiparametric magnetic resonance imaging acquisition (mp-MRI) - which combines T2 weighted (T2-w), diffusion weighted imaging (DWI) and dynamic contrast enhanced (DCE) sequences - prior to any biopsy. The joint analysis of these multimodal images is time demanding and challenging, especially when individual MR sequences yield conflicting findings. In addition, the sensitivity of MRI is low for less aggressive cancers and inter-reader reproducibility remains moderate at best. Moreover, visual analysis does not currently allow to determine the cancer aggressiveness, characterized by the Gleason score (GS). This is why computer-aided diagnosis (CAD) systems based on statistical learning models have been proposed in recent years, to assist radiologists in their diagnostic task, but the vast majority of these models focus on the binary detection of clinically significant (CS) lesions. The objective of this thesis is to develop a CAD system to detect and segment PCa on mp-MRI images but also to characterize their aggressiveness, by predicting the associated GS. In a first part, we present a supervised CAD system to segment PCa by aggressiveness from T2-w and ADC maps. This end-to-end multi-class neural network jointly segments the prostate gland and cancer lesions with GS group grading. The model was trained and validated with a 5-fold cross-validation on a heterogeneous series of 219 MRI exams acquired on three different scanners prior prostatectomy. Regarding the automatic GS group grading, Cohen’s quadratic weighted kappa coefficient (κ) is 0.418 ± 0.138, which is the best reported lesion-wise kappa for GS segmentation to our knowledge. The model has also encouraging generalization capacities on the PROSTATEx-2 public dataset. In a second part, we focus on a weakly supervised model that allows the inclusion of partly annotated data, where the lesions are identified by points only, for a consequent saving of time and the inclusion of biopsy-based databases. Regarding the automatic GS group grading on our private dataset, we show that we can approach performance achieved with the baseline fully supervised model while considering 6% of annotated voxels only for training. In the last part, we study the contribution of DCE MRI, a sequence often omitted as input to deep models, for the detection and characterization of PCa. We evaluate several ways to encode the perfusion from the DCE MRI information in a U-Net like architecture. Parametric maps derived from DCE MR exams are shown to positively impact segmentation and grading performance of PCa lesions
Corbat, Lisa. "Fusion de segmentations complémentaires d'images médicales par Intelligence Artificielle et autres méthodes de gestion de conflits". Thesis, Bourgogne Franche-Comté, 2020. http://www.theses.fr/2020UBFCD029.
Texto completoNephroblastoma is the most common kidney tumour in children and its diagnosis is based exclusively on imaging. This work, which is the subject of our research, is part of a larger project: the European project SAIAD (Automated Segmentation of Medical Images Using Distributed Artificial Intelligence). The aim of the project is to design a platform capable of performing different automatic segmentations from source images using Artificial Intelligence (AI) methods, and thus obtain a faithful three-dimensional reconstruction. In this sense, work carried out in a previous thesis of the research team led to the creation of a segmentation platform. It allows the segmentation of several structures individually, by methods such as Deep Learning, and more particularly Convolutional Neural Networks (CNNs), as well as Case Based Reasoning (CBR). However, it is then necessary to automatically fuse the segmentations of these different structures in order to obtain a complete relevant segmentation. When aggregating these structures, contradictory pixels may appear. These conflicts can be resolved by various methods based or not on AI and are the subject of our research. First, we propose a fusion approach not focused on AI using the combination of six different methods, based on different imaging and segmentation criteria. In parallel, two other fusion methods are proposed using, a CNN coupled to the CBR for one, and a CNN using a specific existing segmentation learning method for the other. These different approaches were tested on a set of 14 nephroblastoma patients and demonstrated their effectiveness in resolving conflicting pixels and their ability to improve the resulting segmentations
Ben, Naceur Mostefa. "Deep Neural Networks for the segmentation and classification in Medical Imaging". Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC2014.
Texto completoNowadays, getting an efficient segmentation of Glioblastoma Multiforme (GBM) braintumors in multi-sequence MRI images as soon as possible, gives an early clinical diagnosis, treatment, and follow-up. The MRI technique is designed specifically to provide radiologists with powerful visualization tools to analyze medical images, but the challenge lies more in the information interpretation of radiological images with clinical and pathologies data and their causes in the GBM tumors. This is why quantitative research in neuroimaging often requires anatomical segmentation of the human brain from MRI images for the detection and segmentation of brain tumors. The objective of the thesis is to propose automatic Deep Learning methods for brain tumors segmentation using MRI images.First, we are mainly interested in the segmentation of patients’ MRI images with GBMbrain tumors using Deep Learning methods, in particular, Deep Convolutional NeuralNetworks (DCNN). We propose two end-to-end DCNN-based approaches for fully automaticbrain tumor segmentation. The first approach is based on the pixel-wise techniquewhile the second one is based on the patch-wise technique. Then, we prove that thelatter is more efficient in terms of segmentation performance and computational benefits. We also propose a new guided optimization algorithm to optimize the suitable hyperparameters for the first approach. Second, to enhance the segmentation performance of the proposed approaches, we propose new segmentation pipelines of patients’ MRI images, where these pipelines are based on deep learned features and two stages of training. We also address problems related to unbalanced data in addition to false positives and false negatives to increase the model segmentation sensitivity towards the tumor regions and specificity towards the healthy regions. Finally, the segmentation performance and the inference time of the proposed approaches and pipelines are reported along with state-of-the-art methods on a public dataset annotated by radiologists and approved by neuroradiologists
Fang, Hao. "Modélisation géométrique à différent niveau de détails d'objets fabriqués par l'homme". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4002/document.
Texto completoGeometric modeling of man-made objects from 3D data is one of the biggest challenges in Computer Vision and Computer Graphics. The long term goal is to generate a CAD-style model in an as-automatic-as-possible way. To achieve this goal, difficult issues have to be addressed including (i) the scalability of the modeling process with respect to massive input data, (ii) the robustness of the methodology to various defect-laden input measurements, and (iii) the geometric quality of output models. Existing methods work well to recover the surface of free-form objects. However, in case of manmade objects, it is difficult to produce results that approach the quality of high-structured representations as CAD models.In this thesis, we present a series of contributions to the field. First, we propose a classification method based on deep learning to distinguish objects from raw 3D point cloud. Second, we propose an algorithm to detect planar primitives in 3D data at different level of abstraction. Finally, we propose a mechanism to assemble planar primitives into compact polygonal meshes. These contributions are complementary and can be used sequentially to reconstruct city models at various level-of-details from airborne 3D data. We illustrate the robustness, scalability and efficiency of our methods on both laser and multi-view stereo data composed of man-made objects
Blanc, Beyne Thibault. "Estimation de posture 3D à partir de données imprécises et incomplètes : application à l'analyse d'activité d'opérateurs humains dans un centre de tri". Thesis, Toulouse, INPT, 2020. http://www.theses.fr/2020INPT0106.
Texto completoIn a context of study of stress and ergonomics at work for the prevention of musculoskeletal disorders, the company Ebhys wants to develop a tool for analyzing the activity of human operators in a waste sorting center, by measuring ergonomic indicators. To cope with the uncontrolled environment of the sorting center, these indicators are measured from depth images. An ergonomic study allows us to define the indicators to be measured. These indicators are zones of movement of the operator’s hands and zones of angulations of certain joints of the upper body. They are therefore indicators that can be obtained from an analysis of the operator’s 3D pose. The software for calculating the indicators will thus be composed of three steps : a first part segments the operator from the rest of the scene to ease the 3D pose estimation, a second part estimates the operator’s 3D pose, and the third part uses the operator’s 3D pose to compute the ergonomic indicators. First of all, we propose an algorithm that extracts the operator from the rest of the depth image. To do this, we use a first automatic segmentation based on static background removal and selection of a moving element given its position and size. This first segmentation allows us to train a neural network that improves the results. This neural network is trained using the segmentations obtained from the first automatic segmentation, from which the best quality samples are automatically selected during training. Next, we build a neural network model to estimate the operator’s 3D pose. We propose a study that allows us to find a light and optimal model for 3D pose estimation on synthetic depth images, which we generate numerically. However, if this network gives outstanding performances on synthetic depth images, it is not directly applicable to real depth images that we acquired in an industrial context. To overcome this issue, we finally build a module that allows us to transform the synthetic depth images into more realistic depth images. This image-to-image translation model modifies the style of the depth image without changing its content, keeping the 3D pose of the operator from the synthetic source image unchanged on the translated realistic depth frames. These more realistic depth images are then used to re-train the 3D pose estimation neural network, to finally obtain a convincing 3D pose estimation on the depth images acquired in real conditions, to compute de ergonomic indicators
Deschaintre, Valentin. "Acquisition légère de matériaux par apprentissage profond". Thesis, Université Côte d'Azur (ComUE), 2019. http://theses.univ-cotedazur.fr/2019AZUR4078.
Texto completoWhether it is used for entertainment or industrial design, computer graphics is ever more present in our everyday life. Yet, reproducing a real scene appearance in a virtual environment remains a challenging task, requiring long hours from trained artists. A good solution is the acquisition of geometries and materials directly from real world examples, but this often comes at the cost of complex hardware and calibration processes. In this thesis, we focus on lightweight material appearance capture to simplify and accelerate the acquisition process and solve industrial challenges such as result image resolution or calibration. Texture, highlights, and shading are some of many visual cues that allow humans to perceive material appearance in pictures. Designing algorithms able to leverage these cues to recover spatially-varying bi-directional reflectance distribution functions (SVBRDFs) from a few images has challenged computer graphics researchers for decades. We explore the use of deep learning to tackle lightweight appearance capture and make sense of these visual cues. Once trained, our networks are capable of recovering per-pixel normals, diffuse albedo, specular albedo and specular roughness from as little as one picture of a flat surface lit by the environment or a hand-held flash. We show how our method improves its prediction with the number of input pictures to reach high quality reconstructions with up to 10 images --- a sweet spot between existing single-image and complex multi-image approaches --- and allows to capture large scale, HD materials. We achieve this goal by introducing several innovations on training data acquisition and network design, bringing clear improvement over the state of the art for lightweight material capture
Paumard, Marie-Morgane. "Résolution automatique de puzzles par apprentissage profond". Thesis, CY Cergy Paris Université, 2020. http://www.theses.fr/2020CYUN1067.
Texto completoThe objective of this thesis is to develop semantic methods of reassembly in the complicated framework of heritage collections, where some blocks are eroded or missing.The reassembly of archaeological remains is an important task for heritage sciences: it allows to improve the understanding and conservation of ancient vestiges and artifacts. However, some sets of fragments cannot be reassembled with techniques using contour information or visual continuities. It is then necessary to extract semantic information from the fragments and to interpret them. These tasks can be performed automatically thanks to deep learning techniques coupled with a solver, i.e., a constrained decision making algorithm.This thesis proposes two semantic reassembly methods for 2D fragments with erosion and a new dataset and evaluation metrics.The first method, Deepzzle, proposes a neural network followed by a solver. The neural network is composed of two Siamese convolutional networks trained to predict the relative position of two fragments: it is a 9-class classification. The solver uses Dijkstra's algorithm to maximize the joint probability. Deepzzle can address the case of missing and supernumerary fragments, is capable of processing about 15 fragments per puzzle, and has a performance that is 25% better than the state of the art.The second method, Alphazzle, is based on AlphaZero and single-player Monte Carlo Tree Search (MCTS). It is an iterative method that uses deep reinforcement learning: at each step, a fragment is placed on the current reassembly. Two neural networks guide MCTS: an action predictor, which uses the fragment and the current reassembly to propose a strategy, and an evaluator, which is trained to predict the quality of the future result from the current reassembly. Alphazzle takes into account the relationships between all fragments and adapts to puzzles larger than those solved by Deepzzle. Moreover, Alphazzle is compatible with constraints imposed by a heritage framework: at the end of reassembly, MCTS does not access the reward, unlike AlphaZero. Indeed, the reward, which indicates if a puzzle is well solved or not, can only be estimated by the algorithm, because only a conservator can be sure of the quality of a reassembly
Haykal, Vanessa. "Modélisation des séries temporelles par apprentissage profond". Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4019.
Texto completoTime series prediction is a problem that has been addressed for many years. In this thesis, we have been interested in methods resulting from deep learning. It is well known that if the relationships between the data are temporal, it is difficult to analyze and predict accurately due to non-linear trends and the existence of noise specifically in the financial and electrical series. From this context, we propose a new hybrid noise reduction architecture that models the recursive error series to improve predictions. The learning process fusessimultaneouslyaconvolutionalneuralnetwork(CNN)andarecurrentlongshort-term memory network (LSTM). This model is distinguished by its ability to capture globally a variety of hybrid properties, where it is able to extract local signal features, to learn long-term and non-linear dependencies, and to have a high noise resistance. The second contribution concerns the limitations of the global approaches because of the dynamic switching regimes in the signal. We present a local unsupervised modification with our previous architecture in order to adjust the results by adapting the Hidden Markov Model (HMM). Finally, we were also interested in multi-resolution techniques to improve the performance of the convolutional layers, notably by using the variational mode decomposition method (VMD)
Ostertag, Cécilia. "Analyse des pathologies neuro-dégénératives par apprentissage profond". Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS003.
Texto completoMonitoring and predicting the cognitive state of a subject affected by a neuro-degenerative disorder is crucial to provide appropriate treatment as soon as possible. Thus, these patients are followed for several years, as part of longitudinal medical studies. During each visit, a large quantity of data is acquired : risk factors linked to the pathology, medical imagery (MRI or PET scans for example), cognitive tests results, sampling of molecules that have been identified as bio-markers, etc. These various modalities give information about the disease's progression, some of them are complementary and others can be redundant. Several deep learning models have been applied to bio-medical data, notably for organ segmentation or pathology diagnosis. This PhD is focused on the conception of a deep neural network model for cognitive decline prediction, using multimodal data, here both structural brain MRI images and clinical data. In this thesis we propose an architecture made of sub-modules tailored to each modality : 3D convolutional network for the brain MRI, and fully connected layers for the quantitative and qualitative clinical data. To predict the patient's evolution, this model takes as input data from two medical visits for each patient. These visits are compared using a siamese architecture. After training and validating this model with Alzheimer's disease as our use case, we look into knowledge transfer to other neuro-degenerative pathologies, and we use transfer learning to adapt our model to Parkinson's disease. Finally, we discuss the choices we made to take into account the temporal aspect of our problem, both during the ground truth creation using the long-term evolution of a cognitive score, and for the choice of using pairs of visits as input instead of longer sequences
Cohen-Hadria, Alice. "Estimation de descriptions musicales et sonores par apprentissage profond". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS607.
Texto completoIn Music Information Retrieval (MIR) and voice processing, the use of machine learning tools has become in the last few years more and more standard. Especially, many state-of-the-art systems now rely on the use of Neural Networks.In this thesis, we propose a wide overview of four different MIR and voice processing tasks, using systems built with neural networks. More precisely, we will use convolutional neural networks, an image designed class neural networks. The first task presented is music structure estimation. For this task, we will show how the choice of input representation can be critical, when using convolutional neural networks. The second task is singing voice detection. We will present how to use a voice detection system to automatically align lyrics and audio tracks.With this alignment mechanism, we have created the largest synchronized audio and speech data set, called DALI. Singing voice separation is the third task. For this task, we will present a data augmentation strategy, a way to significantly increase the size of a training set. Finally, we tackle voice anonymization. We will present an anonymization method that both obfuscate content and mask the speaker identity, while preserving the acoustic scene
Moukari, Michel. "Estimation de profondeur à partir d'images monoculaires par apprentissage profond". Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC211/document.
Texto completoComputer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation
Pham, Chi-Hieu. "Apprentisage profond pour la super-résolution et la segmentation d'images médicales". Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2018. http://www.theses.fr/2018IMTA0124/document.
Texto completoIn this thesis, our motivation is dedicated to studying the behaviors of different image representations and developing a method for super-resolution, cross-modal synthesis and segmentation of medical imaging. Super-Resolution aims to enhance the image resolution using single or multiple data acquisitions. In this work, we focus on single image super-resolution (SR) that estimates the high-resolution (HR) image from one corresponding low-resolution (LR) image. Increasing image resolution through SR is a key to more accurate understanding of the anatomy. The applications of super-resolution have been shown that applying super-resolution techniques leads to more accurate segmentation maps. Sometimes, certain tissue contrasts may not be acquired during the imaging session because of time-consuming, expensive costor lacking of devices. One possible solution is to use medical image cross-modal synthesis methods to generate the missing subject-specific scans in the desired target domain from the given source image domain. The objective of synthetic images is to improve other automatic medical image processing steps such as segmentation, super-resolution or registration. In this thesis, convolutional neural networks are applied to super-resolution and cross-modal synthesis in the context of supervised learning. In addition, an attempt to apply generative adversarial networks for unpaired cross-modal synthesis brain MRI is described. Results demonstrate the potential of deep learning methods with respect to practical medical applications
Routhier, Etienne. "Conception de séquences génomiques artificielles chez la levure par apprentissage profond". Thesis, Sorbonne université, 2021. http://www.theses.fr/2021SORUS465.
Texto completoRecent technological advances in the field of biotechnologies such as CRISPR and the de novo DNA oligonucleotides synthesis now make it possible to modify precisely and intensively genomes. Projects aiming to design partially or completely synthetic genomes, in particular yeast genomes, have been developed by taking advantage of these technologies. However, to achieve this goal it is necessary to control the activity of artificial sequences, which remains a challenge today. Fortunately, the recent emergence of deep learning methodologies able to recognize the genomic function associated to a DNA sequence seems to provide a powerful tool for anticipating the activity of synthetic genomes and facilitating their design. In this perspective, we propose to use deep learning methodologies in order to design synthetic yeast sequences controlling the local structure of the genome. In particular, I will present the methodology we have developed in order to design synthetic sequences precisely positioning nucleosomes - a molecule determining the structure of DNA at the lowest scale - in yeast. I will also show that this methodology opens up the prospect of designing sequences controlling the immediately higher level of structure: loops. The design of sequences controlling the local structure makes it possible to precisely identify the determinants of this structure
Zimmer, Matthieu. "Apprentissage par renforcement développemental". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0008/document.
Texto completoReinforcement learning allows an agent to learn a behavior that has never been previously defined by humans. The agent discovers the environment and the different consequences of its actions through its interaction: it learns from its own experience, without having pre-established knowledge of the goals or effects of its actions. This thesis tackles how deep learning can help reinforcement learning to handle continuous spaces and environments with many degrees of freedom in order to solve problems closer to reality. Indeed, neural networks have a good scalability and representativeness. They make possible to approximate functions on continuous spaces and allow a developmental approach, because they require little a priori knowledge on the domain. We seek to reduce the amount of necessary interaction of the agent to achieve acceptable behavior. To do so, we proposed the Neural Fitted Actor-Critic framework that defines several data efficient actor-critic algorithms. We examine how the agent can fully exploit the transitions generated by previous behaviors by integrating off-policy data into the proposed framework. Finally, we study how the agent can learn faster by taking advantage of the development of his body, in particular, by proceeding with a gradual increase in the dimensionality of its sensorimotor space
Dahmani, Sara. "Synthèse audiovisuelle de la parole expressive : modélisation des émotions par apprentissage profond". Electronic Thesis or Diss., Université de Lorraine, 2020. http://www.theses.fr/2020LORR0137.
Texto completo: The work of this thesis concerns the modeling of emotions for expressive audiovisual textto-speech synthesis. Today, the results of text-to-speech synthesis systems are of good quality, however audiovisual synthesis remains an open issue and expressive synthesis is even less studied. As part of this thesis, we present an emotions modeling method which is malleable and flexible, and allows us to mix emotions as we mix shades on a palette of colors. In the first part, we present and study two expressive corpora that we have built. The recording strategy and the expressive content of these corpora are analyzed to validate their use for the purpose of audiovisual speech synthesis. In the second part, we present two neural architectures for speech synthesis. We used these two architectures to model three aspects of speech : 1) the duration of sounds, 2) the acoustic modality and 3) the visual modality. First, we use a fully connected architecture. This architecture allowed us to study the behavior of neural networks when dealing with different contextual and linguistic descriptors. We were also able to analyze, with objective measures, the network’s ability to model emotions. The second neural architecture proposed is a variational auto-encoder. This architecture is able to learn a latent representation of emotions without using emotion labels. After analyzing the latent space of emotions, we presented a procedure for structuring it in order to move from a discrete representation of emotions to a continuous one. We were able to validate, through perceptual experiments, the ability of our system to generate emotions, nuances of emotions and mixtures of emotions, and this for expressive audiovisual text-to-speech synthesis
Zhang, Yifei. "Real-time multimodal semantic scene understanding for autonomous UGV navigation". Thesis, Bourgogne Franche-Comté, 2021. http://www.theses.fr/2021UBFCK002.
Texto completoRobust semantic scene understanding is challenging due to complex object types, as well as environmental changes caused by varying illumination and weather conditions. This thesis studies the problem of deep semantic segmentation with multimodal image inputs. Multimodal images captured from various sensory modalities provide complementary information for complete scene understanding. We provided effective solutions for fully-supervised multimodal image segmentation and few-shot semantic segmentation of the outdoor road scene. Regarding the former case, we proposed a multi-level fusion network to integrate RGB and polarimetric images. A central fusion framework was also introduced to adaptively learn the joint representations of modality-specific features and reduce model uncertainty via statistical post-processing.In the case of semi-supervised semantic scene understanding, we first proposed a novel few-shot segmentation method based on the prototypical network, which employs multiscale feature enhancement and the attention mechanism. Then we extended the RGB-centric algorithms to take advantage of supplementary depth cues. Comprehensive empirical evaluations on different benchmark datasets demonstrate that all the proposed algorithms achieve superior performance in terms of accuracy as well as demonstrating the effectiveness of complementary modalities for outdoor scene understanding for autonomous navigation
Bhattarai, Binod. "Développement de méthodes de rapprochement physionomique par apprentissage machine". Caen, 2016. https://hal.archives-ouvertes.fr/tel-01467985.
Texto completoThe work presented in this PhD thesis takes place in the general context of face matching. More precisely, our goal is to design and develop novel algorithms to learn compact, discriminative, domain invariant or de-identifying representations of faces. Searching and indexing faces open the door to many interesting applications. However, this is made day after day more challenging due to the rapid growth of the volume of faces to analyse. Representing faces by compact and discriminative features is consequently es- sential to deal with such very large datasets. Moreover, this volume is increasing without any apparent limits; this is why it is also relevant to propose solutions to organise faces in meaningful ways, in order to reduce the search space and improve efficiency of the retrieval. Although the volume of faces available on the internet is increasing, it is still difficult to find annotated examples to train models for each possible use cases e. G. For different races, sexes, etc. For every specifie task. Learning a model with training examples from a group of people can fail to predict well in another group due to the uneven rate of changes of biometrie dimensions e. G. , ageing, among them. Similarly, a modellean1ed from a type of feature can fail to make good predictions when tested with another type of feature. It would be ideal to have models producing face representations that would be invariant to these discrepancies. Learning common representations ultimately helps to reduce the domain specifie parameters and, more important!y, allows to use training examples from domains weil represented to other demains. Hence, there is a need for designing algorithms to map the features from different domains to a common subspace -bringing faces bearing same properties closer. On the other band, as automatic face matching tools are getting smarter and smarter, there is an increasing threat on privacy. The popularity in photo sharing on the social networks has exacerbated this risk. In such a context, altering the representations of faces so that the faces cannot be identified by automatic face matchers -while the faces look as similar as before -has become an interesting perspective toward privacy protection. It allows users to limit the risk of sharing their photos in social networks. In ali these scenarios, we explored how the use of Metric Leaming methods as weil as those of Deep Learning can help us to leam compact and discriminative representations of faces. We build on these tools, proposing compact, discriminative, domain invariant representations and de-identifying representations of faces crawled from Flicker. Corn to LFW and generated a novel and more challenging dataset to evaluate our algorithms in large-scale. We applied the proposed methods on a wide range of facial analysing applications. These applications include: large-scale face retrieval, age estimation, attribute predictions and identity de-identification. We have evaluated our algorithms on standard and challenging public datasets such as: LFW, CelebA, MORPH II etc. Moreover, we appended lM faces crawled from Flicker. Corn to LFW and generated a novel and more challenging dataset to evaluate our algorithms in large-scale. Our experiments show that the proposed methods are more accurate and more efficient than compared competitive baselines and existing state-of-art methods, and attain new state-of-art performance
Martinez, Coralie. "Classification précoce de séquences temporelles par de l'apprentissage par renforcement profond". Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAT123.
Texto completoEarly classification (EC) of time series is a recent research topic in the field of sequential data analysis. It consists in assigning a label to some data that is sequentially collected with new data points arriving over time, and the prediction of a label has to be made using as few data points as possible in the sequence. The EC problem is of paramount importance for supporting decision makers in many real-world applications, ranging from process control to fraud detection. It is particularly interesting for applications concerned with the costs induced by the acquisition of data points, or for applications which seek for rapid label prediction in order to take early actions. This is for example the case in the field of health, where it is necessary to provide a medical diagnosis as soon as possible from the sequence of medical observations collected over time. Another example is predictive maintenance with the objective to anticipate the breakdown of a machine from its sensor signals. In this doctoral work, we developed a new approach for this problem, based on the formulation of a sequential decision making problem, that is the EC model has to decide between classifying an incomplete sequence or delaying the prediction to collect additional data points. Specifically, we described this problem as a Partially Observable Markov Decision Process noted EC-POMDP. The approach consists in training an EC agent with Deep Reinforcement Learning (DRL) in an environment characterized by the EC-POMDP. The main motivation for this approach was to offer an end-to-end model for EC which is able to simultaneously learn optimal patterns in the sequences for classification and optimal strategic decisions for the time of prediction. Also, the method allows to set the importance of time against accuracy of the classification in the definition of rewards, according to the application and its willingness to make this compromise. In order to solve the EC-POMDP and model the policy of the EC agent, we applied an existing DRL algorithm, the Double Deep-Q-Network algorithm, whose general principle is to update the policy of the agent during training episodes, using a replay memory of past experiences. We showed that the application of the original algorithm to the EC problem lead to imbalanced memory issues which can weaken the training of the agent. Consequently, to cope with those issues and offer a more robust training of the agent, we adapted the algorithm to the EC-POMDP specificities and we introduced strategies of memory management and episode management. In experiments, we showed that these contributions improved the performance of the agent over the original algorithm, and that we were able to train an EC agent which compromised between speed and accuracy, on each sequence individually. We were also able to train EC agents on public datasets for which we have no expertise, showing that the method is applicable to various domains. Finally, we proposed some strategies to interpret the decisions of the agent, validate or reject them. In experiments, we showed how these solutions can help gain insight in the choice of action made by the agent
Bilodeau, Anthony. "Apprentissage faiblement supervisé appliqué à la segmentation d'images de protéines neuronales". Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/39752.
Texto completoThèse ou mémoire avec insertion d'articles
Tableau d'honneur de la Faculté des études supérieures et postdoctorales, 2020-2021
En biologie cellulaire, la microscopie optique est couramment utilisée pour visualiser et caractériser la présence et la morphologie des structures biologiques. Suite à l’acquisition, un expert devra effectuer l’annotation des structures pour quantification. Cette tâche est ardue, requiert de nombreuses heures de travail, parfois répétitif, qui peut résulter en erreurs d’annotations causées par la fatigue d’étiquetage. L’apprentissage machine promet l’automatisation de tâches complexes à partir d’un grand lot de données exemples annotés. Mon projet de maîtrise propose d’utiliser des techniques faiblement supervisées, où les annotations requises pour l’entraînement sont réduites et/ou moins précises, pour la segmentation de structures neuronales. J’ai d’abord testé l’utilisation de polygones délimitant la structure d’intérêt pour la tâche complexe de segmentation de la protéine neuronale F-actine dans des images de microscopie à super-résolution. La complexité de la tâche est supportée par la morphologie hétérogène des neurones, le nombre élevé d’instances à segmenter dans une image et la présence de nombreux distracteurs. Malgré ces difficultés, l’utilisation d’annotations faibles a permis de quantifier un changement novateur de la conformation de la protéine F-actine en fonction de l’activité neuronale. J’ai simplifié davantage la tâche d’annotation en requérant seulement des étiquettes binaires renseignant sur la présence des structures dans l’image réduisant d’un facteur 30 le temps d’annotation. De cette façon, l’algorithme est entraîné à prédire le contenu d’une image et extrait ensuite les caractéristiques sémantiques importantes pour la reconnaissance de la structure d’intérêt à l’aide de mécanismes d’attention. La précision de segmentation obtenue sur les images de F-actine est supérieure à celle des annotations polygonales et équivalente à celle des annotations précises d’un expert. Cette nouvelle approche devrait faciliter la quantification des changements dynamiques qui se produisent sous le microscope dans des cellules vivantes et réduire les erreurs causées par l’inattention ou le biais de sélection des régions d’intérêt dans les images de microscopie.
In cell biology, optical microscopy is commonly used to visualize and characterize the presenceand morphology of biological structures. Following the acquisition, an expert will have toannotate the structures for quantification. This is a difficult task, requiring many hours ofwork, sometimes repetitive, which can result in annotation errors caused by labelling fatigue.Machine learning promises to automate complex tasks from a large set of annotated sampledata. My master’s project consists of using weakly supervised techniques, where the anno-tations required for training are reduced and/or less precise, for the segmentation of neuralstructures.I first tested the use of polygons delimiting the structure of interest for the complex taskof segmentation of the neuronal protein F-actin in super-resolution microscopy images. Thecomplexity of the task is supported by the heterogeneous morphology of neurons, the highnumber of instances to segment in an image and the presence of many distractors. Despitethese difficulties, the use of weak annotations has made it possible to quantify an innovativechange in the conformation of the F-actin protein as a function of neuronal activity. I furthersimplified the annotation task by requiring only binary labels that indicate the presence ofstructures in the image, reducing annotation time by a factor of 30. In this way, the algorithmis trained to predict the content of an image and then extract the semantic characteristicsimportant for recognizing the structure of interest using attention mechanisms. The segmen-tation accuracy obtained on F-actin images is higher than that of polygonal annotations andequivalent to that of an expert’s precise annotations. This new approach should facilitate thequantification of dynamic changes that occur under the microscope in living cells and reduceerrors caused by inattention or bias in the selection of regions of interest in microscopy images.
Philip, Julien. "Édition et rendu à base d’images multi-vues par apprentissage profond et optimisation". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4048.
Texto completoComputer-generated imagery (CGI) takes a growing place in our everyday environment. Whether it is in video games or movies, CGI techniques are constantly improving in quality but also require ever more qualitative artistic content which takes a growing time to create. With the emergence of virtual and augmented reality, often comes the need to render or re-render assets that exist in our world. To allow widespread use of CGI in applications such as telepresence or virtual visits, the need for manual artistic replication of assets must be removed from the process. This can be done with the help of Image-Based Rendering (IBR) techniques that allow scenes or objects to be rendered in a free-viewpoint manner from a set of sparse input photographs. While this process requires little to no artistic work, it also does not allow for artistic control or editing of scene content. In this dissertation, we explore Multi-view Image Editing and Rendering. To allow casually captured scenes to be rendered with content alterations such as object removal, lighting edition, or scene compositing, we leverage the use of optimization techniques and modern deep-learning. We design our methods to take advantage of all the information present in multi-view content while handling specific constraints such as multi-view coherency. For object removal, we introduce a new plane-based multi-view inpainting algorithm. Planes are a simple yet effective way to fill geometry and they naturally enforce multi-view coherency as inpainting is computed in a shared rectified texture space, allowing us to correctly respect perspective. We demonstrate instance-based object removal at the scale of a street in scenes composed of several hundreds of images. We next address outdoor relighting with a learning-based algorithm that efficiently allows the illumination in a scene to be changed, while removing and synthesizing cast shadows for any given sun position and accounting for global illumination. An approximate geometric proxy built using multi-view stereo is used to generate illumination and shadow related image buffers that guide a neural network. We train this network on a set of synthetic scenes allowing full supervision of the learning pipeline. Careful data augmentation allows our network to transfer to real scenes and provides state of the art relighting results. We also demonstrate the capacity of this network to be used to compose real scenes captured under different lighting conditions and orientation. We then present contributions to image-based rendering quality. We discuss how our carefully designed depth-map meshing and simplification algorithm improve rendering performance and quality of a new learning-based IBR method. Finally, we present a method that combines relighting, IBR, and material analysis. To enable relightable IBR with accurate glossy effects, we extract both material appearance variations and qualitative texture information from multi-view content in the form of several IBR heuristics. We further combine them with path-traced irradiance images that specify the input and target lighting. This combination allows a neural network to be trained to implicitly extract material properties and produce realistic-looking relit viewpoints. Separating diffuse and specular supervision is crucial in obtaining high-quality output
Trullo, Ramirez Roger. "Approche basées sur l'apprentissage en profondeur pour la segmentation des organes à risques dans les tomodensitométries thoraciques". Thesis, Normandie, 2018. http://www.theses.fr/2018NORMR063.
Texto completoRadiotherapy is one of the options for treatment currently available for patients affected by cancer, one of the leading cause of deaths worldwide. Before radiotherapy, organs at risk (OAR) located near the target tumor, such as the heart, the lungs, the esophagus, etc. in thoracic cancer, must be outlined, in order to minimize the quantity of irradiation that they receive during treatment. Today, segmentation of the OAR is performed mainly manually by clinicians on Computed Tomography (CT) images, despite some partial software support. It is a tedious task, prone to intra and inter-observer variability. In this work, we present several frameworks using deep learning techniques to automatically segment the heart, trachea, aorta and esophagus. In particular, the esophagus is notably challenging to segment, due to the lack of surrounding contrast and shape variability across different patients. As deep networks and in particular fully convolutional networks offer now state of the art performance for semantic segmentation, we first show how a specific type of architecture based on skip connections can improve the accuracy of the results. As a second contribution, we demonstrate that context information can be of vital importance in the segmentation task, where we propose the use of two collaborative networks. Third, we propose a different, distance aware representation of the data, which is then used in junction with adversarial networks, as another way to constrain the anatomical context. All the proposed methods have been tested on 60 patients with 3D-CT scans, showing good performance compared with other methods
Zhang, Jian. "Modèles de Mobilité de Véhicules par Apprentissage Profond dans les Systèmes de Tranport Intelligents". Thesis, Ecole centrale de Lille, 2018. http://www.theses.fr/2018ECLI0015/document.
Texto completoThe intelligent transportation systems gain great research interests in recent years. Although the realistic traffic simulation plays an important role, it has not received enough attention. This thesis is devoted to studying the traffic simulation in microscopic level, and proposes corresponding vehicular mobility models. Using deep learning methods, these mobility models have been proven with a promising credibility to represent the vehicles in real-world. Firstly, a data-driven neural network based mobility model is proposed. This model comes from real-world trajectory data and allows mimicking local vehicle behaviors. By analyzing the performance of this basic learning based mobility model, we indicate that an improvement is possible and we propose its specification. An HMM is then introduced. The preparation of this integration is necessary, which includes an examination of traditional dynamics based mobility models and the adaptation method of “classical” models to our situation. At last, the enhanced model is presented, and a sophisticated scenario simulation is built with it to validate the theoretical results. The performance of our mobility model is promising and implementation issues have also been discussed
Godet, Pierre. "Approches par apprentissage pour l’estimation de mouvement multiframe en vidéo". Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG005.
Texto completoThis work concerns the use of temporal information on a sequence of more than two images for optical flow estimation. Optical flow is defined as the dense field (in any pixel) of the apparent movements in the image plane. We study on the one hand the use of a basis of temporal models, learned by principal component analysis from the studied data, to model the temporal dependence of the movement. This first study focuses on the context of particle image velocimetry in fluid mechanics. On the other hand, the new state of the art of optical flow estimation having recently been established by methods based on deep learning, we train convolutional neural networks to estimate optical flow by taking advantage of temporal continuity, in the case of natural image sequences. We then propose STaRFlow, a convolutional neural network exploiting a memory of information from the past by using a temporal recurrence. By repeated application of the same recurrent cell, the same learned parameters are used for the different time steps and for the different levels of a multiscale process. This architecture is lighter than competing networks while giving STaRFlow state-of-the-art performance. In the course of our work, we highlight several cases where the use of temporal information improves the quality of the estimation, in particular in the presence of occlusions, when the image quality is degraded (blur, noise), or in the case of thin objects
Léon, Aurélia. "Apprentissage séquentiel budgétisé pour la classification extrême et la découverte de hiérarchie en apprentissage par renforcement". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS226.
Texto completoThis thesis deals with the notion of budget to study problems of complexity (it can be computational complexity, a complex task for an agent, or complexity due to a small amount of data). Indeed, the main goal of current techniques in machine learning is usually to obtain the best accuracy, without worrying about the cost of the task. The concept of budget makes it possible to take into account this parameter while maintaining good performances. We first focus on classification problems with a large number of classes: the complexity in those algorithms can be reduced thanks to the use of decision trees (here learned through budgeted reinforcement learning techniques) or the association of each class with a (binary) code. We then deal with reinforcement learning problems and the discovery of a hierarchy that breaks down a (complex) task into simpler tasks to facilitate learning and generalization. Here, this discovery is done by reducing the cognitive effort of the agent (considered in this work as equivalent to the use of an additional observation). Finally, we address problems of understanding and generating instructions in natural language, where data are available in small quantities: we test for this purpose the simultaneous use of an agent that understands and of an agent that generates the instructions
Cámara, Chávez Guillermo. "Analyse du contenu vidéo par apprentissage actif". Cergy-Pontoise, 2007. http://www.theses.fr/2007CERG0380.
Texto completoThis thesis presents work towards a unified framework for semi-automated video indexing and interactive retrieval. To create an efficient index, a set of representative key frames are selected from the entire video content. We developed an automatic shot boundary detection algorithm to get rid of parameters and thresholds. We adopted a SVM classifier due to its ability to use very high dimensional feature spaces while at the same time keeping strong generalization guarantees from few training examples. We deeply evaluated the combination of features and kernels and present interesting results obtained, for shot extraction TRECVID 2006 Task. We then propose an interactive video retrieval system: RETINVID, to significantly reduce the number of key frames annotated by the user. The key frames are selected based on their ability to increase the knowledge of the data. We perform an experiment against the 2005 TRECVID benchmark for high-level task
Brenon, Alexis. "Modèle profond pour le contrôle vocal adaptatif d'un habitat intelligent". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAM057/document.
Texto completoSmart-homes, resulting of the merger of home-automation, ubiquitous computing and artificial intelligence, support inhabitants in their activity of daily living to improve their quality of life.Allowing dependent and aged people to live at home longer, these homes provide a first answer to society problems as the dependency tied to the aging population.In voice controlled home, the home has to answer to user's requests covering a range of automated actions (lights, blinds, multimedia control, etc.).To achieve this, the control system of the home need to be aware of the context in which a request has been done, but also to know user habits and preferences.Thus, the system must be able to aggregate information from a heterogeneous home-automation sensors network and take the (variable) user behavior into account.The development of smart home control systems is hard due to the huge variability regarding the home topology and the user habits.Furthermore, the whole set of contextual information need to be represented in a common space in order to be able to reason about them and make decisions.To address these problems, we propose to develop a system which updates continuously its model to adapt itself to the user and which uses raw data from the sensors through a graphical representation.This new method is particularly interesting because it does not require any prior inference step to extract the context.Thus, our system uses deep reinforcement learning; a convolutional neural network allowing to extract contextual information and reinforcement learning used for decision-making.Then, this memoir presents two systems, a first one only based on reinforcement learning showing limits of this approach against real environment with thousands of possible states.Introduction of deep learning allowed to develop the second one, ARCADES, which gives good performances proving that this approach is relevant and opening many ways to improve it
Thomas, Hugues. "Apprentissage de nouvelles représentations pour la sémantisation de nuages de points 3D". Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEM048/document.
Texto completoIn the recent years, new technologies have allowed the acquisition of large and precise 3D scenes as point clouds. They have opened up new applications like self-driving vehicles or infrastructure monitoring that rely on efficient large scale point cloud processing. Convolutional deep learning methods cannot be directly used with point clouds. In the case of images, convolutional filters brought the ability to learn new representations, which were previously hand-crafted in older computer vision methods. Following the same line of thought, we present in this thesis a study of hand-crafted representations previously used for point cloud processing. We propose several contributions, to serve as basis for the design of a new convolutional representation for point cloud processing. They include a new definition of multiscale radius neighborhood, a comparison with multiscale k-nearest neighbors, a new active learning strategy, the semantic segmentation of large scale point clouds, and a study of the influence of density in multiscale representations. Following these contributions, we introduce the Kernel Point Convolution (KPConv), which uses radius neighborhoods and a set of kernel points to play the role of the kernel pixels in image convolution. Our convolutional networks outperform state-of-the-art semantic segmentation approaches in almost any situation. In addition to these strong results, we designed KPConv with a great flexibility and a deformable version. To conclude our argumentation, we propose several insights on the representations that our method is able to learn
Chandra, Siddhartha. "Apprentissage Profond pour des Prédictions Structurées Efficaces appliqué à la Classification Dense en Vision par Ordinateur". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLC033/document.
Texto completoIn this thesis we propose a structured prediction technique that combines the virtues of Gaussian Conditional Random Fields (G-CRFs) with Convolutional Neural Networks (CNNs). The starting point of this thesis is the observation that while being of a limited form GCRFs allow us to perform exact Maximum-APosteriori (MAP) inference efficiently. We prefer exactness and simplicity over generality and advocate G-CRF based structured prediction in deep learning pipelines. Our proposed structured prediction methods accomodate (i) exact inference, (ii) both shortand long- term pairwise interactions, (iii) rich CNN-based expressions for the pairwise terms, and (iv) end-to-end training alongside CNNs. We devise novel implementation strategies which allow us to overcome memory and computational challenges
Tasar, Onur. "Des images satellites aux cartes vectorielles". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4063.
Texto completoWith the help of significant technological developments over the years, it has been possible to collect massive amounts of remote sensing data. For example, the constellations of various satellites are able to capture large amounts of remote sensing images with high spatial resolution as well as rich spectral information over the globe. The availability of such huge volume of data has opened the door to numerous applications and raised many challenges. Among these challenges, automatically generating accurate maps has become one of the most interesting and long-standing problems, since it is a crucial process for a wide range of applications in domains such as urban monitoring and management, precise agriculture, autonomous driving, and navigation.This thesis seeks for developing novel approaches to generate vector maps from remote sensing images. To this end, we split the task into two sub-stages. The former stage consists in generating raster maps from remote sensing images by performing pixel-wise classification using advanced deep learning techniques. The latter stage aims at converting raster maps to vector ones by leveraging computational geometry approaches. This thesis addresses the challenges that are commonly encountered within both stages. Although previous research has shown that convolutional neural networks (CNNs) are able to generate excellent maps when training data are representative for test data, their performance significantly drops when there exists a large distribution difference between training and test images. In the first stage of our pipeline, we mainly aim at overcoming limited generalization abilities of CNNs to perform large-scale classification. We also explore a way of leveraging multiple data sets collected at different times with annotations for separate classes to train CNNs that can generate maps for all the classes.In the second part, we propose a method that vectorizes raster maps to integrate them into geographic information systems applications, which completes our processing pipeline. Throughout this thesis, we experiment on a large number of very high resolution satellite and aerial images. Our experiments demonstrate robustness and scalability of the proposed methods
Carrara, Nicolas. "Reinforcement learning for dialogue systems optimization with user adaptation". Thesis, Lille 1, 2019. http://www.theses.fr/2019LIL1I071/document.
Texto completoThe most powerful artificial intelligence systems are now based on learned statistical models. In order to build efficient models, these systems must collect a huge amount of data on their environment. Personal assistants, smart-homes, voice-servers and other dialogue applications are no exceptions to this statement. A specificity of those systems is that they are designed to interact with humans, and as a consequence, their training data has to be collected from interactions with these humans. As the number of interactions with a single person is often too scarce to train a proper model, the usual approach to maximise the amount of data consists in mixing data collected with different users into a single corpus. However, one limitation of this approach is that, by construction, the trained models are only efficient with an "average" human and do not include any sort of adaptation; this lack of adaptation makes the service unusable for some specific group of persons and leads to a restricted customers base and inclusiveness problems. This thesis proposes solutions to construct Dialogue Systems that are robust to this problem by combining Transfer Learning and Reinforcement Learning. It explores two main ideas: The first idea of this thesis consists in incorporating adaptation in the very first dialogues with a new user. To that extend, we use the knowledge gathered with previous users. But how to scale such systems with a growing database of user interactions? The first proposed approach involves clustering of Dialogue Systems (tailored for their respective user) based on their behaviours. We demonstrated through handcrafted and real user-models experiments how this method improves the dialogue quality for new and unknown users. The second approach extends the Deep Q-learning algorithm with a continuous transfer process.The second idea states that before using a dedicated Dialogue System, the first interactions with a user should be handled carefully by a safe Dialogue System common to all users. The underlying approach is divided in two steps. The first step consists in learning a safe strategy through Reinforcement Learning. To that extent, we introduced a budgeted Reinforcement Learning framework for continuous state space and the underlying extensions of classic Reinforcement Learning algorithms. In particular, the safe version of the Fitted-Q algorithm has been validated, in term of safety and efficiency, on a dialogue system tasks and an autonomous driving problem. The second step consists in using those safe strategies when facing new users; this method is an extension of the classic ε-greedy algorithm
Ujjwal, Ujjwal. "Gestion du compromis vitesse-précision dans les systèmes de détection de piétons basés sur apprentissage profond". Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4087.
Texto completoThe main objective of this thesis is to improve the detection performance of deep learning based pedestrian detection systems without sacrificing detection speed. Detection speed and accuracy are traditionally known to be at trade-off with one another. Thus, this thesis aims to handle this trade-off in a way that amounts to faster and better pedestrian detection. To achieve this, we first conduct a systematic quantitative analysis of various deep learning techniques with respect to pedestrian detection. This analysis allows us to identify the optimal configuration of various deep learning components of a pedestrian detection pipeline. We then consider the important question of convolutional layer selection for pedestrian detection and propose a pedestrian detection system called Multiple-RPN, which utilizes multiple convolutional layers simultaneously. We propose Multiple-RPN in two configurations -- early-fused and late-fused; and go on to demonstrate that early fusion is a better approach than late fusion for detection across scales and occlusion levels of pedestrians. This work furthermore, provides a quantitative demonstration of the selectivity of various convolutional layers to pedestrian scale and occlusion levels. We next, integrate the early fusion approach with that of pseudo-semantic segmentation to reduce the number of processing operations. In this approach, pseudo-semantic segmentation is shown to reduce false positives and false negatives. This coupled with reduced number of processing operations results in improved detection performance and speed (~20 fps) simultaneously; performing at state-of-art level on caltechreasonable (3.79% miss-rate) and citypersons (7.19% miss-rate) datasets. The final contribution in this thesis is that of an anchor classification layer, which further reduces the number of processing operations for detection. The result is doubling of detection speed (~40 fps) with a minimal loss in detection performance (3.99% and 8.12% miss-rate in caltech-reasonable and citypersons datasets respectively) which is still at the state-of-art standard
Matte, Olivier. "Cartographie des forêts à haute valeur de stockage de carbone par apprentissage profond sur l’île de Bornéo". Master's thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/66791.
Texto completoForests in Southeast Asia are under heavy pressure from extensive land-use activities, including oil palm plantations. The desire to protect and manage habitats with high carbon storage potential has increased the need for preserving the unique ecosystems of local forests. To preserve tropical forest ecosystems from agricultural expansion, a methodology for classifying forests with high carbon storage potential, known as the High Carbon Stock Approach (HCSA) was developed. Our research goal is to assess the effectiveness of the combined use of airborne LiDAR and deep learning for HCSA classification across the island of Borneo. To do this, we will examine the above-ground biomass using the equation developed by Asner (2018) and Jucker (2017), established in the Sabah territory, as well as LiDAR metrics such as canopy height, canopy cover, and the forest basal area. LiDAR metrics of forest structure will also be used to try to differentiate HCS classes. LiDAR data and field surveys were collected from the Jet Propulsion Laboratory (JPL -NASA). The area of interest for this study covers part of the Kalimantan territory (Indonesian part of Borneo). The data collected has been part of the ongoing Carbon Monitoring System (CMS) project. Then, the training of a deep learning algorithm will allow, by the use of satellite images (Landsat 7 and Landsat 8), to make a spatial and temporal jump, in order to establish a cartography of the forests to be monitored in 2019 and on the entirety of Borneo Island.
Gontier, Félix. "Analyse et synthèse de scènes sonores urbaines par approches d'apprentissage profond". Thesis, Ecole centrale de Nantes, 2020. http://www.theses.fr/2020ECDN0042.
Texto completoThe advent of the Internet of Things (IoT) has enabled the development of largescale acoustic sensor networks to continuously monitor sound environments in urban areas. In the soundscape approach, perceptual quality attributes are associated with the activity of sound sources, quantities of importance to better account for the human perception of its acoustic environment. With recent success in acoustic scene analysis, deep learning approaches are uniquely suited to predict these quantities. Though, annotations necessary to the training process of supervised deep learning models are not easily obtainable, partly due to the fact that the information content of sensor measurements is limited by privacy constraints. To address this issue, a method is proposed for the automatic annotation of perceived source activity in large datasets of simulated acoustic scenes. On simulated data, trained deep learning models achieve state-of-the-art performances in the estimation of sourcespecific perceptual attributes and sound pleasantness. Semi-supervised transfer learning techniques are further studied to improve the adaptability of trained models by exploiting knowledge from the large amounts of unlabelled sensor data. Evaluations on annotated in situ recordings show that learning latent audio representations of sensor measurements compensates for the limited ecological validity of simulated sound scenes. In a second part, the use of deep learning methods for the synthesis of time domain signals from privacy-aware sensor measurements is investigated. Two spectral convolutional approaches are developed and evaluated against state-of-the-art methods designed for speech synthesis
Montoya-Obeso, Abraham. "Reconnaissance du patrimoine Mexicaine sous forme numérique par des réseaux d'apprentissage profond". Thesis, Bordeaux, 2020. http://www.theses.fr/2020BORD0064.
Texto completoIn Mexico, one of the priority technological problems is the preservation of cultural heritage in its digital form. In this research, the main interest is the ordering, management and identification of intangible cultural heritage in images. In computer vision, the integration of the Human Visual System (HVS) into automatic learning methods and classifiers has become an intensive research field for object recognition and content mining. The so-called saliency maps, are defined as a topographic representation of visual attention on a scene, modeling attention instantaneously and assigning a degree of interest to each pixel value on the image. Saliency maps proved to be very efficient to point out regions of interest in several tasks of visual content and its understanding. In this context, we focus on the integration of visual attention models in the training pipeline of Deep Neural Networks (DNNs) for the recognition of Mexican architectural structures. We consider the main contributions of this research are in the following areas of interest: • Specific purpose dataset: gathering data related to the topic is a key task to solve the problem of architectural classification. • Data selection: we use saliency prediction methods to select and crop context-relevant regions on images. • Visual attention modeling: we annotate images through a real task of image observation, we record eye-fixations with an eye-tracker system to build subjective saliency maps. • Visual attention integration: we integrate visual attention in deep neural networks in two ways; i) to filter out features in a saliency-based pooling layer and ii) in attention mechanisms. In this research, different essential components for the training of a neural network are tackled down with the aim of recognizing Mexican cultural content and extrapolating these findings to large-scale databases in similar classification tasks, such as in ImageNet. Finally, we show that the integration of visual attention models generated through a psycho-visual experiment allows to reduce training time and improve performances in terms of accuracy