Rozprawy doktorskie na temat „Réseaux neuronaux convolutifs (CNN)”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 47 najlepszych rozpraw doktorskich naukowych na temat „Réseaux neuronaux convolutifs (CNN)”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Fernandez, Brillet Lucas. "Réseaux de neurones CNN pour la vision embarquée". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM043.
Pełny tekst źródłaRecently, Convolutional Neural Networks have become the state-of-the-art soluion(SOA) to most computer vision problems. In order to achieve high accuracy rates, CNNs require a high parameter count, as well as a high number of operations. This greatly complicates the deployment of such solutions in embedded systems, which strive to reduce memory size. Indeed, while most embedded systems are typically in the range of a few KBytes of memory, CNN models from the SOA usually account for multiple MBytes, or even GBytes in model size. Throughout this thesis, multiple novel ideas allowing to ease this issue are proposed. This requires to jointly design the solution across three main axes: Application, Algorithm and Hardware.In this manuscript, the main levers allowing to tailor computational complexity of a generic CNN-based object detector are identified and studied. Since object detection requires scanning every possible location and scale across an image through a fixed-input CNN classifier, the number of operations quickly grows for high-resolution images. In order to perform object detection in an efficient way, the detection process is divided into two stages. The first stage involves a region proposal network which allows to trade-off recall for the number of operations required to perform the search, as well as the number of regions passed on to the next stage. Techniques such as bounding box regression also greatly help reduce the dimension of the search space. This in turn simplifies the second stage, since it allows to reduce the task’s complexity to the set of possible proposals. Therefore, parameter counts can greatly be reduced.Furthermore, CNNs also exhibit properties that confirm their over-dimensionment. This over-dimensionement is one of the key success factors of CNNs in practice, since it eases the optimization process by allowing a large set of equivalent solutions. However, this also greatly increases computational complexity, and therefore complicates deploying the inference stage of these algorithms on embedded systems. In order to ease this problem, we propose a CNN compression method which is based on Principal Component Analysis (PCA). PCA allows to find, for each layer of the network independently, a new representation of the set of learned filters by expressing them in a more appropriate PCA basis. This PCA basis is hierarchical, meaning that basis terms are ordered by importance, and by removing the least important basis terms, it is possible to optimally trade-off approximation error for parameter count. Through this method, it is possible to compress, for example, a ResNet-32 network by a factor of ×2 both in the number of parameters and operations with a loss of accuracy <2%. It is also shown that the proposed method is compatible with other SOA methods which exploit other CNN properties in order to reduce computational complexity, mainly pruning, winograd and quantization. Through this method, we have been able to reduce the size of a ResNet-110 from 6.88Mbytes to 370kbytes, i.e. a x19 memory gain with a 3.9 % accuracy loss.All this knowledge, is applied in order to achieve an efficient CNN-based solution for a consumer face detection scenario. The proposed solution consists of just 29.3kBytes model size. This is x65 smaller than other SOA CNN face detectors, while providing equal detection performance and lower number of operations. Our face detector is also compared to a more traditional Viola-Jones face detector, exhibiting approximately an order of magnitude faster computation, as well as the ability to scale to higher detection rates by slightly increasing computational complexity.Both networks are finally implemented in a custom embedded multiprocessor, verifying that theorical and measured gains from PCA are consistent. Furthermore, parallelizing the PCA compressed network over 8 PEs achieves a x11.68 speed-up with respect to the original network running on a single PE
Deramgozin, Mohammadmahdi. "Développement de modèles de reconnaissance des expressions faciales à base d’apprentissage profond pour les applications embarquées". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0286.
Pełny tekst źródłaThe field of Facial Emotion Recognition (FER) is pivotal in advancing human-machine interactions and finds essential applications in healthcare for conditions like depression and anxiety. Leveraging Convolutional Neural Networks (CNNs), this thesis presents a progression of models aimed at optimizing emotion detection and interpretation. The initial model is resource-frugal but competes favorably with state-of-the-art solutions, making it a strong candidate for embedded systems constrained in computational and memory resources. To capture the complexity and ambiguity of human emotions, the research work presented in this thesis enhances this CNN-based foundational model by incorporating facial Action Units (AUs). This approach not only refines emotion detection but also provides interpretability by identifying specific AUs tied to each emotion. Further sophistication is achieved by introducing neural attention mechanisms—both spatial and channel-based—improving the model's focus on salient facial features. This makes the CNN-based model adapted well to real-world scenarios, such as partially obscured or subtle facial expressions. Based on the previous results, in this thesis we propose finally an optimized, yet computationally efficient, CNN model that is ideal for resource-limited environments like embedded systems. While it provides a robust solution for FER, this research also identifies perspectives for future work, such as real-time applications and advanced techniques for model interpretability
Abidi, Azza. "Investigating Deep Learning and Image-Encoded Time Series Approaches for Multi-Scale Remote Sensing Analysis in the context of Land Use/Land Cover Mapping". Electronic Thesis or Diss., Université de Montpellier (2022-....), 2024. http://www.theses.fr/2024UMONS007.
Pełny tekst źródłaIn this thesis, the potential of machine learning (ML) in enhancing the mapping of complex Land Use and Land Cover (LULC) patterns using Earth Observation data is explored. Traditionally, mapping methods relied on manual and time-consuming classification and interpretation of satellite images, which are susceptible to human error. However, the application of ML, particularly through neural networks, has automated and improved the classification process, resulting in more objective and accurate results. Additionally, the integration of Satellite Image Time Series(SITS) data adds a temporal dimension to spatial information, offering a dynamic view of the Earth's surface over time. This temporal information is crucial for accurate classification and informed decision-making in various applications. The precise and current LULC information derived from SITS data is essential for guiding sustainable development initiatives, resource management, and mitigating environmental risks. The LULC mapping process using ML involves data collection, preprocessing, feature extraction, and classification using various ML algorithms. Two main classification strategies for SITS data have been proposed: pixel-level and object-based approaches. While both approaches have shown effectiveness, they also pose challenges, such as the inability to capture contextual information in pixel-based approaches and the complexity of segmentation in object-based approaches.To address these challenges, this thesis aims to implement a method based on multi-scale information to perform LULC classification, coupling spectral and temporal information through a combined pixel-object methodology and applying a methodological approach to efficiently represent multivariate SITS data with the aim of reusing the large amount of research advances proposed in the field of computer vision
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains". Electronic Thesis or Diss., Paris, ENST, 2017. http://www.theses.fr/2017ENST0071.
Pełny tekst źródłaThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains". Thesis, Paris, ENST, 2017. http://www.theses.fr/2017ENST0071/document.
Pełny tekst źródłaThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Garbay, Thomas. "Zip-CNN". Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS210.pdf.
Pełny tekst źródłaDigital systems used for the Internet of Things (IoT) and Embedded Systems have seen an increasing use in recent decades. Embedded systems based on Microcontroller Unit (MCU) solve various problems by collecting a lot of data. Today, about 250 billion MCU are in use. Projections in the coming years point to very strong growth. Artificial intelligence has seen a resurgence of interest in 2012. The use of Convolutional Neural Networks (CNN) has helped to solve many problems in computer vision or natural language processing. The implementation of CNN within embedded systems would greatly improve the exploitation of the collected data. However, the inference cost of a CNN makes their implementation within embedded systems challenging. This thesis focuses on exploring the solution space, in order to assist the implementation of CNN within embedded systems based on microcontrollers. For this purpose, the ZIP-CNN methodology is defined. It takes into account the embedded system and the CNN to be implemented. It provides an embedded designer with information regarding the impact of the CNN inference on the system. A designer can explore the impact of design choices, with the objective of respecting the constraints of the targeted application. A model is defined to quantitatively provide an estimation of the latency, the energy consumption and the memory space required to infer a CNN within an embedded target, whatever the topology of the CNN is. This model takes into account algorithmic reductions such as knowledge distillation, pruning or quantization. The implementation of state-of-the-art CNN within MCU verified the accuracy of the different estimations through an experimental process. This thesis democratize the implementation of CNN within MCU, assisting the designers of embedded systems. Moreover, the results open a way of exploration to apply the developed models to other target hardware, such as multi-core architectures or FPGA. The estimation results are also exploitable in the Neural Architecture Search (NAS)
Fourure, Damien. "Réseaux de neurones convolutifs pour la segmentation sémantique et l'apprentissage d'invariants de couleur". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSES056/document.
Pełny tekst źródłaComputer vision is an interdisciplinary field that investigates how computers can gain a high level of understanding from digital images or videos. In artificial intelligence, and more precisely in machine learning, the field in which this thesis is positioned,computer vision involves extracting characteristics from images and then generalizing concepts related to these characteristics. This field of research has become very popular in recent years, particularly thanks to the results of the convolutional neural networks that form the basis of so-called deep learning methods. Today, neural networks make it possible, among other things, to recognize different objects present in an image, to generate very realistic images or even to beat the champions at the Go game. Their performance is not limited to the image domain, since they are also used in other fields such as natural language processing (e. g. machine translation) or sound recognition. In this thesis, we study convolutional neural networks in order to develop specialized architectures and loss functions for low-level tasks (color constancy) as well as high-level tasks (semantic segmentation). Color constancy, is the ability of the human visual system to perceive constant colours for a surface despite changes in the spectrum of illumination (lighting change). In computer vision, the main approach consists in estimating the color of the illuminant and then suppressing its impact on the perceived color of objects. We approach the task of color constancy with the use of neural networks by developing a new architecture composed of a subsampling operator inspired by traditional methods. Our experience shows that our method makes it possible to obtain competitive performances with the state of the art. Nevertheless, our architecture requires a large amount of training data. In order to partially correct this problem and improve the training of neural networks, we present several techniques for artificial data augmentation. We are also making two contributions on a high-level issue : semantic segmentation. This task, which consists of assigning a semantic class to each pixel of an image, is a challenge in computer vision because of its complexity. On the one hand, it requires many examples of training that are costly to obtain. On the other hand, it requires the adaptation of traditional convolutional neural networks in order to obtain a so-called dense prediction, i. e., a prediction for each pixel present in the input image. To solve the difficulty of acquiring training data, we propose an approach that uses several databases annotated with different labels at the same time. To do this, we define a selective loss function that has the advantage of allowing the training of a convolutional neural network from data from multiple databases. We also developed self-context approach that captures the correlations between labels in different databases. Finally, we present our third contribution : a new convolutional neural network architecture called GridNet specialized for semantic segmentation. Unlike traditional networks, implemented with a single path from the input (image) to the output (prediction), our architecture is implemented as a 2D grid allowing several interconnected streams to operate at different resolutions. In order to exploit all the paths of the grid, we propose a technique inspired by dropout. In addition, we empirically demonstrate that our architecture generalize many of well-known stateof- the-art networks. We conclude with an analysis of the empirical results obtained with our architecture which, although trained from scratch, reveals very good performances, exceeding popular approaches often pre-trained
Suzano, Massa Francisco Vitor. "Mise en relation d'images et de modèles 3D avec des réseaux de neurones convolutifs". Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC1198/document.
Pełny tekst źródłaThe recent availability of large catalogs of 3D models enables new possibilities for a 3D reasoning on photographs. This thesis investigates the use of convolutional neural networks (CNNs) for relating 3D objects to 2D images.We first introduce two contributions that are used throughout this thesis: an automatic memory reduction library for deep CNNs, and a study of CNN features for cross-domain matching. In the first one, we develop a library built on top of Torch7 which automatically reduces up to 91% of the memory requirements for deploying a deep CNN. As a second point, we study the effectiveness of various CNN features extracted from a pre-trained network in the case of images from different modalities (real or synthetic images). We show that despite the large cross-domain difference between rendered views and photographs, it is possible to use some of these features for instance retrieval, with possible applications to image-based rendering.There has been a recent use of CNNs for the task of object viewpoint estimation, sometimes with very different design choices. We present these approaches in an unified framework and we analyse the key factors that affect performance. We propose a joint training method that combines both detection and viewpoint estimation, which performs better than considering the viewpoint estimation separately. We also study the impact of the formulation of viewpoint estimation either as a discrete or a continuous task, we quantify the benefits of deeper architectures and we demonstrate that using synthetic data is beneficial. With all these elements combined, we improve over previous state-of-the-art results on the Pascal3D+ dataset by a approximately 5% of mean average viewpoint precision.In the instance retrieval study, the image of the object is given and the goal is to identify among a number of 3D models which object it is. We extend this work to object detection, where instead we are given a 3D model (or a set of 3D models) and we are asked to locate and align the model in the image. We show that simply using CNN features are not enough for this task, and we propose to learn a transformation that brings the features from the real images close to the features from the rendered views. We evaluate our approach both qualitatively and quantitatively on two standard datasets: the IKEAobject dataset, and a subset of the Pascal VOC 2012 dataset of the chair category, and we show state-of-the-art results on both of them
Groueix, Thibault. "Learning 3D Generation and Matching". Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC1024.
Pełny tekst źródłaThe goal of this thesis is to develop deep learning approaches to model and analyse 3D shapes. Progress in this field could democratize artistic creation of 3D assets which currently requires time and expert skills with technical software.We focus on the design of deep learning solutions for two particular tasks, key to many 3D modeling applications: single-view reconstruction and shape matching.A single-view reconstruction (SVR) method takes as input a single image and predicts the physical world which produced that image. SVR dates back to the early days of computer vision. In particular, in the 1960s, Lawrence G. Roberts proposed to align simple 3D primitives to the input image under the assumption that the physical world is made of cuboids. Another approach proposed by Berthold Horn in the 1970s is to decompose the input image in intrinsic images and use those to predict the depth of every input pixel.Since several configurations of shapes, texture and illumination can explain the same image, both approaches need to form assumptions on the distribution of images and 3D shapes to resolve the ambiguity. In this thesis, we learn these assumptions from large-scale datasets instead of manually designing them. Learning allows us to perform complete object reconstruction, including parts which are not visible in the input image.Shape matching aims at finding correspondences between 3D objects. Solving this task requires both a local and global understanding of 3D shapes which is hard to achieve explicitly. Instead we train neural networks on large-scale datasets to solve this task and capture this knowledge implicitly through their internal parameters.Shape matching supports many 3D modeling applications such as attribute transfer, automatic rigging for animation, or mesh editing.The first technical contribution of this thesis is a new parametric representation of 3D surfaces modeled by neural networks.The choice of data representation is a critical aspect of any 3D reconstruction algorithm. Until recently, most of the approaches in deep 3D model generation were predicting volumetric voxel grids or point clouds, which are discrete representations. Instead, we present an alternative approach that predicts a parametric surface deformation ie a mapping from a template to a target geometry. To demonstrate the benefits of such a representation, we train a deep encoder-decoder for single-view reconstruction using our new representation. Our approach, dubbed AtlasNet, is the first deep single-view reconstruction approach able to reconstruct meshes from images without relying on an independent post-processing, and can do it at arbitrary resolution without memory issues. A more detailed analysis of AtlasNet reveals it also generalizes better to categories it has not been trained on than other deep 3D generation approaches.Our second main contribution is a novel shape matching approach purely based on reconstruction via deformations. We show that the quality of the shape reconstructions is critical to obtain good correspondences, and therefore introduce a test-time optimization scheme to refine the learned deformations. For humans and other deformable shape categories deviating by a near-isometry, our approach can leverage a shape template and isometric regularization of the surface deformations. As category exhibiting non-isometric variations, such as chairs, do not have a clear template, we learn how to deform any shape into any other and leverage cycle-consistency constraints to learn meaningful correspondences. Our reconstruction-for-matching strategy operates directly on point clouds, is robust to many types of perturbations, and outperforms the state of the art by 15% on dense matching of real human scans
Beltzung, Benjamin. "Utilisation de réseaux de neurones convolutifs pour mieux comprendre l’évolution et le développement du comportement de dessin chez les Hominidés". Electronic Thesis or Diss., Strasbourg, 2023. http://www.theses.fr/2023STRAJ114.
Pełny tekst źródłaThe study of drawing behavior can be highly informative, both cognitively and psychologically, in humans and other primates. However, this wealth of information can also be a challenge to analysis and interpretation, particularly in the absence of explanation or verbalization by the author of the drawing. Indeed, an adult's interpretation of a drawing may not be in line with the artist's original intention. During my thesis, I showed that, although generally regarded as black boxes, convolutional neural networks (CNNs) can provide a better understanding of the drawing behavior. Firstly, by using a CNN to classify drawings of a female orangutan according to their season of production, and highlighting variation in style and content. In addition, an ontogenetic approach was considered to quantify the similarity between productions from different age groups. In the future, more interpretable models and the application of new interpretability methods could be applied to better decipher drawing behavior
Chabot, Florian. "Analyse fine 2D/3D de véhicules par réseaux de neurones profonds". Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC018/document.
Pełny tekst źródłaIn this thesis, we are interested in fine-grained analysis of vehicle from an image. We define fine-grained analysis as the following concepts : vehicle detection in the image, vehicle viewpoint (or orientation) estimation, vehicle visibility characterization, vehicle 3D localization and make and model recognition. The design of reliable solutions for fine-grained analysis of vehicle open the door to multiple applications in particular for intelligent transport systems as well as video surveillance systems. In this work, we propose several contributions allowing to address partially or wholly this issue. Proposed approaches are based on joint deep learning technologies and 3D models. In a first section, we deal with make and model classification keeping in mind the difficulty to create training data. In a second section, we investigate a novel method for both vehicle detection and fine-grained viewpoint estimation based on local apparence features and geometric spatial coherence. It uses models learned only on synthetic data. Finally, in a third section, a complete system for fine-grained analysis is proposed. It is based on the multi-task concept. Throughout this report, we provide quantitative and qualitative results. On several aspects related to vehicle fine-grained analysis, this work allowed to outperform state of the art methods
Mabon, Jules. "Apprentissage de modèles de géométrie stochastique et réseaux de neurones convolutifs. Application à la détection d'objets multiples dans des jeux de données aérospatiales". Electronic Thesis or Diss., Université Côte d'Azur, 2023. http://www.theses.fr/2023COAZ4116.
Pełny tekst źródłaUnmanned aerial vehicles and low-orbit satellites, including CubeSats, are increasingly used for wide-area surveillance, generating substantial data for processing. Satellite imagery acquisition is susceptible to atmospheric disruptions, occlusions, and limited resolution, resulting in limited visual data for small object detection. However, the objects of interest (e.g., small vehicles) are unevenly distributed in the image: there are some priors on the structure of the configurations.In recent years, convolutional neural network (CNN) models have excelled at extracting information from images, especially texture details. Yet, modeling object interactions requires a significant increase in model complexity and parameters. CNN models generally treat interaction as a post-processing step.In contrast, point processes aim to simultaneously model each point's likelihood in relation to the image (data term) and their interactions (prior term). Most point process models rely on contrast measures (foreground vs. background) for their data terms, which work well with clearly contrasted objects and minimal background clutter. However, small vehicles in satellite images exhibit varying contrast levels and a diverse range of background and false alarm objects.In this PhD thesis, we propose harnessing CNN models information extraction abilities in combination with point process interaction models, using CNN outputs as data terms. Additionally, we introduce a unified method for estimating point process model parameters. Our model demonstrates excellent performance on multiple remote sensing datasets, providing geometric regularization and enhanced noise robustness, all with a minimal parameter footprint
Plesse, François. "Intégration de Connaissances aux Modèles Neuronaux pour la Détection de Relations Visuelles Rares". Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC1003.
Pełny tekst źródłaData shared throughout the world has a major impact on the lives of billions of people. It is critical to be able to analyse this data automatically in order to measure and alter its impact. This analysis is tackled by training deep neural networks, which have reached competitive results in many domains. In this work, we focus on the understanding of daily life images, in particular on the interactions between objects and people that are visible in images, which we call visual relations.To complete this task, neural networks are trained in a supervised manner. This involves minimizing an objective function that quantifies how detected relations differ from annotated ones. Performance of these models thus depends on how widely and accurately annotations cover the space of visual relations.However, existing annotations are not sufficient to train neural networks to detect uncommon relations. Thus we integrate knowledge into neural networks during the training phase. To do this, we model semantic relationships between visual relations. This provides a fuzzy set of relations that more accurately represents visible relations. Using the semantic similarities between relations, the model is able to learn to detect uncommon relations from similar and more common ones. However, the improved training does not always translate to improved detections, because the objective function does not capture the whole relation detection process. Thus during the inference phase, we combine knowledge to model predictions in order to predict more relevant relations, aiming to imitate the behaviour of human observers
Lorrain, Vincent. "Etude et conception de circuits innovants exploitant les caractéristiques des nouvelles technologies mémoires résistives". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS182/document.
Pełny tekst źródłaIn this thesis, we study the dedicated computational approaches of deep neural networks and more particularly the convolutional neural networks (CNN).We highlight the convolutional neural networks efficiency make them interesting choice for many applications. We study the different implementation possibilities of this type of networks in order to deduce their computational complexity. We show that the computational complexity of this type of structure can quickly become incompatible with embedded resources. To address this issue, we explored differents models of neurons and architectures that could minimize the resources required for the application. In a first step, our approach consisted in exploring the possible gains by changing the model of neurons. We show that the so-called spiking models theoretically reduce the computational complexity while offering interesting dynamic properties but require a complete rethinking of the hardware architecture. We then proposed our spiking approach to the computation of convolutional neural networks with an associated architecture. We have set up a software and hardware simulation chain in order to explore the different paradigms of computation and hardware implementation and evaluate their suitability with embedded environments. This chain allows us to validate the computational aspects but also to evaluate the relevance of our architectural choices. Our theoretical approach has been validated by our chain and our architecture has been simulated in 28 nm FDSOI. Thus we have shown that this approach is relatively efficient with interesting properties of scaling, dynamic precision and computational performance. In the end, the implementation of convolutional neural networks using spiking models seems to be promising for improving the networks efficiency. Moreover, it allows improvements by the addition of a non-supervised learning type STDP, the improvement of the spike coding or the efficient integration of RRAM memory
Tang, Daogui. "A simulation-based modeling framework for the analysis and protection of smart grids against false pricing attacks". Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPAST017.
Pełny tekst źródłaThe integration of information and communication technology (ICT) systems with power systems enables a two-way communication exchange between customers and utilities, which helps engaging customers in various demand-response (DR) programs of smart grids (SGs), such as time-of-use (TOU) pricing and real-time pricing (RTP). However, this makes SG cyber-physical system exposed to additional threats coming from the ICT layer. For this reason, the threat of cyber attacks of various types has become a major concern. In this context, the focus of the thesis is on the modeling of , detection of and defense from a specific type of cyber attacks to DR schemes, namely, false pricing attacks (FPAs). The study approaches the problem firstly by modeling FPAs initiated in social networks (SNs). The false electricity prices spreading process is described by a multi-level influence propagation model considering customers’ personality characteristics and information value. Monte Carlo simulation is utilized to account for the stochastic nature of the influence propagation process. Then, considering the integration of distributed renewable energy resources (DRERs) in the RTP context, we study FPAs where attackers manipulate realtime electricity prices by injecting false consumption and renewable generation information. A convolutional neural network (CNN)-based online detector is developed to detect the considered FPAs. Finally, to mitigate the impact of FPAs, an optimal defense strategy is defined, under limited resources. The dynamic interaction between attackers and defenders is modeled as a zero-sum Markov game where neither player has full information of the game model. A modelfree multi-agent reinforcement learning method is proposed to solve the game and find the Nash Equilibrium policies for both players. The thesis provides a simulationbased framework for modelling FPAs to smart grids. The findings of the thesis give insights into how FPAs can impact cyber-physical power systems by misleading a portion of customers in the electricity market and provide implications on how to mitigate such impact by detecting and defending the attacks
Abdelouahab, Kamel. "Reconfigurable hardware acceleration of CNNs on FPGA-based smart cameras". Thesis, Université Clermont Auvergne (2017-2020), 2018. http://www.theses.fr/2018CLFAC042/document.
Pełny tekst źródłaDeep Convolutional Neural Networks (CNNs) have become a de-facto standard in computer vision. This success came at the price of a high computational cost, making the implementation of CNNs, under real-time constraints, a challenging task.To address this challenge, the literature exploits the large amount of parallelism exhibited by these algorithms, motivating the use of dedicated hardware platforms. In power-constrained environments, such as smart camera nodes, FPGA-based processing cores are known to be adequate solutions in accelerating computer vision applications. This is especially true for CNN workloads, which have a streaming nature that suits well to reconfigurable hardware architectures.In this context, the following thesis addresses the problems of CNN mapping on FPGAs. In Particular, it aims at improving the efficiency of CNN implementations through two main optimization strategies; The first one focuses on the CNN model and parameters while the second one considers the hardware architecture and the fine-grain building blocks
Caracalla, Hugo. "Sound texture synthesis from summary statistics". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS676.
Pełny tekst źródłaSound textures are a wide class of sounds that includes the sound of the rain falling, the hubbub of a crowd and the chirping of flocks of birds. All these sounds present an element of unpredictability which is not commonly sought after in sound synthesis, requiring the use of dedicated algorithms. However, the diverse audio properties of sound textures make the designing of an algorithm able to convincingly recreate varied textures a complex task. This thesis focuses on parametric sound texture synthesis. In this paradigm, a set of summary statistics are extracted from a target texture and iteratively imposed onto a white noise. If the set of statistics is appropriate, the white noise is modified until it resemble the target, sounding as if it had been recorded moments later. In a first part, we propose improvements to perceptual-based parametric method. These improvements aim at making its synthesis of sharp and salient events by mainly altering and simplifying its imposition process. In a second, we adapt a parametric visual texture synthesis method based statistics extracted by a Convolutional Neural Networks (CNN) to work on sound textures. We modify the computation of its statistics to fit the properties of sound signals, alter the architecture of the CNN to best fit audio elements present in sound textures and use a time-frequency representation taking both magnitude and phase into account
Ducoffe, Mélanie. "Active learning et visualisation des données d'apprentissage pour les réseaux de neurones profonds". Thesis, Université Côte d'Azur (ComUE), 2018. http://www.theses.fr/2018AZUR4115/document.
Pełny tekst źródłaOur work is presented in three separate parts which can be read independently. Firstly we propose three active learning heuristics that scale to deep neural networks: We scale query by committee, an ensemble active learning methods. We speed up the computation time by sampling a committee of deep networks by applying dropout on the trained model. Another direction was margin-based active learning. We propose to use an adversarial perturbation to measure the distance to the margin. We also establish theoretical bounds on the convergence of our Adversarial Active Learning strategy for linear classifiers. Some inherent properties of adversarial examples opens up promising opportunity to transfer active learning data from one network to another. We also derive an active learning heuristic that scales to both CNN and RNN by selecting the unlabeled data that minimize the variational free energy. Secondly, we focus our work on how to fasten the computation of Wasserstein distances. We propose to approximate Wasserstein distances using a Siamese architecture. From another point of view, we demonstrate the submodular properties of Wasserstein medoids and how to apply it in active learning. Eventually, we provide new visualization tools for explaining the predictions of CNN on a text. First, we hijack an active learning strategy to confront the relevance of the sentences selected with active learning to state-of-the-art phraseology techniques. These works help to understand the hierarchy of the linguistic knowledge acquired during the training of CNNs on NLP tasks. Secondly, we take advantage of deconvolution networks for image analysis to present a new perspective on text analysis to the linguistic community that we call Text Deconvolution Saliency
Haykal, Vanessa. "Modélisation des séries temporelles par apprentissage profond". Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4019.
Pełny tekst źródłaTime series prediction is a problem that has been addressed for many years. In this thesis, we have been interested in methods resulting from deep learning. It is well known that if the relationships between the data are temporal, it is difficult to analyze and predict accurately due to non-linear trends and the existence of noise specifically in the financial and electrical series. From this context, we propose a new hybrid noise reduction architecture that models the recursive error series to improve predictions. The learning process fusessimultaneouslyaconvolutionalneuralnetwork(CNN)andarecurrentlongshort-term memory network (LSTM). This model is distinguished by its ability to capture globally a variety of hybrid properties, where it is able to extract local signal features, to learn long-term and non-linear dependencies, and to have a high noise resistance. The second contribution concerns the limitations of the global approaches because of the dynamic switching regimes in the signal. We present a local unsupervised modification with our previous architecture in order to adjust the results by adapting the Hidden Markov Model (HMM). Finally, we were also interested in multi-resolution techniques to improve the performance of the convolutional layers, notably by using the variational mode decomposition method (VMD)
Firmo, Drumond Thalita. "Apports croisées de l'apprentissage hiérarchique et la modélisation du système visuel : catégorisation d'images sur des petits corpus de données". Thesis, Bordeaux, 2020. https://tel.archives-ouvertes.fr/tel-03129189.
Pełny tekst źródłaDeep convolutional neural networks (DCNN) have recently protagonized a revolution in large-scale object recognition. They have changed the usual computer vision practices of hand-engineered features, with their ability to hierarchically learn representative features from data with a pertinent classifier. Together with hardware advances, they have made it possible to effectively exploit the ever-growing amounts of image data gathered online. However, in specific domains like healthcare and industrial applications, data is much less abundant, and expert labeling costs higher than those of general purpose image datasets. This scarcity scenario leads to this thesis' core question: can these limited-data domains profit from the advantages of DCNNs for image classification? This question has been addressed throughout this work, based on an extensive study of literature, divided in two main parts, followed by proposal of original models and mechanisms.The first part reviews object recognition from an interdisciplinary double-viewpoint. First, it resorts to understanding the function of vision from a biological stance, comparing and contrasting to DCNN models in terms of structure, function and capabilities. Second, a state-of-the-art review is established aiming to identify the main architectural categories and innovations in modern day DCNNs. This interdisciplinary basis fosters the identification of potential mechanisms - inspired both from biological and artificial structures — that could improve image recognition under difficult situations. Recurrent processing is a clear example: while not completely absent from the "deep vision" literature, it has mostly been applied to videos — due to their inherently sequential nature. From biology however it is clear such processing plays a role in refining our perception of a still scene. This theme is further explored through a dedicated literature review focused on recurrent convolutional architectures used in image classification.The second part carries on in the spirit of improving DCNNs, this time focusing more specifically on our central question: deep learning over small datasets. First, the work proposes a more detailed and precise discussion of the small sample problem and its relation to learning hierarchical features with deep models. This discussion is followed up by a structured view of the field, organizing and discussing the different possible paths towards adapting deep models to limited data settings. Rather than a raw listing, this review work aims to make sense out of the myriad of approaches in the field, grouping methods with similar intent or mechanism of action, in order to guide the development of custom solutions for small-data applications. Second, this study is complemented by an experimental analysis, exploring small data learning with the proposition of original models and mechanisms (previously published as a journal paper).In conclusion, it is possible to apply deep learning to small datasets and obtain good results, if done in a thoughtful fashion. On the data path, one shall try gather more information from additional related data sources if available. On the complexity path, architecture and training methods can be calibrated in order to profit the most from any available domain-specific side-information. Proposals concerning both of these paths get discussed in detail throughout this document. Overall, while there are multiple ways of reducing the complexity of deep learning with small data samples, there is no universal solution. Each method has its own drawbacks and practical difficulties and needs to be tailored specifically to the target perceptual task at hand
Khlif, Wafa. "Multi-lingual scene text detection based on convolutional neural networks". Thesis, La Rochelle, 2022. http://www.theses.fr/2022LAROS022.
Pełny tekst źródłaThis dissertation explores text detection approaches via deep learning techniques towards achieving the goal of mining and retrieval of weakly structured contents in scene images. First, this dissertation presents a method for detecting text in scene images based on multi-level connected component (CC) analysis and learning text component features via convolutional neural networks (CNN), followed by a graph-based grouping of overlapping text boxes. The features of the resulting raw text/non-text components of different granularity levels are learned via a CNN. The second contribution is inspired from YOLO: Real-Time Object Detection system. Both methods perform text detection and script identification simultaneously. The system presents a joint text detection and script identification approach based on casting the multi-script text detection task as an object detection problem, where the object is the script of the text. Joint text detection and script identification strategy is realized in a holistic approach using a single convolutional neural network where the input data is the full image and the outputs are the text bounding boxes and their script. Textual feature extraction and script classification are performed jointly via a CNN. The experimental evaluation of these methods are performed on the Multi-Lingual Text MLT dataset. We contributed in building this new dataset. It is constituted of natural scene images with embedded text, such as street signs and advertisement boards, passing vehicles, user photos in microblog. This kind of images represents one of the mostly encountered image types on the internet which are the images with embedded text in social media
Jacques, Céline. "Méthodes d'apprentissage automatique pour la transcription automatique de la batterie". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS150.
Pełny tekst źródłaThis thesis focuses on learning methods for automatic transcription of the battery. They are based on a transcription algorithm using a non-negative decomposition method, NMD. This thesis raises two main issues: the adaptation of methods to the analyzed signal and the use of deep learning. Taking into account the information of the signal analyzed in the model can be achieved by their introduction during the decomposition steps. A first approach is to reformulate the decomposition step in a probabilistic context to facilitate the introduction of a posteriori information with methods such as SI-PLCA and statistical NMD. A second approach is to implement an adaptation strategy directly in the NMD: the application of modelable filters to the patterns to model the recording conditions or the adaptation of the learned patterns directly to the signal by applying strong constraints to preserve their physical meaning. The second approach concerns the selection of the signal segments to be analyzed. It is best to analyze segments where at least one percussive event occurs. An onset detector based on a convolutional neural network (CNN) is adapted to detect only percussive onsets. The results obtained being very interesting, the detector is trained to detect only one instrument allowing the transcription of the three main drum instruments with three CNNs. Finally, the use of a CNN multi-output is studied to transcribe the part of battery with a single network
Heuillet, Alexandre. "Exploring deep neural network differentiable architecture design". Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG069.
Pełny tekst źródłaArtificial Intelligence (AI) has gained significant popularity in recent years, primarily due to its successful applications in various domains, including textual data analysis, computer vision, and audio processing. The resurgence of deep learning techniques has played a central role in this success. The groundbreaking paper by Krizhevsky et al., AlexNet, narrowed the gap between human and machine performance in image classification tasks. Subsequent papers such as Xception and ResNet have further solidified deep learning as a leading technique, opening new horizons for the AI community. The success of deep learning lies in its architecture, which is manually designed with expert knowledge and empirical validation. However, these architectures lack the certainty of an optimal solution. To address this issue, recent papers introduced the concept of Neural Architecture Search (NAS), enabling the learning of deep architectures. However, most initial approaches focused on large architectures with specific targets (e.g., supervised learning) and relied on computationally expensive optimization techniques such as reinforcement learning and evolutionary algorithms. In this thesis, we further investigate this idea by exploring automatic deep architecture design, with a particular emphasis on differentiable NAS (DNAS), which represents the current trend in NAS due to its computational efficiency. While our primary focus is on Convolutional Neural Networks (CNNs), we also explore Vision Transformers (ViTs) with the goal of designing cost-effective architectures suitable for real-time applications
Chen, Dexiong. "Modélisation de données structurées avec des machines profondes à noyaux et des applications en biologie computationnelle". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM070.
Pełny tekst źródłaDeveloping efficient algorithms to learn appropriate representations of structured data, including sequences or graphs, is a major and central challenge in machine learning. To this end, deep learning has become popular in structured data modeling. Deep neural networks have drawn particular attention in various scientific fields such as computer vision, natural language understanding or biology. For instance, they provide computational tools for biologists to possibly understand and uncover biological properties or relationships among macromolecules within living organisms. However, most of the success of deep learning methods in these fields essentially relies on the guidance of empirical insights as well as huge amounts of annotated data. Exploiting more data-efficient models is necessary as labeled data is often scarce.Another line of research is kernel methods, which provide a systematic and principled approach for learning non-linear models from data of arbitrary structure. In addition to their simplicity, they exhibit a natural way to control regularization and thus to avoid overfitting.However, the data representations provided by traditional kernel methods are only defined by simply designed hand-crafted features, which makes them perform worse than neural networks when enough labeled data are available. More complex kernels inspired by prior knowledge used in neural networks have thus been developed to build richer representations and thus bridge this gap. Yet, they are less scalable. By contrast, neural networks are able to learn a compact representation for a specific learning task, which allows them to retain the expressivity of the representation while scaling to large sample size.Incorporating complementary views of kernel methods and deep neural networks to build new frameworks is therefore useful to benefit from both worlds.In this thesis, we build a general kernel-based framework for modeling structured data by leveraging prior knowledge from classical kernel methods and deep networks. Our framework provides efficient algorithmic tools for learning representations without annotations as well as for learning more compact representations in a task-driven way. Our framework can be used to efficiently model sequences and graphs with simple interpretation of predictions. It also offers new insights about designing more expressive kernels and neural networks for sequences and graphs
Caye, Daudt Rodrigo. "Convolutional neural networks for change analysis in earth observation images with noisy labels and domain shifts". Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT033.
Pełny tekst źródłaThe analysis of satellite and aerial Earth observation images allows us to obtain precise information over large areas. A multitemporal analysis of such images is necessary to understand the evolution of such areas. In this thesis, convolutional neural networks are used to detect and understand changes using remote sensing images from various sources in supervised and weakly supervised settings. Siamese architectures are used to compare coregistered image pairs and to identify changed pixels. The proposed method is then extended into a multitask network architecture that is used to detect changes and perform land cover mapping simultaneously, which permits a semantic understanding of the detected changes. Then, classification filtering and a novel guided anisotropic diffusion algorithm are used to reduce the effect of biased label noise, which is a concern for automatically generated large-scale datasets. Weakly supervised learning is also achieved to perform pixel-level change detection using only image-level supervision through the usage of class activation maps and a novel spatial attention layer. Finally, a domain adaptation method based on adversarial training is proposed, which succeeds in projecting images from different domains into a common latent space where a given task can be performed. This method is tested not only for domain adaptation for change detection, but also for image classification and semantic segmentation, which proves its versatility
Barhoumi, Amira. "Une approche neuronale pour l’analyse d’opinions en arabe". Thesis, Le Mans, 2020. http://www.theses.fr/2020LEMA1022.
Pełny tekst źródłaMy thesis is part of Arabic sentiment analysis. Its aim is to determine the global polarity of a given textual statement written in MSA or dialectal arabic. This research area has been subject of numerous studies dealing with Indo-European languages, in particular English. One of difficulties confronting this thesis is the processing of Arabic. In fact, Arabic is a morphologically rich language which implies a greater sparsity : we want to overcome this problem by producing, in a completely automatic way, new arabic specific embeddings. Our study focuses on the use of a neural approach to improve polarity detection, using embeddings. These embeddings have revealed fundamental in various natural languages processing tasks (NLP). Our contribution in this thesis concerns several axis. First, we begin with a preliminary study of the various existing pre-trained word embeddings resources in arabic. These embeddings consider words as space separated units in order to capture semantic and syntactic similarities in the embedding space. Second, we focus on the specifity of Arabic language. We propose arabic specific embeddings that take into account agglutination and morphological richness of Arabic. These specific embeddings have been used, alone and in combined way, as input to neural networks providing an improvement in terms of classification performance. Finally, we evaluate embeddings with intrinsic and extrinsic methods specific to sentiment analysis task. For intrinsic embeddings evaluation, we propose a new protocol introducing the notion of sentiment stability in the embeddings space. We propose also a qualitaive extrinsic analysis of our embeddings by using visualisation methods
Minvielle, Ludovic. "Classification d'événements à partir de capteurs sols - Application au suivi de personnes fragiles". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASN023.
Pełny tekst źródłaThis thesis addresses the subject of event detection in temporal signals for elderly monitoring by the use of a floor pressure sensor. We first show that most proposed systems do not meet main practical issues and that floor systems constitute promising candidates for monitoring tasks. Since complex signals require sophisticated models, we propose a random-forest-based approach that detects falls with state-of-the-art accuracy and meets hardware constraints with a feature selection procedure. The model performance is improved with data augmentation and time aggregation of the random forest outputs. Then, we address the issue of confronting our model to the real world with transfer learning methods that act on the core model of random forests, i.e. decision trees. These methods are adaptations of seminal work and are designed to tackle the class imbalance problem as falls are rare events. Methods are tested on several data sets, showing interesting potential continuation, and a Python implementation is made available. Finally, motivated by the issue of elderly monitoring while dealing with one-dimensional signals for a large areas, we propose to distinguish elderly persons from younger individuals with a model based on convolutional neural network and convolutional dictionary learning. Since signals are mainly made of walks, the first part of the model is trained to recognize steps, and the last part of the model is trained with all previous layers frozen. This novel approach to gait classification allows to isolate elderly-generated signals with very high accuracy
Pham, Huy-Hieu. "Architectures d'apprentissage profond pour la reconnaissance d'actions humaines dans des séquences vidéo RGB-D monoculaires : application à la surveillance dans les transports publics". Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30145.
Pełny tekst źródłaThis thesis is dealing with automatic recognition of human actions from monocular RGB-D video sequences. Our main goal is to recognize which human actions occur in unknown videos. This problem is a challenging task due to a number of obstacles caused by the variability of the acquisition conditions, including the lighting, the position, the orientation and the field of view of the camera, as well as the variability of actions which can be performed differently, notably in terms of speed. To tackle these problems, we first review and evaluate the most prominent state-of-the-art techniques to identify the current state of human action recognition in videos. We then propose a new approach for skeleton-based action recognition using Deep Neural Networks (DNNs). Two key questions have been addressed. First, how to efficiently represent the spatio-temporal patterns of skeletal data for fully exploiting the capacity in learning high-level representations of Deep Convolutional Neural Networks (D-CNNs). Second, how to design a powerful D-CNN architecture that is able to learn discriminative features from the proposed representation for classification task. As a result, we introduce two new 3D motion representations called SPMF (Skeleton Posture-Motion Feature) and Enhanced-SPMF that encode skeleton poses and their motions into color images. For learning and classification tasks, we design and train different D-CNN architectures based on the Residual Network (ResNet), Inception-ResNet-v2, Densely Connected Convolutional Network (DenseNet) and Efficient Neural Architecture Search (ENAS) to extract robust features from color-coded images and classify them. Experimental results on various public and challenging human action recognition datasets (MSR Action3D, Kinect Activity Recognition Dataset, SBU Kinect Interaction, and NTU-RGB+D) show that the proposed approach outperforms current state-of-the-art. We also conducted research on the problem of 3D human pose estimation from monocular RGB video sequences and exploited the estimated 3D poses for recognition task. Specifically, a deep learning-based model called OpenPose is deployed to detect 2D human poses. A DNN is then proposed and trained for learning a 2D-to-3D mapping in order to map the detected 2D keypoints into 3D poses. Our experiments on the Human3.6M dataset verified the effectiveness of the proposed method. These obtained results allow opening a new research direction for human action recognition from 3D skeletal data, when the depth cameras are failing. In addition, we collect and introduce in this thesis, CEMEST database, a new RGB-D dataset depicting passengers' behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic "normal" and "abnormal" events. We achieve promising results on CEMEST with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing public transportation management services
Botella, Christophe. "Méthodes statistiques pour la modélisation de la distribution spatiale des espèces végétales à partir de grandes masses d’observations incertaines issues de programmes de sciences citoyennes". Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS135.
Pełny tekst źródłaHuman botanical expertise is becoming too scarce to provide the field data needed to monitor plant biodiversity. The use of geolocated botanical observations from major citizen science projects, such as Pl@ntNet, opens interesting paths for a temporal monitoring of plant species distribution. Pl@ntNet provides automatically identified flora observations, a confidence score, and can thus be used for species distribution models (SDM). They enable to monitor the distribution of invasive or rare plants, as well as the effects of global changes on species, if we can (i) take into account identification uncertainty, (ii) correct for spatial sampling bias, and (iii) predict species abundances accurately at a fine spatial grain.First, we ask ourselves if we can estimate realistic distributions of invasive plant species on automatically identified occurrences of Pl@ntNet, and what is the effect of filtering with a confidence score threshold. Filtering improves predictions when the confidence level increases until the sample size is limiting. The predicted distributions are generally consistent with expert data, but also indicate urban areas of abundance due to ornamental cultivation and new areas of presence.Next, we studied the correction of spatial sampling bias in SDMs based on presences only. We first mathematically analyzed the bias when the occurrences of a target group of species (Target Group Background, TGB) are used as background points, and compared this bias with that of a spatially uniform selection of base points. We then show that the bias of TGB is due to the variation in the cumulative abundance of target group species in the environmental space, which is difficult to control. We can alternatively jointly model the global observation effort with the abundances of several species. We model the observation effort as a step spatial function defined on a mesh of geographical cells. The addition of massively observed species to the model then reduces the variance in the estimation of the observation effort and thus on the models of the other species.Finally, we propose a new type of SDM based on convolutional neural networks using environmental images as input variables. These models can capture complex spatial patterns of several environmental variables. We propose to share the architecture of the neural network between several species in order to extract common high-level predictors and regularize the model. Our results show that this model outperforms existing SDMs, that performance is improved by simultaneously predicting many species, and this is confirmed by two cooperative SDM evaluation campaigns conducted on independent data sets. This supports the hypothesis that there are common environmental models describing the distribution of many species.Our results support the use of Pl@ntnet occurrences for monitoring plant invasions. Joint modelling of multiple species and observation effort is a promising strategy that transforms the bias problem into a more controllable estimation variance problem. However, the effect of certain factors, such as the level of anthropization, on species abundance is difficult to separate from the effect on observation effort with occurrence data. This can be solved by additional protocolled data collection. The deep learning methods developed show good performance and could be used to deploy spatial species prediction services
Boukhtache, Seyfeddine. "Système de traitement d’images temps réel dédié à la mesure de champs denses de déplacements et de déformations". Thesis, Université Clermont Auvergne (2017-2020), 2020. http://www.theses.fr/2020CLFAC054.
Pełny tekst źródłaThis PhD thesis has been carried out in a multidisciplinary context. It deals with the challenge of real-time and metrological performance in digital image processing. This is particularly interesting in photomechanics. This is a recent field of activity, which consists in developing and using systems for measuring whole fields of small displacements and small deformations of solids subjected to thermomechanical loading. The technique targeted in this PhD thesis is Digital Images Correlation (DIC), which is the most popular measuring technique in this community. However, it has some limitations, the main one being the computing resources and the metrological performance, which should be improved to reach that of classic pointwise measuring sensors such as strain gauges.In order to address this challenge, this work relies on two main studies. The first one consists in optimizing the interpolation process because this is the most expensive treatment in DIC. Acceleration is proposed by using a parallel hardware implementation on FPGA, and by taking into consideration the consumption of hardware resources as well as accuracy. The main conclusion of this study is that a single FPGA (current technology) is not sufficient to implement the entire DIC algorithm. Thus, a second study has been proposed. It is based on the use of convolutional neural networks (CNNs) in an attempt to achieve both better metrological performance than CIN and real-time processing. This second study shows the relevance of using CNNs for measuring displacement and deformation fields. It opens new perspectives in terms of metrological performance and speed of full-field measuring systems
Singh, Praveer. "Processing high-resolution images through deep learning techniques". Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1172.
Pełny tekst źródłaIn this thesis, we discuss four different application scenarios that can be broadly grouped under the larger umbrella of Analyzing and Processing high-resolution images using deep learning techniques. The first three chapters encompass processing remote-sensing (RS) images which are captured either from airplanes or satellites from hundreds of kilometers away from the Earth. We start by addressing a challenging problem related to improving the classification of complex aerial scenes through a deep weakly supervised learning paradigm. We showcase as to how by only using the image level labels we can effectively localize the most distinctive regions in complex scenes and thus remove ambiguities leading to enhanced classification performance in highly complex aerial scenes. In the second chapter, we deal with refining segmentation labels of Building footprints in aerial images. This we effectively perform by first detecting errors in the initial segmentation masks and correcting only those segmentation pixels where we find a high probability of errors. The next two chapters of the thesis are related to the application of Generative Adversarial Networks. In the first one, we build an effective Cloud-GAN model to remove thin films of clouds in Sentinel-2 imagery by adopting a cyclic consistency loss. This utilizes an adversarial lossfunction to map cloudy-images to non-cloudy images in a fully unsupervised fashion, where the cyclic-loss helps in constraining the network to output a cloud-free image corresponding to the input cloudy image and not any random image in the target domain. Finally, the last chapter addresses a different set of high-resolution images, not coming from the RS domain but instead from High Dynamic Range Imaging (HDRI) application. These are 32-bit imageswhich capture the full extent of luminance present in the scene. Our goal is to quantize them to 8-bit Low Dynamic Range (LDR) images so that they can be projected effectively on our normal display screens while keeping the overall contrast and perception quality similar to that found in HDR images. We adopt a Multi-scale GAN model that focuses on both coarser as well as finer-level information necessary for high-resolution images. The final tone-mapped outputs have a high subjective quality without any perceived artifacts
Diallo, Boubacar. "Mesure de l'intégrité d'une image : des modèles physiques aux modèles d'apprentissage profond". Thesis, Poitiers, 2020. http://www.theses.fr/2020POIT2293.
Pełny tekst źródłaDigital images have become a powerful and effective visual communication tool for delivering messages, diffusing ideas, and proving facts. The smartphone emergence with a wide variety of brands and models facilitates the creation of new visual content and its dissemination in social networks and image sharing platforms. Related to this phenomenon and helped by the availability and ease of use of image manipulation softwares, many issues have arisen ranging from the distribution of illegal content to copyright infringement. The reliability of digital images is questioned for common or expert users such as court or police investigators. A well known phenomenon and widespread examples are the "fake news" which oftenly include malicious use of digital images.Many researchers in the field of image forensic have taken up the scientific challenges associated with image manipulation. Many methods with interesting performances have been developed based on automatic image processing and more recently the adoption of deep learning. Despite the variety of techniques offered, performance are bound to specific conditions and remains vulnerable to relatively simple malicious attacks. Indeed, the images collected on the Internet impose many constraints on algorithms questioning many existing integrity verification techniques. There are two main peculiarities to be taken into account for the detection of a falsification: one is the lack of information on pristine image acquisition, the other is the high probability of automatic transformations linked to the image-sharing platforms such as lossy compression or resizing.In this thesis, we focus on several of these image forensic challenges including camera model identification and image tampering detection. After reviewing the state of the art in the field, we propose a first data-driven method for identifying camera models. We use deep learning techniques based on convolutional neural networks (CNNs) and develop a learning strategy considering the quality of the input data versus the applied transformation. A family of CNN networks has been designed to learn the characteristics of the camera model directly from a collection of images undergoing the same transformations as those commonly used on the Internet. Our interest focused on lossy compression for our experiments, because it is the most used type of post-processing on the Internet. The proposed approach, therefore, provides a robust solution to compression for camera model identification. The performance achieved by our camera model detection approach is also used and adapted for image tampering detection and localization. The performances obtained underline the robustness of our proposals for camera model identification and image forgery detection
Seznec, Mickaël. "From the algorithm to the targets, optimization flow for high performance computing on embedded GPUs". Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG074.
Pełny tekst źródłaCurrent digital processing algorithms require more computing power to achieve more accurate results and process larger data. In the meantime, hardware architectures are becoming more specialized, with highly efficient accelerators designed for specific tasks. In this context, the path of deployment from the algorithm to the implementation becomes increasingly complex. It is, therefore, crucial to determine how algorithms can be modified to take advantage of new hardware capabilities. Our study focused on graphics processing units (GPUs), a massively parallel processor. Our algorithmic work was done in the context of radio-astronomy or optical flow estimation and consisted of finding the best adaptation of the software to the hardware. At the level of a mathematical operator, we modified the traditional image convolution algorithm to use the matrix units and showed that its performance doubles for large convolution kernels. At a broader method level, we evaluated linear solvers for the combined local-global optical flow to find the most suitable one on GPU. With additional optimizations, such as iteration fusion or memory buffer re-utilization, the method is twice as fast as the initial implementation, running at 60 frames per second on an embedded platform (30 W). Finally, we also pointed out the interest of this hardware-aware algorithm design method in the context of deep neural networks. For that, we showed the hybridization of a convolutional neural network for optical flow estimation with a pre-trained image classification network, MobileNet, that was initially designed for efficient image classification on low-power platforms
Chelali, Mohamed Tayeb. "Prise en compte de l'information spatiale et temporelle pour l'analyse de séquences d'images". Electronic Thesis or Diss., Université Paris Cité, 2021. http://www.theses.fr/2021UNIP5205.
Pełny tekst źródłaThe evolution of digital technology has allowed the multiplicity of image sensors, leading every day to the production of masses of visual data. In some contexts, these data can take the form of 2D images time series leading to 3D data that we note 2D+t. This type of data is frequent in several domains such as remote surveillance or remote sensing. Because of their dimensions, the analysis and interpretation of this mass of data is a major challenge in computer vision. This thesis is in the context of the exploitation of these data in order to classify them, by exploiting the maximum the wealth of spatial and temporal information carried by these data. The research works presented in this manuscript includes two methods that proceed differently but whose common point is based on a change of the representation of the initial data. The first method is based on the extraction of hand-crafted features while the second one is based on the use of machine learning methods, in particular deep convolutional neural networks. Through these two methods, we propose to study the temporal stability of image times series with hand-crafted features and to study their spatial and temporal variability with deep convolutional neural networks. The two methods are then evaluated on two different applications. One is related to satellite image time series and the other is related to surveillance camera videos. The experimental results illustrate the interest of the proposed methods
Martineau, Maxime. "Deep learning onto graph space : application to image-based insect recognition". Thesis, Tours, 2019. http://www.theses.fr/2019TOUR4024.
Pełny tekst źródłaThe goal of this thesis is to investigate insect recognition as an image-based pattern recognition problem. Although this problem has been extensively studied along the previous three decades, an element is to the best of our knowledge still to be experimented as of 2017: deep approaches. Therefore, a contribution is about determining to what extent deep convolutional neural networks (CNNs) can be applied to image-based insect recognition. Graph-based representations and methods have also been tested. Two attempts are presented: The former consists in designing a graph-perceptron classifier and the latter graph-based work in this thesis is on defining convolution on graphs to build graph convolutional neural networks. The last chapter of the thesis deals with applying most of the aforementioned methods to insect image recognition problems. Two datasets are proposed. The first one consists of lab-based images with constant background. The second one is generated by taking a ImageNet subset. This set is composed of field-based images. CNNs with transfer learning are the most successful method applied on these datasets
Njima, Wafa. "Méthodes de localisation de capteurs dans le contexte de l'Internet des Objets". Electronic Thesis or Diss., Paris, CNAM, 2019. http://www.theses.fr/2019CNAM1264.
Pełny tekst źródłaWith the growing emergence of the Internet of Things and the importance of position information in this context, localization is attracting more and more attention in the researchers' community. The outdoor location is provided by GPS which is not suitable for indoors environments. Several indoor localization techniques exist, but there is not yet a standard.Existing methods are mainly based on trilateration or fingerprinting. Trilateration is a geometric method that exploits thedistances between an object and reference points to locate it. This method only works when we have at least 3 access points detected and is strongly affected by multi paths. In order to overcome these disadvantages, the fingerprinting methodcompares the fingerprint associated to the object to be located to a fingerprints' database constructed on offline. The estimated position is a combination of the selected training positions. This method is of great interest. However, it requiressignificant computing and storage capabilities. The aim of this thesis is to improve the existing localization techniqueswhile maintaining a satisfying localization accuracy with low computational complexity. In order to overcome the disadvantages of these two classes of localization techniques, we propose alternative approaches. For trilateration, it hasbeen combined with an optimization process that aims at completing the inter-node distance matrix from partially knowndata. Advanced optimization algorithms have been used in developing the mathematical equation corresponding to eachone. Using this method, we came up with a localization solution for a distributed IoT architecture. As for fingerprinting, we have exploited it to develop localization systems for a centralized IoT architecture. A comparative study between different metrics of similarity evaluation is conducted. This study was followed by the development of a linear model generating a mathematical relation that links the powers of the signal received by an object to its coordinates. This helps to reduce the online complexity of and adapts our system to real time. This is also ensured by the development of a CNN model which deal with the localization problem as radio images classification problem. The performances of all proposed approaches are evaluated and discussed. These results show the improvement of the performances of basic techniques in terms of localization accuracy and complexity
Aderghal, Karim. "Classification of multimodal MRI images using Deep Learning : Application to the diagnosis of Alzheimer’s disease". Thesis, Bordeaux, 2021. http://www.theses.fr/2021BORD0045.
Pełny tekst źródłaIn this thesis, we are interested in the automatic classification of brain MRI images to diagnose Alzheimer’s disease (AD). We aim to build intelligent models that provide decisions about a patient’s disease state to the clinician based on visual features extracted from MRI images. The goal is to classify patients (subjects) into three main categories: healthy subjects (NC), subjects with mild cognitive impairment (MCI), and subjects with Alzheimer’s disease (AD). We use deep learning methods, specifically convolutional neural networks (CNN) based on visual biomarkers from multimodal MRI images (structural MRI and DTI), to detect structural changes in the brain hippocampal region of the limbic cortex. We propose an approach called "2-D+e" applied to our ROI (Region-of-Interest): the hippocampus. This approach allows extracting 2D slices from three planes (sagittal, coronal, and axial) of our region by preserving the spatial dependencies between adjacent slices according to each dimension. We present a complete study of different artificial data augmentation methods and different data balancing approaches to analyze the impact of these conditions on our models during the training phase. We propose our methods for combining information from different sources (projections/modalities), including two fusion strategies (early fusion and late fusion). Finally, we present transfer learning schemes by introducing three frameworks: (i) a cross-modal scheme (using sMRI and DTI), (ii) a cross-domain scheme that involves external data (MNIST), and (iii) a hybrid scheme with these two methods (i) and (ii). Our proposed methods are suitable for using shallow CNNs for multimodal MRI images. They give encouraging results even if the model is trained on small datasets, which is often the case in medical image analysis
Ben, Naceur Mostefa. "Deep Neural Networks for the segmentation and classification in Medical Imaging". Thesis, Paris Est, 2020. http://www.theses.fr/2020PESC2014.
Pełny tekst źródłaNowadays, getting an efficient segmentation of Glioblastoma Multiforme (GBM) braintumors in multi-sequence MRI images as soon as possible, gives an early clinical diagnosis, treatment, and follow-up. The MRI technique is designed specifically to provide radiologists with powerful visualization tools to analyze medical images, but the challenge lies more in the information interpretation of radiological images with clinical and pathologies data and their causes in the GBM tumors. This is why quantitative research in neuroimaging often requires anatomical segmentation of the human brain from MRI images for the detection and segmentation of brain tumors. The objective of the thesis is to propose automatic Deep Learning methods for brain tumors segmentation using MRI images.First, we are mainly interested in the segmentation of patients’ MRI images with GBMbrain tumors using Deep Learning methods, in particular, Deep Convolutional NeuralNetworks (DCNN). We propose two end-to-end DCNN-based approaches for fully automaticbrain tumor segmentation. The first approach is based on the pixel-wise techniquewhile the second one is based on the patch-wise technique. Then, we prove that thelatter is more efficient in terms of segmentation performance and computational benefits. We also propose a new guided optimization algorithm to optimize the suitable hyperparameters for the first approach. Second, to enhance the segmentation performance of the proposed approaches, we propose new segmentation pipelines of patients’ MRI images, where these pipelines are based on deep learned features and two stages of training. We also address problems related to unbalanced data in addition to false positives and false negatives to increase the model segmentation sensitivity towards the tumor regions and specificity towards the healthy regions. Finally, the segmentation performance and the inference time of the proposed approaches and pipelines are reported along with state-of-the-art methods on a public dataset annotated by radiologists and approved by neuroradiologists
Wei, Wen. "Apprentissage automatique des altérations cérébrales causées par la sclérose en plaques en neuro-imagerie multimodale". Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4021.
Pełny tekst źródłaMultiple Sclerosis (MS) is the most common progressive neurological disease of young adults worldwide and thus represents a major public health issue with about 90,000 patients in France and more than 500,000 people affected with MS in Europe. In order to optimize treatments, it is essential to be able to measure and track brain alterations in MS patients. In fact, MS is a multi-faceted disease which involves different types of alterations, such as myelin damage and repair. Under this observation, multimodal neuroimaging are needed to fully characterize the disease. Magnetic resonance imaging (MRI) has emerged as a fundamental imaging biomarker for multiple sclerosis because of its high sensitivity to reveal macroscopic tissue abnormalities in patients with MS. Conventional MR scanning provides a direct way to detect MS lesions and their changes, and plays a dominant role in the diagnostic criteria of MS. Moreover, positron emission tomography (PET) imaging, an alternative imaging modality, can provide functional information and detect target tissue changes at the cellular and molecular level by using various radiotracers. For example, by using the radiotracer [11C]PIB, PET allows a direct pathological measure of myelin alteration. However, in clinical settings, not all the modalities are available because of various reasons. In this thesis, we therefore focus on learning and predicting missing-modality-derived brain alterations in MS from multimodal neuroimaging data
Mandache, Diana. "Cancer Detection in Full Field Optical Coherence Tomography Images". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS370.
Pełny tekst źródłaCancer is a leading cause of death worldwide making it a major public health concern. Different biomedical imaging techniques accompany both research and clinical efforts towards improving patient outcome. In this work we explore the use of a new family of imaging techniques, static and dynamic full field optical coherence tomography, which allow for a faster tissue analysis than gold standard histology. In order to facilitate the interpretation of this new imaging, we develop several exploratory methods based on data curated from clinical studies. We propose an analytical method for a better characterization of the raw dynamic interferometric signal, as well as multiple diagnostic support methods for the images. Accordingly, convolutional neural networks were exploited under various paradigms: (i) fully supervised learning, whose prediction capability surpasses the pathologist performance; (ii) multiple instance learning, which accommodates the lack of expert annotations; (iii) contrastive learning, which exploits the multi-modality of the data. Moreover, we highly focus on method validation and decoding the trained "black box" models to ensure their good generalization and to ultimately find specific biomarkers
Eickenberg, Michael. "Évaluation de modèles computationnels de la vision humaine en imagerie par résonance magnétique fonctionnelle". Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112206/document.
Pełny tekst źródłaBlood-oxygen-level dependent (BOLD) functional magnetic resonance imaging (fMRI) makes it possible to measure brain activity through blood flow to areas with metabolically active neurons. In this thesis we use these measurements to evaluate the capacity of biologically inspired models of vision coming from computer vision to represent image content in a similar way as the human brain. The main vision models used are convolutional networks.Deep neural networks have made unprecedented progress in many fields in recent years. Even strongholds of biological systems such as scene analysis and object detection have been addressed with enormous success. A body of prior work has been able to establish firm links between the first and last layers of deep convolutional nets and brain regions: The first layer and V1 essentially perform edge detection and the last layer as well as inferotemporal cortex permit a linear read-out of object category. In this work we have generalized this correspondence to all intermediate layers of a convolutional net. We found that each layer of a convnet maps to a stage of processing along the ventral stream, following the hierarchy of biological processing: Along the ventral stream we observe a stage-by-stage increase in complexity. Between edge detection and object detection, for the first time we are given a toolbox to study the intermediate processing steps.A preliminary result to this was obtained by studying the response of the visual areas to presentation of visual textures and analysing it using convolutional scattering networks.The other global aspect of this thesis is “decoding” models: In the preceding part, we predicted brain activity from the stimulus presented (this is called “encoding”). Predicting a stimulus from brain activity is the inverse inference mechanism and can be used as an omnibus test for presence of this information in brain signal. Most often generalized linear models such as linear or logistic regression or SVMs are used for this task, giving access to a coefficient vector the same size as a brain sample, which can thus be visualized as a brain map. However, interpretation of these maps is difficult, because the underlying linear system is either ill-defined and ill-conditioned or non-adequately regularized, resulting in non-informative maps. Supposing a sparse and spatially contiguous organization of coefficient maps, we build on the convex penalty consisting of the sum of total variation (TV) seminorm and L1 norm (“TV+L1”) to develop a penalty grouping an activation term with a spatial derivative. This penalty sets most coefficients to zero but permits free smooth variations in active zones, as opposed to TV+L1 which creates flat active zones. This method improves interpretability of brain maps obtained through cross-validation to determine the best hyperparameter.In the context of encoding and decoding models, we also work on improving data preprocessing in order to obtain the best performance. We study the impulse response of the BOLD signal: the hemodynamic response function. To generate activation maps, instead of using a classical linear model with fixed canonical response function, we use a bilinear model with spatially variable hemodynamic response (but fixed across events). We propose an efficient optimization algorithm and show a gain in predictive capacity for encoding and decoding models on different datasets
Cárdenas, Chapellín Julio José. "Inversion of geophysical data by deep learning". Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS185.
Pełny tekst źródłaThis thesis presents the characterization ofmagnetic anomalies using convolutional neural networks, and the application of visualization tools to understand and validate their predictions. The developed approach allows the localization of magnetic dipoles, including counting the number of dipoles, their geographical position, and the prediction of their parameters (magnetic moment, depth, and declination). Our results suggest that the combination of two deep learning models, "YOLO" and "DenseNet", performs best in achieving our classification and regression goals. Additionally, we applied visualization tools to understand our model’s predictions and its working principle. We found that the Grad-CAM tool improved prediction performance by identifying several layers that had no influence on the prediction and the t-SNE tool confirmed the good ability of our model to differentiate among different parameter combinations. Then, we tested our model with real data to establish its limitations and application domain. Results demonstrate that our model detects dipolar anomalies in a real magnetic map even after learning from a synthetic database with a lower complexity, which indicates a significant generalization capability. We also noticed that it is not able to identify dipole anomalies of shapes and sizes different from those considered for the creation of the synthetic database. Our current work consists in creating new databases by combining synthetic and real data to compare their potential influence in improving predictions. Finally, the perspectives of this work consist in validating the operational relevance and adaptability of our model under realistic conditions and in testing other applications with alternative geophysical methods
Esta tesis presenta la caracterización de anomalías magnéticas mediante redes neuronales convolucionales, y la aplicación de herramientas de visualización para entender y validar sus predicciones. El enfoque desarrollado permite la localización de dipolos magnéticos, incluyendo el recuento delnúmero de dipolos, su posición geográfica y la predicción de sus parámetros (momento magnético, profundidad y declinación). Nuestros resultados sugieren que la combinación de dos modelos de aprendizaje profundo, "YOLO" y "DenseNet", es la que mejor se ajusta a nuestros objetivos de clasificación y regresión. Adicionalmente, aplicamos herramientas de visualización para entender las predicciones de nuestromodelo y su principio de funcionamiento. Descubrimos que la herramienta Grad-CAM mejoraba el rendimiento de la predicción al identificar varias capas que no influían enla predicción y la herramienta t-SNE confirmaba la buena capacidad de nuestro modelo para diferenciar entre distintas combinaciones de parámetros. Seguidamente, probamos nuestro modelo con datos reales para establecer sus limitaciones y su rango de aplicación. Los resultados demuestran quenuestro modelo detecta anomalías dipolares en unmapa magnético real incluso después de aprender de una base de datos sintética con una complejidad menor, lo que indica una capacidad de generalización significativa. También observamos que no es capaz de identificar anomalías dipolares de formas y tamaños diferentes a los considerados para la creación de la base de datos sintética. Nuestro trabajo actual consiste en crear nuevas bases de datos combinando datos sintéticos y reales para comparar su posible influencia en la mejora de las predicciones. Por último, las perspectivas de este trabajo consisten en validar la pertinencia operativa y la adaptabilidad de nuestro modelo en condiciones realistas y en probar otras aplicaciones con métodos geofísicos alternativos
Khalil, Toni. "Processus d’évaluation de la qualité de l’imagerie médicale et outils d’aide à la décision basée sur la connaissance". Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0351.
Pełny tekst źródłaThe great progress that medical imaging has offered in the medical sector on the diagnostic level (Conventional Radiology, Computed Tomography, Nuclear Magnetic Resonance and Interventional Radiology) has pushed medicine to go through this area as the first choice. With an ever-increasing number of diagnostic images produced each year, as well as the recommendations of international organizations requiring low-dose irradiation resulting in enormous noise that can distort the diagnosis, Artificial Intelligence (AI) de-noising methods offer an opportunity to meet growing demand. In this thesis, we quantify the effect of AI-based de-noising on X-ray textural parameters with respect to a convolutional neural network.The study was based on the principle of characterizing the radiographic noise resulting from an X-ray of a water phantom and, generating this noise in a standard dose radiography aimed at producing artificially noisy images, and this in order to be able to feed a neural network by thousands of images to ensure its learning phase. After the learning phase, the testing phase and the inference, human chest X-rays were extracted from the archive to validate the de-noising on human X-rays in RGB and in “greyscale”. The study was done through a water phantom for ethical reasons in order to avoid irradiating people, avoiding voluntary and involuntary patient movements, and ensuring a study based on a homogeneous material (water) which constitutes the majority of the human body. This study is carried out on the one hand on 17 X-rays of a water phantom with different exposure doses to study the noise distribution on different gray scale values and, on the other hand on 25 X-rays divided into 5 groups of 5 images each taken with the same exposure dose without and with adjacent obstacles to study the gain effect of the flat panel detector chosen as the pre-processing means. The noise distribution was detected on two gray levels, i.e. 160 and 180 respectively, and showed a higher level of noise on the 160 level where the absorption of the X-ray beam is greater and, consequently, the quantum effect is most important. Noise scatter diagrams on these two levels have been shown. On the other hand, the presence of obstacles in the same image showed an absorption directly proportional to the number of obstacles next to the water phantom, which triggered a gain factor of the detector which, in its role produces nonlinear trace noise. Texture characteristics of AI-de-noised images compared to artificially noisy radiographs were compared with a peak signal-to-noise ratio (PSNR) coefficient. Features with increased PSNR values on RGB images and on greyscale images were considered to be consistent. A test to compare absolute values between AI-de-noised and artificially noisy images was performed. The results of the concordant features report were (38.05/30.06) -100 (26.58%) improvement in RGB versus (35.93/22.21) - 100 (61.77%) improvement in ‘greyscale'. In conclusion, applying AI-based de-noising on X-ray images retains most of the texture information of the image. AI-based de-noising in low-dose radiography is a very promising approach because it adapts de-noising, preserving information where it should
Dahmane, Khouloud. "Analyse d'images par méthode de Deep Learning appliquée au contexte routier en conditions météorologiques dégradées". Thesis, Université Clermont Auvergne (2017-2020), 2020. http://www.theses.fr/2020CLFAC020.
Pełny tekst źródłaNowadays, vision systems are becoming more and more used in the road context. They ensure safety and facilitate mobility. These vision systems are generally affected by the degradation of weather conditions, like heavy fog or strong rain, phenomena limiting the visibility and thus reducing the quality of the images. In order to optimize the performance of the vision systems, it is necessary to have a reliable detection system for these adverse weather conditions.There are meteorological sensors dedicated to physical measurement, but they are expensive. Since cameras are already installed on the road, they can simultaneously perform two functions: image acquisition for surveillance applications and physical measurement of weather conditions instead of dedicated sensors. Following the great success of convolutional neural networks (CNN) in classification and image recognition, we used a deep learning method to study the problem of meteorological classification. The objective of our study is to first seek to develop a classifier of time, which discriminates between "normal" conditions, fog and rain. In a second step, once the class is known, we seek to develop a model for measuring meteorological visibility.The use of CNN requires the use of train and test databases. For this, two databases were used, "Cerema-AWP database" (https://ceremadlcfmds.wixsite.com/cerema-databases), and the "Cerema-AWH database", which has been acquired since 2017 on the Fageole site on the highway A75. Each image of the two bases is labeled automatically thanks to meteorological data collected on the site to characterize various levels of precipitation for rain and fog.The Cerema-AWH base, which was set up as part of our work, contains 5 sub-bases: normal day conditions, heavy fog, light fog, heavy rain and light rain. Rainfall intensities range from 0 mm/h to 70mm/h and fog weather visibilities range from 50m to 1800m. Among the known neural networks that have demonstrated their performance in the field of recognition and classification, we can cite LeNet, ResNet-152, Inception-v4 and DenseNet-121. We have applied these networks in our adverse weather classification system. We start by the study of the use of convolutional neural networks. The nature of the input data and the optimal hyper-parameters that must be used to achieve the best results. An analysis of the different components of a neural network is done by constructing an instrumental neural network architecture. The conclusions drawn from this analysis show that we must use deep neural networks. This type of network is able to classify five meteorological classes of Cerema-AWH base with a classification score of 83% and three meteorological classes with a score of 99%Then, an analysis of the input and output data was made to study the impact of scenes change, the input's data and the meteorological classes number on the classification result.Finally, a database transfer method is developed. We study the portability from one site to another of our adverse weather conditions classification system. A classification score of 63% by making a transfer between a public database and Cerema-AWH database is obtained.After the classification, the second step of our study is to measure the meteorological visibility of the fog. For this, we use a neural network that generates continuous values. Two fog variants were tested: light and heavy fog combined and heavy fog (road fog) only. The evaluation of the result is done using a correlation coefficient R² between the real values and the predicted values. We compare this coefficient with the correlation coefficient between the two sensors used to measure the weather visibility on site. Among the results obtained and more specifically for road fog, the correlation coefficient reaches a value of 0.74 which is close to the physical sensors value (0.76)
Papadopoulos, Georgios. "Towards a 3D building reconstruction using spatial multisource data and computational intelligence techniques". Thesis, Limoges, 2019. http://www.theses.fr/2019LIMO0084/document.
Pełny tekst źródłaBuilding reconstruction from aerial photographs and other multi-source urban spatial data is a task endeavored using a plethora of automated and semi-automated methods ranging from point processes, classic image processing and laser scanning. In this thesis, an iterative relaxation system is developed based on the examination of the local context of each edge according to multiple spatial input sources (optical, elevation, shadow & foliage masks as well as other pre-processed data as elaborated in Chapter 6). All these multisource and multiresolution data are fused so that probable line segments or edges are extracted that correspond to prominent building boundaries.Two novel sub-systems have also been developed in this thesis. They were designed with the purpose to provide additional, more reliable, information regarding building contours in a future version of the proposed relaxation system. The first is a deep convolutional neural network (CNN) method for the detection of building borders. In particular, the network is based on the state of the art super-resolution model SRCNN (Dong C. L., 2015). It accepts aerial photographs depicting densely populated urban area data as well as their corresponding digital elevation maps (DEM). Training is performed using three variations of this urban data set and aims at detecting building contours through a novel super-resolved heteroassociative mapping. Another innovation of this approach is the design of a modified custom loss layer named Top-N. In this variation, the mean square error (MSE) between the reconstructed output image and the provided ground truth (GT) image of building contours is computed on the 2N image pixels with highest values . Assuming that most of the N contour pixels of the GT image are also in the top 2N pixels of the re-construction, this modification balances the two pixel categories and improves the generalization behavior of the CNN model. It is shown in the experiments, that the Top-N cost function offers performance gains in comparison to standard MSE. Further improvement in generalization ability of the network is achieved by using dropout.The second sub-system is a super-resolution deep convolutional network, which performs an enhanced-input associative mapping between input low-resolution and high-resolution images. This network has been trained with low-resolution elevation data and the corresponding high-resolution optical urban photographs. Such a resolution discrepancy between optical aerial/satellite images and elevation data is often the case in real world applications. More specifically, low-resolution elevation data augmented by high-resolution optical aerial photographs are used with the aim of augmenting the resolution of the elevation data. This is a unique super-resolution problem where it was found that many of -the proposed general-image SR propositions do not perform as well. The network aptly named building super resolution CNN (BSRCNN) is trained using patches extracted from the aforementioned data. Results show that in comparison with a classic bicubic upscale of the elevation data the proposed implementation offers important improvement as attested by a modified PSNR and SSIM metric. In comparison, other proposed general-image SR methods performed poorer than a standard bicubic up-scaler.Finally, the relaxation system fuses together all these multisource data sources comprising of pre-processed optical data, elevation data, foliage masks, shadow masks and other pre-processed data in an attempt to assign confidence values to each pixel belonging to a building contour. Confidence is augmented or decremented iteratively until the MSE error fails below a specified threshold or a maximum number of iterations have been executed. The confidence matrix can then be used to extract the true building contours via thresholding
Dekhtiar, Jonathan. "Deep Learning and unsupervised learning to automate visual inspection in the manufacturing industry". Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2513.
Pełny tekst źródłaAlthough studied since 1970, automatic visual inspection on production lines still struggles to be applied on a large scale and at low cost. The methods used depend greatly on the availability of domain experts. This inevitably leads to increased costs and reduced flexibility in the methods used. Since 2012, advances in the field of Deep Learning have enabled many advances in this direction, particularly thanks to convolutional neura networks that have achieved near-human performance in many areas associated with visual perception (e.g. object recognition and detection, etc.). This thesis proposes an unsupervised approach to meet the needs of automatic visual inspection. This method, called AnoAEGAN, combines adversarial learning and the estimation of a probability density function. These two complementary approaches make it possible to jointly estimate the pixel-by-pixel probability of a visual defect on an image. The model is trained from a very limited number of images (i.e. less than 1000 images) without using expert knowledge to "label" the data beforehand. This method allows increased flexibility with a limited training time and therefore great versatility, demonstrated on ten different tasks without any modification of the model. This method should reduce development costs and the time required to deploy in production. This method can also be deployed in a complementary way to a supervised approach in order to benefit from the advantages of each approach
Belharbi, Soufiane. "Neural networks regularization through representation learning". Thesis, Normandie, 2018. http://www.theses.fr/2018NORMIR10/document.
Pełny tekst źródłaNeural network models and deep models are one of the leading and state of the art models in machine learning. They have been applied in many different domains. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models requires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc. In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. Our proposal aims mainly at exploiting these dependencies by learning them in an unsupervised way. Validated on a facial landmark detection problem, learning the structure of the output data has shown to improve the network generalization and speedup its training. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. This prior is based on the idea that samples within the same class should have the same internal representation. We formulate this prior as a penalty that we add to the training cost to be minimized. Empirical experiments over MNIST and its variants showed an improvement of the network generalization when using only few training samples. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. The idea consists in re-using the filters of pre-trained convolutional networks that have been trained on large datasets such as ImageNet. Such pre-trained filters are plugged into a new convolutional network with new dense layers. Then, the whole network is trained over a new task. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. A pre-processing of the 3D CT scan to obtain a 2D representation and a post-processing to refine the decision are included in the proposed system. This work has been done in collaboration with the clinic "Rouen Henri Becquerel Center" who provided us with data