Dissertations / Theses on the topic 'Réseau de neurone convolutif'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Réseau de neurone convolutif.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Messaoud, Kaouther. "Deep learning based trajectory prediction for autonomous vehicles." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS048.
Full textThe trajectory prediction of neighboring agents of an autonomous vehicle is essential for autonomous driving in order to perform trajectory planning in an efficient manner. In this thesis, we tackle the problem of predicting the trajectory of a target vehicle in two different environments; a highway and an urban area (intersection, roundabout, etc.). To this end, we develop solutions based on deep machine learning by phasing the interactions between the target vehicle and the static and dynamic elements of the scene. In addition, in order to take into account the uncertainty of the future, we generate multiple plausible trajectories and the probability of occurrence of each. We also make sure that the predicted trajectories are realistic and conform to the structure of the scene. The solutions developed are evaluated using real driving datasets
Fernandez, Brillet Lucas. "Réseaux de neurones CNN pour la vision embarquée." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM043.
Full textRecently, Convolutional Neural Networks have become the state-of-the-art soluion(SOA) to most computer vision problems. In order to achieve high accuracy rates, CNNs require a high parameter count, as well as a high number of operations. This greatly complicates the deployment of such solutions in embedded systems, which strive to reduce memory size. Indeed, while most embedded systems are typically in the range of a few KBytes of memory, CNN models from the SOA usually account for multiple MBytes, or even GBytes in model size. Throughout this thesis, multiple novel ideas allowing to ease this issue are proposed. This requires to jointly design the solution across three main axes: Application, Algorithm and Hardware.In this manuscript, the main levers allowing to tailor computational complexity of a generic CNN-based object detector are identified and studied. Since object detection requires scanning every possible location and scale across an image through a fixed-input CNN classifier, the number of operations quickly grows for high-resolution images. In order to perform object detection in an efficient way, the detection process is divided into two stages. The first stage involves a region proposal network which allows to trade-off recall for the number of operations required to perform the search, as well as the number of regions passed on to the next stage. Techniques such as bounding box regression also greatly help reduce the dimension of the search space. This in turn simplifies the second stage, since it allows to reduce the task’s complexity to the set of possible proposals. Therefore, parameter counts can greatly be reduced.Furthermore, CNNs also exhibit properties that confirm their over-dimensionment. This over-dimensionement is one of the key success factors of CNNs in practice, since it eases the optimization process by allowing a large set of equivalent solutions. However, this also greatly increases computational complexity, and therefore complicates deploying the inference stage of these algorithms on embedded systems. In order to ease this problem, we propose a CNN compression method which is based on Principal Component Analysis (PCA). PCA allows to find, for each layer of the network independently, a new representation of the set of learned filters by expressing them in a more appropriate PCA basis. This PCA basis is hierarchical, meaning that basis terms are ordered by importance, and by removing the least important basis terms, it is possible to optimally trade-off approximation error for parameter count. Through this method, it is possible to compress, for example, a ResNet-32 network by a factor of ×2 both in the number of parameters and operations with a loss of accuracy <2%. It is also shown that the proposed method is compatible with other SOA methods which exploit other CNN properties in order to reduce computational complexity, mainly pruning, winograd and quantization. Through this method, we have been able to reduce the size of a ResNet-110 from 6.88Mbytes to 370kbytes, i.e. a x19 memory gain with a 3.9 % accuracy loss.All this knowledge, is applied in order to achieve an efficient CNN-based solution for a consumer face detection scenario. The proposed solution consists of just 29.3kBytes model size. This is x65 smaller than other SOA CNN face detectors, while providing equal detection performance and lower number of operations. Our face detector is also compared to a more traditional Viola-Jones face detector, exhibiting approximately an order of magnitude faster computation, as well as the ability to scale to higher detection rates by slightly increasing computational complexity.Both networks are finally implemented in a custom embedded multiprocessor, verifying that theorical and measured gains from PCA are consistent. Furthermore, parallelizing the PCA compressed network over 8 PEs achieves a x11.68 speed-up with respect to the original network running on a single PE
Pothier, Dominique. "Réseaux convolutifs à politiques." Master's thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69184.
Full textDespite their excellent performances, artificial neural networks high demand of both data and computational power limit their adoption in many domains. Developing less demanding architecture thus remain an important endeavor. This thesis seeks to produce a more flexible and less resource-intensive architecture by using reinforcement learning theory. When considering a network as an agent instead of a function approximator, one realize that the implicit policy followed by popular feed forward networks is extremely simple. We hypothesize that an architecture able to learn a more flexible policy could reach similar performances while reducing its resource footprint. The architecture we propose is inspired by research done in weight prediction, particularly by the hypernetwork architecture, which we use as a baseline model.Our results show that learning a dynamic policy achieving similar results to the static policies of conventional networks is not a trivial task. Our proposed architecture succeeds in limiting its parameter space by 20%, but does so at the cost of a 24% computation increase and loss of5% accuracy. Despite those results, we believe that this architecture provides a baseline that can be improved in multiple ways that we describe in the conclusion.
Morère, Olivier André Luc. "Deep learning compact and invariant image representations for instance retrieval." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066406.
Full textImage instance retrieval is the problem of finding an object instance present in a query image from a database of images. Also referred to as particular object retrieval, this problem typically entails determining with high precision whether the retrieved image contains the same object as the query image. Scale, rotation and orientation changes between query and database objects and background clutter pose significant challenges for this problem. State-of-the-art image instance retrieval pipelines consist of two major steps: first, a subset of images similar to the query are retrieved from the database, and second, Geometric Consistency Checks (GCC) are applied to select the relevant images from the subset with high precision. The first step is based on comparison of global image descriptors: high-dimensional vectors with up to tens of thousands of dimensions rep- resenting the image data. The second step is computationally highly complex and can only be applied to hundreds or thousands of images in practical applications. More discriminative global descriptors result in relevant images being more highly ranked, resulting in fewer images that need to be compared pairwise with GCC. As a result, better global descriptors are key to improving retrieval performance and have been the object of much recent interest. Furthermore, fast searches in large databases of millions or even billions of images requires the global descriptors to be compressed into compact representations. This thesis will focus on how to achieve extremely compact global descriptor representations for large-scale image instance retrieval. After introducing background concepts about supervised neural networks, Restricted Boltzmann Machine (RBM) and deep learning in Chapter 2, Chapter 3 will present the design principles and recent work for the Convolutional Neural Networks (CNN), which recently became the method of choice for large-scale image classification tasks. Next, an original multistage approach for the fusion of the output of multiple CNN is proposed. Submitted as part of the ILSVRC 2014 challenge, results show that this approach can significantly improve classification results. The promising perfor- mance of CNN is largely due to their capability to learn appropriate high-level visual representations from the data. Inspired by a stream of recent works showing that the representations learnt on one particular classification task can transfer well to other classification tasks, subsequent chapters will focus on the transferability of representa- tions learnt by CNN to image instance retrieval…
Carpentier, Mathieu. "Classification fine par réseau de neurones à convolution." Master's thesis, Université Laval, 2019. http://hdl.handle.net/20.500.11794/35835.
Full textArtificial intelligence is a relatively recent research domain. With it, many breakthroughs were made on a number of problems that were considered very hard. Fine-grained classification is one of those problems. However, a relatively small amount of research has been done on this task even though itcould represent progress on a scientific, commercial and industrial level. In this work, we talk about applying fine-grained classification on concrete problems such as tree bark classification and mould classification in culture. We start by presenting fundamental deep learning concepts at the root of our solution. Then, we present multiple experiments made in order to try to solve the tree bark classification problem and we detail the novel dataset BarkNet 1.0 that we made for this project. With it, we were able to develop a method that obtains an accuracy of 93.88% on singlecrop in a single image, and an accuracy of 97.81% using a majority voting approach on all the images of a tree. We conclude by demonstrating the feasibility of applying our method on new problems by showing two concrete applications on which we tried our approach, industrial tree classification and mould classification.
Morère, Olivier André Luc. "Deep learning compact and invariant image representations for instance retrieval." Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066406.
Full textImage instance retrieval is the problem of finding an object instance present in a query image from a database of images. Also referred to as particular object retrieval, this problem typically entails determining with high precision whether the retrieved image contains the same object as the query image. Scale, rotation and orientation changes between query and database objects and background clutter pose significant challenges for this problem. State-of-the-art image instance retrieval pipelines consist of two major steps: first, a subset of images similar to the query are retrieved from the database, and second, Geometric Consistency Checks (GCC) are applied to select the relevant images from the subset with high precision. The first step is based on comparison of global image descriptors: high-dimensional vectors with up to tens of thousands of dimensions rep- resenting the image data. The second step is computationally highly complex and can only be applied to hundreds or thousands of images in practical applications. More discriminative global descriptors result in relevant images being more highly ranked, resulting in fewer images that need to be compared pairwise with GCC. As a result, better global descriptors are key to improving retrieval performance and have been the object of much recent interest. Furthermore, fast searches in large databases of millions or even billions of images requires the global descriptors to be compressed into compact representations. This thesis will focus on how to achieve extremely compact global descriptor representations for large-scale image instance retrieval. After introducing background concepts about supervised neural networks, Restricted Boltzmann Machine (RBM) and deep learning in Chapter 2, Chapter 3 will present the design principles and recent work for the Convolutional Neural Networks (CNN), which recently became the method of choice for large-scale image classification tasks. Next, an original multistage approach for the fusion of the output of multiple CNN is proposed. Submitted as part of the ILSVRC 2014 challenge, results show that this approach can significantly improve classification results. The promising perfor- mance of CNN is largely due to their capability to learn appropriate high-level visual representations from the data. Inspired by a stream of recent works showing that the representations learnt on one particular classification task can transfer well to other classification tasks, subsequent chapters will focus on the transferability of representa- tions learnt by CNN to image instance retrieval…
Elloumi, Zied. "Prédiction de performances des systèmes de Reconnaissance Automatique de la Parole." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM005/document.
Full textIn this thesis, we focus on performance prediction of automatic speech recognition (ASR) systems.This is a very useful task to measure the reliability of transcription hypotheses for a new data collection, when the reference transcription is unavailable and the ASR system used is unknown (black box).Our contribution focuses on several areas: first, we propose a heterogeneous French corpus to learn and evaluate ASR prediction systems.We then compare two prediction approaches: a state-of-the-art (SOTA) performance prediction based on engineered features and a new strategy based on learnt features using convolutional neural networks (CNNs).While the joint use of textual and signal features did not work for the SOTA system, the combination of inputs for CNNs leads to the best WER prediction performance. We also show that our CNN prediction remarkably predicts the shape of the WER distribution on a collection of speech recordings.Then, we analyze factors impacting both prediction approaches. We also assess the impact of the training size of prediction systems as well as the robustness of systems learned with the outputs of a particular ASR system and used to predict performance on a new data collection.Our experimental results show that both prediction approaches are robust and that the prediction task is more difficult on short speech turns as well as spontaneous speech style.Finally, we try to understand which information is captured by our neural model and its relation with different factors.Our experiences show that intermediate representations in the network automatically encode information on the speech style, the speaker's accent as well as the broadcast program type.To take advantage of this analysis, we propose a multi-task system that is slightly more effective on the performance prediction task
Foroughmand, Aarabi Hadrien. "Towards global tempo estimation and rhythm-oriented genre classification based on harmonic characteristics of rhythm." Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS018.
Full textAutomatic detection of the rhythmic structure within music is one of the challenges of the "Music Information Retrieval" research area. The advent of technology dedicated to the arts has allowed the emergence of new musical trends generally described by the term "Electronic/Dance Music" (EDM) which encompasses a plethora of sub-genres. This type of music often dedicated to dance is characterized by its rhythmic structure. We propose a rhythmic analysis of what defines certain musical genres including those of EDM. To do so, we want to perform an automatic global tempo estimation task and a genre classification task based on rhythm. Tempo and genre are two intertwined aspects since genres are often associated with rhythmic patterns that are played in specific tempo ranges. Some so-called "handcrafted" tempo estimation systems have been shown to be effective based on the extraction of rhythm-related characteristics. Recently, with the appearance of annotated databases, so-called "data-driven" systems and deep learning approaches have shown progress in the automatic estimation of these tasks. In this thesis, we propose methods at the crossroads between " handcrafted " and " data-driven " systems. The development of a new representation of rhythm combined with deep learning by convolutional neural network is at the basis of all our work. We present in detail our Deep Rhythm method in this thesis and we also present several extensions based on musical intuitions that allow us to improve our results
Pourchot, Aloïs. "Improving Radiographic Diagnosis with Deep Learning in Clinical Settings." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS421.
Full textThe impressive successes of deep learning over the course of the past decade have reinforced its establishment as the standard modus operandi to solve difficult machine learning problems, as well as enabled its swift spread to manifold domains of application. One such domain, which is at the heart of this PhD, is medical imaging. Deep learning has made the thrilling perspective of relieving medical experts from a fraction of their burden through automated diagnosis a reality. Over the course of this thesis, we were led to consider two medical problems: the task of fracture detection, and the task of bone age assessment. For both of them, we strove to explore possibilities to improve deep learning tools aimed at facilitating their diagnosis. With this objective in mind, we have explored two different strategies. The first one, ambitious yet arrogant, has led us to investigate the paradigm of neural architecture search, a logical succession to deep learning which aims at learning the very structure of the neural network model used to solve a task. In a second, bleaker but wiser strategy, we have tried to improve a model through the meticulous analysis of the data sources at hands. In both scenarios, a particular care was given to the clinical relevance of our different results and contributions, as we believed that the practical anchoring of our different contrivances was just as important as their theoretical design
Abbasi, Mahdieh. "Toward robust deep neural networks." Doctoral thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67766.
Full textIn this thesis, our goal is to develop robust and reliable yet accurate learning models, particularly Convolutional Neural Networks (CNNs), in the presence of adversarial examples and Out-of-Distribution (OOD) samples. As the first contribution, we propose to predict adversarial instances with high uncertainty through encouraging diversity in an ensemble of CNNs. To this end, we devise an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism to predict the adversarial examples with low confidence while keeping the predictive confidence of the clean samples high. In the presence of high entropy in our ensemble, we prove that the predictive confidence can be upper-bounded, leading to have a globally fixed threshold over the predictive confidence for identifying adversaries. We analytically justify the role of diversity in our ensemble on mitigating the risk of both black-box and white-box adversarial examples. Finally, we empirically assess the robustness of our ensemble to the black-box and the white-box attacks on several benchmark datasets.The second contribution aims to address the detection of OOD samples through an end-to-end model trained on an appropriate OOD set. To this end, we address the following central question: how to differentiate many available OOD sets w.r.t. a given in distribution task to select the most appropriate one, which in turn induces a model with a high detection rate of unseen OOD sets? To answer this question, we hypothesize that the “protection” level of in-distribution sub-manifolds by each OOD set can be a good possible property to differentiate OOD sets. To measure the protection level, we then design three novel, simple, and cost-effective metrics using a pre-trained vanilla CNN. In an extensive series of experiments on image and audio classification tasks, we empirically demonstrate the abilityof an Augmented-CNN (A-CNN) and an explicitly-calibrated CNN for detecting a significantly larger portion of unseen OOD samples, if they are trained on the most protective OOD set. Interestingly, we also observe that the A-CNN trained on the most protective OOD set (calledA-CNN) can also detect the black-box Fast Gradient Sign (FGS) adversarial examples. As the third contribution, we investigate more closely the capacity of the A-CNN on the detection of wider types of black-box adversaries. To increase the capability of A-CNN to detect a larger number of adversaries, we augment its OOD training set with some inter-class interpolated samples. Then, we demonstrate that the A-CNN trained on the most protective OOD set along with the interpolated samples has a consistent detection rate on all types of unseen adversarial examples. Where as training an A-CNN on Projected Gradient Descent (PGD) adversaries does not lead to a stable detection rate on all types of adversaries, particularly the unseen types. We also visually assess the feature space and the decision boundaries in the input space of a vanilla CNN and its augmented counterpart in the presence of adversaries and the clean ones. By a properly trained A-CNN, we aim to take a step toward a unified and reliable end-to-end learning model with small risk rates on both clean samples and the unusual ones, e.g. adversarial and OOD samples.The last contribution is to show a use-case of A-CNN for training a robust object detector on a partially-labeled dataset, particularly a merged dataset. Merging various datasets from similar contexts but with different sets of Object of Interest (OoI) is an inexpensive way to craft a large-scale dataset which covers a larger spectrum of OoIs. Moreover, merging datasets allows achieving a unified object detector, instead of having several separate ones, resultingin the reduction of computational and time costs. However, merging datasets, especially from a similar context, causes many missing-label instances. With the goal of training an integrated robust object detector on a partially-labeled but large-scale dataset, we propose a self-supervised training framework to overcome the issue of missing-label instances in the merged datasets. Our framework is evaluated on a merged dataset with a high missing-label rate. The empirical results confirm the viability of our generated pseudo-labels to enhance the performance of YOLO, as the current (to date) state-of-the-art object detector.
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains." Thesis, Paris, ENST, 2017. http://www.theses.fr/2017ENST0071/document.
Full textThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Fourure, Damien. "Réseaux de neurones convolutifs pour la segmentation sémantique et l'apprentissage d'invariants de couleur." Thesis, Lyon, 2017. http://www.theses.fr/2017LYSES056/document.
Full textComputer vision is an interdisciplinary field that investigates how computers can gain a high level of understanding from digital images or videos. In artificial intelligence, and more precisely in machine learning, the field in which this thesis is positioned,computer vision involves extracting characteristics from images and then generalizing concepts related to these characteristics. This field of research has become very popular in recent years, particularly thanks to the results of the convolutional neural networks that form the basis of so-called deep learning methods. Today, neural networks make it possible, among other things, to recognize different objects present in an image, to generate very realistic images or even to beat the champions at the Go game. Their performance is not limited to the image domain, since they are also used in other fields such as natural language processing (e. g. machine translation) or sound recognition. In this thesis, we study convolutional neural networks in order to develop specialized architectures and loss functions for low-level tasks (color constancy) as well as high-level tasks (semantic segmentation). Color constancy, is the ability of the human visual system to perceive constant colours for a surface despite changes in the spectrum of illumination (lighting change). In computer vision, the main approach consists in estimating the color of the illuminant and then suppressing its impact on the perceived color of objects. We approach the task of color constancy with the use of neural networks by developing a new architecture composed of a subsampling operator inspired by traditional methods. Our experience shows that our method makes it possible to obtain competitive performances with the state of the art. Nevertheless, our architecture requires a large amount of training data. In order to partially correct this problem and improve the training of neural networks, we present several techniques for artificial data augmentation. We are also making two contributions on a high-level issue : semantic segmentation. This task, which consists of assigning a semantic class to each pixel of an image, is a challenge in computer vision because of its complexity. On the one hand, it requires many examples of training that are costly to obtain. On the other hand, it requires the adaptation of traditional convolutional neural networks in order to obtain a so-called dense prediction, i. e., a prediction for each pixel present in the input image. To solve the difficulty of acquiring training data, we propose an approach that uses several databases annotated with different labels at the same time. To do this, we define a selective loss function that has the advantage of allowing the training of a convolutional neural network from data from multiple databases. We also developed self-context approach that captures the correlations between labels in different databases. Finally, we present our third contribution : a new convolutional neural network architecture called GridNet specialized for semantic segmentation. Unlike traditional networks, implemented with a single path from the input (image) to the output (prediction), our architecture is implemented as a 2D grid allowing several interconnected streams to operate at different resolutions. In order to exploit all the paths of the grid, we propose a technique inspired by dropout. In addition, we empirically demonstrate that our architecture generalize many of well-known stateof- the-art networks. We conclude with an analysis of the empirical results obtained with our architecture which, although trained from scratch, reveals very good performances, exceeding popular approaches often pre-trained
Suzano, Massa Francisco Vitor. "Mise en relation d'images et de modèles 3D avec des réseaux de neurones convolutifs." Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC1198/document.
Full textThe recent availability of large catalogs of 3D models enables new possibilities for a 3D reasoning on photographs. This thesis investigates the use of convolutional neural networks (CNNs) for relating 3D objects to 2D images.We first introduce two contributions that are used throughout this thesis: an automatic memory reduction library for deep CNNs, and a study of CNN features for cross-domain matching. In the first one, we develop a library built on top of Torch7 which automatically reduces up to 91% of the memory requirements for deploying a deep CNN. As a second point, we study the effectiveness of various CNN features extracted from a pre-trained network in the case of images from different modalities (real or synthetic images). We show that despite the large cross-domain difference between rendered views and photographs, it is possible to use some of these features for instance retrieval, with possible applications to image-based rendering.There has been a recent use of CNNs for the task of object viewpoint estimation, sometimes with very different design choices. We present these approaches in an unified framework and we analyse the key factors that affect performance. We propose a joint training method that combines both detection and viewpoint estimation, which performs better than considering the viewpoint estimation separately. We also study the impact of the formulation of viewpoint estimation either as a discrete or a continuous task, we quantify the benefits of deeper architectures and we demonstrate that using synthetic data is beneficial. With all these elements combined, we improve over previous state-of-the-art results on the Pascal3D+ dataset by a approximately 5% of mean average viewpoint precision.In the instance retrieval study, the image of the object is given and the goal is to identify among a number of 3D models which object it is. We extend this work to object detection, where instead we are given a 3D model (or a set of 3D models) and we are asked to locate and align the model in the image. We show that simply using CNN features are not enough for this task, and we propose to learn a transformation that brings the features from the real images close to the features from the rendered views. We evaluate our approach both qualitatively and quantitatively on two standard datasets: the IKEAobject dataset, and a subset of the Pascal VOC 2012 dataset of the chair category, and we show state-of-the-art results on both of them
Antipov, Grigory. "Apprentissage profond pour la description sémantique des traits visuels humains." Electronic Thesis or Diss., Paris, ENST, 2017. http://www.theses.fr/2017ENST0071.
Full textThe recent progress in artificial neural networks (rebranded as deep learning) has significantly boosted the state-of-the-art in numerous domains of computer vision. In this PhD study, we explore how deep learning techniques can help in the analysis of gender and age from a human face. In particular, two complementary problem settings are considered: (1) gender/age prediction from given face images, and (2) synthesis and editing of human faces with the required gender/age attributes.Firstly, we conduct a comprehensive study which results in an empirical formulation of a set of principles for optimal design and training of gender recognition and age estimation Convolutional Neural Networks (CNNs). As a result, we obtain the state-of-the-art CNNs for gender/age prediction according to the three most popular benchmarks, and win an international competition on apparent age estimation. On a very challenging internal dataset, our best models reach 98.7% of gender classification accuracy and an average age estimation error of 4.26 years.In order to address the problem of synthesis and editing of human faces, we design and train GA-cGAN, the first Generative Adversarial Network (GAN) which can generate synthetic faces of high visual fidelity within required gender and age categories. Moreover, we propose a novel method which allows employing GA-cGAN for gender swapping and aging/rejuvenation without losing the original identity in synthetic faces. Finally, in order to show the practical interest of the designed face editing method, we apply it to improve the accuracy of an off-the-shelf face verification software in a cross-age evaluation scenario
Yang, Lixuan. "Structuring of image databases for the suggestion of products for online advertising." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1102/document.
Full textThe topic of the thesis is the extraction and segmentation of clothing items from still images using techniques from computer vision, machine learning and image description, in view of suggesting non intrusively to the users similar items from a database of retail products. We firstly propose a dedicated object extractor for dress segmentation by combining local information with a prior learning. A person detector is applied to localize sites in the image that are likely to contain the object. Then, an intra-image two-stage learning process is developed to roughly separate foreground pixels from the background. Finally, the object is finely segmented by employing an active contour algorithm that takes into account the previous segmentation and injects specific knowledge about local curvature in the energy function.We then propose a new framework for extracting general deformable clothing items by using a three stage global-local fitting procedure. A set of template initiates an object extraction process by a global alignment of the model, followed by a local search minimizing a measure of the misfit with respect to the potential boundaries in the neighborhood. The results provided by each template are aggregated, with a global fitting criterion, to obtain the final segmentation.In our latest work, we extend the output of a Fully Convolution Neural Network to infer context from local units(superpixels). To achieve this we optimize an energy function,that combines the large scale structure of the image with the locallow-level visual descriptions of superpixels, over the space of all possiblepixel labellings. In addition, we introduce a novel dataset called RichPicture, consisting of 1000 images for clothing extraction from fashion images.The methods are validated on the public database and compares favorably to the other methods according to all the performance measures considered
Farabet, Clément. "Analyse sémantique des images en temps-réel avec des réseaux convolutifs." Phd thesis, Université Paris-Est, 2013. http://tel.archives-ouvertes.fr/tel-00965622.
Full textChabot, Florian. "Analyse fine 2D/3D de véhicules par réseaux de neurones profonds." Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC018/document.
Full textIn this thesis, we are interested in fine-grained analysis of vehicle from an image. We define fine-grained analysis as the following concepts : vehicle detection in the image, vehicle viewpoint (or orientation) estimation, vehicle visibility characterization, vehicle 3D localization and make and model recognition. The design of reliable solutions for fine-grained analysis of vehicle open the door to multiple applications in particular for intelligent transport systems as well as video surveillance systems. In this work, we propose several contributions allowing to address partially or wholly this issue. Proposed approaches are based on joint deep learning technologies and 3D models. In a first section, we deal with make and model classification keeping in mind the difficulty to create training data. In a second section, we investigate a novel method for both vehicle detection and fine-grained viewpoint estimation based on local apparence features and geometric spatial coherence. It uses models learned only on synthetic data. Finally, in a third section, a complete system for fine-grained analysis is proposed. It is based on the multi-task concept. Throughout this report, we provide quantitative and qualitative results. On several aspects related to vehicle fine-grained analysis, this work allowed to outperform state of the art methods
Mamalet, Franck. "Adéquation algorithme-architecture pour les réseaux de neurones à convolution : application à l'analyse de visages embarquée." Thesis, Lyon, INSA, 2011. http://www.theses.fr/2011ISAL0068.
Full textProliferation of image sensors in many electronic devices, and increasing processing capabilities of such sensors, open a field of exploration for the implementation and optimization of complex image processing algorithms in order to provide embedded vision systems. This work is a contribution in the research domain of algorithm-architecture matching. It focuses on a class of algorithms called convolution neural network (ConvNet) and its applications in embedded facial analysis. The facial analysis framework, introduced by Garcia et al., was chosen for its state of the art performances in detection/recognition, and also for its homogeneity based on ConvNets. The first contribution of this work deals with an adequacy study of this facial analysis framework with embedded processors. We propose several algorithmic adaptations of ConvNets, and show that they can lead to significant speedup factors (up to 700) on an embedded processor for mobile phone, without performance degradation. We then present a study of ConvNets parallelization capabilities, through N. Farrugia's PhD work. A coarse-grain parallelism exploration of ConvNets, followed by study of internal scheduling of elementary processors, lead to a parameterized parallel architecture on FPGA, able to detect faces at more than 10 VGA frames per second. Finally, we propose an extension of these studies to the learning phase of neural networks. We analyze several hypothesis space restrictions for ConvNets, and show, on a case study, that classification rate performances are almost the same with a training time divided by up to five
Mlynarski, Pawel. "Apprentissage profond pour la segmentation des tumeurs cérébrales et des organes à risque en radiothérapie." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4084.
Full textMedical images play an important role in cancer diagnosis and treatment. Oncologists analyze images to determine the different characteristics of the cancer, to plan the therapy and to observe the evolution of the disease. The objective of this thesis is to propose efficient methods for automatic segmentation of brain tumors and organs at risk in the context of radiotherapy planning, using Magnetic Resonance (MR) images. First, we focus on segmentation of brain tumors using Convolutional Neural Networks (CNN) trained on MRIs manually segmented by experts. We propose a segmentation model having a large 3D receptive field while being efficient in terms of computational complexity, based on combination of 2D and 3D CNNs. We also address problems related to the joint use of several MRI sequences (T1, T2, FLAIR). Second, we introduce a segmentation model which is trained using weakly-annotated images in addition to fully-annotated images (with voxelwise labels), which are usually available in very limited quantities due to their cost. We show that this mixed level of supervision considerably improves the segmentation accuracy when the number of fully-annotated images is limited.\\ Finally, we propose a methodology for an anatomy-consistent segmentation of organs at risk in the context of radiotherapy of brain tumors. The segmentations produced by our system on a set of MRIs acquired in the Centre Antoine Lacassagne (Nice, France) are evaluated by an experienced radiotherapist
Beltzung, Benjamin. "Utilisation de réseaux de neurones convolutifs pour mieux comprendre l’évolution et le développement du comportement de dessin chez les Hominidés." Electronic Thesis or Diss., Strasbourg, 2023. http://www.theses.fr/2023STRAJ114.
Full textThe study of drawing behavior can be highly informative, both cognitively and psychologically, in humans and other primates. However, this wealth of information can also be a challenge to analysis and interpretation, particularly in the absence of explanation or verbalization by the author of the drawing. Indeed, an adult's interpretation of a drawing may not be in line with the artist's original intention. During my thesis, I showed that, although generally regarded as black boxes, convolutional neural networks (CNNs) can provide a better understanding of the drawing behavior. Firstly, by using a CNN to classify drawings of a female orangutan according to their season of production, and highlighting variation in style and content. In addition, an ontogenetic approach was considered to quantify the similarity between productions from different age groups. In the future, more interpretable models and the application of new interpretability methods could be applied to better decipher drawing behavior
Matteo, Lionel. "De l’image optique "multi-stéréo" à la topographie très haute résolution et la cartographie automatique des failles par apprentissage profond." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4099.
Full textSeismogenic faults are the source of earthquakes. The study of their properties thus provides information on some of the properties of the large earthquakes they might produce. Faults are 3D features, forming complex networks generally including one master fault and myriads of secondary faults and fractures that intensely dissect the master fault embedding rocks. I aim in my thesis to develop approaches to help studying this intense secondary faulting/fracturing. To identify, map and measure the faults and fractures within dense fault networks, I have handled two challenges:1) Faults generally form steep topographic escarpments at the ground surface that enclose narrow, deep corridors or canyons, where topography, and hence fault traces, are difficult to measure using the available standard methods (such as stereo and tri-stereo of optical satellite images). To address this challenge, I have thus used multi-stéréo acquisitions with different configuration such as different roll and pitch angles, different date of acquisitions and different mode of acquisitions (mono and tri-stéréo). Our dataset amounting 37 Pléiades images in three different tectonic sites within Western USA (Valley of Fire, Nevada; Granite Dells, Arizona; Bishop Tuff, California) allow us to test different configuration of acquisitions to calculate the topography with three different approaches. Using the free open-source software Micmac (IGN ; Rupnik et al., 2017), I have calculated the topography in the form of Digital Surface Models (DSM): (i) with the combination of 2 to 17 Pleiades images, (ii) stacking and merging DSM built from individual stéréo or tri-stéréo acquisitions avoiding the use of multi-dates combinations, (iii) stacking and merging point clouds built from tri-stereo acquisitions following the multiview pipeline developped by Rupnik et al., 2018. We used the recent multiview stereo pipeling CARS (CNES/CMLA) developped by Michel et al., 2020 as a last approach (iv), combnining tri-stereo acquisitions. From the four different approaches, I have thus calculated more than 200 DSM and my results suggest that combining two tri-stéréo acquisitions or one stéréo and one tri-stéréo acquisitions with opposite roll angles leads to the most accurate DSM (with the most complete and precise topography surface).2) Commonly, faults are mapped manually in the field or from optical images and topographic data through the recognition of the specific curvilinear traces they form at the ground surface. However, manual mapping is time-consuming, which limits our capacity to produce complete representations and measurements of the fault networks. To overcome this problem, we have adopted a machine learning approach, namely a U-Net Convolutional Neural Network, to automate the identification and mapping of fractures and faults in optical images and topographic data. Intentionally, we trained the CNN with a moderate amount of manually created fracture and fault maps of low resolution and basic quality, extracted from one type of optical images (standard camera photographs of the ground surface). Based on the results of a number of performance tests, we select the best performing model, MRef, and demonstrate its capacity to predict fractures and faults accurately in image data of various types and resolutions (ground photographs, drone and satellite images and topographic data). The MRef predictions thus enable the statistical analysis of the fault networks. MRef exhibits good generalization capacities, making it a viable tool for fast and accurate extraction of fracture and fault networks from image and topographic data
Yang, Lixuan. "Structuring of image databases for the suggestion of products for online advertising." Electronic Thesis or Diss., Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1102.
Full textThe topic of the thesis is the extraction and segmentation of clothing items from still images using techniques from computer vision, machine learning and image description, in view of suggesting non intrusively to the users similar items from a database of retail products. We firstly propose a dedicated object extractor for dress segmentation by combining local information with a prior learning. A person detector is applied to localize sites in the image that are likely to contain the object. Then, an intra-image two-stage learning process is developed to roughly separate foreground pixels from the background. Finally, the object is finely segmented by employing an active contour algorithm that takes into account the previous segmentation and injects specific knowledge about local curvature in the energy function.We then propose a new framework for extracting general deformable clothing items by using a three stage global-local fitting procedure. A set of template initiates an object extraction process by a global alignment of the model, followed by a local search minimizing a measure of the misfit with respect to the potential boundaries in the neighborhood. The results provided by each template are aggregated, with a global fitting criterion, to obtain the final segmentation.In our latest work, we extend the output of a Fully Convolution Neural Network to infer context from local units(superpixels). To achieve this we optimize an energy function,that combines the large scale structure of the image with the locallow-level visual descriptions of superpixels, over the space of all possiblepixel labellings. In addition, we introduce a novel dataset called RichPicture, consisting of 1000 images for clothing extraction from fashion images.The methods are validated on the public database and compares favorably to the other methods according to all the performance measures considered
Nkeumaleu, Guy-Merlin. "Propagation d'informations le long d'une ligne de transmission non linéaire structurée en super réseau et simulant un neurone myélinisé." Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCK006/document.
Full textNon-linear systems are almostly described by partial differential equations that characterize them. We have some systems such as the chain of coupled pebdelums, the protein chain comprising molecules with hydrogen bonds, atomic lattice, and so on .These systems are most often characterized by anharmonic inter particulate interactions and and then immersed in deformable potential substrates. In addition to nonlinearity and dispersion, these other phenomena namely anharmonicity and deformability are responsible for certain properties of propagation of solitary waves such as (compactons, kinks and anti-kinks, peackons, ...etc) and also the ability of the systems to transmit a signal . We used the bifurcation method to plot the different phase portraits obtained . For various parameters of such systems , we have highlighted the influence of anharmonicity on transmissivity and bistability of the system: It appears that the amplitude of the input signal which produces bistability increases with anharmonicity and the bistability is delayed.To considering these important properties generated by such systems, it seemed interesting to buildin an electrical line characterized by the same equations of the system. By alternately doubling the capacitance of the capacitors of a section of this line, we have realised a super-lattice that simulates a myelinised neuron. The types of solitons we get from this line are better adapted to describe the electrical signal which characterizes the neuron impulse located in space with a compact support
Heuillet, Alexandre. "Exploring deep neural network differentiable architecture design." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG069.
Full textArtificial Intelligence (AI) has gained significant popularity in recent years, primarily due to its successful applications in various domains, including textual data analysis, computer vision, and audio processing. The resurgence of deep learning techniques has played a central role in this success. The groundbreaking paper by Krizhevsky et al., AlexNet, narrowed the gap between human and machine performance in image classification tasks. Subsequent papers such as Xception and ResNet have further solidified deep learning as a leading technique, opening new horizons for the AI community. The success of deep learning lies in its architecture, which is manually designed with expert knowledge and empirical validation. However, these architectures lack the certainty of an optimal solution. To address this issue, recent papers introduced the concept of Neural Architecture Search (NAS), enabling the learning of deep architectures. However, most initial approaches focused on large architectures with specific targets (e.g., supervised learning) and relied on computationally expensive optimization techniques such as reinforcement learning and evolutionary algorithms. In this thesis, we further investigate this idea by exploring automatic deep architecture design, with a particular emphasis on differentiable NAS (DNAS), which represents the current trend in NAS due to its computational efficiency. While our primary focus is on Convolutional Neural Networks (CNNs), we also explore Vision Transformers (ViTs) with the goal of designing cost-effective architectures suitable for real-time applications
Barhoumi, Amira. "Une approche neuronale pour l’analyse d’opinions en arabe." Thesis, Le Mans, 2020. http://www.theses.fr/2020LEMA1022.
Full textMy thesis is part of Arabic sentiment analysis. Its aim is to determine the global polarity of a given textual statement written in MSA or dialectal arabic. This research area has been subject of numerous studies dealing with Indo-European languages, in particular English. One of difficulties confronting this thesis is the processing of Arabic. In fact, Arabic is a morphologically rich language which implies a greater sparsity : we want to overcome this problem by producing, in a completely automatic way, new arabic specific embeddings. Our study focuses on the use of a neural approach to improve polarity detection, using embeddings. These embeddings have revealed fundamental in various natural languages processing tasks (NLP). Our contribution in this thesis concerns several axis. First, we begin with a preliminary study of the various existing pre-trained word embeddings resources in arabic. These embeddings consider words as space separated units in order to capture semantic and syntactic similarities in the embedding space. Second, we focus on the specifity of Arabic language. We propose arabic specific embeddings that take into account agglutination and morphological richness of Arabic. These specific embeddings have been used, alone and in combined way, as input to neural networks providing an improvement in terms of classification performance. Finally, we evaluate embeddings with intrinsic and extrinsic methods specific to sentiment analysis task. For intrinsic embeddings evaluation, we propose a new protocol introducing the notion of sentiment stability in the embeddings space. We propose also a qualitaive extrinsic analysis of our embeddings by using visualisation methods
Breux, Yohan. "Du capteur à la sémantique : contribution à la modélisation d'environnement pour la robotique autonome en interaction avec l'humain." Thesis, Montpellier, 2018. http://www.theses.fr/2018MONTS059/document.
Full textAutonomous robotics is successfully used in controled industrial environments where instructions follow predetermined implementation plans.Domestic robotics is the challenge of years to come and involve several new problematics : we have to move from a closed bounded world to an open one. A robot can no longer only rely on its raw sensor data as they merely show the absence or presence of things. It should also understand why objects are in its environment as well as the meaning of its tasks. Besides, it has to interact with human beings and therefore has to share their conceptualization through natural language. Indeed, each language is in its own an abstract and compact representation of the world which links up variety of concrete and abstract concepts. However, real observations are more complex than our simplified semantical representation. Thus they can come into conflict : this is the price for a finite representation of an "infinite" world.To address those challenges, we propose in this thesis a global architecture bringing together different modalities of environment representation. It allows to relate a physical representation to abstract concepts expressed in natural language. The inputs of our system are two-fold : sensor data feed the perception modality whereas textual information and human interaction are linked to the semantic modality. The novelty of our approach is in the introduction of an intermediate modality based on instances (physical realization of semantic concepts). Among other things, it allows to connect indirectly and without contradiction perceptual data to knowledge in natural langage.We propose in this context an original method to automatically generate an ontology for the description of physical objects. On the perception side, we investigate some properties of image descriptor extracted from intermediate layers of convolutional neural networks. In particular, we show their relevance for instance representation as well as their use for estimation of similarity transformation. We also propose a method to relate instances to our object-oriented ontology which, in the assumption of an open world, can be seen as an alternative to classical classification methods. Finally, the global flow of our system is illustrated through the description of user request management processes
Yedroudj, Mehdi. "Steganalysis and steganography by deep learning." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS095.
Full textImage steganography is the art of secret communication in order to exchange a secret message. In the other hand, image steganalysis attempts to detect the presence of a hidden message by searching artefacts within an image. For about ten years, the classic approach for steganalysis was to use an Ensemble Classifier fed by hand-crafted features. In recent years, studies have shown that well-designed convolutional neural networks (CNNs) can achieve superior performance compared to conventional machine-learning approaches.The subject of this thesis deals with the use of deep learning techniques for image steganography and steganalysis in the spatialdomain.The first contribution is a fast and very effective convolutional neural network for steganalysis, named Yedroudj-Net. Compared tomodern deep learning based steganalysis methods, Yedroudj-Net can achieve state-of-the-art detection results, but also takes less time to converge, allowing the use of a large training set. Moreover,Yedroudj-Net can easily be improved by using well known add-ons. Among these add-ons, we have evaluated the data augmentation, and the the use of an ensemble of CNN; Both increase our CNN performances.The second contribution is the application of deep learning techniques for steganography i.e the embedding. Among the existing techniques, we focus on the 3-player game approach.We propose an embedding algorithm that automatically learns how to hide a message secretly. Our proposed steganography system is based on the use of generative adversarial networks. The training of this steganographic system is conducted using three neural networks that compete against each other: the embedder, the extractor, and the steganalyzer. For the steganalyzer we use Yedroudj-Net, this for its affordable size, and for the fact that its training does not require the use of any tricks that could increase the computational time.This second contribution defines a research direction, by giving first reflection elements while giving promising first results
Pham, Huy-Hieu. "Architectures d'apprentissage profond pour la reconnaissance d'actions humaines dans des séquences vidéo RGB-D monoculaires : application à la surveillance dans les transports publics." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30145.
Full textThis thesis is dealing with automatic recognition of human actions from monocular RGB-D video sequences. Our main goal is to recognize which human actions occur in unknown videos. This problem is a challenging task due to a number of obstacles caused by the variability of the acquisition conditions, including the lighting, the position, the orientation and the field of view of the camera, as well as the variability of actions which can be performed differently, notably in terms of speed. To tackle these problems, we first review and evaluate the most prominent state-of-the-art techniques to identify the current state of human action recognition in videos. We then propose a new approach for skeleton-based action recognition using Deep Neural Networks (DNNs). Two key questions have been addressed. First, how to efficiently represent the spatio-temporal patterns of skeletal data for fully exploiting the capacity in learning high-level representations of Deep Convolutional Neural Networks (D-CNNs). Second, how to design a powerful D-CNN architecture that is able to learn discriminative features from the proposed representation for classification task. As a result, we introduce two new 3D motion representations called SPMF (Skeleton Posture-Motion Feature) and Enhanced-SPMF that encode skeleton poses and their motions into color images. For learning and classification tasks, we design and train different D-CNN architectures based on the Residual Network (ResNet), Inception-ResNet-v2, Densely Connected Convolutional Network (DenseNet) and Efficient Neural Architecture Search (ENAS) to extract robust features from color-coded images and classify them. Experimental results on various public and challenging human action recognition datasets (MSR Action3D, Kinect Activity Recognition Dataset, SBU Kinect Interaction, and NTU-RGB+D) show that the proposed approach outperforms current state-of-the-art. We also conducted research on the problem of 3D human pose estimation from monocular RGB video sequences and exploited the estimated 3D poses for recognition task. Specifically, a deep learning-based model called OpenPose is deployed to detect 2D human poses. A DNN is then proposed and trained for learning a 2D-to-3D mapping in order to map the detected 2D keypoints into 3D poses. Our experiments on the Human3.6M dataset verified the effectiveness of the proposed method. These obtained results allow opening a new research direction for human action recognition from 3D skeletal data, when the depth cameras are failing. In addition, we collect and introduce in this thesis, CEMEST database, a new RGB-D dataset depicting passengers' behaviors in public transport. It consists of 203 untrimmed real-world surveillance videos of realistic "normal" and "abnormal" events. We achieve promising results on CEMEST with the support of data augmentation and transfer learning techniques. This enables the construction of real-world applications based on deep learning for enhancing public transportation management services
Duhr, Fanny. "Voies de signalisation associées au récepteur 5-HT6 et développement neuronal." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTT042/document.
Full textBrain circuitry patterning is a complex, highly regulated process. Alteration of this process is affected gives rise to various neurodevelopmental disorders such as schizophrenia or Autism Spectrum Disorders (ASD), which are both characterized by a wide spectrum of deficits. Serotonin 6 receptor (5-HT6 receptor), which is known for its implication in neuronal migration process, has been identified as a key therapeutic target for the treatment of cognitive deficits observed in schizophrenia, but also in neurodegenerative pathologies such as Alzheimer's disease. However, the signalling mechanisms knowned to be activated by the 5-HT6 receptor do not explain its involvement in neurodevelopmental processes. My thesis project therefore aimed at characterizing the signalling pathways engaged by 5-HT6 receptor during neural development. A proteomic approach allowed me to show that the 5-HT6 receptor was interacting with several proteins playing crucial roles in neurodevelopmental processes such as Cdk5 or WAVE-1. I then demonstrated that, besides its role in neuronal migration, the 5-HT6 receptor was also involved in neurite growth through constitutive phosphorylation of 5-HT6 receptor at Ser350 by associated Cdk5, a process leading to an increase in Cdc42 activity. The second part of my work aimed at understanding the role of 5-HT6 receptor in dendritic spines morphogenesis, and the involvement of WAVE-1 and Cdk5 in this process. These results provide new insights into the control of neurodevelopemental processes by 5-HT6 receptor. Thus, 5-HT6 receptor appears to be a key therapeutic target for neurodevelopmental disorders by contributing to the development of cognitive circuitry related to the pathophysiology of ASD or schizophrenia
Martin, Pierre-Etienne. "Détection et classification fines d'actions à partir de vidéos par réseaux de neurones à convolutions spatio-temporelles : Application au tennis de table." Thesis, Bordeaux, 2020. http://www.theses.fr/2020BORD0313.
Full textAction recognition in videos is one of the key problems in visual data interpretation. Despite intensive research, differencing and recognizing similar actions remains a challenge. This thesis deals with fine-grained classification of sport gestures from videos, with an application to table tennis.In this manuscript, we propose a method based on deep learning for automatically segmenting and classifying table tennis strokes in videos. Our aim is to design a smart system for students and teachers for analyzing their performances. By profiling the players, a teacher can therefore tailor the training sessions more efficiently in order to improve their skills. Players can also have an instant feedback on their performances.For developing such a system with fine-grained classification, a very specific dataset is needed to supervise the learning process. To that aim, we built the “TTStroke-21” dataset, which is composed of 20 stroke classes plus a rejection class. The TTStroke-21 dataset comprises video clips of recorded table tennis exercises performed by students at the sport faculty of the University of Bordeaux - STAPS. These recorded sessions were annotated by professional players or teachers using a crowdsourced annotation platform. The annotations consist in a description of the handedness of the player and information for each stroke performed (starting and ending frames, class of the stroke).Fine-grained action recognition has some notable differences with coarse-grained action recognition. In general, datasets used for coarse-grained action recognition, the background context often provides discriminative information that methods can use to classify the action, rather than focusing on the action itself. In fine-grained classification, where the inter-class similarity is high, discriminative visual features are harder to extract and the motion plays a key role for characterizing an action.In this thesis, we introduce a Twin Spatio-Temporal Convolutional Neural Network. This deep learning network takes as inputs an RGB image sequence and its computed Optical Flow. The RGB image sequence allows our model to capture appearance features while the optical flow captures motion features. Those two streams are processed in parallel using 3D convolutions, and fused at the last stage of the network. Spatio-temporal features extracted in the network allow efficient classification of video clips from TTStroke-21. Our method gets an average classification performance of 87.3% with a best run of 93.2% accuracy on the test set. When applied on joint detection and classification task, the proposed method reaches an accuracy of 82.6%.A systematic study of the influence of each stream and fusion types on classification accuracy has been performed, giving clues on how to obtain the best performances. A comparison of different optical flow methods and the role of their normalization on the classification score is also done. The extracted features are also analyzed by back-tracing strong features from the last convolutional layer to understand the decision path of the trained model. Finally, we introduce an attention mechanism to help the model focusing on particular characteristic features and also to speed up the training process. For comparison purposes, we provide performances of other methods on TTStroke-21 and test our model on other datasets. We notice that models performing well on coarse-grained action datasets do not always perform well on our fine-grained action dataset.The research presented in this manuscript was validated with publications in one international journal, five international conference papers, two international workshop papers and a reconductible task in MediaEval workshop in which participants can apply their action recognition methods to TTStroke-21. Two additional international workshop papers are in process along with one book chapter
Combes, Denis. "Processus d'intégration dans un système sensori-moteur simple : mécanismes cellulaires impliqués dans le contrôle d'un réseau moteur par un neurone mécanorécepteur primaire chez le homard." Bordeaux 1, 1993. http://www.theses.fr/1993BOR10634.
Full textKosmidis, Efstratios. "Effets du bruit dans le système nerveux central : du neurone au réseau de neurones : fiabilité des neurones, rythmogenèse respiratoire, information visuelle : étude par neurobiologie numérique." Paris 6, 2002. http://www.theses.fr/2002PA066199.
Full textChauvet, Pierre. "Sur la stabilité d'un réseau de neurones hiérarchique à propos de la coordination du mouvement." Angers, 1993. http://www.theses.fr/1993ANGE0011.
Full textBoukhtache, Seyfeddine. "Système de traitement d’images temps réel dédié à la mesure de champs denses de déplacements et de déformations." Thesis, Université Clermont Auvergne (2017-2020), 2020. http://www.theses.fr/2020CLFAC054.
Full textThis PhD thesis has been carried out in a multidisciplinary context. It deals with the challenge of real-time and metrological performance in digital image processing. This is particularly interesting in photomechanics. This is a recent field of activity, which consists in developing and using systems for measuring whole fields of small displacements and small deformations of solids subjected to thermomechanical loading. The technique targeted in this PhD thesis is Digital Images Correlation (DIC), which is the most popular measuring technique in this community. However, it has some limitations, the main one being the computing resources and the metrological performance, which should be improved to reach that of classic pointwise measuring sensors such as strain gauges.In order to address this challenge, this work relies on two main studies. The first one consists in optimizing the interpolation process because this is the most expensive treatment in DIC. Acceleration is proposed by using a parallel hardware implementation on FPGA, and by taking into consideration the consumption of hardware resources as well as accuracy. The main conclusion of this study is that a single FPGA (current technology) is not sufficient to implement the entire DIC algorithm. Thus, a second study has been proposed. It is based on the use of convolutional neural networks (CNNs) in an attempt to achieve both better metrological performance than CIN and real-time processing. This second study shows the relevance of using CNNs for measuring displacement and deformation fields. It opens new perspectives in terms of metrological performance and speed of full-field measuring systems
Wauquier, Pauline. "Task driven representation learning." Thesis, Lille 3, 2017. http://www.theses.fr/2017LIL30005/document.
Full textMachine learning proposes numerous algorithms to solve the different tasks that can be extracted from real world prediction problems. To solve the different concerned tasks, most Machine learning algorithms somehow rely on relationships between instances. Pairwise instances relationships can be obtained by computing a distance between the vectorial representations of the instances. Considering the available vectorial representation of the data, none of the commonly used distances is ensured to be representative of the task that aims at being solved. In this work, we investigate the gain of tuning the vectorial representation of the data to the distance to more optimally solve the task. We more particularly focus on an existing graph-based algorithm for classification task. An algorithm to learn a mapping of the data in a representation space which allows an optimal graph-based classification is first introduced. By projecting the data in a representation space in which the predefined distance is representative of the task, we aim at outperforming the initial vectorial representation of the data when solving the task. A theoretical analysis of the introduced algorithm is performed to define the conditions ensuring an optimal classification. A set of empirical experiments allows us to evaluate the gain of the introduced approach and to temper the theoretical analysis
Abdelouahab, Kamel. "Reconfigurable hardware acceleration of CNNs on FPGA-based smart cameras." Thesis, Université Clermont Auvergne (2017-2020), 2018. http://www.theses.fr/2018CLFAC042/document.
Full textDeep Convolutional Neural Networks (CNNs) have become a de-facto standard in computer vision. This success came at the price of a high computational cost, making the implementation of CNNs, under real-time constraints, a challenging task.To address this challenge, the literature exploits the large amount of parallelism exhibited by these algorithms, motivating the use of dedicated hardware platforms. In power-constrained environments, such as smart camera nodes, FPGA-based processing cores are known to be adequate solutions in accelerating computer vision applications. This is especially true for CNN workloads, which have a streaming nature that suits well to reconfigurable hardware architectures.In this context, the following thesis addresses the problems of CNN mapping on FPGAs. In Particular, it aims at improving the efficiency of CNN implementations through two main optimization strategies; The first one focuses on the CNN model and parameters while the second one considers the hardware architecture and the fine-grain building blocks
Haj, Hassan Hawraa. "Détection et classification temps réel de biocellules anormales par technique de segmentation d’images." Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0043.
Full textDevelopment of methods for help diagnosis of the real time detection of abnormal cells (which can be considered as cancer cells) through bio-image processing and detection are most important research directions in information science and technology. Our work has been concerned by developing automatic reading procedures of the normal and abnormal bio-images tissues. Therefore, the first step of our work is to detect a certain type of abnormal bio-images associated to many types evolution of cancer within a Microscopic multispectral image, which is an image, repeated in many wavelengths. And using a new segmentation method that reforms itself in an iterative adaptive way to localize and cover the real cell contour, using some segmentation techniques. It is based on color intensity and can be applied on sequences of objects in the image. This work presents a classification of the abnormal tissues using the Convolution neural network (CNN), where it was applied on the microscopic images segmented using the snake method, which gives a high performance result with respect to the other segmentation methods. This classification method reaches high performance values, where it reaches 100% for training and 99.168% for testing. This method was compared to different papers that uses different feature extraction, and proved its high performance with respect to other methods. As a future work, we will aim to validate our approach on a larger datasets, and to explore different CNN architectures and the optimization of the hyper-parameters, in order to increase its performance, and it will be applied to relevant medical imaging tasks including computer-aided diagnosis
Guerre, Alexandre. "Champ visuel augmenté pour l'exploration vidéo de la rétine." Thesis, Brest, 2019. http://www.theses.fr/2019BRES0110.
Full textThe main objective of this thesis is toincrease the visual comfort of theophthalmologists during examinations orsurgeries. To do so, we decided toartificially increase in real time the field ofview in videos of retinal exploration. Thetools used for the acquisition of thesevideos are the slit lamp and theendoscope. The increase of the field ofview passes by the establishment ofdynamic 3D maps of the retina.To our knowledge, there is still no suchmethod in the state of the art.In order to implement our solution, westudied the different methods of motionestimations between two images. Wegrouped them into "classical" methods, onthe one hand, including methods based onSIFT or SURF algorithms. On the otherhand, we grouped deep learning methods(or "CNN" methods for ConvolutionalNeural Network).Some of these methods, such as thoseusing FlowNet networks, required groundtruth annotation of movement betweenimages.Since such bases are very difficult to set upin the medical field and do not exist inophthalmology, general databases havebeen used. In addition, we built twodatabases of artificial displacements whichbackgrounds are composed of images ofretinas. Finally, to get around this problemof annotations, a self-supervised deeplearning approach was studied.After comparing the results, it appears thatmethods using convolutional neuralnetworks outperform conventional methodsfor estimating movements in retinal videos.Moreover, only a strong supervision allowsacceptable results. In the future, we hopethat this work will enable surgeons to bemore confident and effective inenvironments where it is sometimesdifficult to find their bearings
Mayorquim, Jorge Luiz. "Étude en vue de la réalisation d'un réseau de neurones binaires logiques : détection de contours en temps réel." Compiègne, 1996. http://www.theses.fr/1996COMPD893.
Full textPalluat, Nicolas. "Méthodologie de surveillance dynamique à l'aide des réseaux neuro-flous temporels." Phd thesis, Université de Franche-Comté, 2006. http://tel.archives-ouvertes.fr/tel-00217474.
Full textA partir de l'observation de données capteurs, l'outil de détection détermine l'état du système en associant un degré de possibilité à chacun des modes de fonctionnement. A partir de ces informations, l'outil de diagnostic recherche les causes les plus probables (diagnostic abductif) pondérées par un degré de confiance. En complément et dans une optique à la décision, nous avons veillé à ce que l'opérateur puisse ajouter des informations supplémentaires. Notons que la configuration et l'initialisation des outils implique de connaître l'historique et les données de maintenance du système. Nous exploitons pour cela les AMDEC et Arbres de Défaillance des équipements surveillés. La partie applicative de cette thèse se décompose en deux points : l'intégration logicielle de l'ensemble du travail sur un ordinateur industriel (démarche UML + implémentation) ainsi que l'application sur un système de transfert flexible de production.
Caye, Daudt Rodrigo. "Convolutional neural networks for change analysis in earth observation images with noisy labels and domain shifts." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT033.
Full textThe analysis of satellite and aerial Earth observation images allows us to obtain precise information over large areas. A multitemporal analysis of such images is necessary to understand the evolution of such areas. In this thesis, convolutional neural networks are used to detect and understand changes using remote sensing images from various sources in supervised and weakly supervised settings. Siamese architectures are used to compare coregistered image pairs and to identify changed pixels. The proposed method is then extended into a multitask network architecture that is used to detect changes and perform land cover mapping simultaneously, which permits a semantic understanding of the detected changes. Then, classification filtering and a novel guided anisotropic diffusion algorithm are used to reduce the effect of biased label noise, which is a concern for automatically generated large-scale datasets. Weakly supervised learning is also achieved to perform pixel-level change detection using only image-level supervision through the usage of class activation maps and a novel spatial attention layer. Finally, a domain adaptation method based on adversarial training is proposed, which succeeds in projecting images from different domains into a common latent space where a given task can be performed. This method is tested not only for domain adaptation for change detection, but also for image classification and semantic segmentation, which proves its versatility
Chen, Dexiong. "Modélisation de données structurées avec des machines profondes à noyaux et des applications en biologie computationnelle." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM070.
Full textDeveloping efficient algorithms to learn appropriate representations of structured data, including sequences or graphs, is a major and central challenge in machine learning. To this end, deep learning has become popular in structured data modeling. Deep neural networks have drawn particular attention in various scientific fields such as computer vision, natural language understanding or biology. For instance, they provide computational tools for biologists to possibly understand and uncover biological properties or relationships among macromolecules within living organisms. However, most of the success of deep learning methods in these fields essentially relies on the guidance of empirical insights as well as huge amounts of annotated data. Exploiting more data-efficient models is necessary as labeled data is often scarce.Another line of research is kernel methods, which provide a systematic and principled approach for learning non-linear models from data of arbitrary structure. In addition to their simplicity, they exhibit a natural way to control regularization and thus to avoid overfitting.However, the data representations provided by traditional kernel methods are only defined by simply designed hand-crafted features, which makes them perform worse than neural networks when enough labeled data are available. More complex kernels inspired by prior knowledge used in neural networks have thus been developed to build richer representations and thus bridge this gap. Yet, they are less scalable. By contrast, neural networks are able to learn a compact representation for a specific learning task, which allows them to retain the expressivity of the representation while scaling to large sample size.Incorporating complementary views of kernel methods and deep neural networks to build new frameworks is therefore useful to benefit from both worlds.In this thesis, we build a general kernel-based framework for modeling structured data by leveraging prior knowledge from classical kernel methods and deep networks. Our framework provides efficient algorithmic tools for learning representations without annotations as well as for learning more compact representations in a task-driven way. Our framework can be used to efficiently model sequences and graphs with simple interpretation of predictions. It also offers new insights about designing more expressive kernels and neural networks for sequences and graphs
Tran, Ngoc Tiem. "Recherche des oscillations de neutrinos par apparition du τ avec désintégration muonique du vτ dans l'expérience OPERA." Phd thesis, Université Claude Bernard - Lyon I, 2010. http://tel.archives-ouvertes.fr/tel-00534753.
Full textChen, Yifu. "Deep learning for visual semantic segmentation." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS200.
Full textIn this thesis, we are interested in Visual Semantic Segmentation, one of the high-level task that paves the way towards complete scene understanding. Specifically, it requires a semantic understanding at the pixel level. With the success of deep learning in recent years, semantic segmentation problems are being tackled using deep architectures. In the first part, we focus on the construction of a more appropriate loss function for semantic segmentation. More precisely, we define a novel loss function by employing a semantic edge detection network. This loss imposes pixel-level predictions to be consistent with the ground truth semantic edge information, and thus leads to better shaped segmentation results. In the second part, we address another important issue, namely, alleviating the need for training segmentation models with large amounts of fully annotated data. We propose a novel attribution method that identifies the most significant regions in an image considered by classification networks. We then integrate our attribution method into a weakly supervised segmentation framework. The semantic segmentation models can thus be trained with only image-level labeled data, which can be easily collected in large quantities. All models proposed in this thesis are thoroughly experimentally evaluated on multiple datasets and the results are competitive with the literature
Estienne, Théo. "Deep learning-based methods for 3D medical image registration." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG055.
Full textThis thesis focuses on new deep learning approaches to find the best displacement between two different medical images. This research area, called image registration, have many applications in the clinical pipeline, including the fusion of different imaging types or the temporal follow-up of a patient. This field is studied for many years with various methods, such as diffeomorphic, graph-based or physical-based methods. Recently, deep learning-based methods were proposed using convolutional neural networks.These methods obtained similar results to non-deep learning methods while greatly reducing the computation time and enabling real-time prediction. This improvement comes from the use of graphics processing units (GPU) and a prediction phase where no optimisation is required. However, deep learning-based registration has several limitations, such as the need for large databases to train the network or tuning regularisation hyperparameters to prevent too noisy transformations.In this manuscript, we investigate diverse modifications to deep learning algorithms, working on various imaging types and body parts. We study first the combination of segmentation and registration tasks proposing a new joint architecture. We apply to brain MRI datasets, exploring different cases : brain without and with tumours. Our architecture comprises one encoder and two decoders and the coupling is reinforced by the introduction of a supplementary loss. In the presence of tumour, the similarity loss is modified such as the registration focus only on healthy part ignoring the tumour. Then, we shift to abdominal CT, a more challenging localisation, as there are natural organ's movement and deformation. We improve registration performances thanks to the use of pre-training and pseudo segmentations, the addition of new losses to provide a better regularisation and a multi-steps strategy. Finally, we analyse the explainability of registration networks using a linear decomposition and applying to lung and hippocampus MR. Thanks to our late fusion strategy, we project images to the latent space and calculate a new basis. This basis correspond to elementary transformation witch we study qualitatively
Al, Chami Zahi. "Estimation de la qualité des données multimedia en temps réel." Thesis, Pau, 2021. http://www.theses.fr/2021PAUU3066.
Full textOver the past decade, data providers have been generating and streaming a large amount of data, including images, videos, audio, etc. In this thesis, we will be focusing on processing images since they are the most commonly shared between the users on the global inter-network. In particular, treating images containing faces has received great attention due to its numerous applications, such as entertainment and social media apps. However, several challenges could arise during the processing and transmission phase: firstly, the enormous number of images shared and produced at a rapid pace requires a significant amount of time to be processed and delivered; secondly, images are subject to a wide range of distortions during the processing, transmission, or combination of many factors that could damage the images’content. Two main contributions are developed. First, we introduce a Full-Reference Image Quality Assessment Framework in Real-Time, capable of:1) preserving the images’content by ensuring that some useful visual information can still be extracted from the output, and 2) providing a way to process the images in real-time in order to cope with the huge amount of images that are being received at a rapid pace. The framework described here is limited to processing those images that have access to their reference version (a.k.a Full-Reference). Secondly, we present a No-Reference Image Quality Assessment Framework in Real-Time. It has the following abilities: a) assessing the distorted image without having its distortion-free image, b) preserving the most useful visual information in the images before publishing, and c) processing the images in real-time, even though the No-Reference image quality assessment models are considered very complex. Our framework offers several advantages over the existing approaches, in particular: i. it locates the distortion in an image in order to directly assess the distorted parts instead of processing the whole image, ii. it has an acceptable trade-off between quality prediction accuracy and execution latency, andiii. it could be used in several applications, especially these that work in real-time. The architecture of each framework is presented in the chapters while detailing the modules and components of the framework. Then, a number of simulations are made to show the effectiveness of our approaches to solve our challenges in relation to the existing approaches
Etienne, Caroline. "Apprentissage profond appliqué à la reconnaissance des émotions dans la voix." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS517.
Full textThis thesis deals with the application of artificial intelligence to the automatic classification of audio sequences according to the emotional state of the customer during a commercial phone call. The goal is to improve on existing data preprocessing and machine learning models, and to suggest a model that is as efficient as possible on the reference IEMOCAP audio dataset. We draw from previous work on deep neural networks for automatic speech recognition, and extend it to the speech emotion recognition task. We are therefore interested in End-to-End neural architectures to perform the classification task including an autonomous extraction of acoustic features from the audio signal. Traditionally, the audio signal is preprocessed using paralinguistic features, as part of an expert approach. We choose a naive approach for data preprocessing that does not rely on specialized paralinguistic knowledge, and compare it with the expert approach. In this approach, the raw audio signal is transformed into a time-frequency spectrogram by using a short-term Fourier transform. In order to apply a neural network to a prediction task, a number of aspects need to be considered. On the one hand, the best possible hyperparameters must be identified. On the other hand, biases present in the database should be minimized (non-discrimination), for example by adding data and taking into account the characteristics of the chosen dataset. We study these aspects in order to develop an End-to-End neural architecture that combines convolutional layers specialized in the modeling of visual information with recurrent layers specialized in the modeling of temporal information. We propose a deep supervised learning model, competitive with the current state-of-the-art when trained on the IEMOCAP dataset, justifying its use for the rest of the experiments. This classification model consists of a four-layer convolutional neural networks and a bidirectional long short-term memory recurrent neural network (BLSTM). Our model is evaluated on two English audio databases proposed by the scientific community: IEMOCAP and MSP-IMPROV. A first contribution is to show that, with a deep neural network, we obtain high performances on IEMOCAP, and that the results are promising on MSP-IMPROV. Another contribution of this thesis is a comparative study of the output values of the layers of the convolutional module and the recurrent module according to the data preprocessing method used: spectrograms (naive approach) or paralinguistic indices (expert approach). We analyze the data according to their emotion class using the Euclidean distance, a deterministic proximity measure. We try to understand the characteristics of the emotional information extracted autonomously by the network. The idea is to contribute to research focused on the understanding of deep neural networks used in speech emotion recognition and to bring more transparency and explainability to these systems, whose decision-making mechanism is still largely misunderstood
Boutin, Victor. "Etude d’un algorithme hiérarchique de codage épars et prédictif : vers un modèle bio-inspiré de la perception visuelle." Thesis, Aix-Marseille, 2020. http://www.theses.fr/2020AIXM0028.
Full textBuilding models to efficiently represent images is a central and difficult problem in the machine learning community. The neuroscientific study of the early visual cortical areas is a great source of inspiration to find economical and robust solutions. For instance, Sparse Coding (SC) is one of the most successful frameworks to model neural computation at the local scale in the visual cortex. At the structural scale of the ventral visual pathways, the Predictive Coding (PC) theory has been proposed to model top-down and bottom-up interaction between cortical regions. The presented thesis introduces a model called the Sparse Deep Predictive Coding (SDPC) that combines Sparse Coding and Predictive Coding in a hierarchical and convolutional architecture. We analyze the SPDC from a computational and a biological perspective. In terms of computation, the recurrent connectivity introduced by the PC framework allows the SDPC to converge to lower prediction errors with a higher convergence rate. In addition, we combine neuroscientific evidence with machine learning methods to analyze the impact of recurrent processing at both the neural organization and representational level. At the neural organization level, the feedback signal of the model accounted for a reorganization of the V1 association fields that promotes contour integration. At the representational level, the SDPC exhibited significant denoising ability which is highly correlated with the strength of the feedback from V2 to V1. These results from the SDPC model demonstrate that neuro-inspiration might be the right methodology to design more powerful and more robust computer vision algorithms
Li, Xuhong. "Regularization schemes for transfer learning with convolutional networks." Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2497/document.
Full textTransfer learning with deep convolutional neural networks significantly reduces the computation and data overhead of the training process and boosts the performance on the target task, compared to training from scratch. However, transfer learning with a deep network may cause the model to forget the knowledge acquired when learning the source task, leading to the so-called catastrophic forgetting. Since the efficiency of transfer learning derives from the knowledge acquired on the source task, this knowledge should be preserved during transfer. This thesis solves this problem of forgetting by proposing two regularization schemes that preserve the knowledge during transfer. First we investigate several forms of parameter regularization, all of which explicitly promote the similarity of the final solution with the initial model, based on the L1, L2, and Group-Lasso penalties. We also propose the variants that use Fisher information as a metric for measuring the importance of parameters. We validate these parameter regularization approaches on various tasks. The second regularization scheme is based on the theory of optimal transport, which enables to estimate the dissimilarity between two distributions. We benefit from optimal transport to penalize the deviations of high-level representations between the source and target task, with the same objective of preserving knowledge during transfer learning. With a mild increase in computation time during training, this novel regularization approach improves the performance of the target tasks, and yields higher accuracy on image classification tasks compared to parameter regularization approaches
Papadopoulos, Georgios. "Towards a 3D building reconstruction using spatial multisource data and computational intelligence techniques." Thesis, Limoges, 2019. http://www.theses.fr/2019LIMO0084/document.
Full textBuilding reconstruction from aerial photographs and other multi-source urban spatial data is a task endeavored using a plethora of automated and semi-automated methods ranging from point processes, classic image processing and laser scanning. In this thesis, an iterative relaxation system is developed based on the examination of the local context of each edge according to multiple spatial input sources (optical, elevation, shadow & foliage masks as well as other pre-processed data as elaborated in Chapter 6). All these multisource and multiresolution data are fused so that probable line segments or edges are extracted that correspond to prominent building boundaries.Two novel sub-systems have also been developed in this thesis. They were designed with the purpose to provide additional, more reliable, information regarding building contours in a future version of the proposed relaxation system. The first is a deep convolutional neural network (CNN) method for the detection of building borders. In particular, the network is based on the state of the art super-resolution model SRCNN (Dong C. L., 2015). It accepts aerial photographs depicting densely populated urban area data as well as their corresponding digital elevation maps (DEM). Training is performed using three variations of this urban data set and aims at detecting building contours through a novel super-resolved heteroassociative mapping. Another innovation of this approach is the design of a modified custom loss layer named Top-N. In this variation, the mean square error (MSE) between the reconstructed output image and the provided ground truth (GT) image of building contours is computed on the 2N image pixels with highest values . Assuming that most of the N contour pixels of the GT image are also in the top 2N pixels of the re-construction, this modification balances the two pixel categories and improves the generalization behavior of the CNN model. It is shown in the experiments, that the Top-N cost function offers performance gains in comparison to standard MSE. Further improvement in generalization ability of the network is achieved by using dropout.The second sub-system is a super-resolution deep convolutional network, which performs an enhanced-input associative mapping between input low-resolution and high-resolution images. This network has been trained with low-resolution elevation data and the corresponding high-resolution optical urban photographs. Such a resolution discrepancy between optical aerial/satellite images and elevation data is often the case in real world applications. More specifically, low-resolution elevation data augmented by high-resolution optical aerial photographs are used with the aim of augmenting the resolution of the elevation data. This is a unique super-resolution problem where it was found that many of -the proposed general-image SR propositions do not perform as well. The network aptly named building super resolution CNN (BSRCNN) is trained using patches extracted from the aforementioned data. Results show that in comparison with a classic bicubic upscale of the elevation data the proposed implementation offers important improvement as attested by a modified PSNR and SSIM metric. In comparison, other proposed general-image SR methods performed poorer than a standard bicubic up-scaler.Finally, the relaxation system fuses together all these multisource data sources comprising of pre-processed optical data, elevation data, foliage masks, shadow masks and other pre-processed data in an attempt to assign confidence values to each pixel belonging to a building contour. Confidence is augmented or decremented iteratively until the MSE error fails below a specified threshold or a maximum number of iterations have been executed. The confidence matrix can then be used to extract the true building contours via thresholding