Log in

Relevant bibliographies by topics / Unsupervised deep neural networks / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Unsupervised deep neural networks.

Dissertations / Theses on the topic 'Unsupervised deep neural networks'

Author: Grafiati

Published: 25 May 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Unsupervised deep neural networks.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Donati, Lorenzo. "Domain Adaptation through Deep Neural Networks for Health Informatics." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017. http://amslaurea.unibo.it/14888/.

Full text

Abstract:

The PreventIT project is an EU Horizon 2020 project aimed at preventing early functional decline at younger old age. The analysis of causal links between risk factors and functional decline has been made possible by the cooperation of several research institutes' studies. However, since each research institute collects and delivers different kinds of data in different formats, so far the analysis has been assisted by expert geriatricians whose role is to detect the best candidates among hundreds of fields and offer a semantic interpretation of the values. This manual data harmonization approach is very common in both scientific and industrial environments. In this thesis project an alternative method for parsing heterogeneous data is proposed. Since all the datasets represent semantically related data, being all made from longitudinal studies on aging-related metrics, it is possible to train an artificial neural network to perform an automatic domain adaptation. To achieve this goal, a Stacked Denoising Autoencoder has been implemented and trained to extract a domain-invariant representation of the data. Then, from this high-level representation, multiple classifiers have been trained to validate the model and ultimately to predict the probability of functional decline of the patient. This innovative approach to the domain adaptation process can provide an easy and fast solution to many research fields that now rely on human interaction to analyze the semantic data model and perform cross-dataset analysis. Functional decline classifiers show a great improvement in their performance when trained on the domain-invariant features extracted by the Stacked Denoising Autoencoder. Furthermore, this project applies multiple deep neural network classifiers on top of the Stacked Denoising Autoencoder representation, achieving excellent results for the prediction of functional decline in a real case study that involves two different datasets.

APA, Harvard, Vancouver, ISO, and other styles

2

Ahn, Euijoon. "Unsupervised Deep Feature Learning for Medical Image Analysis." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/23002.

Full text

Abstract:

The availability of annotated image datasets and recent advances in supervised deep learning methods are enabling the end-to-end derivation of representative image features that can impact a variety of image analysis problems. These supervised methods use prior knowledge derived from labelled training data and approaches, for example, convolutional neural networks (CNNs) have produced impressive results in natural (photographic) image classification. CNNs learn image features in a hierarchical fashion. Each deeper layer of the network learns a representation of the image data that is higher level and semantically more meaningful. However, the accuracy and robustness of image features with supervised CNNs are dependent on the availability of large-scale labelled training data. In medical imaging, these large labelled datasets are scarce mainly due to the complexity of manual annotation and inter- and intra-observer variability in label assignment. The concept of ‘transfer learning’ – the adoption of image features from different domains, e.g., image features learned from natural photographic images – was introduced to address the lack of large amounts of labelled medical image data. These image features, however, are often generic and do not perform well in specific medical image analysis problems. An alternative approach was to optimise these features by retraining the generic features using a relatively small set of labelled medical images. This ‘fine-tuning’ approach, however, is not able to match the overall accuracy of learning image features directly from large collections of data that are specifically related to the problem at hand. An alternative approach is to use unsupervised feature learning algorithms to build features from unlabelled data, which then allows unannotated image archives to be used. Many unsupervised feature learning algorithms such as sparse coding (SC), auto-encoder (AE) and Restricted Boltzmann Machines (RBMs), however, have often been limited to learning low-level features such as lines and edges. In an attempt to address these limitations, in this thesis, we present several new unsupervised deep learning methods to learn semantic high-level features from unlabelled medical images to address the challenge of learning representative visual features in medical image analysis. We present two methods to derive non-linear and non-parametric models, which are crucial to unsupervised feature learning algorithms; one method embeds a kernel learning within CNNs while the other couples clustering with CNNs. We then further improved the quality of image features using domain adaptation methods (DAs) that learn representations that are invariant to domains with different data distributions. We present a deep unsupervised feature extractor to transform the feature maps from the pre-trained CNN on natural images to a set of non-redundant and relevant medical image features. Our feature extractor preserves meaningful generic features from the pre-trained domain and learns specific local features that are more representative of the medical image data. We conducted extensive experiments on 4 public datasets which have diverse visual characteristics of medical images including X-ray, dermoscopic and CT images. Our results show that our methods had better accuracy when compared to other conventional unsupervised methods and competitive accuracy to methods that used state-of-the-art supervised CNNs. Our findings suggest that our methods could scale to many different transfer learning or domain adaptation approaches where they have none or small sets of labelled data.

APA, Harvard, Vancouver, ISO, and other styles

3

Cherti, Mehdi. "Deep generative neural networks for novelty generation : a foundational framework, metrics and experiments." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS029/document.

Full text

Abstract:

Des avancées significatives sur les réseaux de neurones profonds ont récemment permis le développement de technologies importantes comme les voitures autonomes et les assistants personnels intelligents basés sur la commande vocale. La plupart des succès en apprentissage profond concernent la prédiction, alors que les percées initiales viennent des modèles génératifs. Actuellement, même s'il existe des outils puissants dans la littérature des modèles génératifs basés sur les réseaux profonds, ces techniques sont essentiellement utilisées pour la prédiction ou pour générer des objets connus (i.e., des images de haute qualité qui appartiennent à des classes connues) : un objet généré qui est à priori inconnu est considéré comme une erreur (Salimans et al., 2016) ou comme un objet fallacieux (Bengio et al., 2013b). En d'autres termes, quand la prédiction est considérée comme le seul objectif possible, la nouveauté est vue comme une erreur - que les chercheurs ont essayé d'éliminer au maximum. Cette thèse défends le point de vue que, plutôt que d'éliminer ces nouveautés, on devrait les étudier et étudier le potentiel génératif des réseaux neuronaux pour créer de la nouveauté utile - particulièrement sachant l'importance économique et sociétale de la création d'objets nouveaux dans les sociétés contemporaines. Cette thèse a pour objectif d'étudier la génération de la nouveauté et sa relation avec les modèles de connaissance produits par les réseaux neurones profonds génératifs. Notre première contribution est la démonstration de l'importance des représentations et leur impact sur le type de nouveautés qui peuvent être générées : une conséquence clé est qu'un agent créatif a besoin de re-représenter les objets connus et utiliser cette représentation pour générer des objets nouveaux. Ensuite, on démontre que les fonctions objectives traditionnelles utilisées dans la théorie de l'apprentissage statistique, comme le maximum de vraisemblance, ne sont pas nécessairement les plus adaptées pour étudier la génération de nouveauté. On propose plusieurs alternatives à un niveau conceptuel. Un deuxième résultat clé est la confirmation que les modèles actuels - qui utilisent les fonctions objectives traditionnelles - peuvent en effet générer des objets inconnus. Cela montre que même si les fonctions objectives comme le maximum de vraisemblance s'efforcent à éliminer la nouveauté, les implémentations en pratique échouent à le faire. A travers une série d'expérimentations, on étudie le comportement de ces modèles ainsi que les objets qu'ils génèrent. En particulier, on propose une nouvelle tâche et des métriques pour la sélection de bons modèles génératifs pour la génération de la nouveauté. Finalement, la thèse conclue avec une série d'expérimentations qui clarifie les caractéristiques des modèles qui génèrent de la nouveauté. Les expériences montrent que la sparsité, le niveaux du niveau de corruption et la restriction de la capacité des modèles tuent la nouveauté et que les modèles qui arrivent à reconnaître des objets nouveaux arrivent généralement aussi à générer de la nouveauté
In recent years, significant advances made in deep neural networks enabled the creation of groundbreaking technologies such as self-driving cars and voice-enabled personal assistants. Almost all successes of deep neural networks are about prediction, whereas the initial breakthroughs came from generative models. Today, although we have very powerful deep generative modeling techniques, these techniques are essentially being used for prediction or for generating known objects (i.e., good quality images of known classes): any generated object that is a priori unknown is considered as a failure mode (Salimans et al., 2016) or as spurious (Bengio et al., 2013b). In other words, when prediction seems to be the only possible objective, novelty is seen as an error that researchers have been trying hard to eliminate. This thesis defends the point of view that, instead of trying to eliminate these novelties, we should study them and the generative potential of deep nets to create useful novelty, especially given the economic and societal importance of creating new objects in contemporary societies. The thesis sets out to study novelty generation in relationship with data-driven knowledge models produced by deep generative neural networks. Our first key contribution is the clarification of the importance of representations and their impact on the kind of novelties that can be generated: a key consequence is that a creative agent might need to rerepresent known objects to access various kinds of novelty. We then demonstrate that traditional objective functions of statistical learning theory, such as maximum likelihood, are not necessarily the best theoretical framework for studying novelty generation. We propose several other alternatives at the conceptual level. A second key result is the confirmation that current models, with traditional objective functions, can indeed generate unknown objects. This also shows that even though objectives like maximum likelihood are designed to eliminate novelty, practical implementations do generate novelty. Through a series of experiments, we study the behavior of these models and the novelty they generate. In particular, we propose a new task setup and metrics for selecting good generative models. Finally, the thesis concludes with a series of experiments clarifying the characteristics of models that can exhibit novelty. Experiments show that sparsity, noise level, and restricting the capacity of the net eliminates novelty and that models that are better at recognizing novelty are also good at generating novelty

APA, Harvard, Vancouver, ISO, and other styles

4

Kilinc, Ismail Ozsel. "Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings." Scholar Commons, 2017. https://scholarcommons.usf.edu/etd/7415.

Full text

Abstract:

Machine learning has been immensely successful in supervised learning with outstanding examples in major industrial applications such as voice and image recognition. Following these developments, the most recent research has now begun to focus primarily on algorithms which can exploit very large sets of unlabeled examples to reduce the amount of manually labeled data required for existing models to perform well. In this dissertation, we propose graph-based latent embedding/annotation/representation learning techniques in neural networks tailored for semi-supervised and unsupervised learning problems. Specifically, we propose a novel regularization technique called Graph-based Activity Regularization (GAR) and a novel output layer modification called Auto-clustering Output Layer (ACOL) which can be used separately or collaboratively to develop scalable and efficient learning frameworks for semi-supervised and unsupervised settings. First, singularly using the GAR technique, we develop a framework providing an effective and scalable graph-based solution for semi-supervised settings in which there exists a large number of observations but a small subset with ground-truth labels. The proposed approach is natural for the classification framework on neural networks as it requires no additional task calculating the reconstruction error (as in autoencoder based methods) or implementing zero-sum game mechanism (as in adversarial training based methods). We demonstrate that GAR effectively and accurately propagates the available labels to unlabeled examples. Our results show comparable performance with state-of-the-art generative approaches for this setting using an easier-to-train framework. Second, we explore a different type of semi-supervised setting where a coarse level of labeling is available for all the observations but the model has to learn a fine, deeper level of latent annotations for each one. Problems in this setting are likely to be encountered in many domains such as text categorization, protein function prediction, image classification as well as in exploratory scientific studies such as medical and genomics research. We consider this setting as simultaneously performed supervised classification (per the available coarse labels) and unsupervised clustering (within each one of the coarse labels) and propose a novel framework combining GAR with ACOL, which enables the network to perform concurrent classification and clustering. We demonstrate how the coarse label supervision impacts performance and the classification task actually helps propagate useful clustering information between sub-classes. Comparative tests on the most popular image datasets rigorously demonstrate the effectiveness and competitiveness of the proposed approach. The third and final setup builds on the prior framework to unlock fully unsupervised learning where we propose to substitute real, yet unavailable, parent- class information with pseudo class labels. In this novel unsupervised clustering approach the network can exploit hidden information indirectly introduced through a pseudo classification objective. We train an ACOL network through this pseudo supervision together with unsupervised objective based on GAR and ultimately obtain a k-means friendly latent representation. Furthermore, we demonstrate how the chosen transformation type impacts performance and helps propagate the latent information that is useful in revealing unknown clusters. Our results show state-of-the-art performance for unsupervised clustering tasks on MNIST, SVHN and USPS datasets with the highest accuracies reported to date in the literature.

APA, Harvard, Vancouver, ISO, and other styles

5

McClintick, Kyle W. "Training Data Generation Framework For Machine-Learning Based Classifiers." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/1276.

Full text

Abstract:

In this thesis, we propose a new framework for the generation of training data for machine learning techniques used for classification in communications applications. Machine learning-based signal classifiers do not generalize well when training data does not describe the underlying probability distribution of real signals. The simplest way to accomplish statistical similarity between training and testing data is to synthesize training data passed through a permutation of plausible forms of noise. To accomplish this, a framework is proposed that implements arbitrary channel conditions and baseband signals. A dataset generated using the framework is considered, and is shown to be appropriately sized by having $11\%$ lower entropy than state-of-the-art datasets. Furthermore, unsupervised domain adaptation can allow for powerful generalized training via deep feature transforms on unlabeled evaluation-time signals. A novel Deep Reconstruction-Classification Network (DRCN) application is introduced, which attempts to maintain near-peak signal classification accuracy despite dataset bias, or perturbations on testing data unforeseen in training. Together, feature transforms and diverse training data generated from the proposed framework, teaching a range of plausible noise, can train a deep neural net to classify signals well in many real-world scenarios despite unforeseen perturbations.

APA, Harvard, Vancouver, ISO, and other styles

6

Boschini, Matteo. "Unsupervised Learning of Scene Flow." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16226/.

Full text

Abstract:

As Computer Vision-powered autonomous systems are increasingly deployed to solve problems in the wild, the case is made for developing visual understanding methods that are robust and flexible. One of the most challenging tasks for this purpose is given by the extraction of scene flow, that is the dense three-dimensional vector field that associates each world point with its corresponding position in the next observed frame, hence describing its three-dimensional motion entirely. The recent addition of a limited amount of ground truth scene flow information to the popular KITTI dataset prompted a renewed interest in the study of techniques for scene flow inference, although the proposed solutions in literature mostly rely on computation-intensive techniques and are characterised by execution times that are not suited for real-time application. In the wake of the recent widespread adoption of Deep Learning techniques to Computer Vision tasks and in light of the convenience of Unsupervised Learning for scenarios in which ground truth collection is difficult and time-consuming, this thesis work proposes the first neural network architecture to be trained in end-to-end fashion for unsupervised scene flow regression from monocular visual data, called Pantaflow. The proposed solution is much faster than currently available state-of-the-art methods and therefore represents a step towards the achievement of real-time scene flow inference.

APA, Harvard, Vancouver, ISO, and other styles

7

Kalinicheva, Ekaterina. "Unsupervised satellite image time series analysis using deep learning techniques." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS335.

Full text

Abstract:

Cette thèse présente un ensemble d'algorithmes non-supervisés pour l'analyse générique de séries temporelles d'images satellites (STIS). Nos algorithmes exploitent des méthodes de machine learning et, notamment, les réseaux de neurones afin de détecter les différentes entités spatio-temporelles et leurs changements éventuels dans le temps. Nous visons à identifier trois types de comportement temporel : les zones sans changements, les changements saisonniers, les changements non triviaux (changements permanents comme les constructions, la rotation des cultures agricoles, etc).Par conséquent, nous proposons deux frameworks : pour la détection et le clustering des changements non-triviaux et pour le clustering des changements saisonniers et des zones sans changements. Le premier framework est composé de deux étapes : la détection de changements bi-temporels et leur interprétation dans le contexte multi-temporel avec une approche basée graphes. La détection de changements bi-temporels est faite pour chaque couple d’images consécutives et basée sur la transformation des features avec les autoencodeurs (AEs). A l’étape suivante, les changements à différentes dates qui appartiennent à la même zone géographique forment les graphes d’évolution qui sont par la suite clusterisés avec un modèle AE de réseaux de neurones récurrents. Le deuxième framework présente le clustering basé objets de STIS. Premièrement, la STIS est encodée en image unique avec un AE convolutif 3D multi-vue. Dans un deuxième temps, nous faisons la segmentation en deux étapes en utilisant à la fois l’image encodée et la STIS. Finalement, les segments obtenus sont clusterisés avec leurs descripteurs encodés
This thesis presents a set of unsupervised algorithms for satellite image time series (SITS) analysis. Our methods exploit machine learning algorithms and, in particular, neural networks to detect different spatio-temporal entities and their eventual changes in the time.In our thesis, we aim to identify three different types of temporal behavior: no change areas, seasonal changes (vegetation and other phenomena that have seasonal recurrence) and non-trivial changes (permanent changes such as constructions or demolishment, crop rotation, etc). Therefore, we propose two frameworks: one for detection and clustering of non-trivial changes and another for clustering of “stable” areas (seasonal changes and no change areas). The first framework is composed of two steps which are bi-temporal change detection and the interpretation of detected changes in a multi-temporal context with graph-based approaches. The bi-temporal change detection is performed for each pair of consecutive images of the SITS and is based on feature translation with autoencoders (AEs). At the next step, the changes from different timestamps that belong to the same geographic area form evolution change graphs. The graphs are then clustered using a recurrent neural networks AE model to identify different types of change behavior. For the second framework, we propose an approach for object-based SITS clustering. First, we encode SITS with a multi-view 3D convolutional AE in a single image. Second, we perform a two steps SITS segmentation using the encoded SITS and original images. Finally, the obtained segments are clustered exploiting their encoded descriptors

APA, Harvard, Vancouver, ISO, and other styles

8

Yuan, Xiao. "Graph neural networks for spatial gene expression analysis of the developing human heart." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-427330.

Full text

Abstract:

Single-cell RNA sequencing and in situ sequencing were combined in a recent study of the developing human heart to explore the transcriptional landscape at three developmental stages. However, the method used in the study to create the spatial cellular maps has some limitations. It relies on image segmentation of the nuclei and cell types defined in advance by single-cell sequencing. In this study, we applied a new unsupervised approach based on graph neural networks on the in situ sequencing data of the human heart to find spatial gene expression patterns and detect novel cell and sub-cell types. In this thesis, we first introduce some relevant background knowledge about the sequencing techniques that generate our data, machine learning in single-cell analysis, and deep learning on graphs. We have explored several graph neural network models and algorithms to learn embeddings for spatial gene expression. Dimensionality reduction and cluster analysis were performed on the embeddings for visualization and identification of biologically functional domains. Based on the cluster gene expression profiles, locations of the clusters in the heart sections, and comparison with cell types defined in the previous study, the results of our experiments demonstrate that graph neural networks can learn meaningful representations of spatial gene expression in the human heart. We hope further validations of our clustering results could give new insights into cell development and differentiation processes of the human heart.

APA, Harvard, Vancouver, ISO, and other styles

9

VENTURA, FRANCESCO. "Explaining black-box deep neural models' predictions, behaviors, and performances through the unsupervised mining of their inner knowledge." Doctoral thesis, Politecnico di Torino, 2021. http://hdl.handle.net/11583/2912972.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Li, Yingzhen. "Approximate inference : new visions." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/277549.

Full text

Abstract:

Nowadays machine learning (especially deep learning) techniques are being incorporated to many intelligent systems affecting the quality of human life. The ultimate purpose of these systems is to perform automated decision making, and in order to achieve this, predictive systems need to return estimates of their confidence. Powered by the rules of probability, Bayesian inference is the gold standard method to perform coherent reasoning under uncertainty. It is generally believed that intelligent systems following the Bayesian approach can better incorporate uncertainty information for reliable decision making, and be less vulnerable to attacks such as data poisoning. Critically, the success of Bayesian methods in practice, including the recent resurgence of Bayesian deep learning, relies on fast and accurate approximate Bayesian inference applied to probabilistic models. These approximate inference methods perform (approximate) Bayesian reasoning at a relatively low cost in terms of time and memory, thus allowing the principles of Bayesian modelling to be applied to many practical settings. However, more work needs to be done to scale approximate Bayesian inference methods to big systems such as deep neural networks and large-scale dataset such as ImageNet. In this thesis we develop new algorithms towards addressing the open challenges in approximate inference. In the first part of the thesis we develop two new approximate inference algorithms, by drawing inspiration from the well known expectation propagation and message passing algorithms. Both approaches provide a unifying view of existing variational methods from different algorithmic perspectives. We also demonstrate that they lead to better calibrated inference results for complex models such as neural network classifiers and deep generative models, and scale to large datasets containing hundreds of thousands of data-points. In the second theme of the thesis we propose a new research direction for approximate inference: developing algorithms for fitting posterior approximations of arbitrary form, by rethinking the fundamental principles of Bayesian computation and the necessity of algorithmic constraints in traditional inference schemes. We specify four algorithmic options for the development of such new generation approximate inference methods, with one of them further investigated and applied to Bayesian deep learning tasks.

APA, Harvard, Vancouver, ISO, and other styles

11

Oquab, Maxime. "Convolutional neural networks : towards less supervision for visual recognition." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE061.

Full text

Abstract:

Les réseaux de neurones à convolution sont des algorithmes d’apprentissage flexibles qui tirent efficacement parti des importantes masses de données qui leur sont fournies pour l’entraînement. Malgré leur utilisation dans des applications industrielles dès les années 90, ces algorithmes n’ont pas été utilisés pour la reconnaissance d’image à cause de leurs faibles performances avec les images naturelles. C’est finalement grâce a l’apparition d’importantes quantités de données et de puissance de calcul que ces algorithmes ont pu révéler leur réel potentiel lors de la compétition ImageNet, menant à un changement de paradigme en reconnaissance d’image. La première contribution de cette thèse est une méthode de transfert d’apprentissage dans les réseaux à convolution pour la classification d’image. À l’aide d’une procédure de pré-entraînement, nous montrons que les représentations internes d’un réseau à convolution sont assez générales pour être utilisées sur d’autres tâches, et meilleures lorsque le pré-entraînement est réalisé avec plus de données. La deuxième contribution de cette thèse est un système faiblement supervisé pour la classification d’images, pouvant prédire la localisation des objets dans des scènes complexes, en utilisant, lors de l’entraînement, seulement l’indication de la présence ou l’absence des objets dans les images. La troisième contribution de cette thèse est une recherche de pistes de progression en apprentissage non-supervisé. Nous étudions l’algorithme récent des réseaux génératifs adversariaux et proposons l’utilisation d’un test statistique pour l’évaluation de ces modèles. Nous étudions ensuite les liens avec le problème de la causalité, et proposons un test statistique pour la découverte causale. Finalement, grâce a un lien établi récemment avec les problèmes de transport optimal, nous étudions ce que ces réseaux apprennent des données dans le cas non-supervisé
Convolutional Neural Networks are flexible learning algorithms for computer vision that scale particularly well with the amount of data that is provided for training them. Although these methods had successful applications already in the ’90s, they were not used in visual recognition pipelines because of their lesser performance on realistic natural images. It is only after the amount of data and the computational power both reached a critical point that these algorithms revealed their potential during the ImageNet challenge of 2012, leading to a paradigm shift in visual recogntion. The first contribution of this thesis is a transfer learning setup with a Convolutional Neural Network for image classification. Using a pre-training procedure, we show that image representations learned in a network generalize to other recognition tasks, and their performance scales up with the amount of data used in pre-training. The second contribution of this thesis is a weakly supervised setup for image classification that can predict the location of objects in complex cluttered scenes, based on a dataset indicating only with the presence or absence of objects in training images. The third contribution of this thesis aims at finding possible paths for progress in unsupervised learning with neural networks. We study the recent trend of Generative Adversarial Networks and propose two-sample tests for evaluating models. We investigate possible links with concepts related to causality, and propose a two-sample test method for the task of causal discovery. Finally, building on a recent connection with optimal transport, we investigate what these generative algorithms are learning from unlabeled data

APA, Harvard, Vancouver, ISO, and other styles

12

Ackerman, Wesley. "Semantic-Driven Unsupervised Image-to-Image Translation for Distinct Image Domains." BYU ScholarsArchive, 2020. https://scholarsarchive.byu.edu/etd/8684.

Full text

Abstract:

We expand the scope of image-to-image translation to include more distinct image domains, where the image sets have analogous structures, but may not share object types between them. Semantic-Driven Unsupervised Image-to-Image Translation for Distinct Image Domains (SUNIT) is built to more successfully translate images in this setting, where content from one domain is not found in the other. Our method trains an image translation model by learning encodings for semantic segmentations of images. These segmentations are translated between image domains to learn meaningful mappings between the structures in the two domains. The translated segmentations are then used as the basis for image generation. Beginning image generation with encoded segmentation information helps maintain the original structure of the image. We qualitatively and quantitatively show that SUNIT improves image translation outcomes, especially for image translation tasks where the image domains are very distinct.

APA, Harvard, Vancouver, ISO, and other styles

13

Belharbi, Soufiane. "Neural networks regularization through representation learning." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMIR10/document.

Full text

Abstract:

Les modèles de réseaux de neurones et en particulier les modèles profonds sont aujourd'hui l'un des modèles à l'état de l'art en apprentissage automatique et ses applications. Les réseaux de neurones profonds récents possèdent de nombreuses couches cachées ce qui augmente significativement le nombre total de paramètres. L'apprentissage de ce genre de modèles nécessite donc un grand nombre d'exemples étiquetés, qui ne sont pas toujours disponibles en pratique. Le sur-apprentissage est un des problèmes fondamentaux des réseaux de neurones, qui se produit lorsque le modèle apprend par coeur les données d'apprentissage, menant à des difficultés à généraliser sur de nouvelles données. Le problème du sur-apprentissage des réseaux de neurones est le thème principal abordé dans cette thèse. Dans la littérature, plusieurs solutions ont été proposées pour remédier à ce problème, tels que l'augmentation de données, l'arrêt prématuré de l'apprentissage ("early stopping"), ou encore des techniques plus spécifiques aux réseaux de neurones comme le "dropout" ou la "batch normalization". Dans cette thèse, nous abordons le sur-apprentissage des réseaux de neurones profonds sous l'angle de l'apprentissage de représentations, en considérant l'apprentissage avec peu de données. Pour aboutir à cet objectif, nous avons proposé trois différentes contributions. La première contribution, présentée dans le chapitre 2, concerne les problèmes à sorties structurées dans lesquels les variables de sortie sont à grande dimension et sont généralement liées par des relations structurelles. Notre proposition vise à exploiter ces relations structurelles en les apprenant de manière non-supervisée avec des autoencodeurs. Nous avons validé notre approche sur un problème de régression multiple appliquée à la détection de points d'intérêt dans des images de visages. Notre approche a montré une accélération de l'apprentissage des réseaux et une amélioration de leur généralisation. La deuxième contribution, présentée dans le chapitre 3, exploite la connaissance a priori sur les représentations à l'intérieur des couches cachées dans le cadre d'une tâche de classification. Cet à priori est basé sur la simple idée que les exemples d'une même classe doivent avoir la même représentation interne. Nous avons formalisé cet à priori sous la forme d'une pénalité que nous avons rajoutée à la fonction de perte. Des expérimentations empiriques sur la base MNIST et ses variantes ont montré des améliorations dans la généralisation des réseaux de neurones, particulièrement dans le cas où peu de données d'apprentissage sont utilisées. Notre troisième et dernière contribution, présentée dans le chapitre 4, montre l'intérêt du transfert d'apprentissage ("transfer learning") dans des applications dans lesquelles peu de données d'apprentissage sont disponibles. L'idée principale consiste à pré-apprendre les filtres d'un réseau à convolution sur une tâche source avec une grande base de données (ImageNet par exemple), pour les insérer par la suite dans un nouveau réseau sur la tâche cible. Dans le cadre d'une collaboration avec le centre de lutte contre le cancer "Henri Becquerel de Rouen", nous avons construit un système automatique basé sur ce type de transfert d'apprentissage pour une application médicale où l'on dispose d’un faible jeu de données étiquetées. Dans cette application, la tâche consiste à localiser la troisième vertèbre lombaire dans un examen de type scanner. L’utilisation du transfert d’apprentissage ainsi que de prétraitements et de post traitements adaptés a permis d’obtenir des bons résultats, autorisant la mise en oeuvre du modèle en routine clinique
Neural network models and deep models are one of the leading and state of the art models in machine learning. They have been applied in many different domains. Most successful deep neural models are the ones with many layers which highly increases their number of parameters. Training such models requires a large number of training samples which is not always available. One of the fundamental issues in neural networks is overfitting which is the issue tackled in this thesis. Such problem often occurs when the training of large models is performed using few training samples. Many approaches have been proposed to prevent the network from overfitting and improve its generalization performance such as data augmentation, early stopping, parameters sharing, unsupervised learning, dropout, batch normalization, etc. In this thesis, we tackle the neural network overfitting issue from a representation learning perspective by considering the situation where few training samples are available which is the case of many real world applications. We propose three contributions. The first one presented in chapter 2 is dedicated to dealing with structured output problems to perform multivariate regression when the output variable y contains structural dependencies between its components. Our proposal aims mainly at exploiting these dependencies by learning them in an unsupervised way. Validated on a facial landmark detection problem, learning the structure of the output data has shown to improve the network generalization and speedup its training. The second contribution described in chapter 3 deals with the classification task where we propose to exploit prior knowledge about the internal representation of the hidden layers in neural networks. This prior is based on the idea that samples within the same class should have the same internal representation. We formulate this prior as a penalty that we add to the training cost to be minimized. Empirical experiments over MNIST and its variants showed an improvement of the network generalization when using only few training samples. Our last contribution presented in chapter 4 showed the interest of transfer learning in applications where only few samples are available. The idea consists in re-using the filters of pre-trained convolutional networks that have been trained on large datasets such as ImageNet. Such pre-trained filters are plugged into a new convolutional network with new dense layers. Then, the whole network is trained over a new task. In this contribution, we provide an automatic system based on such learning scheme with an application to medical domain. In this application, the task consists in localizing the third lumbar vertebra in a 3D CT scan. A pre-processing of the 3D CT scan to obtain a 2D representation and a post-processing to refine the decision are included in the proposed system. This work has been done in collaboration with the clinic "Rouen Henri Becquerel Center" who provided us with data

APA, Harvard, Vancouver, ISO, and other styles

14

Dekhtiar, Jonathan. "Deep Learning and unsupervised learning to automate visual inspection in the manufacturing industry." Thesis, Compiègne, 2019. http://www.theses.fr/2019COMP2513.

Full text

Abstract:

La croissance exponentielle des besoins et moyens informatiques implique un besoin croissant d’automatisation des procédés industriels. Ce constat est en particulier visible pour l’inspection visuelle automatique sur ligne de production. Bien qu’étudiée depuis 1970, peine toujours à être appliquée à de larges échelles et à faible coûts. Les méthodes employées dépendent grandement de la disponibilité des experts métiers. Ce qui provoque inévitablement une augmentation des coûts et une réduction de la flexibilité des méthodes employées. Depuis 2012, les avancées dans le domaine associé à l’étude des réseaux neuronaux profonds (i.e. Deep Learning) a permis de nombreux progrès en ce sens, notamment grâce au réseaux neuronaux convolutif qui ont atteint des performances proches de l’humain dans de nombreux domaines associées à la perception visuelle (e.g. reconnaissance et détection d’objets, etc.). Cette thèse propose une approche non supervisée pour répondre aux besoins de l’inspection visuelle automatique. Cette méthode, baptisé AnoAEGAN, combine l’apprentissage adversaire et l’estimation d’une fonction de densité de probabilité. Ces deux approches complémentaires permettent d’estimer jointement la probabilité pixel par pixel d’un défaut visuel sur une image. Le modèle est entrainé à partir d’un nombre très limités d’images (i.e. inférieur à 1000 images) sans utilisation de connaissance expert pour « étiqueter » préalablement les données. Cette méthode permet une flexibilité accrue par la rapidité d’entrainement du modèle et une grande versatilité, démontrée sur dix tâches différentes sans la moindre modification du modèle. Cette méthode devrait permettre de réduire les coûts de développement et le temps nécessaire de déploiement en production. Cette méthode peut être également déployée de manière complémentaire à une approche supervisée afin de bénéficier des avantages de chaque approche
Although studied since 1970, automatic visual inspection on production lines still struggles to be applied on a large scale and at low cost. The methods used depend greatly on the availability of domain experts. This inevitably leads to increased costs and reduced flexibility in the methods used. Since 2012, advances in the field of Deep Learning have enabled many advances in this direction, particularly thanks to convolutional neura networks that have achieved near-human performance in many areas associated with visual perception (e.g. object recognition and detection, etc.). This thesis proposes an unsupervised approach to meet the needs of automatic visual inspection. This method, called AnoAEGAN, combines adversarial learning and the estimation of a probability density function. These two complementary approaches make it possible to jointly estimate the pixel-by-pixel probability of a visual defect on an image. The model is trained from a very limited number of images (i.e. less than 1000 images) without using expert knowledge to "label" the data beforehand. This method allows increased flexibility with a limited training time and therefore great versatility, demonstrated on ten different tasks without any modification of the model. This method should reduce development costs and the time required to deploy in production. This method can also be deployed in a complementary way to a supervised approach in order to benefit from the advantages of each approach

APA, Harvard, Vancouver, ISO, and other styles

15

Choi, Jin-Woo. "Action Recognition with Knowledge Transfer." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/101780.

Full text

Abstract:

Recent progress on deep neural networks has shown remarkable action recognition performance from videos. The remarkable performance is often achieved by transfer learning: training a model on a large-scale labeled dataset (source) and then fine-tuning the model on the small-scale labeled datasets (targets). However, existing action recognition models do not always generalize well on new tasks or datasets because of the following two reasons. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor generalization performance. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small- scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. For the first problem, I propose to learn scene-invariant action representations to mitigate the scene bias in action recognition models. Specifically, I augment the standard cross-entropy loss for action classification with 1) an adversarial loss for the scene types and 2) a human mask confusion loss for videos where the human actors are invisible. These two losses encourage learning representations unsuitable for predicting 1) the correct scene types and 2) the correct action types when there is no evidence. I validate the efficacy of the proposed method by transfer learning experiments. I trans- fer the pre-trained model to three different tasks, including action classification, temporal action localization, and spatio-temporal action detection. The results show consistent improvement over the baselines for every task and dataset. I formulate human action recognition as an unsupervised domain adaptation (UDA) problem to handle the second problem. In the UDA setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already exist- ing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene, to learn domain-invariant action representations. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Then I explore the semi-supervised video action recognition, where we have a lot of labeled videos as source data and sparsely labeled videos as target data. The semi-supervised setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject photometric, geometric, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks.
Doctor of Philosophy
Recent progress on deep learning has shown remarkable action recognition performance. The remarkable performance is often achieved by transferring the knowledge learned from existing large-scale data to the small-scale data specific to applications. However, existing action recog- nition models do not always work well on new tasks and datasets because of the following two problems. i) Current action recognition datasets have a spurious correlation between action types and background scene types. The models trained on these datasets are biased towards the scene instead of focusing on the actual action. This scene bias leads to poor performance on the new datasets and tasks. ii) Directly testing the model trained on the source data on the target data leads to poor performance as the source, and target distributions are different. Fine-tuning the model on the target data can mitigate this issue. However, manual labeling small-scale target videos is labor-intensive. In this dissertation, I propose solutions to these two problems. To tackle the first problem, I propose to learn scene-invariant action representations to mitigate background scene- biased human action recognition models for the first problem. Specifically, the proposed method learns representations that cannot predict the scene types and the correct actions when there is no evidence. I validate the proposed method's effectiveness by transferring the pre-trained model to multiple action understanding tasks. The results show consistent improvement over the baselines for every task and dataset. To handle the second problem, I formulate human action recognition as an unsupervised learning problem on the target data. In this setting, we have many labeled videos as source data and unlabeled videos as target data. We can use already existing labeled video datasets as source data in this setting. The task is to align the source and target feature distributions so that the learned model can generalize well on the target data. I propose 1) aligning the more important temporal part of each video and 2) encouraging the model to focus on action, not the background scene. The proposed method is simple and intuitive while achieving state-of-the-art performance without training on a lot of labeled target videos. I relax the unsupervised target data setting to a sparsely labeled target data setting. Here, we have many labeled videos as source data and sparsely labeled videos as target data. The setting is practical as sometimes we can afford a little bit of cost for labeling target data. I propose multiple video data augmentation methods to inject color, spatial, temporal, and scene invariances to the action recognition model in this setting. The resulting method shows favorable performance on the public benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

16

Yogeswaran, Arjun. "Self-Organizing Neural Visual Models to Learn Feature Detectors and Motion Tracking Behaviour by Exposure to Real-World Data." Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/37096.

Full text

Abstract:

Advances in unsupervised learning and deep neural networks have led to increased performance in a number of domains, and to the ability to draw strong comparisons between the biological method of self-organization conducted by the brain and computational mechanisms. This thesis aims to use real-world data to tackle two areas in the domain of computer vision which have biological equivalents: feature detection and motion tracking. The aforementioned advances have allowed efficient learning of feature representations directly from large sets of unlabeled data instead of using traditional handcrafted features. The first part of this thesis evaluates such representations by comparing regularization and preprocessing methods which incorporate local neighbouring information during training on a single-layer neural network. The networks are trained and tested on the Hollywood2 video dataset, as well as the static CIFAR-10, STL-10, COIL-100, and MNIST image datasets. The induction of topography or simple image blurring via Gaussian filters during training produces better discriminative features as evidenced by the consistent and notable increase in classification results that they produce. In the visual domain, invariant features are desirable such that objects can be classified despite transformations. It is found that most of the compared methods produce more invariant features, however, classification accuracy does not correlate to invariance. The second, and paramount, contribution of this thesis is a biologically-inspired model to explain the emergence of motion tracking behaviour in early development using unsupervised learning. The model’s self-organization is biased by an original concept called retinal constancy, which measures how similar visual contents are between successive frames. In the proposed two-layer deep network, when exposed to real-world video, the first layer learns to encode visual motion, and the second layer learns to relate that motion to gaze movements, which it perceives and creates through bi-directional nodes. This is unique because it uses general machine learning algorithms, and their inherent generative properties, to learn from real-world data. It also implements a biological theory and learns in a fully unsupervised manner. An analysis of its parameters and limitations is conducted, and its tracking performance is evaluated. Results show that this model is able to successfully follow targets in real-world video, despite being trained without supervision on real-world video.

APA, Harvard, Vancouver, ISO, and other styles

17

Nyamapfene, Abel. "Unsupervised multimodal neural networks." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/844064/.

Full text

Abstract:

We extend the in-situ Hebbian-linked SOMs network by Miikkulainen to come up with two unsupervised neural networks that learn the mapping between the individual modes of a multimodal dataset. The first network, the single-pass Hebbian linked SOMs network, extends the in-situ Hebbian-linked SOMs network by enabling the Hebbian link weights to be computed through one- shot learning. The second network, a modified counter propagation network, extends the unsupervised learning of crossmodal mappings by making it possible for only one self-organising map to implement the crossmodal mapping. The two proposed networks each have a smaller computation time and achieve lower crossmodal mean squared errors than the in-situ Hebbian- linked SOMs network when assessed on two bimodal datasets, an audio-acoustic speech utterance dataset and a phonological-semantics child utterance dataset. Of the three network architectures, the modified counterpropagation network achieves the highest percentage of correct classifications comparable to that of the LVQ-2 algorithm by Kohonen and the neural network for category learning by de Sa and Ballard in classification tasks using the audio-acoustic speech utterance dataset. To facilitate multimodal processing of temporal data, we propose a Temporal Hypermap neural network architecture that learns and recalls multiple temporal patterns in an unsupervised manner. The Temporal Hypermap introduces flexibility in the recall of temporal patterns - a stored temporal pattern can be retrieved by prompting the network with the temporal pattern's identity vector, whilst the incorporation of short term memory allows the recall of a temporal pattern, starting from the pattern item specified by contextual information up to the last item in the pattern sequence. Finally, we extend the connectionist modelling of child language acquisition in two important respects. First, we introduce the concept of multimodal representation of speech utterances at the one-word and two-word stage. This allows us to model child language at the one-word utterance stage with a single modified counterpropagation network, which is an improvement on previous models in which multiple networks are required to simulate the different aspects of speech at the one-word utterance stage. Secondly, we present, for the time, a connectionist model of the transition of child language from the one-word utterance stage to the two-word utterance stage. We achieve this using a gated multi-net comprising a modified counterpropagation network and a Temporal Hypermap.

APA, Harvard, Vancouver, ISO, and other styles

18

Stella, Federico. "Learning a Local Reference Frame for Point Clouds using Spherical CNNs." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20197/.

Full text

Abstract:

Uno dei problemi più importanti della 3D Computer Vision è il cosiddetto surface matching, che consiste nel trovare corrispondenze tra oggetti tridimensionali. Attualmente il problema viene affrontato calcolando delle feature locali e compatte, chiamate descrittori, che devono essere riconosciute e messe in corrispondenza al mutare della posa dell'oggetto nello spazio, e devono quindi essere invarianti rispetto all'orientazione. Il metodo più usato per ottenere questa proprietà consiste nell'utilizzare dei Local Reference Frame (LRF): sistemi di coordinate locali che forniscono un'orientazione canonica alle porzioni di oggetti 3D che vengono usate per calcolare i descrittori. In letteratura esistono diversi modi per calcolare gli LRF, ma fanno tutti uso di algoritmi progettati manualmente. Vi è anche una recente proposta che utilizza reti neurali, tuttavia queste vengono addestrate mediante feature specificamente progettate per lo scopo, il che non permette di sfruttare pienamente i benefici delle moderne strategie di end-to-end learning. Lo scopo di questo lavoro è utilizzare un approccio data-driven per far imparare a una rete neurale il calcolo di un Local Reference Frame a partire da point cloud grezze, producendo quindi il primo esempio di end-to-end learning applicato alla stima di LRF. Per farlo, sfruttiamo una recente innovazione chiamata Spherical Convolutional Neural Networks, le quali generano e processano segnali nello spazio SO(3) e sono quindi naturalmente adatte a rappresentare e stimare orientazioni e LRF. Confrontiamo le prestazioni ottenute con quelle di metodi esistenti su benchmark standard, ottenendo risultati promettenti.

APA, Harvard, Vancouver, ISO, and other styles

19

Nair, Karthik. "Optimisation of autoencoders for prediction of SNPs determining phenotypes in wheat." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-437451.

Full text

Abstract:

The increase in demand for food has resulted in increased demand for tools that help streamline plant breeding process in order to create new varieties of crops. Identifying the underlying genetic mechanism of favourable characteristics is essential in order to make the best breeding decisions. In this project we have developed a modified autoencoder model which allows for lateral phenotype injection into the latent layer, in order to identify causal SNPs for phenotypes of interest in wheat. SNP and phenotype data for 500 samples of Lantmännen SW Seed provided by Lantmännen was used to train the network. Artificial phenotype created using a single SNP was used during training instead of real phenotype, since the relationship between the phenotype and SNP is already known. The modified training model with lateral phenotype injection showed significant increase in genotype concordance of the artificial phenotype when compared to the control model without phenotype injection. Causal SNP was successfully identified by using concordance terrain graph, where the difference in concordance of individual SNPs between the modified modified model and control model was plotted against the genomic position of each SNP. The model requires further testing to elucidate its behaviour for phenotypes linked to multiple SNPs.

APA, Harvard, Vancouver, ISO, and other styles

20

Sala, Cardoso Enric. "Advanced energy management strategies for HVAC systems in smart buildings." Doctoral thesis, Universitat Politècnica de Catalunya, 2019. http://hdl.handle.net/10803/668528.

Full text

Abstract:

The efficacy of the energy management systems at dealing with energy consumption in buildings has been a topic with a growing interest in recent years due to the ever-increasing global energy demand and the large percentage of energy being currently used by buildings. The scale of this sector has attracted research effort with the objective of uncovering potential improvement avenues and materializing them with the help of recent technological advances that could be exploited to lower the energetic footprint of buildings. Specifically, in the area of heating, ventilating and air conditioning installations, the availability of large amounts of historical data in building management software suites makes possible the study of how resource-efficient these systems really are when entrusted with ensuring occupant comfort. Actually, recent reports have shown that there is a gap between the ideal operating performance and the performance achieved in practice. Accordingly, this thesis considers the research of novel energy management strategies for heating, ventilating and air conditioning installations in buildings, aimed at narrowing the performance gap by employing data-driven methods to increase their context awareness, allowing management systems to steer the operation towards higher efficiency. This includes the advancement of modeling methodologies capable of extracting actionable knowledge from historical building behavior databases, through load forecasting and equipment operational performance estimation supporting the identification of a building’s context and energetic needs, and the development of a generalizable multi-objective optimization strategy aimed at meeting these needs while minimizing the consumption of energy. The experimental results obtained from the implementation of the developed methodologies show a significant potential for increasing energy efficiency of heating, ventilating and air conditioning systems while being sufficiently generic to support their usage in different installations having diverse equipment. In conclusion, a complete analysis and actuation framework was developed, implemented and validated by means of an experimental database acquired from a pilot plant during the research period of this thesis. The obtained results demonstrate the efficacy of the proposed standalone contributions, and as a whole represent a suitable solution for helping to increase the performance of heating, ventilating and air conditioning installations without affecting the comfort of their occupants.
L’eficàcia dels sistemes de gestió d’energia per afrontar el consum d’energia en edificis és un tema que ha rebut un interès en augment durant els darrers anys a causa de la creixent demanda global d’energia i del gran percentatge d’energia que n’utilitzen actualment els edificis. L’escala d’aquest sector ha atret l'atenció de nombrosa investigació amb l’objectiu de descobrir possibles vies de millora i materialitzar-les amb l’ajuda de recents avenços tecnològics que es podrien aprofitar per disminuir les necessitats energètiques dels edificis. Concretament, en l’àrea d’instal·lacions de calefacció, ventilació i climatització, la disponibilitat de grans bases de dades històriques als sistemes de gestió d’edificis fa possible l’estudi de com d'eficients són realment aquests sistemes quan s’encarreguen d'assegurar el confort dels seus ocupants. En realitat, informes recents indiquen que hi ha una diferència entre el rendiment operatiu ideal i el rendiment generalment assolit a la pràctica. En conseqüència, aquesta tesi considera la investigació de noves estratègies de gestió de l’energia per a instal·lacions de calefacció, ventilació i climatització en edificis, destinades a reduir la diferència de rendiment mitjançant l’ús de mètodes basats en dades per tal d'augmentar el seu coneixement contextual, permetent als sistemes de gestió dirigir l’operació cap a zones de treball amb un rendiment superior. Això inclou tant l’avanç de metodologies de modelat capaces d’extreure coneixement de bases de dades de comportaments històrics d’edificis a través de la previsió de càrregues de consum i l’estimació del rendiment operatiu dels equips que recolzin la identificació del context operatiu i de les necessitats energètiques d’un edifici, tant com del desenvolupament d’una estratègia d’optimització multi-objectiu generalitzable per tal de minimitzar el consum d’energia mentre es satisfan aquestes necessitats energètiques. Els resultats experimentals obtinguts a partir de la implementació de les metodologies desenvolupades mostren un potencial important per augmentar l'eficiència energètica dels sistemes de climatització, mentre que són prou genèrics com per permetre el seu ús en diferents instal·lacions i suportant equips diversos. En conclusió, durant aquesta tesi es va desenvolupar, implementar i validar un marc d’anàlisi i actuació complet mitjançant una base de dades experimental adquirida en una planta pilot durant el període d’investigació de la tesi. Els resultats obtinguts demostren l’eficàcia de les contribucions de manera individual i, en conjunt, representen una solució idònia per ajudar a augmentar el rendiment de les instal·lacions de climatització sense afectar el confort dels seus ocupants

APA, Harvard, Vancouver, ISO, and other styles

21

Macdonald, Donald. "Unsupervised neural networks for visualisation of data." Thesis, University of the West of Scotland, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.395687.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Berry, Ian Michael. "Data classification using unsupervised artificial neural networks." Thesis, University of Sussex, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390079.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Harpur, George Francis. "Low entropy coding with unsupervised neural networks." Thesis, University of Cambridge, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627227.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Liu, Qian. "Deep spiking neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/deep-spiking-neural-networks(336e6a37-2a0b-41ff-9ffb-cca897220d6c).html.

Full text

Abstract:

Neuromorphic Engineering (NE) has led to the development of biologically-inspired computer architectures whose long-term goal is to approach the performance of the human brain in terms of energy efficiency and cognitive capabilities. Although there are a number of neuromorphic platforms available for large-scale Spiking Neural Network (SNN) simulations, the problem of programming these brain-like machines to be competent in cognitive applications still remains unsolved. On the other hand, Deep Learning has emerged in Artificial Neural Network (ANN) research to dominate state-of-the-art solutions for cognitive tasks. Thus the main research problem emerges of understanding how to operate and train biologically-plausible SNNs to close the gap in cognitive capabilities between SNNs and ANNs. SNNs can be trained by first training an equivalent ANN and then transferring the tuned weights to the SNN. This method is called âoff-lineâ training, since it does not take place on an SNN directly, but rather on an ANN instead. However, previous work on such off-line training methods has struggled in terms of poor modelling accuracy of the spiking neurons and high computational complexity. In this thesis we propose a simple and novel activation function, Noisy Softplus (NSP), to closely model the response firing activity of biologically-plausible spiking neurons, and introduce a generalised off-line training method using the Parametric Activation Function (PAF) to map the abstract numerical values of the ANN to concrete physical units, such as current and firing rate in the SNN. Based on this generalised training method and its fine tuning, we achieve the state-of-the-art accuracy on the MNIST classification task using spiking neurons, 99.07%, on a deep spiking convolutional neural network (ConvNet). We then take a step forward to âon-lineâ training methods, where Deep Learning modules are trained purely on SNNs in an event-driven manner. Existing work has failed to provide SNNs with recognition accuracy equivalent to ANNs due to the lack of mathematical analysis. Thus we propose a formalised Spike-based Rate Multiplication (SRM) method which transforms the product of firing rates to the number of coincident spikes of a pair of rate-coded spike trains. Moreover, these coincident spikes can be captured by the Spike-Time-Dependent Plasticity (STDP) rule to update the weights between the neurons in an on-line, event-based, and biologically-plausible manner. Furthermore, we put forward solutions to reduce correlations between spike trains; thereby addressing the result of performance drop in on-line SNN training. The promising results of spiking Autoencoders (AEs) and Restricted Boltzmann Machines (SRBMs) exhibit equivalent, sometimes even superior, classification and reconstruction capabilities compared to their non-spiking counterparts. To provide meaningful comparisons between these proposed SNN models and other existing methods within this rapidly advancing field of NE, we propose a large dataset of spike-based visual stimuli and a corresponding evaluation methodology to estimate the overall performance of SNN models and their hardware implementations.

APA, Harvard, Vancouver, ISO, and other styles

25

Walcott, Terry Hugh. "Market prediction for SMEs using unsupervised neural networks." Thesis, University of East London, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.532991.

Full text

Abstract:

The objective of this study was to create a market prediction model for small and medium enterprises (SMEs). To achieve this, an extensive literature examination was carried out which focused on SMEs, marketing and prediction; neural networks as a competitive tool for SME marketing; and clustering a review. A Delphi study was used for collating expert opinions in order to determine likely factors hindering SMEs wanting to remain business proficient. An analysis of Delphi responses led to the creation of a market prediction questionnaire. This questionnaire was used to create variables for analysis using four unsupervised algorithm. The algorithms used in this study were joining tree, k-means, learning vector quantisation and the snap-drift algorithm. Questionnaire data took the form of data collected from 102 SMEs. This led to the determination of 23 variables that could best represent the data under examination. Further analysis of each 23 variable led to the choice of respondents for case study analysis. A higher education college (HEC) and a private hire company (PHC) were chosen for this stage of the research. In case study one (1), analysis has discovered that HEC's can compete with Universities if they tailor their products and services to selected academic markets as opposed to entering all academic sectors. The findings suggest that if a HEC monitors the growth of its students and establishes the likely point of creating new courses they will retain students and not lose them to universities. Comparisons between the case HEC and rival HECs has demonstrated that there is a knowledge gap that currently exists between these institutions and by using post-modem marketing coupled with neural networks a competitive advantage will be realised. In case study two (2), a private hire company was investigated allowing for the interpretation of current markets for this firm by making existing operating areas more transparent. Therefore, knowledge barriers were discovered between telephonists and drivers, and the owner/manger and drivers. As such historical data was used for distinguishing the performance of drivers within this firm. In differentiating job times and driver performance our case organisation was better equipped for determining the times in which it is most busy. Therefore, being able to determine the amount of telephonists needed per shift and the likely busy periods in which this firm will operate. Analysis of all participating SMEs have revealed that: (1) these firms are more likely to fail in the first two years of operation generally, (2) successful SMEs are owned or managed by persons having prior management and or general business expertise, (3) success is normally attributed to experience gained as a result of working or managing a threatened firm in the past, (4) successful SMEs understand the importance of valuing the ethnicity held in their respective firms and (5) these firms are less likely to understand how technology can aid and sustain market growth generally. It seems market prediction in SMEs can be affected by employee performance and managerial ability to undertake predefined tasks. The findings suggest that there are SMEs that can benefit from market prediction. More importantly, the findings indicate the need to understand the SME for determining the types of intelligent systems that can be used for initiate marketing and providing marketing prediction generally. Several theoretical and practical implications are discussed. To this effect, SME owner/managers, researchers in academia, government and public SME organisations can learn from the results. Suggestions for future research are also presented.

APA, Harvard, Vancouver, ISO, and other styles

26

Vetcha, Sarat Babu. "Fault diagnosis in pumps by unsupervised neural networks." Thesis, University of Sussex, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.300604.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Squadrani, Lorenzo. "Deep neural networks and thermodynamics." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

Deep learning is the most effective and used approach to artificial intelligence, and yet it is far from being properly understood. The understanding of it is the way to go to further improve its effectiveness and in the best case to gain some understanding of the "natural" intelligence. We attempt a step in this direction with the aim of physics. We describe a convolutional neural network for image classification (trained on CIFAR-10) within the descriptive framework of Thermodynamics. In particular we define and study the temperature of each component of the network. Our results provides a new point of view on deep learning models, which may be a starting point towards a better understanding of artificial intelligence.

APA, Harvard, Vancouver, ISO, and other styles

28

Mancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.

Full text

Abstract:

Deep Convolutional Neural Networks and "deep learning" in general stand at the cutting edge on a range of applications, from image based recognition and classification to natural language processing, speech and speaker recognition and reinforcement learning. Very deep models however are often large, complex and computationally expensive to train and evaluate. Deep learning models are thus seldom deployed natively in environments where computational resources are scarce or expensive. To address this problem we turn our attention towards a range of techniques that we collectively refer to as "model compression" where a lighter student model is trained to approximate the output produced by the model we wish to compress. To this end, the output from the original model is used to craft the training labels of the smaller student model. This work contains some experiments on CIFAR-10 and demonstrates how to use the aforementioned techniques to compress a people counting model whose precision, recall and F1-score are improved by as much as 14% against our baseline.

APA, Harvard, Vancouver, ISO, and other styles

29

Abbasi, Mahdieh. "Toward robust deep neural networks." Doctoral thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67766.

Full text

Abstract:

Dans cette thèse, notre objectif est de développer des modèles d’apprentissage robustes et fiables mais précis, en particulier les Convolutional Neural Network (CNN), en présence des exemples anomalies, comme des exemples adversaires et d’échantillons hors distribution –Out-of-Distribution (OOD). Comme la première contribution, nous proposons d’estimer la confiance calibrée pour les exemples adversaires en encourageant la diversité dans un ensemble des CNNs. À cette fin, nous concevons un ensemble de spécialistes diversifiés avec un mécanisme de vote simple et efficace en termes de calcul pour prédire les exemples adversaires avec une faible confiance tout en maintenant la confiance prédicative des échantillons propres élevée. En présence de désaccord dans notre ensemble, nous prouvons qu’une borne supérieure de 0:5 + _0 peut être établie pour la confiance, conduisant à un seuil de détection global fixe de tau = 0; 5. Nous justifions analytiquement le rôle de la diversité dans notre ensemble sur l’atténuation du risque des exemples adversaires à la fois en boîte noire et en boîte blanche. Enfin, nous évaluons empiriquement la robustesse de notre ensemble aux attaques de la boîte noire et de la boîte blanche sur plusieurs données standards. La deuxième contribution vise à aborder la détection d’échantillons OOD à travers un modèle de bout en bout entraîné sur un ensemble OOD approprié. À cette fin, nous abordons la question centrale suivante : comment différencier des différents ensembles de données OOD disponibles par rapport à une tâche de distribution donnée pour sélectionner la plus appropriée, ce qui induit à son tour un modèle calibré avec un taux de détection des ensembles inaperçus de données OOD? Pour répondre à cette question, nous proposons de différencier les ensembles OOD par leur niveau de "protection" des sub-manifolds. Pour mesurer le niveau de protection, nous concevons ensuite trois nouvelles mesures efficaces en termes de calcul à l’aide d’un CNN vanille préformé. Dans une vaste série d’expériences sur les tâches de classification d’image et d’audio, nous démontrons empiriquement la capacité d’un CNN augmenté (A-CNN) et d’un CNN explicitement calibré pour détecter une portion significativement plus grande des exemples OOD. Fait intéressant, nous observons également qu’un tel A-CNN (nommé A-CNN) peut également détecter les adversaires exemples FGS en boîte noire avec des perturbations significatives. En tant que troisième contribution, nous étudions de plus près de la capacité de l’A-CNN sur la détection de types plus larges d’adversaires boîte noire (pas seulement ceux de type FGS). Pour augmenter la capacité d’A-CNN à détecter un plus grand nombre d’adversaires,nous augmentons l’ensemble d’entraînement OOD avec des échantillons interpolés inter-classes. Ensuite, nous démontrons que l’A-CNN, entraîné sur tous ces données, a un taux de détection cohérent sur tous les types des adversaires exemples invisibles. Alors que la entraînement d’un A-CNN sur des adversaires PGD ne conduit pas à un taux de détection stable sur tous les types d’adversaires, en particulier les types inaperçus. Nous évaluons également visuellement l’espace des fonctionnalités et les limites de décision dans l’espace d’entrée d’un CNN vanille et de son homologue augmenté en présence d’adversaires et de ceux qui sont propres. Par un A-CNN correctement formé, nous visons à faire un pas vers un modèle d’apprentissage debout en bout unifié et fiable avec de faibles taux de risque sur les échantillons propres et les échantillons inhabituels, par exemple, les échantillons adversaires et OOD. La dernière contribution est de présenter une application de A-CNN pour l’entraînement d’un détecteur d’objet robuste sur un ensemble de données partiellement étiquetées, en particulier un ensemble de données fusionné. La fusion de divers ensembles de données provenant de contextes similaires mais avec différents ensembles d’objets d’intérêt (OoI) est un moyen peu coûteux de créer un ensemble de données à grande échelle qui couvre un plus large spectre d’OoI. De plus, la fusion d’ensembles de données permet de réaliser un détecteur d’objet unifié, au lieu d’en avoir plusieurs séparés, ce qui entraîne une réduction des coûts de calcul et de temps. Cependant, la fusion d’ensembles de données, en particulier à partir d’un contexte similaire, entraîne de nombreuses instances d’étiquetées manquantes. Dans le but d’entraîner un détecteur d’objet robuste intégré sur un ensemble de données partiellement étiquetées mais à grande échelle, nous proposons un cadre d’entraînement auto-supervisé pour surmonter le problème des instances d’étiquettes manquantes dans les ensembles des données fusionnés. Notre cadre est évalué sur un ensemble de données fusionné avec un taux élevé d’étiquettes manquantes. Les résultats empiriques confirment la viabilité de nos pseudo-étiquettes générées pour améliorer les performances de YOLO, en tant que détecteur d’objet à la pointe de la technologie.
In this thesis, our goal is to develop robust and reliable yet accurate learning models, particularly Convolutional Neural Networks (CNNs), in the presence of adversarial examples and Out-of-Distribution (OOD) samples. As the first contribution, we propose to predict adversarial instances with high uncertainty through encouraging diversity in an ensemble of CNNs. To this end, we devise an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism to predict the adversarial examples with low confidence while keeping the predictive confidence of the clean samples high. In the presence of high entropy in our ensemble, we prove that the predictive confidence can be upper-bounded, leading to have a globally fixed threshold over the predictive confidence for identifying adversaries. We analytically justify the role of diversity in our ensemble on mitigating the risk of both black-box and white-box adversarial examples. Finally, we empirically assess the robustness of our ensemble to the black-box and the white-box attacks on several benchmark datasets.The second contribution aims to address the detection of OOD samples through an end-to-end model trained on an appropriate OOD set. To this end, we address the following central question: how to differentiate many available OOD sets w.r.t. a given in distribution task to select the most appropriate one, which in turn induces a model with a high detection rate of unseen OOD sets? To answer this question, we hypothesize that the “protection” level of in-distribution sub-manifolds by each OOD set can be a good possible property to differentiate OOD sets. To measure the protection level, we then design three novel, simple, and cost-effective metrics using a pre-trained vanilla CNN. In an extensive series of experiments on image and audio classification tasks, we empirically demonstrate the abilityof an Augmented-CNN (A-CNN) and an explicitly-calibrated CNN for detecting a significantly larger portion of unseen OOD samples, if they are trained on the most protective OOD set. Interestingly, we also observe that the A-CNN trained on the most protective OOD set (calledA-CNN) can also detect the black-box Fast Gradient Sign (FGS) adversarial examples. As the third contribution, we investigate more closely the capacity of the A-CNN on the detection of wider types of black-box adversaries. To increase the capability of A-CNN to detect a larger number of adversaries, we augment its OOD training set with some inter-class interpolated samples. Then, we demonstrate that the A-CNN trained on the most protective OOD set along with the interpolated samples has a consistent detection rate on all types of unseen adversarial examples. Where as training an A-CNN on Projected Gradient Descent (PGD) adversaries does not lead to a stable detection rate on all types of adversaries, particularly the unseen types. We also visually assess the feature space and the decision boundaries in the input space of a vanilla CNN and its augmented counterpart in the presence of adversaries and the clean ones. By a properly trained A-CNN, we aim to take a step toward a unified and reliable end-to-end learning model with small risk rates on both clean samples and the unusual ones, e.g. adversarial and OOD samples.The last contribution is to show a use-case of A-CNN for training a robust object detector on a partially-labeled dataset, particularly a merged dataset. Merging various datasets from similar contexts but with different sets of Object of Interest (OoI) is an inexpensive way to craft a large-scale dataset which covers a larger spectrum of OoIs. Moreover, merging datasets allows achieving a unified object detector, instead of having several separate ones, resultingin the reduction of computational and time costs. However, merging datasets, especially from a similar context, causes many missing-label instances. With the goal of training an integrated robust object detector on a partially-labeled but large-scale dataset, we propose a self-supervised training framework to overcome the issue of missing-label instances in the merged datasets. Our framework is evaluated on a merged dataset with a high missing-label rate. The empirical results confirm the viability of our generated pseudo-labels to enhance the performance of YOLO, as the current (to date) state-of-the-art object detector.

APA, Harvard, Vancouver, ISO, and other styles

30

Caron, Mathilde. "Unsupervised Representation Learning with Clustering in Deep Convolutional Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-227926.

Full text

Abstract:

This master thesis tackles the problem of unsupervised learning of visual representations with deep Convolutional Neural Networks (CNN). This is one of the main actual challenges in image recognition to close the gap between unsupervised and supervised representation learning. We propose a novel and simple way of training CNN on fully unlabeled datasets. Our method jointly optimizes a grouping of the representations and trains a CNN using the groups as supervision. We evaluate the models trained with our method on standard transfer learning experiments from the literature. We find out that our method outperforms all self-supervised and unsupervised state-of-the-art approaches. More importantly, our method outperforms those methods even when the unsupervised training set is not ImageNet but an arbitrary subset of images from Flickr.
Detta examensarbete behandlar problemet med oövervakat lärande av visuella representationer med djupa konvolutionella neurala nätverk (CNN). Detta är en av de viktigaste faktiska utmaningarna i datorseende för att överbrygga klyftan mellan oövervakad och övervakad representationstjänst. Vi föreslår ett nytt och enkelt sätt att träna CNN på helt omärkta dataset. Vår metod består i att tillsammans optimera en gruppering av representationerna och träna ett CNN med hjälp av grupperna som tillsyn. Vi utvärderar modellerna som tränats med vår metod på standardöverföringslärande experiment från litteraturen. Vi finner att vår metod överträffar alla självövervakade och oövervakade, toppmoderna tillvägagångssätt, hur sofistikerade de än är. Ännu viktigare är att vår metod överträffar de metoderna även när den oövervakade träningsuppsättningen inte är ImageNet men en godtycklig delmängd av bilder från Flickr.

APA, Harvard, Vancouver, ISO, and other styles

31

Manenti, Céline. "Découverte d'unités linguistiques à l'aide de méthodes d'apprentissage non supervisé." Thesis, Toulouse 3, 2019. http://www.theses.fr/2019TOU30074.

Full text

Abstract:

La découverte d'unités linguistiques élémentaires (phonèmes, mots) uniquement à partir d'enregistrements sonores est un problème non-résolu qui suscite un fort intérêt de la communauté du traitement automatique de la parole, comme en témoignent les nombreuses contributions récentes de l'état de l'art. Durant cette thèse, nous nous sommes concentrés sur l'utilisation de réseaux de neurones pour répondre au problème. Nous avons approché le problème en utilisant les réseaux de neurones de manière supervisée, faiblement supervisée et multilingue. Nous avons ainsi développé des outils de segmentation automatique en phonèmes et de classification phonétique fondés sur des réseaux de neurones convolutifs. L'outil de segmentation automatique a obtenu 79% de F-mesure sur le corpus de parole conversationnelle en anglais BUCKEYE. Ce résultat est similaire à un annotateur humain d'après l'accord inter-annotateurs fourni par les créateurs du corpus. De plus, il n'a pas besoin de beaucoup de données (environ une dizaine de minutes par locuteur et 5 locuteurs différents) pour être performant. De plus, il est portable à d'autres langues (notamment pour des langues peu dotées telle que le xitsonga). Le système de classification phonétique permet de fixer les différents paramètres et hyperparamètres utiles pour un scénario non supervisé. Dans le cadre non supervisé, les réseaux de neurones (Auto-Encodeurs) nous ont permis de générer de nouvelles représentations paramétriques, concentrant l'information de la trame d'entrée et ses trames voisines. Nous avons étudié leur utilité pour la compression audio à partir du signal brut, pour laquelle ils se sont montrés efficaces (faible taux de RMS, même avec une compression de 99%). Nous avons également réalisé une pré-étude novatrice sur une utilisation différente des réseaux de neurones, pour générer des vecteurs de paramètres non pas à partir des sorties des couches mais des valeurs des poids des couches. Ces paramètres visent à imiter les coefficients de prédiction linéaire (Linear Predictive Coefficients, LPC). Dans le contexte de la découverte non supervisée d'unités similaires à des phonèmes (dénommées pseudo-phones dans ce mémoire) et la génération de nouvelles représentations paramétriques phonétiquement discriminantes, nous avons couplé un réseau de neurones avec un outil de regroupement (k-means). L'alternance itérative de ces deux outils a permis la génération de paramètres phonétiquement discriminants pour un même locuteur : de faibles taux d'erreur ABx intra-locuteur de 7,3% pour l'anglais, 8,5% pour le français et 8,4% pour le mandarin ont été obtenus. Ces résultats permettent un gain absolu d'environ 4% par rapport à la baseline (paramètres classiques MFCC) et sont proches des meilleures approches actuelles (1% de plus que le vainqueur du Zero Ressource Speech Challenge 2017). Les résultats inter-locuteurs varient entre 12% et 15% suivant la langue, contre 21% à 25% pour les MFCC
The discovery of elementary linguistic units (phonemes, words) only from sound recordings is an unresolved problem that arouses a strong interest from the community of automatic speech processing, as evidenced by the many recent contributions of the state of the art. During this thesis, we focused on using neural networks to answer the problem. We approached the problem using neural networks in a supervised, poorly supervised and multilingual manner. We have developed automatic phoneme segmentation and phonetic classification tools based on convolutional neural networks. The automatic segmentation tool obtained 79% F-measure on the BUCKEYE conversational speech corpus. This result is similar to a human annotator according to the inter-annotator agreement provided by the creators of the corpus. In addition, it does not need a lot of data (about ten minutes per speaker and 5 different speakers) to be effective. In addition, it is portable to other languages (especially for poorly endowed languages such as xitsonga). The phonetic classification system makes it possible to set the various parameters and hyperparameters that are useful for an unsupervised scenario. In the unsupervised context, the neural networks (Auto-Encoders) allowed us to generate new parametric representations, concentrating the information of the input frame and its neighboring frames. We studied their utility for audio compression from the raw signal, for which they were effective (low RMS, even at 99% compression). We also carried out an innovative pre-study on a different use of neural networks, to generate vectors of parameters not from the outputs of the layers but from the values of the weights of the layers. These parameters are designed to mimic Linear Predictive Coefficients (LPC). In the context of the unsupervised discovery of phoneme-like units (called pseudo-phones in this memory) and the generation of new phonetically discriminative parametric representations, we have coupled a neural network with a clustering tool (k-means ). The iterative alternation of these two tools allowed the generation of phonetically discriminating parameters for the same speaker: low rates of intra-speaker ABx error of 7.3% for English, 8.5% for French and 8 , 4% for Mandarin were obtained. These results allow an absolute gain of about 4% compared to the baseline (conventional parameters MFCC) and are close to the best current approaches (1% more than the winner of the Zero Resource Speech Challenge 2017). The inter-speaker results vary between 12% and 15% depending on the language, compared to 21% to 25% for MFCCs

APA, Harvard, Vancouver, ISO, and other styles

32

Bishop, Griffin R. "Unsupervised Semantic Segmentation through Cross-Instance Representation Similarity." Digital WPI, 2020. https://digitalcommons.wpi.edu/etd-theses/1371.

Full text

Abstract:

Semantic segmentation methods using deep neural networks typically require huge volumes of annotated data to train properly. Due to the expense of collecting these pixel-level dataset annotations, the problem of semantic segmentation without ground-truth labels has been recently proposed. Many current approaches to unsupervised semantic segmentation frame the problem as a pixel clustering task, and in particular focus heavily on color differences between image regions. In this paper, we explore a weakness to this approach: By focusing on color, these approaches do not adequately capture relationships between similar objects across images. We present a new approach to the problem, and propose a novel architecture that captures the characteristic similarities of objects between images directly. We design a synthetic dataset to illustrate this flaw in an existing model. Experiments on this synthetic dataset show that our method can succeed where the pixel color clustering approach fails. Further, we show that plain autoencoder models can implicitly capture these cross-instance object relationships. This suggests that some generative model architectures may be viable candidates for unsupervised semantic segmentation even with no additional loss terms.

APA, Harvard, Vancouver, ISO, and other styles

33

Längkvist, Martin. "Modeling time-series with deep networks." Doctoral thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-39415.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Lu, Yifei. "Deep neural networks and fraud detection." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331833.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Kalogiras, Vasileios. "Sentiment Classification with Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217858.

Full text

Abstract:

Attitydanalys är ett delfält av språkteknologi (NLP) som försöker analysera känslan av skriven text. Detta är ett komplext problem som medför många utmaningar. Av denna anledning har det studerats i stor utsträckning. Under de senaste åren har traditionella maskininlärningsalgoritmer eller handgjord metodik använts och givit utmärkta resultat. Men den senaste renässansen för djupinlärning har växlat om intresse till end to end deep learning-modeller.Å ena sidan resulterar detta i mer kraftfulla modeller men å andra sidansaknas klart matematiskt resonemang eller intuition för dessa modeller. På grund av detta görs ett försök i denna avhandling med att kasta ljus på nyligen föreslagna deep learning-arkitekturer för attitydklassificering. En studie av deras olika skillnader utförs och ger empiriska resultat för hur ändringar i strukturen eller kapacitet hos modellen kan påverka exaktheten och sättet den representerar och ''förstår'' meningarna.
Sentiment analysis is a subfield of natural language processing (NLP) that attempts to analyze the sentiment of written text.It is is a complex problem that entails different challenges. For this reason, it has been studied extensively. In the past years traditional machine learning algorithms or handcrafted methodologies used to provide state of the art results. However, the recent deep learning renaissance shifted interest towards end to end deep learning models. On the one hand this resulted into more powerful models but on the other hand clear mathematical reasoning or intuition behind distinct models is still lacking. As a result, in this thesis, an attempt to shed some light on recently proposed deep learning architectures for sentiment classification is made.A study of their differences is performed as well as provide empirical results on how changes in the structure or capacity of a model can affect its accuracy and the way it represents and ''comprehends'' sentences.

APA, Harvard, Vancouver, ISO, and other styles

36

Choi, Keunwoo. "Deep neural networks for music tagging." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46029.

Full text

Abstract:

In this thesis, I present my hypothesis, experiment results, and discussion that are related to various aspects of deep neural networks for music tagging. Music tagging is a task to automatically predict the suitable semantic label when music is provided. Generally speaking, the input of music tagging systems can be any entity that constitutes music, e.g., audio content, lyrics, or metadata, but only the audio content is considered in this thesis. My hypothesis is that we can fi nd effective deep learning practices for the task of music tagging task that improves the classi fication performance. As a computational model to realise a music tagging system, I use deep neural networks. Combined with the research problem, the scope of this thesis is the understanding, interpretation, optimisation, and application of deep neural networks in the context of music tagging systems. The ultimate goal of this thesis is to provide insight that can help to improve deep learning-based music tagging systems. There are many smaller goals in this regard. Since using deep neural networks is a data-driven approach, it is crucial to understand the dataset. Selecting and designing a better architecture is the next topic to discuss. Since the tagging is done with audio input, preprocessing the audio signal becomes one of the important research topics. After building (or training) a music tagging system, fi nding a suitable way to re-use it for other music information retrieval tasks is a compelling topic, in addition to interpreting the trained system. The evidence presented in the thesis supports that deep neural networks are powerful and credible methods for building a music tagging system.

APA, Harvard, Vancouver, ISO, and other styles

37

Yin, Yonghua. "Random neural networks for deep learning." Thesis, Imperial College London, 2018. http://hdl.handle.net/10044/1/64917.

Full text

Abstract:

The random neural network (RNN) is a mathematical model for an 'integrate and fire' spiking network that closely resembles the stochastic behaviour of neurons in mammalian brains. Since its proposal in 1989, there have been numerous investigations into the RNN's applications and learning algorithms. Deep learning (DL) has achieved great success in machine learning, but there has been no research into the properties of the RNN for DL to combine their power. This thesis intends to bridge the gap between RNNs and DL, in order to provide powerful DL tools that are faster, and that can potentially be used with less energy expenditure than existing methods. Based on the RNN function approximator proposed by Gelenbe in 1999, the approximation capability of the RNN is investigated and an efficient classifier is developed. By combining the RNN, DL and non-negative matrix factorisation, new shallow and multi-layer non-negative autoencoders are developed. The autoencoders are tested on typical image datasets and real-world datasets from different domains, and the test results yield the desired high learning accuracy. The concept of dense nuclei/clusters is examined, using RNN theory as a basis. In dense nuclei, neurons may interconnect via soma-to-soma interactions and conventional synaptic connections. A mathematical model of the dense nuclei is proposed and the transfer function can be deduced. A multi-layer architecture of the dense nuclei is constructed for DL, whose value is demonstrated by experiments on multi-channel datasets and server-state classification in cloud servers. A theoretical study into the multi-layer architecture of the standard RNN (MLRNN) for DL is presented. Based on the layer-output analyses, the MLRNN is shown to be a universal function approximator. The effects of the layer number on the learning capability and high-level representation extraction are analysed. A hypothesis for transforming the DL problem into a moment-learning problem is also presented. The power of the standard RNN for DL is investigated. The ability of the RNN with only positive parameters to conduct image convolution operations is demonstrated. The MLRNN equipped with the developed training algorithm achieves comparable or better classification at a lower computation cost than conventional DL methods.

APA, Harvard, Vancouver, ISO, and other styles

38

Zagoruyko, Sergey. "Weight parameterizations in deep neural networks." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1129/document.

Full text

Abstract:

Les réseaux de neurones multicouches ont été proposés pour la première fois il y a plus de trois décennies, et diverses architectures et paramétrages ont été explorés depuis. Récemment, les unités de traitement graphique ont permis une formation très efficace sur les réseaux neuronaux et ont permis de former des réseaux beaucoup plus grands sur des ensembles de données plus importants, ce qui a considérablement amélioré le rendement dans diverses tâches d'apprentissage supervisé. Cependant, la généralisation est encore loin du niveau humain, et il est difficile de comprendre sur quoi sont basées les décisions prises. Pour améliorer la généralisation et la compréhension, nous réexaminons les problèmes de paramétrage du poids dans les réseaux neuronaux profonds. Nous identifions les problèmes les plus importants, à notre avis, dans les architectures modernes : la profondeur du réseau, l'efficacité des paramètres et l'apprentissage de tâches multiples en même temps, et nous essayons de les aborder dans cette thèse. Nous commençons par l'un des problèmes fondamentaux de la vision par ordinateur, le patch matching, et proposons d'utiliser des réseaux neuronaux convolutifs de différentes architectures pour le résoudre, au lieu de descripteurs manuels. Ensuite, nous abordons la tâche de détection d'objets, où un réseau devrait apprendre simultanément à prédire à la fois la classe de l'objet et l'emplacement. Dans les deux tâches, nous constatons que le nombre de paramètres dans le réseau est le principal facteur déterminant sa performance, et nous explorons ce phénomène dans les réseaux résiduels. Nos résultats montrent que leur motivation initiale, la formation de réseaux plus profonds pour de meilleures représentations, ne tient pas entièrement, et des réseaux plus larges avec moins de couches peuvent être aussi efficaces que des réseaux plus profonds avec le même nombre de paramètres. Dans l'ensemble, nous présentons une étude approfondie sur les architectures et les paramétrages de poids, ainsi que sur les moyens de transférer les connaissances entre elles
Multilayer neural networks were first proposed more than three decades ago, and various architectures and parameterizations were explored since. Recently, graphics processing units enabled very efficient neural network training, and allowed training much larger networks on larger datasets, dramatically improving performance on various supervised learning tasks. However, the generalization is still far from human level, and it is difficult to understand on what the decisions made are based. To improve on generalization and understanding we revisit the problems of weight parameterizations in deep neural networks. We identify the most important, to our mind, problems in modern architectures: network depth, parameter efficiency, and learning multiple tasks at the same time, and try to address them in this thesis. We start with one of the core problems of computer vision, patch matching, and propose to use convolutional neural networks of various architectures to solve it, instead of manual hand-crafting descriptors. Then, we address the task of object detection, where a network should simultaneously learn to both predict class of the object and the location. In both tasks we find that the number of parameters in the network is the major factor determining it's performance, and explore this phenomena in residual networks. Our findings show that their original motivation, training deeper networks for better representations, does not fully hold, and wider networks with less layers can be as effective as deeper with the same number of parameters. Overall, we present an extensive study on architectures and weight parameterizations, and ways of transferring knowledge between them

APA, Harvard, Vancouver, ISO, and other styles

39

Ioannou, Yani Andrew. "Structural priors in deep neural networks." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278976.

Full text

Abstract:

Deep learning has in recent years come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite breakthroughs in training deep networks, there remains a lack of understanding of both the optimization and structure of deep networks. The approach advocated by many researchers in the field has been to train monolithic networks with excess complexity, and strong regularization --- an approach that leaves much to desire in efficiency. Instead we propose that carefully designing networks in consideration of our prior knowledge of the task and learned representation can improve the memory and compute efficiency of state-of-the art networks, and even improve generalization --- what we propose to denote as structural priors. We present two such novel structural priors for convolutional neural networks, and evaluate them in state-of-the-art image classification CNN architectures. The first of these methods proposes to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small, low-rank, filters. The second addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. The size of these channel-wise basis filters increases with the depth of the model, giving a novel sparse connection structure that resembles a tree root. Both methods are found to improve the generalization of these architectures while also decreasing the size and increasing the efficiency of their training and test-time computation. Finally, we present work towards conditional computation in deep neural networks, moving towards a method of automatically learning structural priors in deep networks. We propose a new discriminative learning model, conditional networks, that jointly exploit the accurate representation learning capabilities of deep neural networks with the efficient conditional computation of decision trees. Conditional networks yield smaller models, and offer test-time flexibility in the trade-off of computation vs. accuracy.

APA, Harvard, Vancouver, ISO, and other styles

40

Billman, Linnar, and Johan Hullberg. "Speech Reading with Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360022.

Full text

Abstract:

Recent growth in computational power and available data has increased popularityand progress of machine learning techniques. Methods of machine learning areused for automatic speech recognition in order to allow humans to transferinformation to computers simply by speech. In the present work, we are interestedin doing this for general contexts as e.g. speakers talking on TV or newsreadersrecorded in a studio. Automatic speech recognition systems are often solely basedon acoustic data. By introducing visual data such as lip movements, robustness ofsuch system can be increased.This thesis instead investigates how well machine learning techniques can learnthe art of lip reading as a sole source for automatic speech recognition. The keyidea is to use a sequence of 24 lip coordinates to feed to the system, rather thanlearning directly from the raw video frames.This thesis designs a solution around this principle empowered by state-of-the-artmachine learning techniques such as recurrent neural networks, making use ofGPUs. We find that this design reduces computational requirements by more thana factor of 25 compared to a state-of-art machine learning solution called LipNet.This however also scales down performance to an accuracy of 80% of what LipNetachieves, while still outperforming human recognition by a factor of 150%. Theaccuracies are based on processing of yet unseen speakers.This text presents this architecture. It details its design, reports its results, andcompares its performance to an existing solution. Basedon this, it is indicated how the result can be further refined.

APA, Harvard, Vancouver, ISO, and other styles

41

Wang, Shenhao. "Deep neural networks for choice analysis." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129894.

Full text

Abstract:

Thesis: Ph. D. in Computer and Urban Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, September, 2020
Cataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 117-128).
As deep neural networks (DNNs) outperform classical discrete choice models (DCMs) in many empirical studies, one pressing question is how to reconcile them in the context of choice analysis. So far researchers mainly compare their prediction accuracy, treating them as completely different modeling methods. However, DNNs and classical choice models are closely related and even complementary. This dissertation seeks to lay out a new foundation of using DNNs for choice analysis. It consists of three essays, which respectively tackle the issues of economic interpretation, architectural design, and robustness of DNNs by using classical utility theories. Essay 1 demonstrates that DNNs can provide economic information as complete as the classical DCMs.
The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution (MRS), and heterogeneous values of time (VOT). Unlike DCMs, DNNs can automatically learn the utility function and reveal behavioral patterns that are not prespecified by modelers. However, the economic information from DNNs can be unreliable because the automatic learning capacity is associated with three challenges: high sensitivity to hyperparameters, model non-identification, and local irregularity. To demonstrate the strength of DNNs as well as the three issues, I conduct an empirical experiment by applying the DNNs to a stated preference survey and discuss successively the full list of economic information extracted from the DNNs. Essay 2 designs a particular DNN architecture with alternative-specific utility functions (ASU-DNN) by using prior behavioral knowledge.
Theoretically, ASU-DNN reduces the estimation error of fully connected DNN (F-DNN) because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility could cause ASU-DNN to exhibit a larger approximation error. Both ASU-DNN and F-DNN can be treated as special cases of DNN architecture design guided by utility connectivity graph (UCG). Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN. The alternative-specific connectivity constraint, as a domain-knowledge- based regularization method, is more effective than other regularization methods. This essay demonstrates that prior behavioral knowledge can be used to guide the architecture design of DNN, to function as an effective domain-knowledge-based regularization method, and to improve both the interpretability and predictive power of DNNs in choice analysis.
Essay 3 designs a theory-based residual neural network (TB-ResNet) with a two-stage training procedure, which synthesizes decision-making theories and DNNs in a linear manner. Three instances of TB-ResNets based on choice modeling (CM-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets) are designed. Empirically, compared to the decision-making theories, the three instances of TB-ResNets predict significantly better in the out-of-sample test and become more interpretable owing to the rich utility function augmented by DNNs. Compared to the DNNs, the TB-ResNets predict better because the decision-making theories aid in localizing and regularizing the DNN models. TB-ResNets also become more robust than DNNs because the decision-making theories stablize the local utility function and the input gradients.
This essay demonstrates that it is both feasible and desirable to combine the handcrafted utility theory and automatic utility specification, with joint improvement in prediction, interpretation, and robustness.
by Shenhao Wang.
Ph. D. in Computer and Urban Science
Ph.D.inComputerandUrbanScience Massachusetts Institute of Technology, Department of Urban Studies and Planning

APA, Harvard, Vancouver, ISO, and other styles

42

Sunnegårdh, Christina. "Scar detection using deep neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299576.

Full text

Abstract:

Object detection is a computer vision method that deals with the tasks of localizing and classifying objects within an image. The number of usages for the method is constantly growing, and this thesis investigates the unexplored area of using deep neural networks for scar detection. Furthermore, the thesis investigates using the scar detector as a basis for the binary classification task of deciding whether in-the-wild images contains a scar or not. Two pre-trained object detection models, Faster R-CNN and RetinaNet, were trained on 1830 manually labeled images using different hyperparameters. Faster R-CNN Inception ResNet V2 achieved the highest results in terms of Average Precision (AP), particularly at higher IoU thresholds, closely followed by Faster R-CNN ResNet50, and finally RetinaNet. The results both indicate the superiority of Faster R-CNN compared to RetinaNet, as well as using Inception ResNet V2 as feature extractor for a large variety of object sizes. The reason is most likely due to multiple convolutional filters of different sizes operating at the same levels in the Inception ResNet network. As for inference time, RetinaNet was the fastest, followed by Faster R-CNN ResNet50 and finally Faster R-CNN Inception ResNet V2. For the binary classification task, the models were tested on a set of 200 images, where half of the images contained clearly visible scars. Faster R-CNN ResNet50 achieved the highest accuracy, followed by Faster R-CNN Inception ResNet V2 and finally RetinaNet. While the accuracy of RetinaNet suffered mainly from a low recall, Faster R-CNN Inception ResNet V2 detected some actual scars in images that had not been labeled due to low image quality, which could be a matter of subjective labeling and that the model is punished for something that at other times might be considered correct. In conclusion, this thesis shows promising results of using object detection to detect scars in images. While two-stage Faster R-CNN holds the advantage in AP for scar detection, one-stage RetinaNet holds the advantage in speed. Suggestions for future work include eliminating biases by putting more effort into labeling data as well as including training data that contain objects for which the models produced false positives. Examples of this are wounds, knuckles, and possible background objects that are visually similar to scars.
Objektdetektion är en metod inom datorseende som inkluderar både lokalisering och klassificering av objekt i bilder. Antalet användningsområden för metoden växer ständigt och denna studie undersöker det outforskade området av att använda djupa neurala nätverk för detektering av ärr. Studien utforskar även att använda detektering av ärr som grund för den binära klassificeringsuppgiften att bestämma om bilder innehåller ett synligt ärr eller inte. Två förtränade objektdetekteringsmodeller, Faster R-CNN och RetinaNet, tränades med olika hyperparametrar på 1830 manuellt märkta bilder. Faster RCNN Inception ResNet V2 uppnådde bäst resultat med avseende på average precision (AP), tätt följd av Faster R-CNN ResNet50 och slutligen RetinaNet. Resultatet indikerar både överlägsenhet av Faster R-CNN gentemot RetinaNet, såväl som att använda Inception ResNet V2 för särdragsextrahering. Detta beror med stor sannolikhet på dess användning av faltningsfilter i flera storlekar på samma nivåer i nätverket. Gällande detekteringstid per bild var RetinaNet snabbast, följd av Faster R-CNN ResNet50 och slutligen Faster R-CNN Inception ResNet V2. För den binära klassificeringsuppgiften testades modellerna på 200 bilder, där hälften av bilderna innehöll tydligt synliga ärr. Faster RCNN ResNet50 uppnådde högst träffsäkerhet, följt av Faster R-CNN Inception ResNet V2 och till sist RetinaNet. Medan träffsäkerheten för RetinaNet huvudsakligen bestraffades på grund av att ha förbisett ärr i bilder, så detekterade Faster R-CNN Inception ResNet V2 ett flertal faktiska ärr som inte datamärkts på grund av bristande bildkvalitet. Detta kan dock vara en fråga om subjektiv datamärkning och att modellen bestraffas för något som andra gånger skulle kunna anses korrekt. Sammanfattningsvis visar denna studie lovande resultat av att använda objektdetektion för att detektera ärr i bilder. Medan tvåstegsmodellen Faster R-CNN har övertaget sett till AP, har enstegsmodellen RetinaNet övertaget sett till detekteringstid. Förslag för framtida arbete inkluderar att lägga större vikt vid märkning av data för att eliminera potentiell subjektivitet, samt inkludera träningsdata innehållande objekt som modellerna misstog för ärr. Exempel på detta är öppna sår, knogar och bakgrundsobjekt som visuellt liknar ärr.

APA, Harvard, Vancouver, ISO, and other styles

43

Landeen, Trevor J. "Association Learning Via Deep Neural Networks." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7028.

Full text

Abstract:

Deep learning has been making headlines in recent years and is often portrayed as an emerging technology on a meteoric rise towards fully sentient artificial intelligence. In reality, deep learning is the most recent renaissance of a 70 year old technology and is far from possessing true intelligence. The renewed interest is motivated by recent successes in challenging problems, the accessibility made possible by hardware developments, and dataset availability. The predecessor to deep learning, commonly known as the artificial neural network, is a computational network setup to mimic the biological neural structure found in brains. However, unlike human brains, artificial neural networks, in most cases cannot make inferences from one problem to another. As a result, developing an artificial neural network requires a large number of examples of desired behavior for a specific problem. Furthermore, developing an artificial neural network capable of solving the problem can take days, or even weeks, of computations. Two specific problems addressed in this dissertation are both input association problems. One problem challenges a neural network to identify overlapping regions in images and is used to evaluate the ability of a neural network to learn associations between inputs of similar types. The other problem asks a neural network to identify which observed wireless signals originated from observed potential sources and is used to assess the ability of a neural network to learn associations between inputs of different types. The neural network solutions to both problems introduced, discussed, and evaluated in this dissertation demonstrate deep learning’s applicability to problems which have previously attracted little attention.

APA, Harvard, Vancouver, ISO, and other styles

44

Srivastava, Sanjana. "On foveation of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123134.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 61-63).
The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. [26] show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of- the-art convolutional neural networks (CNNs), and are much more prominent in CNNs. We found many cases where CNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in CNNs. Our results thus reveal a new failure mode of CNNs that also affects humans to a lesser degree. They expose how fragile CNN recognition ability is for natural images even without synthetic adversarial patterns being introduced. This opens potential for CNN robustness in natural images to be brought to the human level by taking inspiration from human robustness methods. One of these is eccentricity dependence, a model of human focus in which attention to the visual input degrades proportional to distance from the focal point [7]. We demonstrate that applying the "inverted pyramid" eccentricity method, a multi-scale input transformation, makes CNNs more robust to useless background features than a standard raw-image input. Our results also find that using the inverted pyramid method generally reduces useless background pixels, therefore reducing required training data.
by Sanjana Srivastava.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science

APA, Harvard, Vancouver, ISO, and other styles

45

Grechka, Asya. "Image editing with deep neural networks." Electronic Thesis or Diss., Sorbonne université, 2023. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2023SORUS683.pdf.

Full text

Abstract:

L'édition d'images a une histoire riche remontant à plus de deux siècles. Cependant, l'édition "classique" des images requiert une grande maîtrise artistique et nécessitent un temps considérable, souvent plusieurs heures, pour modifier chaque image. Ces dernières années, d'importants progrès dans la modélisation générative ont permis la synthèse d'images réalistes et de haute qualité. Toutefois, l'édition d'une image réelle est un vrai défi nécessitant de synthétiser de nouvelles caractéristiques tout en préservant fidèlement une partie de l'image d'origine. Dans cette thèse, nous explorons différentes approches pour l'édition d'images en exploitant trois familles de modèles génératifs : les GANs, les auto-encodeurs variationnels et les modèles de diffusion. Tout d'abord, nous étudions l'utilisation d'un GAN pré-entraîné pour éditer une image réelle. Bien que des méthodes d'édition d'images générées par des GANs soient bien connues, elles ne se généralisent pas facilement aux images réelles. Nous analysons les raisons de cette limitation et proposons une solution pour mieux projeter une image réelle dans un GAN afin de la rendre éditable. Ensuite, nous utilisons des autoencodeurs variationnels avec quantification vectorielle pour obtenir directement une représentation compacte de l'image (ce qui faisait défaut avec les GANs) et optimiser le vecteur latent de manière à se rapprocher d'un texte souhaité. Nous cherchons à contraindre ce problème, qui pourrait être vulnérable à des exemples adversariaux. Nous proposons une méthode pour choisir les hyperparamètres en fonction de la fidélité et de l'édition des images modifiées. Nous présentons un protocole d'évaluation robuste et démontrons l'intérêt de notre approche. Enfin, nous abordons l'édition d'images sous l'angle particulier de l'inpainting. Notre objectif est de synthétiser une partie de l'image tout en préservant le reste intact. Pour cela, nous exploitons des modèles de diffusion pré-entraînés et nous appuyons sur la méthode classique d'inpainting en remplaçant, à chaque étape du processus de débruitage, la partie que nous ne souhaitons pas modifier par l'image réelle bruitée. Cependant, cette méthode peut entraîner une désynchronisation entre la partie générée et la partie réelle. Nous proposons une approche basée sur le calcul du gradient d'une fonction qui évalue l'harmonisation entre les deux parties. Nous guidons ainsi le processus de débruitage en utilisant ce gradient
Image editing has a rich history which dates back two centuries. That said, "classic" image editing requires strong artistic skills as well as considerable time, often in the scale of hours, to modify an image. In recent years, considerable progress has been made in generative modeling which has allowed realistic and high-quality image synthesis. However, real image editing is still a challenge which requires a balance between novel generation all while faithfully preserving parts of the original image. In this thesis, we will explore different approaches to edit images, leveraging three families of generative networks: GANs, VAEs and diffusion models. First, we study how to use a GAN to edit a real image. While methods exist to modify generated images, they do not generalize easily to real images. We analyze the reasons for this and propose a solution to better project a real image into the GAN's latent space so as to make it editable. Then, we use variational autoencoders with vector quantification to directly obtain a compact image representation (which we could not obtain with GANs) and optimize the latent vector so as to match a desired text input. We aim to constrain this problem, which on the face could be vulnerable to adversarial attacks. We propose a method to chose the hyperparameters while optimizing simultaneously the image quality and the fidelity to the original image. We present a robust evaluation protocol and show the interest of our method. Finally, we abord the problem of image editing from the view of inpainting. Our goal is to synthesize a part of an image while preserving the rest unmodified. For this, we leverage pre-trained diffusion models and build off on their classic inpainting method while replacing, at each denoising step, the part which we do not wish to modify with the noisy real image. However, this method leads to a disharmonization between the real and generated parts. We propose an approach based on calculating a gradient of a loss which evaluates the harmonization of the two parts. We guide the denoising process with this gradient

APA, Harvard, Vancouver, ISO, and other styles

46

Plumbley, Mark David. "An information-theoretic approach to unsupervised connectionist models." Thesis, University of Cambridge, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.387051.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Al, Chami Zahi. "Estimation de la qualité des données multimedia en temps réel." Thesis, Pau, 2021. http://www.theses.fr/2021PAUU3066.

Full text

Abstract:

Au cours de la dernière décennie, les fournisseurs de données ont généré et diffusé une grande quantité de données, notamment des images, des vidéos, de l'audio, etc. Dans cette thèse, nous nous concentrerons sur le traitement des images puisqu'elles sont les plus communément partagées entre les utilisateurs sur l'inter-réseau mondial. En particulier, le traitement des images contenant des visages a reçu une grande attention en raison de ses nombreuses applications, telles que les applications de divertissement et de médias sociaux. Cependant, plusieurs défis pourraient survenir au cours de la phase de traitement et de transmission : d'une part, le nombre énorme d'images partagées et produites à un rythme rapide nécessite un temps de traitement et de livraison considérable; d’autre part, les images sont soumises à un très grand nombre de distorsions lors du traitement, de la transmission ou de la combinaison de nombreux facteurs qui pourraient endommager le contenu des images. Deux contributions principales sont développées. Tout d'abord, nous présentons un framework d'évaluation de la qualité d'image ayant une référence complète en temps réel, capable de : 1) préserver le contenu des images en s'assurant que certaines informations visuelles utiles peuvent toujours être extraites de l'image résultante, et 2) fournir un moyen de traiter les images en temps réel afin de faire face à l'énorme quantité d'images reçues à un rythme rapide. Le framework décrit ici est limité au traitement des images qui ont accès à leur image de référence (connu sous le nom référence complète). Dans notre second chapitre, nous présentons un framework d'évaluation de la qualité d'image sans référence en temps réel. Il a les capacités suivantes : a) évaluer l'image déformée sans avoir recours à son image originale, b) préserver les informations visuelles les plus utiles dans les images avant de les publier, et c) traiter les images en temps réel, bien que les modèles d'évaluation de la qualité des images sans référence sont considérés très complexes. Notre framework offre plusieurs avantages par rapport aux approches existantes, en particulier : i. il localise la distorsion dans une image afin d'évaluer directement les parties déformées au lieu de traiter l'image entière, ii. il a un compromis acceptable entre la précision de la prédiction de qualité et le temps d’exécution, et iii. il pourrait être utilisé dans plusieurs applications, en particulier celles qui fonctionnent en temps réel. L'architecture de chaque framework est présentée dans les chapitres tout en détaillant les modules et composants du framework. Ensuite, un certain nombre de simulations sont faites pour montrer l'efficacité de nos approches pour résoudre nos défis par rapport aux approches existantes
Over the past decade, data providers have been generating and streaming a large amount of data, including images, videos, audio, etc. In this thesis, we will be focusing on processing images since they are the most commonly shared between the users on the global inter-network. In particular, treating images containing faces has received great attention due to its numerous applications, such as entertainment and social media apps. However, several challenges could arise during the processing and transmission phase: firstly, the enormous number of images shared and produced at a rapid pace requires a significant amount of time to be processed and delivered; secondly, images are subject to a wide range of distortions during the processing, transmission, or combination of many factors that could damage the images’content. Two main contributions are developed. First, we introduce a Full-Reference Image Quality Assessment Framework in Real-Time, capable of:1) preserving the images’content by ensuring that some useful visual information can still be extracted from the output, and 2) providing a way to process the images in real-time in order to cope with the huge amount of images that are being received at a rapid pace. The framework described here is limited to processing those images that have access to their reference version (a.k.a Full-Reference). Secondly, we present a No-Reference Image Quality Assessment Framework in Real-Time. It has the following abilities: a) assessing the distorted image without having its distortion-free image, b) preserving the most useful visual information in the images before publishing, and c) processing the images in real-time, even though the No-Reference image quality assessment models are considered very complex. Our framework offers several advantages over the existing approaches, in particular: i. it locates the distortion in an image in order to directly assess the distorted parts instead of processing the whole image, ii. it has an acceptable trade-off between quality prediction accuracy and execution latency, andiii. it could be used in several applications, especially these that work in real-time. The architecture of each framework is presented in the chapters while detailing the modules and components of the framework. Then, a number of simulations are made to show the effectiveness of our approaches to solve our challenges in relation to the existing approaches

APA, Harvard, Vancouver, ISO, and other styles

48

Haddad, Josef, and Carl Piehl. "Unsupervised anomaly detection in time series with recurrent neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259655.

Full text

Abstract:

Artificial neural networks (ANN) have been successfully applied to a wide range of problems. However, most of the ANN-based models do not attempt to model the brain in detail, but there are still some models that do. An example of a biologically constrained ANN is Hierarchical Temporal Memory (HTM). This study applies HTM and Long Short-Term Memory (LSTM) to anomaly detection problems in time series in order to compare their performance for this task. The shape of the anomalies are restricted to point anomalies and the time series are univariate. Pre-existing implementations that utilise these networks for unsupervised anomaly detection in time series are used in this study. We primarily use our own synthetic data sets in order to discover the networks’ robustness to noise and how they compare to each other regarding different characteristics in the time series. Our results shows that both networks can handle noisy time series and the difference in performance regarding noise robustness is not significant for the time series used in the study. LSTM outperforms HTM in detecting point anomalies on our synthetic time series with sine curve trend but a conclusion about the overall best performing network among these two remains inconclusive.
Artificiella neurala nätverk (ANN) har tillämpats på många problem. Däremot försöker inte de flesta ANN-modeller efterlikna hjärnan i detalj. Ett exempel på ett ANN som är begränsat till att efterlikna hjärnan är Hierarchical Temporal Memory (HTM). Denna studie tillämpar HTM och Long Short-Term Memory (LSTM) på avvikelsedetektionsproblem i tidsserier för att undersöka vilka styrkor och svagheter de har för detta problem. Avvikelserna i denna studie är begränsade till punktavvikelser och tidsserierna är i endast en variabel. Redan existerande implementationer som utnyttjar dessa nätverk för oövervakad avvikelsedetektionsproblem i tidsserier används i denna studie. Vi använder främst våra egna syntetiska tidsserier för att undersöka hur nätverken hanterar brus och hur de hanterar olika egenskaper som en tidsserie kan ha. Våra resultat visar att båda nätverken kan hantera brus och prestationsskillnaden rörande brusrobusthet var inte tillräckligt stor för att urskilja modellerna. LSTM presterade bättre än HTM på att upptäcka punktavvikelser i våra syntetiska tidsserier som följer en sinuskurva men en slutsats angående vilket nätverk som presterar bäst överlag är fortfarande oavgjord.

APA, Harvard, Vancouver, ISO, and other styles

49

Galtier, Mathieu. "A mathematical approach to unsupervised learning in recurrent neural networks." Paris, ENMP, 2011. https://pastel.hal.science/pastel-00667368.

Full text

Abstract:

Dans cette thèse nous tentons de donner un sens mathématique à la proposition : le néocortex se construit un modèle de son environnement. Nous considérons que le néocortex est un réseau de neurones spikants dont la connectivité est soumise à une lente évolution appelée apprentissage. Dans le cas où le nombre de neurones est proche de l'infini, nous proposons une nouvelle méthode de champ-moyen afin de trouver une équation décrivant l'évolution du taux de décharge de populations de neurones. Nous étudions donc la dynamique de ce système moyennisé avec apprentissage. Dans le régime où l'apprentissage est beaucoup plus lent que l'activité du réseau nous pouvons utiliser des outils de moyennisation temporelle pour les systèmes lents/rapides. Dans ce cadre mathématique nous montrons que la connectivité du réseau converge toujours vers une unique valeur d'équilibre que nous pouvons calculer explicitement. Cette connectivité regroupe l'ensemble des connaissances du réseau à propos de son environnement. Nous comparons cette connectivité à l'équilibre avec les stimuli du réseau. Considérant que l'environnement est solution d'un système dynamique quelconque, il est possible de montrer que le réseau encode la totalité de l'information nécessaire à la définition de ce système dynamique. En effet nous montrons que la partie symétrique de la connectivité correspond à la variété sur laquelle est définie le système dynamique de l'environnement, alors que la partie anti-symétrique de la connectivité correspond au champ de vecteur définissant le système dynamique de l'environnement. Dans ce contexte il devient clair que le réseau agit comme un prédicteur de son environnement
In this thesis, we propose to give a mathematical sense to the claim: the neocortex builds itself a model of its environment. We study the neocortex as a network of spiking neurons undergoing slow STDP learning. By considering that the number of neurons is close to infinity, we propose a new mean-field method to find the ''smoother'' equation describing the firing-rate of populations of these neurons. Then, we study the dynamics of this averaged system with learning. By assuming the modification of the synapses' strength is very slow compared the activity of the network, it is possible to use tools from temporal averaging theory. They lead to showing that the connectivity of the network always converges towards a single equilibrium point which can be computed explicitely. This connectivity gathers the knowledge of the network about the world. Finally, we analyze the equilibrium connectivity and compare it to the inputs. By seeing the inputs as the solution of a dynamical system, we are able to show that the connectivity embedded the entire information about this dynamical system. Indeed, we show that the symmetric part of the connectivity leads to finding the manifold over which the inputs dynamical system is defined, and that the anti-symmetric part of the connectivity corresponds to the vector field of the inputs dynamical system. In this context, the network acts as a predictor of the future events in its environment

APA, Harvard, Vancouver, ISO, and other styles

50

Chen, Zhe. "Augmented Context Modelling Neural Networks." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20654.

Full text

Abstract:

Contexts provide beneficial information for machine-based image understanding tasks. However, existing context modelling methods still cannot fully exploit contexts, especially for object recognition and detection. In this thesis, we develop augmented context modelling neural networks to better utilize contexts for different object recognition and detection tasks. Our contributions are two-fold: 1) we introduce neural networks to better model instance-level visual relationships; 2) we introduce neural network-based algorithms to better utilize contexts from 3D information and synthesized data. In particular, to augment the modelling of instance-level visual relationships, we propose a context refinement network and an encapsulated context modelling network for object detection. In the context refinement study, we propose to improve the modeling of visual relationships by introducing overlap scores and confidence scores of different regions. In addition, in the encapsulated context modelling study, we boost the context modelling performance by exploiting the more powerful capsule-based neural networks. To augment the modeling of contexts from different sources, we propose novel neural networks to better utilize 3D information and synthesis-based contexts. For the modelling of 3D information, we mainly investigate the modelling of LiDAR data for road detection and the depth data for instance segmentation, respectively. In road detection, we develop a progressive LiDAR adaptation algorithm to improve the fusion of 3D LiDAR data and 2D image data. Regarding instance segmentation, we model depth data as context to help tackle the low-resolution annotation-based training problem. Moreover, to improve the modelling of synthesis-based contexts, we devise a shape translation-based pedestrian generation framework to help improve the pedestrian detection performance.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!