To see the other types of publications on this topic, follow the link: Deep Photonic Neural Networks.

Dissertations / Theses on the topic 'Deep Photonic Neural Networks'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Deep Photonic Neural Networks.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Liu, Qian. "Deep spiking neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/deep-spiking-neural-networks(336e6a37-2a0b-41ff-9ffb-cca897220d6c).html.

Full text
Abstract:
Neuromorphic Engineering (NE) has led to the development of biologically-inspired computer architectures whose long-term goal is to approach the performance of the human brain in terms of energy efficiency and cognitive capabilities. Although there are a number of neuromorphic platforms available for large-scale Spiking Neural Network (SNN) simulations, the problem of programming these brain-like machines to be competent in cognitive applications still remains unsolved. On the other hand, Deep Learning has emerged in Artificial Neural Network (ANN) research to dominate state-of-the-art solutions for cognitive tasks. Thus the main research problem emerges of understanding how to operate and train biologically-plausible SNNs to close the gap in cognitive capabilities between SNNs and ANNs. SNNs can be trained by first training an equivalent ANN and then transferring the tuned weights to the SNN. This method is called ‘off-line’ training, since it does not take place on an SNN directly, but rather on an ANN instead. However, previous work on such off-line training methods has struggled in terms of poor modelling accuracy of the spiking neurons and high computational complexity. In this thesis we propose a simple and novel activation function, Noisy Softplus (NSP), to closely model the response firing activity of biologically-plausible spiking neurons, and introduce a generalised off-line training method using the Parametric Activation Function (PAF) to map the abstract numerical values of the ANN to concrete physical units, such as current and firing rate in the SNN. Based on this generalised training method and its fine tuning, we achieve the state-of-the-art accuracy on the MNIST classification task using spiking neurons, 99.07%, on a deep spiking convolutional neural network (ConvNet). We then take a step forward to ‘on-line’ training methods, where Deep Learning modules are trained purely on SNNs in an event-driven manner. Existing work has failed to provide SNNs with recognition accuracy equivalent to ANNs due to the lack of mathematical analysis. Thus we propose a formalised Spike-based Rate Multiplication (SRM) method which transforms the product of firing rates to the number of coincident spikes of a pair of rate-coded spike trains. Moreover, these coincident spikes can be captured by the Spike-Time-Dependent Plasticity (STDP) rule to update the weights between the neurons in an on-line, event-based, and biologically-plausible manner. Furthermore, we put forward solutions to reduce correlations between spike trains; thereby addressing the result of performance drop in on-line SNN training. The promising results of spiking Autoencoders (AEs) and Restricted Boltzmann Machines (SRBMs) exhibit equivalent, sometimes even superior, classification and reconstruction capabilities compared to their non-spiking counterparts. To provide meaningful comparisons between these proposed SNN models and other existing methods within this rapidly advancing field of NE, we propose a large dataset of spike-based visual stimuli and a corresponding evaluation methodology to estimate the overall performance of SNN models and their hardware implementations.
APA, Harvard, Vancouver, ISO, and other styles
2

Squadrani, Lorenzo. "Deep neural networks and thermodynamics." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text
Abstract:
Deep learning is the most effective and used approach to artificial intelligence, and yet it is far from being properly understood. The understanding of it is the way to go to further improve its effectiveness and in the best case to gain some understanding of the "natural" intelligence. We attempt a step in this direction with the aim of physics. We describe a convolutional neural network for image classification (trained on CIFAR-10) within the descriptive framework of Thermodynamics. In particular we define and study the temperature of each component of the network. Our results provides a new point of view on deep learning models, which may be a starting point towards a better understanding of artificial intelligence.
APA, Harvard, Vancouver, ISO, and other styles
3

Mancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.

Full text
Abstract:
Deep Convolutional Neural Networks and "deep learning" in general stand at the cutting edge on a range of applications, from image based recognition and classification to natural language processing, speech and speaker recognition and reinforcement learning. Very deep models however are often large, complex and computationally expensive to train and evaluate. Deep learning models are thus seldom deployed natively in environments where computational resources are scarce or expensive. To address this problem we turn our attention towards a range of techniques that we collectively refer to as "model compression" where a lighter student model is trained to approximate the output produced by the model we wish to compress. To this end, the output from the original model is used to craft the training labels of the smaller student model. This work contains some experiments on CIFAR-10 and demonstrates how to use the aforementioned techniques to compress a people counting model whose precision, recall and F1-score are improved by as much as 14% against our baseline.
APA, Harvard, Vancouver, ISO, and other styles
4

Abbasi, Mahdieh. "Toward robust deep neural networks." Doctoral thesis, Université Laval, 2020. http://hdl.handle.net/20.500.11794/67766.

Full text
Abstract:
Dans cette thèse, notre objectif est de développer des modèles d’apprentissage robustes et fiables mais précis, en particulier les Convolutional Neural Network (CNN), en présence des exemples anomalies, comme des exemples adversaires et d’échantillons hors distribution –Out-of-Distribution (OOD). Comme la première contribution, nous proposons d’estimer la confiance calibrée pour les exemples adversaires en encourageant la diversité dans un ensemble des CNNs. À cette fin, nous concevons un ensemble de spécialistes diversifiés avec un mécanisme de vote simple et efficace en termes de calcul pour prédire les exemples adversaires avec une faible confiance tout en maintenant la confiance prédicative des échantillons propres élevée. En présence de désaccord dans notre ensemble, nous prouvons qu’une borne supérieure de 0:5 + _0 peut être établie pour la confiance, conduisant à un seuil de détection global fixe de tau = 0; 5. Nous justifions analytiquement le rôle de la diversité dans notre ensemble sur l’atténuation du risque des exemples adversaires à la fois en boîte noire et en boîte blanche. Enfin, nous évaluons empiriquement la robustesse de notre ensemble aux attaques de la boîte noire et de la boîte blanche sur plusieurs données standards. La deuxième contribution vise à aborder la détection d’échantillons OOD à travers un modèle de bout en bout entraîné sur un ensemble OOD approprié. À cette fin, nous abordons la question centrale suivante : comment différencier des différents ensembles de données OOD disponibles par rapport à une tâche de distribution donnée pour sélectionner la plus appropriée, ce qui induit à son tour un modèle calibré avec un taux de détection des ensembles inaperçus de données OOD? Pour répondre à cette question, nous proposons de différencier les ensembles OOD par leur niveau de "protection" des sub-manifolds. Pour mesurer le niveau de protection, nous concevons ensuite trois nouvelles mesures efficaces en termes de calcul à l’aide d’un CNN vanille préformé. Dans une vaste série d’expériences sur les tâches de classification d’image et d’audio, nous démontrons empiriquement la capacité d’un CNN augmenté (A-CNN) et d’un CNN explicitement calibré pour détecter une portion significativement plus grande des exemples OOD. Fait intéressant, nous observons également qu’un tel A-CNN (nommé A-CNN) peut également détecter les adversaires exemples FGS en boîte noire avec des perturbations significatives. En tant que troisième contribution, nous étudions de plus près de la capacité de l’A-CNN sur la détection de types plus larges d’adversaires boîte noire (pas seulement ceux de type FGS). Pour augmenter la capacité d’A-CNN à détecter un plus grand nombre d’adversaires,nous augmentons l’ensemble d’entraînement OOD avec des échantillons interpolés inter-classes. Ensuite, nous démontrons que l’A-CNN, entraîné sur tous ces données, a un taux de détection cohérent sur tous les types des adversaires exemples invisibles. Alors que la entraînement d’un A-CNN sur des adversaires PGD ne conduit pas à un taux de détection stable sur tous les types d’adversaires, en particulier les types inaperçus. Nous évaluons également visuellement l’espace des fonctionnalités et les limites de décision dans l’espace d’entrée d’un CNN vanille et de son homologue augmenté en présence d’adversaires et de ceux qui sont propres. Par un A-CNN correctement formé, nous visons à faire un pas vers un modèle d’apprentissage debout en bout unifié et fiable avec de faibles taux de risque sur les échantillons propres et les échantillons inhabituels, par exemple, les échantillons adversaires et OOD. La dernière contribution est de présenter une application de A-CNN pour l’entraînement d’un détecteur d’objet robuste sur un ensemble de données partiellement étiquetées, en particulier un ensemble de données fusionné. La fusion de divers ensembles de données provenant de contextes similaires mais avec différents ensembles d’objets d’intérêt (OoI) est un moyen peu coûteux de créer un ensemble de données à grande échelle qui couvre un plus large spectre d’OoI. De plus, la fusion d’ensembles de données permet de réaliser un détecteur d’objet unifié, au lieu d’en avoir plusieurs séparés, ce qui entraîne une réduction des coûts de calcul et de temps. Cependant, la fusion d’ensembles de données, en particulier à partir d’un contexte similaire, entraîne de nombreuses instances d’étiquetées manquantes. Dans le but d’entraîner un détecteur d’objet robuste intégré sur un ensemble de données partiellement étiquetées mais à grande échelle, nous proposons un cadre d’entraînement auto-supervisé pour surmonter le problème des instances d’étiquettes manquantes dans les ensembles des données fusionnés. Notre cadre est évalué sur un ensemble de données fusionné avec un taux élevé d’étiquettes manquantes. Les résultats empiriques confirment la viabilité de nos pseudo-étiquettes générées pour améliorer les performances de YOLO, en tant que détecteur d’objet à la pointe de la technologie.
In this thesis, our goal is to develop robust and reliable yet accurate learning models, particularly Convolutional Neural Networks (CNNs), in the presence of adversarial examples and Out-of-Distribution (OOD) samples. As the first contribution, we propose to predict adversarial instances with high uncertainty through encouraging diversity in an ensemble of CNNs. To this end, we devise an ensemble of diverse specialists along with a simple and computationally efficient voting mechanism to predict the adversarial examples with low confidence while keeping the predictive confidence of the clean samples high. In the presence of high entropy in our ensemble, we prove that the predictive confidence can be upper-bounded, leading to have a globally fixed threshold over the predictive confidence for identifying adversaries. We analytically justify the role of diversity in our ensemble on mitigating the risk of both black-box and white-box adversarial examples. Finally, we empirically assess the robustness of our ensemble to the black-box and the white-box attacks on several benchmark datasets.The second contribution aims to address the detection of OOD samples through an end-to-end model trained on an appropriate OOD set. To this end, we address the following central question: how to differentiate many available OOD sets w.r.t. a given in distribution task to select the most appropriate one, which in turn induces a model with a high detection rate of unseen OOD sets? To answer this question, we hypothesize that the “protection” level of in-distribution sub-manifolds by each OOD set can be a good possible property to differentiate OOD sets. To measure the protection level, we then design three novel, simple, and cost-effective metrics using a pre-trained vanilla CNN. In an extensive series of experiments on image and audio classification tasks, we empirically demonstrate the abilityof an Augmented-CNN (A-CNN) and an explicitly-calibrated CNN for detecting a significantly larger portion of unseen OOD samples, if they are trained on the most protective OOD set. Interestingly, we also observe that the A-CNN trained on the most protective OOD set (calledA-CNN) can also detect the black-box Fast Gradient Sign (FGS) adversarial examples. As the third contribution, we investigate more closely the capacity of the A-CNN on the detection of wider types of black-box adversaries. To increase the capability of A-CNN to detect a larger number of adversaries, we augment its OOD training set with some inter-class interpolated samples. Then, we demonstrate that the A-CNN trained on the most protective OOD set along with the interpolated samples has a consistent detection rate on all types of unseen adversarial examples. Where as training an A-CNN on Projected Gradient Descent (PGD) adversaries does not lead to a stable detection rate on all types of adversaries, particularly the unseen types. We also visually assess the feature space and the decision boundaries in the input space of a vanilla CNN and its augmented counterpart in the presence of adversaries and the clean ones. By a properly trained A-CNN, we aim to take a step toward a unified and reliable end-to-end learning model with small risk rates on both clean samples and the unusual ones, e.g. adversarial and OOD samples.The last contribution is to show a use-case of A-CNN for training a robust object detector on a partially-labeled dataset, particularly a merged dataset. Merging various datasets from similar contexts but with different sets of Object of Interest (OoI) is an inexpensive way to craft a large-scale dataset which covers a larger spectrum of OoIs. Moreover, merging datasets allows achieving a unified object detector, instead of having several separate ones, resultingin the reduction of computational and time costs. However, merging datasets, especially from a similar context, causes many missing-label instances. With the goal of training an integrated robust object detector on a partially-labeled but large-scale dataset, we propose a self-supervised training framework to overcome the issue of missing-label instances in the merged datasets. Our framework is evaluated on a merged dataset with a high missing-label rate. The empirical results confirm the viability of our generated pseudo-labels to enhance the performance of YOLO, as the current (to date) state-of-the-art object detector.
APA, Harvard, Vancouver, ISO, and other styles
5

Lu, Yifei. "Deep neural networks and fraud detection." Thesis, Uppsala universitet, Tillämpad matematik och statistik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-331833.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kalogiras, Vasileios. "Sentiment Classification with Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217858.

Full text
Abstract:
Attitydanalys är ett delfält av språkteknologi (NLP) som försöker analysera känslan av skriven text. Detta är ett komplext problem som medför många utmaningar. Av denna anledning har det studerats i stor utsträckning. Under de senaste åren har traditionella maskininlärningsalgoritmer eller handgjord metodik använts och givit utmärkta resultat. Men den senaste renässansen för djupinlärning har växlat om intresse till end to end deep learning-modeller.Å ena sidan resulterar detta i mer kraftfulla modeller men å andra sidansaknas klart matematiskt resonemang eller intuition för dessa modeller. På grund av detta görs ett försök i denna avhandling med att kasta ljus på nyligen föreslagna deep learning-arkitekturer för attitydklassificering. En studie av deras olika skillnader utförs och ger empiriska resultat för hur ändringar i strukturen eller kapacitet hos modellen kan påverka exaktheten och sättet den representerar och ''förstår'' meningarna.
Sentiment analysis is a subfield of natural language processing (NLP) that attempts to analyze the sentiment of written text.It is is a complex problem that entails different challenges. For this reason, it has been studied extensively. In the past years traditional machine learning algorithms or handcrafted methodologies used to provide state of the art results. However, the recent deep learning renaissance shifted interest towards end to end deep learning models. On the one hand this resulted into more powerful models but on the other hand clear mathematical reasoning or intuition behind distinct models is still lacking. As a result, in this thesis, an attempt to shed some light on recently proposed deep learning architectures for sentiment classification is made.A study of their differences is performed as well as provide empirical results on how changes in the structure or capacity of a model can affect its accuracy and the way it represents and ''comprehends'' sentences.
APA, Harvard, Vancouver, ISO, and other styles
7

Choi, Keunwoo. "Deep neural networks for music tagging." Thesis, Queen Mary, University of London, 2018. http://qmro.qmul.ac.uk/xmlui/handle/123456789/46029.

Full text
Abstract:
In this thesis, I present my hypothesis, experiment results, and discussion that are related to various aspects of deep neural networks for music tagging. Music tagging is a task to automatically predict the suitable semantic label when music is provided. Generally speaking, the input of music tagging systems can be any entity that constitutes music, e.g., audio content, lyrics, or metadata, but only the audio content is considered in this thesis. My hypothesis is that we can fi nd effective deep learning practices for the task of music tagging task that improves the classi fication performance. As a computational model to realise a music tagging system, I use deep neural networks. Combined with the research problem, the scope of this thesis is the understanding, interpretation, optimisation, and application of deep neural networks in the context of music tagging systems. The ultimate goal of this thesis is to provide insight that can help to improve deep learning-based music tagging systems. There are many smaller goals in this regard. Since using deep neural networks is a data-driven approach, it is crucial to understand the dataset. Selecting and designing a better architecture is the next topic to discuss. Since the tagging is done with audio input, preprocessing the audio signal becomes one of the important research topics. After building (or training) a music tagging system, fi nding a suitable way to re-use it for other music information retrieval tasks is a compelling topic, in addition to interpreting the trained system. The evidence presented in the thesis supports that deep neural networks are powerful and credible methods for building a music tagging system.
APA, Harvard, Vancouver, ISO, and other styles
8

Yin, Yonghua. "Random neural networks for deep learning." Thesis, Imperial College London, 2018. http://hdl.handle.net/10044/1/64917.

Full text
Abstract:
The random neural network (RNN) is a mathematical model for an 'integrate and fire' spiking network that closely resembles the stochastic behaviour of neurons in mammalian brains. Since its proposal in 1989, there have been numerous investigations into the RNN's applications and learning algorithms. Deep learning (DL) has achieved great success in machine learning, but there has been no research into the properties of the RNN for DL to combine their power. This thesis intends to bridge the gap between RNNs and DL, in order to provide powerful DL tools that are faster, and that can potentially be used with less energy expenditure than existing methods. Based on the RNN function approximator proposed by Gelenbe in 1999, the approximation capability of the RNN is investigated and an efficient classifier is developed. By combining the RNN, DL and non-negative matrix factorisation, new shallow and multi-layer non-negative autoencoders are developed. The autoencoders are tested on typical image datasets and real-world datasets from different domains, and the test results yield the desired high learning accuracy. The concept of dense nuclei/clusters is examined, using RNN theory as a basis. In dense nuclei, neurons may interconnect via soma-to-soma interactions and conventional synaptic connections. A mathematical model of the dense nuclei is proposed and the transfer function can be deduced. A multi-layer architecture of the dense nuclei is constructed for DL, whose value is demonstrated by experiments on multi-channel datasets and server-state classification in cloud servers. A theoretical study into the multi-layer architecture of the standard RNN (MLRNN) for DL is presented. Based on the layer-output analyses, the MLRNN is shown to be a universal function approximator. The effects of the layer number on the learning capability and high-level representation extraction are analysed. A hypothesis for transforming the DL problem into a moment-learning problem is also presented. The power of the standard RNN for DL is investigated. The ability of the RNN with only positive parameters to conduct image convolution operations is demonstrated. The MLRNN equipped with the developed training algorithm achieves comparable or better classification at a lower computation cost than conventional DL methods.
APA, Harvard, Vancouver, ISO, and other styles
9

Zagoruyko, Sergey. "Weight parameterizations in deep neural networks." Thesis, Paris Est, 2018. http://www.theses.fr/2018PESC1129/document.

Full text
Abstract:
Les réseaux de neurones multicouches ont été proposés pour la première fois il y a plus de trois décennies, et diverses architectures et paramétrages ont été explorés depuis. Récemment, les unités de traitement graphique ont permis une formation très efficace sur les réseaux neuronaux et ont permis de former des réseaux beaucoup plus grands sur des ensembles de données plus importants, ce qui a considérablement amélioré le rendement dans diverses tâches d'apprentissage supervisé. Cependant, la généralisation est encore loin du niveau humain, et il est difficile de comprendre sur quoi sont basées les décisions prises. Pour améliorer la généralisation et la compréhension, nous réexaminons les problèmes de paramétrage du poids dans les réseaux neuronaux profonds. Nous identifions les problèmes les plus importants, à notre avis, dans les architectures modernes : la profondeur du réseau, l'efficacité des paramètres et l'apprentissage de tâches multiples en même temps, et nous essayons de les aborder dans cette thèse. Nous commençons par l'un des problèmes fondamentaux de la vision par ordinateur, le patch matching, et proposons d'utiliser des réseaux neuronaux convolutifs de différentes architectures pour le résoudre, au lieu de descripteurs manuels. Ensuite, nous abordons la tâche de détection d'objets, où un réseau devrait apprendre simultanément à prédire à la fois la classe de l'objet et l'emplacement. Dans les deux tâches, nous constatons que le nombre de paramètres dans le réseau est le principal facteur déterminant sa performance, et nous explorons ce phénomène dans les réseaux résiduels. Nos résultats montrent que leur motivation initiale, la formation de réseaux plus profonds pour de meilleures représentations, ne tient pas entièrement, et des réseaux plus larges avec moins de couches peuvent être aussi efficaces que des réseaux plus profonds avec le même nombre de paramètres. Dans l'ensemble, nous présentons une étude approfondie sur les architectures et les paramétrages de poids, ainsi que sur les moyens de transférer les connaissances entre elles
Multilayer neural networks were first proposed more than three decades ago, and various architectures and parameterizations were explored since. Recently, graphics processing units enabled very efficient neural network training, and allowed training much larger networks on larger datasets, dramatically improving performance on various supervised learning tasks. However, the generalization is still far from human level, and it is difficult to understand on what the decisions made are based. To improve on generalization and understanding we revisit the problems of weight parameterizations in deep neural networks. We identify the most important, to our mind, problems in modern architectures: network depth, parameter efficiency, and learning multiple tasks at the same time, and try to address them in this thesis. We start with one of the core problems of computer vision, patch matching, and propose to use convolutional neural networks of various architectures to solve it, instead of manual hand-crafting descriptors. Then, we address the task of object detection, where a network should simultaneously learn to both predict class of the object and the location. In both tasks we find that the number of parameters in the network is the major factor determining it's performance, and explore this phenomena in residual networks. Our findings show that their original motivation, training deeper networks for better representations, does not fully hold, and wider networks with less layers can be as effective as deeper with the same number of parameters. Overall, we present an extensive study on architectures and weight parameterizations, and ways of transferring knowledge between them
APA, Harvard, Vancouver, ISO, and other styles
10

Ioannou, Yani Andrew. "Structural priors in deep neural networks." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278976.

Full text
Abstract:
Deep learning has in recent years come to dominate the previously separate fields of research in machine learning, computer vision, natural language understanding and speech recognition. Despite breakthroughs in training deep networks, there remains a lack of understanding of both the optimization and structure of deep networks. The approach advocated by many researchers in the field has been to train monolithic networks with excess complexity, and strong regularization --- an approach that leaves much to desire in efficiency. Instead we propose that carefully designing networks in consideration of our prior knowledge of the task and learned representation can improve the memory and compute efficiency of state-of-the art networks, and even improve generalization --- what we propose to denote as structural priors. We present two such novel structural priors for convolutional neural networks, and evaluate them in state-of-the-art image classification CNN architectures. The first of these methods proposes to exploit our knowledge of the low-rank nature of most filters learned for natural images by structuring a deep network to learn a collection of mostly small, low-rank, filters. The second addresses the filter/channel extents of convolutional filters, by learning filters with limited channel extents. The size of these channel-wise basis filters increases with the depth of the model, giving a novel sparse connection structure that resembles a tree root. Both methods are found to improve the generalization of these architectures while also decreasing the size and increasing the efficiency of their training and test-time computation. Finally, we present work towards conditional computation in deep neural networks, moving towards a method of automatically learning structural priors in deep networks. We propose a new discriminative learning model, conditional networks, that jointly exploit the accurate representation learning capabilities of deep neural networks with the efficient conditional computation of decision trees. Conditional networks yield smaller models, and offer test-time flexibility in the trade-off of computation vs. accuracy.
APA, Harvard, Vancouver, ISO, and other styles
11

Billman, Linnar, and Johan Hullberg. "Speech Reading with Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-360022.

Full text
Abstract:
Recent growth in computational power and available data has increased popularityand progress of machine learning techniques. Methods of machine learning areused for automatic speech recognition in order to allow humans to transferinformation to computers simply by speech. In the present work, we are interestedin doing this for general contexts as e.g. speakers talking on TV or newsreadersrecorded in a studio. Automatic speech recognition systems are often solely basedon acoustic data. By introducing visual data such as lip movements, robustness ofsuch system can be increased.This thesis instead investigates how well machine learning techniques can learnthe art of lip reading as a sole source for automatic speech recognition. The keyidea is to use a sequence of 24 lip coordinates to feed to the system, rather thanlearning directly from the raw video frames.This thesis designs a solution around this principle empowered by state-of-the-artmachine learning techniques such as recurrent neural networks, making use ofGPUs. We find that this design reduces computational requirements by more thana factor of 25 compared to a state-of-art machine learning solution called LipNet.This however also scales down performance to an accuracy of 80% of what LipNetachieves, while still outperforming human recognition by a factor of 150%. Theaccuracies are based on processing of yet unseen speakers.This text presents this architecture. It details its design, reports its results, andcompares its performance to an existing solution. Basedon this, it is indicated how the result can be further refined.
APA, Harvard, Vancouver, ISO, and other styles
12

Wang, Shenhao. "Deep neural networks for choice analysis." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/129894.

Full text
Abstract:
Thesis: Ph. D. in Computer and Urban Science, Massachusetts Institute of Technology, Department of Urban Studies and Planning, September, 2020
Cataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 117-128).
As deep neural networks (DNNs) outperform classical discrete choice models (DCMs) in many empirical studies, one pressing question is how to reconcile them in the context of choice analysis. So far researchers mainly compare their prediction accuracy, treating them as completely different modeling methods. However, DNNs and classical choice models are closely related and even complementary. This dissertation seeks to lay out a new foundation of using DNNs for choice analysis. It consists of three essays, which respectively tackle the issues of economic interpretation, architectural design, and robustness of DNNs by using classical utility theories. Essay 1 demonstrates that DNNs can provide economic information as complete as the classical DCMs.
The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution (MRS), and heterogeneous values of time (VOT). Unlike DCMs, DNNs can automatically learn the utility function and reveal behavioral patterns that are not prespecified by modelers. However, the economic information from DNNs can be unreliable because the automatic learning capacity is associated with three challenges: high sensitivity to hyperparameters, model non-identification, and local irregularity. To demonstrate the strength of DNNs as well as the three issues, I conduct an empirical experiment by applying the DNNs to a stated preference survey and discuss successively the full list of economic information extracted from the DNNs. Essay 2 designs a particular DNN architecture with alternative-specific utility functions (ASU-DNN) by using prior behavioral knowledge.
Theoretically, ASU-DNN reduces the estimation error of fully connected DNN (F-DNN) because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility could cause ASU-DNN to exhibit a larger approximation error. Both ASU-DNN and F-DNN can be treated as special cases of DNN architecture design guided by utility connectivity graph (UCG). Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN. The alternative-specific connectivity constraint, as a domain-knowledge- based regularization method, is more effective than other regularization methods. This essay demonstrates that prior behavioral knowledge can be used to guide the architecture design of DNN, to function as an effective domain-knowledge-based regularization method, and to improve both the interpretability and predictive power of DNNs in choice analysis.
Essay 3 designs a theory-based residual neural network (TB-ResNet) with a two-stage training procedure, which synthesizes decision-making theories and DNNs in a linear manner. Three instances of TB-ResNets based on choice modeling (CM-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HD-ResNets) are designed. Empirically, compared to the decision-making theories, the three instances of TB-ResNets predict significantly better in the out-of-sample test and become more interpretable owing to the rich utility function augmented by DNNs. Compared to the DNNs, the TB-ResNets predict better because the decision-making theories aid in localizing and regularizing the DNN models. TB-ResNets also become more robust than DNNs because the decision-making theories stablize the local utility function and the input gradients.
This essay demonstrates that it is both feasible and desirable to combine the handcrafted utility theory and automatic utility specification, with joint improvement in prediction, interpretation, and robustness.
by Shenhao Wang.
Ph. D. in Computer and Urban Science
Ph.D.inComputerandUrbanScience Massachusetts Institute of Technology, Department of Urban Studies and Planning
APA, Harvard, Vancouver, ISO, and other styles
13

Sunnegårdh, Christina. "Scar detection using deep neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299576.

Full text
Abstract:
Object detection is a computer vision method that deals with the tasks of localizing and classifying objects within an image. The number of usages for the method is constantly growing, and this thesis investigates the unexplored area of using deep neural networks for scar detection. Furthermore, the thesis investigates using the scar detector as a basis for the binary classification task of deciding whether in-the-wild images contains a scar or not. Two pre-trained object detection models, Faster R-CNN and RetinaNet, were trained on 1830 manually labeled images using different hyperparameters. Faster R-CNN Inception ResNet V2 achieved the highest results in terms of Average Precision (AP), particularly at higher IoU thresholds, closely followed by Faster R-CNN ResNet50, and finally RetinaNet. The results both indicate the superiority of Faster R-CNN compared to RetinaNet, as well as using Inception ResNet V2 as feature extractor for a large variety of object sizes. The reason is most likely due to multiple convolutional filters of different sizes operating at the same levels in the Inception ResNet network. As for inference time, RetinaNet was the fastest, followed by Faster R-CNN ResNet50 and finally Faster R-CNN Inception ResNet V2. For the binary classification task, the models were tested on a set of 200 images, where half of the images contained clearly visible scars. Faster R-CNN ResNet50 achieved the highest accuracy, followed by Faster R-CNN Inception ResNet V2 and finally RetinaNet. While the accuracy of RetinaNet suffered mainly from a low recall, Faster R-CNN Inception ResNet V2 detected some actual scars in images that had not been labeled due to low image quality, which could be a matter of subjective labeling and that the model is punished for something that at other times might be considered correct. In conclusion, this thesis shows promising results of using object detection to detect scars in images. While two-stage Faster R-CNN holds the advantage in AP for scar detection, one-stage RetinaNet holds the advantage in speed. Suggestions for future work include eliminating biases by putting more effort into labeling data as well as including training data that contain objects for which the models produced false positives. Examples of this are wounds, knuckles, and possible background objects that are visually similar to scars.
Objektdetektion är en metod inom datorseende som inkluderar både lokalisering och klassificering av objekt i bilder. Antalet användningsområden för metoden växer ständigt och denna studie undersöker det outforskade området av att använda djupa neurala nätverk för detektering av ärr. Studien utforskar även att använda detektering av ärr som grund för den binära klassificeringsuppgiften att bestämma om bilder innehåller ett synligt ärr eller inte. Två förtränade objektdetekteringsmodeller, Faster R-CNN och RetinaNet, tränades med olika hyperparametrar på 1830 manuellt märkta bilder. Faster RCNN Inception ResNet V2 uppnådde bäst resultat med avseende på average precision (AP), tätt följd av Faster R-CNN ResNet50 och slutligen RetinaNet. Resultatet indikerar både överlägsenhet av Faster R-CNN gentemot RetinaNet, såväl som att använda Inception ResNet V2 för särdragsextrahering. Detta beror med stor sannolikhet på dess användning av faltningsfilter i flera storlekar på samma nivåer i nätverket. Gällande detekteringstid per bild var RetinaNet snabbast, följd av Faster R-CNN ResNet50 och slutligen Faster R-CNN Inception ResNet V2. För den binära klassificeringsuppgiften testades modellerna på 200 bilder, där hälften av bilderna innehöll tydligt synliga ärr. Faster RCNN ResNet50 uppnådde högst träffsäkerhet, följt av Faster R-CNN Inception ResNet V2 och till sist RetinaNet. Medan träffsäkerheten för RetinaNet huvudsakligen bestraffades på grund av att ha förbisett ärr i bilder, så detekterade Faster R-CNN Inception ResNet V2 ett flertal faktiska ärr som inte datamärkts på grund av bristande bildkvalitet. Detta kan dock vara en fråga om subjektiv datamärkning och att modellen bestraffas för något som andra gånger skulle kunna anses korrekt. Sammanfattningsvis visar denna studie lovande resultat av att använda objektdetektion för att detektera ärr i bilder. Medan tvåstegsmodellen Faster R-CNN har övertaget sett till AP, har enstegsmodellen RetinaNet övertaget sett till detekteringstid. Förslag för framtida arbete inkluderar att lägga större vikt vid märkning av data för att eliminera potentiell subjektivitet, samt inkludera träningsdata innehållande objekt som modellerna misstog för ärr. Exempel på detta är öppna sår, knogar och bakgrundsobjekt som visuellt liknar ärr.
APA, Harvard, Vancouver, ISO, and other styles
14

Landeen, Trevor J. "Association Learning Via Deep Neural Networks." DigitalCommons@USU, 2018. https://digitalcommons.usu.edu/etd/7028.

Full text
Abstract:
Deep learning has been making headlines in recent years and is often portrayed as an emerging technology on a meteoric rise towards fully sentient artificial intelligence. In reality, deep learning is the most recent renaissance of a 70 year old technology and is far from possessing true intelligence. The renewed interest is motivated by recent successes in challenging problems, the accessibility made possible by hardware developments, and dataset availability. The predecessor to deep learning, commonly known as the artificial neural network, is a computational network setup to mimic the biological neural structure found in brains. However, unlike human brains, artificial neural networks, in most cases cannot make inferences from one problem to another. As a result, developing an artificial neural network requires a large number of examples of desired behavior for a specific problem. Furthermore, developing an artificial neural network capable of solving the problem can take days, or even weeks, of computations. Two specific problems addressed in this dissertation are both input association problems. One problem challenges a neural network to identify overlapping regions in images and is used to evaluate the ability of a neural network to learn associations between inputs of similar types. The other problem asks a neural network to identify which observed wireless signals originated from observed potential sources and is used to assess the ability of a neural network to learn associations between inputs of different types. The neural network solutions to both problems introduced, discussed, and evaluated in this dissertation demonstrate deep learning’s applicability to problems which have previously attracted little attention.
APA, Harvard, Vancouver, ISO, and other styles
15

Srivastava, Sanjana. "On foveation of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123134.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 61-63).
The human ability to recognize objects is impaired when the object is not shown in full. "Minimal images" are the smallest regions of an image that remain recognizable for humans. [26] show that a slight modification of the location and size of the visible region of the minimal image produces a sharp drop in human recognition accuracy. In this paper, we demonstrate that such drops in accuracy due to changes of the visible region are a common phenomenon between humans and existing state-of- the-art convolutional neural networks (CNNs), and are much more prominent in CNNs. We found many cases where CNNs classified one region correctly and the other incorrectly, though they only differed by one row or column of pixels, and were often bigger than the average human minimal image size. We show that this phenomenon is independent from previous works that have reported lack of invariance to minor modifications in object location in CNNs. Our results thus reveal a new failure mode of CNNs that also affects humans to a lesser degree. They expose how fragile CNN recognition ability is for natural images even without synthetic adversarial patterns being introduced. This opens potential for CNN robustness in natural images to be brought to the human level by taking inspiration from human robustness methods. One of these is eccentricity dependence, a model of human focus in which attention to the visual input degrades proportional to distance from the focal point [7]. We demonstrate that applying the "inverted pyramid" eccentricity method, a multi-scale input transformation, makes CNNs more robust to useless background features than a standard raw-image input. Our results also find that using the inverted pyramid method generally reduces useless background pixels, therefore reducing required training data.
by Sanjana Srivastava.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
16

Chen, Zhe. "Augmented Context Modelling Neural Networks." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20654.

Full text
Abstract:
Contexts provide beneficial information for machine-based image understanding tasks. However, existing context modelling methods still cannot fully exploit contexts, especially for object recognition and detection. In this thesis, we develop augmented context modelling neural networks to better utilize contexts for different object recognition and detection tasks. Our contributions are two-fold: 1) we introduce neural networks to better model instance-level visual relationships; 2) we introduce neural network-based algorithms to better utilize contexts from 3D information and synthesized data. In particular, to augment the modelling of instance-level visual relationships, we propose a context refinement network and an encapsulated context modelling network for object detection. In the context refinement study, we propose to improve the modeling of visual relationships by introducing overlap scores and confidence scores of different regions. In addition, in the encapsulated context modelling study, we boost the context modelling performance by exploiting the more powerful capsule-based neural networks. To augment the modeling of contexts from different sources, we propose novel neural networks to better utilize 3D information and synthesis-based contexts. For the modelling of 3D information, we mainly investigate the modelling of LiDAR data for road detection and the depth data for instance segmentation, respectively. In road detection, we develop a progressive LiDAR adaptation algorithm to improve the fusion of 3D LiDAR data and 2D image data. Regarding instance segmentation, we model depth data as context to help tackle the low-resolution annotation-based training problem. Moreover, to improve the modelling of synthesis-based contexts, we devise a shape translation-based pedestrian generation framework to help improve the pedestrian detection performance.
APA, Harvard, Vancouver, ISO, and other styles
17

Habibi, Aghdam Hamed. "Understanding Road Scenes using Deep Neural Networks." Doctoral thesis, Universitat Rovira i Virgili, 2018. http://hdl.handle.net/10803/461607.

Full text
Abstract:
La comprensió de les escenes de la carretera és fonamental per als automòbils autònoms. Això requereix segmentar escenes de carreteres en regions semànticament significatives i reconèixer objectes en una escena. Tot i que objectes com ara cotxes i vianants han de segmentar-se amb precisió, és possible que no sigui necessari detectar i localitzar aquests objectes en una escena. Tanmateix, detectar i classificar objectes com ara els senyals de trànsit és fonamental per ajustar-se a les regles del camí. En aquesta tesi, primer proposem un mètode per classificar senyals de trànsit amb atributs visuals i xarxes bayesianes. A continuació, proposem dues xarxes neuronals per a aquest propòsit i desenvolupem un nou mètode per crear un conjunt de models. A continuació, estudiem la sensibilitat de les xarxes neuronals contra mostres adversàries i proposem dues xarxes de denoising que s'adjunten a les xarxes de classificació per augmentar la seva estabilitat contra el soroll. A la segona part de la tesi, primer proposem una xarxa per detectar senyals de trànsit en imatges d'alta resolució en temps real i mostrar com implementar la tècnica de la finestra d'escaneig dins de la nostra xarxa utilitzant convolucions dilatades. A continuació, formulem el problema de detecció com a problema de segmentació i proposem una xarxa totalment convolucional per detectar senyals de trànsit. ? Finalment, proposem una nova xarxa totalment convolucional composta de mòduls de foc, connexions de derivació i convolucions consecutives dilatades? En l'última part de la tesi per a escenes de camins segmentinc en regions semànticament significatives i demostrar que és més accentuat i computacionalment més eficient en comparació amb xarxes similars
Comprender las escenas de la carretera es crucial para los automóviles autónomos. Esto requiere segmentar escenas de carretera en regiones semánticamente significativas y reconocer objetos en una escena. Mientras que los objetos tales como coches y peatones tienen que segmentarse con precisión, puede que no sea necesario detectar y localizar estos objetos en una escena. Sin embargo, la detección y clasificación de objetos tales como señales de tráfico es esencial para ajustarse a las reglas de la carretera. En esta tesis, proponemos un método para la clasificación de señales de tráfico utilizando atributos visuales y redes bayesianas. A continuación, proponemos dos redes neuronales para este fin y desarrollar un nuevo método para crear un conjunto de modelos. A continuación, se estudia la sensibilidad de las redes neuronales frente a las muestras adversarias y se proponen dos redes destructoras que se unen a las redes de clasificación para aumentar su estabilidad frente al ruido. En la segunda parte de la tesis, proponemos una red para detectar señales de tráfico en imágenes de alta resolución en tiempo real y mostrar cómo implementar la técnica de ventana de escaneo dentro de nuestra red usando circunvoluciones dilatadas. A continuación, formulamos el problema de detección como un problema de segmentación y proponemos una red completamente convolucional para detectar señales de tráfico. Finalmente, proponemos una nueva red totalmente convolucional compuesta de módulos de fuego, conexiones de bypass y circunvoluciones consecutivas dilatadas en la última parte de la tesis para escenarios de carretera segmentinc en regiones semánticamente significativas y muestran que es más accuarate y computacionalmente más eficiente en comparación con redes similares
Understanding road scenes is crucial for autonomous cars. This requires segmenting road scenes into semantically meaningful regions and recognizing objects in a scene. While objects such as cars and pedestrians has to be segmented accurately, it might not be necessary to detect and locate these objects in a scene. However, detecting and classifying objects such as traffic signs is essential for conforming to road rules. In this thesis, we first propose a method for classifying traffic signs using visual attributes and Bayesian networks. Then, we propose two neural network for this purpose and develop a new method for creating an ensemble of models. Next, we study sensitivity of neural networks against adversarial samples and propose two denoising networks that are attached to the classification networks to increase their stability against noise. In the second part of the thesis, we first propose a network to detect traffic signs in high-resolution images in real-time and show how to implement the scanning window technique within our network using dilated convolutions. Then, we formulate the detection problem as a segmentation problem and propose a fully convolutional network for detecting traffic signs. Finally, we propose a new fully convolutional network composed of fire modules, bypass connections and consecutive dilated convolutions in the last part of the thesis for segmenting road scenes into semantically meaningful regions and show that it is more accurate and computationally more efficient compared to similar networks.
APA, Harvard, Vancouver, ISO, and other styles
18

Antoniades, Andreas. "Interpreting biomedical data via deep neural networks." Thesis, University of Surrey, 2018. http://epubs.surrey.ac.uk/845765/.

Full text
Abstract:
Machine learning technology has taken quantum leaps in the past few years. From the rise of voice recognition as an interface to interact with our computers, to self-organising photo albums and self-driving cars. Neural networks and deep learning contributed significantly to drive this revolution. Yet, biomedicine is one of the research areas that has yet to fully embrace the possibilities of deep learning. Engaged in a cross-disciplinary subject, researchers, and clinical experts are focused on machine learning and statistical signal processing techniques. The ability to learn hierarchical features makes deep learning models highly applicable to biomedicine and researchers have started to notice this. The first works of deep learning in biomedicine are emerging with applications in diagnostics and genomics analysis. These models offer excellent accuracy, even comparable to that of human doctors. Despite the exceptional classification performance of these models, they are still used to provide \textit{quantitative} results. Diagnosing cancer proficiently and faster than a human doctor is beneficial, but automatically finding which biomarkers indicate the existence of cancerous cells would be invaluable. This type of \textit{qualitative} insight can be enabled by the hierarchical features and learning coefficients that manifest in deep models. It is this \textit{qualitative} approach that enables the interpretability of data and explainability of neural networks for biomedicine, which is the overarching aim of this thesis. As such, the aim of this thesis is to investigate the use of neural networks and deep learning models for the qualitative assessment of biomedical datasets. The first contribution is the proposition of a non-iterative, data agnostic feature selection algorithm to retain original features and provide qualitative analysis on their importance. This algorithm is employed in numerous areas including Pima Indian diabetes and children tumour detection. Next, the thesis focuses on the topic of epilepsy studied through scalp and intracranial electroencephalogram recordings of human brain activity. The second contribution promotes the use of deep learning models for the automatic generation of clinically meaningful features, as opposed to traditional handcrafted features. Convolutional neural networks are adapted to accommodate the intricacies of electroencephalogram data and trained to detect epileptiform discharges. The learning coefficients of these models are examined and found to contain clinically significant features. When combined, in a hierarchical way, these features reveal useful insights for the evaluation of treatment effectivity. The final contribution addresses the difficulty in acquiring intracranial data due to the invasive nature of the recording procedure. A non-linear brain mapping algorithm is proposed to link the electrical activities recorded on the scalp to those inside the cranium. This process improves the generalisation of models and alleviates the need for surgical procedures. %This is accomplished via an asymmetric autoencoder that accounts for differences in the dimensionality of the electroencephalogram data and improves the quality of the data.
APA, Harvard, Vancouver, ISO, and other styles
19

Tavanaei, Amirhossein. "Spiking Neural Networks and Sparse Deep Learning." Thesis, University of Louisiana at Lafayette, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=10807940.

Full text
Abstract:

This document proposes new methods for training multi-layer and deep spiking neural networks (SNNs), specifically, spiking convolutional neural networks (CNNs). Training a multi-layer spiking network poses difficulties because the output spikes do not have derivatives and the commonly used backpropagation method for non-spiking networks is not easily applied. Our methods use novel versions of the brain-like, local learning rule named spike-timing-dependent plasticity (STDP) that incorporates supervised and unsupervised components. Our method starts with conventional learning methods and converts them to spatio-temporally local rules suited for SNNs.

The training uses two components for unsupervised feature extraction and supervised classification. The first component refers to new STDP rules for spike-based representation learning that trains convolutional filters and initial representations. The second introduces new STDP-based supervised learning rules for spike pattern classification via an approximation to gradient descent by combining the STDP and anti-STDP rules. Specifically, the STDP-based supervised learning model approximates gradient descent by using temporally local STDP rules. Stacking these components implements a novel sparse, spiking deep learning model. Our spiking deep learning model is categorized as a variation of spiking CNNs of integrate-and-fire (IF) neurons with performance comparable with the state-of-the-art deep SNNs. The experimental results show the success of the proposed model for image classification. Our network architecture is the only spiking CNN which provides bio-inspired STDP rules in a hierarchy of feature extraction and classification in an entirely spike-based framework.

APA, Harvard, Vancouver, ISO, and other styles
20

Avramova, Vanya. "Curriculum Learning with Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-178453.

Full text
Abstract:
Curriculum learning is a machine learning technique inspired by the way humans acquire knowledge and skills: by mastering simple concepts first, and progressing through information with increasing difficulty to grasp more complex topics. Curriculum Learning, and its derivatives Self Paced Learning (SPL) and Self Paced Learning with Diversity (SPLD), have been previously applied within various machine learning contexts: Support Vector Machines (SVMs), perceptrons, and multi-layer neural networks, where they have been shown to improve both training speed and model accuracy. This project ventured to apply the techniques within the previously unexplored context of deep learning, by investigating how they affect the performance of a deep convolutional neural network (ConvNet) trained on a large labeled image dataset. The curriculum was formed by presenting the training samples to the network in order of increasing difficulty, measured by the sample's loss value based on the network's objective function. The project evaluated SPL and SPLD, and proposed two new curriculum learning sub-variants, p-SPL and p-SPLD, which allow for a smooth progresson of sample inclusion during training. The project also explored the "inversed" versions of the SPL, SPLD, p-SPL and p-SPLD techniques, where the samples were selected for the curriculum in order of decreasing difficulty. The experiments demonstrated that all learning variants perform fairly similarly, within ≈1% average test accuracy margin, based on five trained models per variant. Surprisingly, models trained with the inversed version of the algorithms performed slightly better than the standard curriculum training variants. The SPLD-Inversed, SPL-Inversed and SPLD networks also registered marginally higher accuracy results than the network trained with the usual random sample presentation. The results suggest that while sample ordering does affect the training process, the optimal order in which samples are presented may vary based on the data set and algorithm used. The project also investigated whether some samples were more beneficial for the training process than others. Based on sample difficulty, subsets of samples were removed from the training data set. The models trained on the remaining samples were compared to a default model trained on all samples. On the data set used, removing the “easiest” 10% of samples had no effect on the achieved test accuracy compared to the default model, and removing the “easiest” 40% of samples reduced model accuracy by only ≈1% (compared to ≈6% loss when 40% of the "most difficult" samples were removed, and ≈3% loss when 40% of samples were randomly removed). Taking away the "easiest" samples first (up to a certain percentage of the data set) affected the learning process less negatively than removing random samples, while removing the "most difficult" samples first had the most detrimental effect. The results suggest that the networks derived most learning value from the "difficult" samples, and that a large subset of the "easiest" samples can be excluded from training with minimal impact on the attained model accuracy. Moreover, it is possible to identify these samples early during training, which can greatly reduce the training time for these models.
APA, Harvard, Vancouver, ISO, and other styles
21

Karlsson, Daniel. "Classifying sport videos with deep neural networks." Thesis, Umeå universitet, Institutionen för datavetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-130654.

Full text
Abstract:
This project aims to apply deep neural networks to classify video clips in applications used to streamline advertisements on the web. The system focuses on sport clips but can be expanded into other advertisement fields with lower accuracy and longer training times as a consequence. The main task was to find the neural network model best suited for classifying videos. To achieve this the field was researched and three network models were introduced to see how they could handle the videos. It was proposed that applying a recurrent LSTM structure at the end of an image classification network could make it well adapted to work with videos. The most popular image classification architectures are mostly convolutional neural networks and these structures are also the foundation of all three models. The results from the evaluation of the models as well as the research suggests that using a convolutional LSTM can bean efficient and powerful way of classifying videos. Further this project shows that by reducing the size of the input data with 25%, the training and evaluation time can be cut with around 50%. This comes at the cost of lower accuracy. However it is demonstrated that the performance loss can be compensated by considering more frames from the same videos during evaluation.
APA, Harvard, Vancouver, ISO, and other styles
22

Peng, Zeng. "Pedestrian Tracking by using Deep Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302107.

Full text
Abstract:
This project aims at using deep learning to solve the pedestrian tracking problem for Autonomous driving usage. The research area is in the domain of computer vision and deep learning. Multi-Object Tracking (MOT) aims at tracking multiple targets simultaneously in a video data. The main application scenarios of MOT are security monitoring and autonomous driving. In these scenarios, we often need to track many targets at the same time which is not possible with only object detection or single object tracking algorithms for their lack of stability and usability. Therefore we need to explore the area of multiple object tracking. The proposed method breaks the MOT into different stages and utilizes the motion and appearance information of targets to track them in the video data. We used three different object detectors to detect the pedestrians in frames, a person re-identification model as appearance feature extractor and Kalman filter as motion predictor. Our proposed model achieves 47.6% MOT accuracy and 53.2% in IDF1 score while the results obtained by the model without person re-identification module is only 44.8% and 45.8% respectively. Our experiment results indicate the fact that a robust multiple object tracking algorithm can be achieved by splitted tasks and improved by the representative DNN based appearance features.
Detta projekt syftar till att använda djupinlärning för att lösa problemet med att följa fotgängare för autonom körning. For ligger inom datorseende och djupinlärning. Multi-Objekt-följning (MOT) syftar till att följa flera mål samtidigt i videodata. de viktigaste applikationsscenarierna för MOT är säkerhetsövervakning och autonom körning. I dessa scenarier behöver vi ofta följa många mål samtidigt, vilket inte är möjligt med endast objektdetektering eller algoritmer för enkel följning av objekt för deras bristande stabilitet och användbarhet, därför måste utforska området för multipel objektspårning. Vår metod bryter MOT i olika steg och använder rörelse- och utseendinformation för mål för att spåra dem i videodata, vi använde tre olika objektdetektorer för att upptäcka fotgängare i ramar en personidentifieringsmodell som utseendefunktionsavskiljare och Kalmanfilter som rörelsesprediktor. Vår föreslagna modell uppnår 47,6 % MOT-noggrannhet och 53,2 % i IDF1 medan resultaten som erhållits av modellen utan personåteridentifieringsmodul är endast 44,8%respektive 45,8 %. Våra experimentresultat visade att den robusta algoritmen för multipel objektspårning kan uppnås genom delade uppgifter och förbättras av de representativa DNN-baserade utseendefunktionerna.
APA, Harvard, Vancouver, ISO, and other styles
23

Milner, Rosanna Margaret. "Using deep neural networks for speaker diarisation." Thesis, University of Sheffield, 2016. http://etheses.whiterose.ac.uk/16567/.

Full text
Abstract:
Speaker diarisation answers the question “who spoke when?” in an audio recording. The input may vary, but a system is required to output speaker labelled segments in time. Typical stages are Speech Activity Detection (SAD), speaker segmentation and speaker clustering. Early research focussed on Conversational Telephone Speech (CTS) and Broadcast News (BN) domains before the direction shifted to meetings and, more recently, broadcast media. The British Broadcasting Corporation (BBC) supplied data through the Multi-Genre Broadcast (MGB) Challenge in 2015 which showed the difficulties speaker diarisation systems have on broadcast media data. Diarisation is typically an unsupervised task which does not use auxiliary data or information to enhance a system. However, methods which do involve supplementary data have shown promise. Five semi-supervised methods are investigated which use a combination of inputs: different channel types and transcripts. The methods involve Deep Neural Networks (DNNs) for SAD, DNNs trained for channel detection, transcript alignment, and combinations of these approaches. However, the methods are only applicable when datasets contain the required inputs. Therefore, a method involving a pretrained Speaker Separation Deep Neural Network (ssDNN) is investigated which is applicable to every dataset. This technique performs speaker clustering and speaker segmentation using DNNs successfully for meeting data and with mixed results for broadcast media. The task of diarisation focuses on two aspects: accurate segments and speaker labels. The Diarisation Error Rate (DER) does not evaluate the segmentation quality as it does not measure the number of correctly detected segments. Other metrics exist, such as boundary and purity measures, but these also mask the segmentation quality. An alternative metric is presented based on the F-measure which considers the number of hypothesis segments correctly matched to reference segments. A deeper insight into the segment quality is shown through this metric.
APA, Harvard, Vancouver, ISO, and other styles
24

Karlsson, Jonas. "Auditory Classification of Carsby Deep Neural Networks." Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-355673.

Full text
Abstract:
This thesis explores the challenge of using deep neural networks to classify traits incars through sound recognition. These traits could include type of engine, model, or manufacturer of the car. The problem was approached by creating three different neural networks and evaluating their performance in classifying sounds of three different cars. The top scoring neural network achieved an accuracy of 61 percent, which is far from reaching the standard accuracy of modern speech recognition systems. The results do, however, show that there are some tendencies to the data that neural networks can learn. If the methods and networks presented in this report are further built upon, a greater classification performance may be achieved.
APA, Harvard, Vancouver, ISO, and other styles
25

Wang, Yuxuan. "Supervised Speech Separation Using Deep Neural Networks." The Ohio State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=osu1426366690.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Wu, Chunyang. "Structured deep neural networks for speech recognition." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/276084.

Full text
Abstract:
Deep neural networks (DNNs) and deep learning approaches yield state-of-the-art performance in a range of machine learning tasks, including automatic speech recognition. The multi-layer transformations and activation functions in DNNs, or related network variations, allow complex and difficult data to be well modelled. However, the highly distributed representations associated with these models make it hard to interpret the parameters. The whole neural network is commonly treated a ``black box''. The behaviours of activation functions and the meanings of network parameters are rarely controlled in the standard DNN training. Though a sensible performance can be achieved, the lack of interpretations to network structures and parameters causes better regularisation and adaptation on DNN models challenging. In regularisation, parameters have to be regularised universally and indiscriminately. For instance, the widely used L2 regularisation encourages all parameters to be zeros. In adaptation, it requires to re-estimate a large number of independent parameters. Adaptation schemes in this framework cannot be effectively performed when there are limited adaptation data. This thesis investigates structured deep neural networks. Special structures are explicitly designed, and they are imposed with desired interpretation to improve DNN regularisation and adaptation. For regularisation, parameters can be separately regularised based on their functions. For adaptation, parameters can be adapted in groups or partially adapted according to their roles in the network topology. Three forms of structured DNNs are proposed in this thesis. The contributions of these models are presented as follows. The first contribution of this thesis is the multi-basis adaptive neural network. This form of structured DNN introduces a set of parallel sub-networks with restricted connections. The design of restricted connectivity allows different aspects of data to be explicitly learned. Sub-network outputs are then combined, and this combination module is used as the speaker-dependent structure that can be robustly estimated for adaptation. The second contribution of this thesis is the stimulated deep neural network. This form of structured DNN relates and smooths activation functions in regions of the network. It aids the visualisation and interpretation of DNN models but also has the potential to reduce over-fitting. Novel adaptation schemes can be performed on it, taking advantages of the smooth property that the stimulated DNN offer. The third contribution of this thesis is the deep activation mixture model. Also, this form of structured DNN encourages the outputs of activation functions to achieve a smooth surface. The output of one hidden layer is explicitly modelled as the sum of a mixture model and a residual model. The mixture model forms an activation contour, and the residual model depicts fluctuations around this contour. The smoothness yielded by a mixture model helps to regularise the overall model and allows novel adaptation schemes.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, Jeffrey M. Eng Massachusetts Institute of Technology. "Enhancing adversarial robustness of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122994.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 57-58).
Logit-based regularization and pretrain-then-tune are two approaches that have recently been shown to enhance adversarial robustness of machine learning models. In the realm of regularization, Zhang et al. (2019) proposed TRADES, a logit-based regularization optimization function that has been shown to improve upon the robust optimization framework developed by Madry et al. (2018) [14, 9]. They were able to achieve state-of-the-art adversarial accuracy on CIFAR10. In the realm of pretrain- then-tune models, Hendrycks el al. (2019) demonstrated that adversarially pretraining a model on ImageNet then adversarially tuning on CIFAR10 greatly improves the adversarial robustness of machine learning models. In this work, we propose Adversarial Regularization, another logit-based regularization optimization framework that surpasses TRADES in adversarial generalization. Furthermore, we explore the impact of trying different types of adversarial training on the pretrain-then-tune paradigm.
by Jeffry Zhang.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
28

Miglani, Vivek N. "Comparing learned representations of deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123048.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 63-64).
In recent years, a variety of deep neural network architectures have obtained substantial accuracy improvements in tasks such as image classification, speech recognition, and machine translation, yet little is known about how different neural networks learn. To further understand this, we interpret the function of a deep neural network used for classification as converting inputs to a hidden representation in a high dimensional space and applying a linear classifier in this space. This work focuses on comparing these representations as well as the learned input features for different state-of-the-art convolutional neural network architectures. By focusing on the geometry of this representation, we find that different network architectures trained on the same task have hidden representations which are related by linear transformations. We find that retraining the same network architecture with a different initialization does not necessarily lead to more similar representation geometry for most architectures, but the ResNeXt architecture consistently learns similar features and hidden representation geometry. We also study connections to adversarial examples and observe that networks with more similar hidden representation geometries also exhibit higher rates of adversarial example transferability.
by Vivek N. Miglani.
M. Eng.
M.Eng. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
29

Bayer, Ali Orkan. "Semantic Language models with deep neural Networks." Doctoral thesis, Università degli studi di Trento, 2015. https://hdl.handle.net/11572/367784.

Full text
Abstract:
Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as “what the system understands†, to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.
APA, Harvard, Vancouver, ISO, and other styles
30

Bayer, Ali Orkan. "Semantic Language models with deep neural Networks." Doctoral thesis, University of Trento, 2015. http://eprints-phd.biblio.unitn.it/1578/1/bayer_thesis.pdf.

Full text
Abstract:
Spoken language systems (SLS) communicate with users in natural language through speech. There are two main problems related to processing the spoken input in SLS. The first one is automatic speech recognition (ASR) which recognizes what the user says. The second one is spoken language understanding (SLU) which understands what the user means. We focus on the language model (LM) component of SLS. LMs constrain the search space that is used in the search for the best hypothesis. Therefore, they play a crucial role in the performance of SLS. It has long been discussed that an improvement in the recognition performance does not necessarily yield a better understanding performance. Therefore, optimization of LMs for the understanding performance is crucial. In addition, long-range dependencies in languages are hard to handle with statistical language models. These two problems are addressed in this thesis. We investigate two different LM structures. The first LM that we investigate enable SLS to understand better what they recognize by searching the ASR hypotheses for the best understanding performance. We refer to these models as joint LMs. They use lexical and semantic units jointly in the LM. The second LM structure uses the semantic context of an utterance, which can also be described as “what the system understands”, to search for a better hypothesis that improves the recognition and the understanding performance. We refer to these models as semantic LMs (SELMs). SELMs use features that are based on a well established theory of lexical semantics, namely the theory of frame semantics. They incorporate the semantic features which are extracted from the ASR hypothesis into the LM and handle long-range dependencies by using the semantic relationships between words and semantic context. ASR noise is propagated to the semantic features, to suppress this noise we introduce the use of deep semantic encodings for semantic feature extraction. In this way, SELMs optimize both the recognition and the understanding performance.
APA, Harvard, Vancouver, ISO, and other styles
31

Elezi, Ismail <1991&gt. "Exploiting contextual information with deep neural networks." Doctoral thesis, Università Ca' Foscari Venezia, 2020. http://hdl.handle.net/10579/18453.

Full text
Abstract:
Context matters! Nevertheless, there has not been much research in exploiting contextual information in deep neural networks. For the most part, the entire usage of contextual information has been limited to recurrent neural networks. Attention models and capsule networks are two recent ways of introducing contextual information in non-recurrent models, however both of these algorithms have been developed after this work has started. In this thesis, we show that contextual information can be exploited in $2$ fundamentally different ways: implicitly and explicitly. In DeepScores project, where the usage of context is very important for the recognition of many tiny objects, we show that by carefully crafting convolutional architectures, we can achieve state-of-the-art results, while also being able to correctly distinguish between objects which are virtually identical, but have different meanings based on their surrounding. On parallel, we show that by implicitly designing algorithms (motivated from graph and game theory) which take into considerations the entire structure of the dataset, we can achieve state-of-the-art results in different topics like semi-supervised learning and similarity learning. To the best of our knowledge, we are the first to integrate graph-theoretical modules carefully crafted for the problem of similarity learning and whom are designed to consider contextual information, not only outperforming the other models, but also gaining a speed improvement while using a smaller number of parameters.
APA, Harvard, Vancouver, ISO, and other styles
32

RAGONESI, RUGGERO. "Addressing Dataset Bias in Deep Neural Networks." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1069001.

Full text
Abstract:
Deep Learning has achieved tremendous success in recent years in several areas such as image classification, text translation, autonomous agents, to name a few. Deep Neural Networks are able to learn non-linear features in a data-driven fashion from complex, large scale datasets to solve tasks. However, some fundamental issues remain to be fixed: the kind of data that is provided to the neural network directly influences its capability to generalize. This is especially true when training and test data come from different distributions (the so called domain gap or domain shift problem): in this case, the neural network may learn a data representation that is representative for the training data but not for the test, thus performing poorly when deployed in actual scenarios. The domain gap problem is addressed by the so-called Domain Adaptation, for which a large literature was recently developed. In this thesis, we first present a novel method to perform Unsupervised Domain Adaptation. Starting from the typical scenario in which we dispose of labeled source distributions and an unlabeled target distribution, we pursue a pseudo-labeling approach to assign a label to the target data, and then, in an iterative way, we refine them using Generative Adversarial Networks. Subsequently, we faced the debiasing problem. Simply speaking, bias occurs when there are factors in the data which are spuriously correlated with the task label, e.g., the background, which might be a strong clue to guess what class is depicted in an image. When this happens, neural networks may erroneously learn such spurious correlations as predictive factors, and may therefore fail when deployed on different scenarios. Learning a debiased model can be done using supervision regarding the type of bias affecting the data, or can be done without any annotation about what are the spurious correlations. We tackled the problem of supervised debiasing -- where a ground truth annotation for the bias is given -- under the lens of information theory. We designed a neural network architecture that learns to solve the task while achieving at the same time, statistical independence of the data embedding with respect to the bias label. We finally addressed the unsupervised debiasing problem, in which there is no availability of bias annotation. we address this challenging problem by a two-stage approach: we first split coarsely the training dataset into two subsets, samples that exhibit spurious correlations and those that do not. Second, we learn a feature representation that can accommodate both subsets and an augmented version of them.
APA, Harvard, Vancouver, ISO, and other styles
33

Zheng, Xuebin. "Wavelet-based Graph Neural Networks." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/27989.

Full text
Abstract:
This thesis focuses on spectral-based graph neural networks (GNNs). In Chapter 2, we use multiresolution Haar-like wavelets to design a framework of GNNs which equips with graph convolution and pooling strategies. The resulting model is called MathNet whose wavelet transform matrix is constructed with a coarse-grained chain. So our proposed MathNet not only enjoys the multiresolution analysis from the Haar-like wavelets but also leverages the clustering information of the graph data. Furthermore, we develop a novel multiscale representation system for graph data, called decimated framelets, which form a localized tight frame on the graph in Chapter 3. Based on this, we establish decimated G-framelet transforms for the decomposition and reconstruction of the graph data at multi resolutions via a constructive data-driven filter bank. The graph framelets are built on a chain-based orthonormal basis that supports fast graph Fourier transforms. From this, we give a fast algorithm for the decimated G-framelet transforms, or FGT, that has linear computational complexity O (N) for a graph of size N. Finally, in Chapter 4, we present a new approach for assembling graph neural networks based on the undecimated framelet transforms which provide a multiscale representation for graph-structured data. With the framelet system, we can decompose the graph feature into low-pass and high-pass frequencies as extracted features for network training, which then defines an undecimated-framelet-based graph convolution UFGConv. The framelet decomposition naturally induces a graph pooling strategy UFGPool by aggregating the graph feature into low-pass and high-pass spectra, which considers both the feature values and geometry of the graph data and conserves the total information. Moreover, we propose shrinkage as a new activation for UFGConv, which thresholds the high-frequency information at different scales.
APA, Harvard, Vancouver, ISO, and other styles
34

Pons, Puig Jordi. "Deep neural networks for music and audio tagging." Doctoral thesis, Universitat Pompeu Fabra, 2019. http://hdl.handle.net/10803/668036.

Full text
Abstract:
Automatic music and audio tagging can help increase the retrieval and re-use possibilities of many audio databases that remain poorly labeled. In this dissertation, we tackle the task of music and audio tagging from the deep learning perspective and, within that context, we address the following research questions: (i) Which deep learning architectures are most appropriate for (music) audio signals? (ii) In which scenarios is waveform-based end-to-end learning feasible? (iii) How much data is required for carrying out competitive deep learning research? In pursuit of answering research question (i), we propose to use musically motivated convolutional neural networks as an alternative to designing deep learning models that is based on domain knowledge, and we evaluate several deep learning architectures for audio at a low computational cost with a novel methodology based on non-trained (randomly weighted) convolutional neural networks. Throughout our work, we find that employing music and audio domain knowledge during the model’s design can help improve the efficiency, interpretability, and performance of spectrogram-based deep learning models. For research questions (ii) and (iii), we perform a study with the SampleCNN, a recently proposed end-to-end learning model, to assess its viability for music audio tagging when variable amounts of training data —ranging from 25k to 1.2M songs— are available. We compare the SampleCNN against a spectrogram-based architecture that is musically motivated and conclude that, given enough data, end-to-end learning models can achieve better results. Finally, throughout our quest for answering research question (iii), we also investigate whether a naive regularization of the solution space, prototypical networks, transfer learning, or their combination, can foster deep learning models to better leverage a small number of training examples. Results indicate that transfer learning and prototypical networks are powerful strategies in such low-data regimes.
L’etiquetatge automàtic d’àudio i de música pot augmentar les possibilitats de reutilització de moltes de les bases de dades d’àudio que romanen pràcticament sense etiquetar. En aquesta tesi, abordem la tasca de l’etiquetatge automàtic d’àudio i de música des de la perspectiva de l’aprenentatge profund i, en aquest context, abordem les següents qüestions cientı́fiques: (i) Quines arquitectures d’aprenentatge profund són les més adients per a senyals d’àudio (musicals)? (ii) En quins escenaris és viable que els models d’aprenentatge profund processin directament formes d’ona? (iii) Quantes dades es necessiten per dur a terme estudis d’investigació en aprenentatge profund? Per tal de respondre a la primera pregunta (i), proposem utilitzar xarxes neuronals convolucionals motivades musicalment i avaluem diverses arquitectures d’aprenentatge profund per a àudio a un baix cost computacional. Al llarg de les nostres investigacions, trobem que els coneixements previs que tenim sobre la música i l’àudio ens poden ajudar a millorar l’eficiència, la interpretabilitat i el rendiment dels models d’aprenentatge basats en espectrogrames. Per a les preguntes (ii – iii) estudiem com el SampleCNN, un model d’aprenentatge profund que processa formes d’ona, funciona quan disposem de quantitats variables de dades d’entrenament — des de 25k cançons fins a 1’2M cançons. En aquest estudi, comparem el SampleCNN amb una arquitectura basada en espectrogrames que està motivada musicalment. Els resultats experimentals que obtenim indiquen que, en escenaris on disposem de suficients dades, els models d’aprenentatge profund que processen formes d’ona (com el SampleCNN) poden aconseguir millors resultats que els que processen espectrogrames. Finalment, per tal d’intentar respondre a la pregunta (iii), també investiguem si una regularització severa de l’espai de solucions, les xarxes prototipades, l’aprenentatge per transferència de coneixement, o la seva combinació, poden permetre als models d’aprenentatge profund obtenir més bons resultats en escenaris on no hi ha gaires dades d’entrenament. Els resultats dels nostres experiments indiquen que l’aprenentatge per transferència de coneixement i les xarxes prototipades són estratègies útils quan les dades d’entrenament no són abundants.
APA, Harvard, Vancouver, ISO, and other styles
35

Purmonen, Sami. "Predicting Game Level Difficulty Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217140.

Full text
Abstract:
We explored the usage of Monte Carlo tree search (MCTS) and deep learning in order to predict game level difficulty in Candy Crush Saga (Candy) measured as number of attempts per success. A deep neural network (DNN) was trained to predict moves from game states from large amounts of game play data. The DNN played a diverse set of levels in Candy and a regression model was fitted to predict human difficulty from bot difficulty. We compared our results to an MCTS bot. Our results show that the DNN can make estimations of game level difficulty comparable to MCTS in substantially shorter time.
Vi utforskade användning av Monte Carlo tree search (MCTS) och deep learning för attuppskatta banors svårighetsgrad i Candy Crush Saga (Candy). Ett deep neural network(DNN) tränades för att förutse speldrag från spelbanor från stora mängder speldata. DNN:en spelade en varierad mängd banor i Candy och en modell byggdes för att förutsemänsklig svårighetsgrad från DNN:ens svårighetsgrad. Resultatet jämfördes medMCTS. Våra resultat indikerar att DNN:ens kan göra uppskattningar jämförbara medMCTS men på substantiellt kortare tid.
APA, Harvard, Vancouver, ISO, and other styles
36

Winsnes, Casper. "Automatic Subcellular Protein Localization Using Deep Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-189991.

Full text
Abstract:
Protein localization is an important part in understanding the functionality of a protein. The current method of localizing proteins is to manually annotate microscopy images. This thesis investigates the feasibility of using deep artificial neural networks to automatically classify subcellular protein locations based on immunoflourescent images. We investigate the applicability in both single-label and multi-label classification, as well as cross cell line classification. We show that deep single-label neural networks can be used for protein localization with up to 73% accuracy. We also show the potential of deep multi-label neural networks for protein localization and cross cell line classification but conclude that more research is needed before we can say for certain that the method is applicable.
APA, Harvard, Vancouver, ISO, and other styles
37

Pitkänen, P. (Perttu). "Automatic image quality enhancement using deep neural networks." Master's thesis, University of Oulu, 2019. http://jultika.oulu.fi/Record/nbnfioulu-201904101454.

Full text
Abstract:
Abstract. Photo retouching can significantly improve image quality and it is considered an essential part of photography. Traditionally this task has been completed manually with special image enhancement software. However, recent research utilizing neural networks has been proven to perform better in the automated image enhancement task compared to traditional methods. During the literature review of this thesis, multiple automatic neural-network-based image enhancement methods were studied, and one of these methods was chosen for closer examination and evaluation. The chosen network design has several appealing qualities such as the ability to learn both local and global enhancements, and its simple architecture constructed for efficient computational speed. This research proposes a novel dataset generation method for automated image enhancement research, and tests its usefulness with the chosen network design. This dataset generation method simulates commonly occurring photographic errors, and the original high-quality images can be used as the target data. This dataset design allows studying fixes for individual and combined aberrations. The underlying idea of this design choice is that the network would learn to fix these aberrations while producing aesthetically pleasing and consistent results. The quantitative evaluation proved that the network can learn to counter these errors, and with greater effort, it could also learn to enhance all of these aspects simultaneously. Additionally, the network’s capability of learning local and portrait specific enhancement tasks were evaluated. The models can apply the effect successfully, but the results did not gain the same level of accuracy as with global enhancement tasks. According to the completed qualitative survey, the images enhanced by the proposed general enhancement model can successfully enhance the image quality, and it can perform better than some of the state-of-the-art image enhancement methods.Automaattinen kuvanlaadun parantaminen käyttämällä syviä neuroverkkoja. Tiivistelmä. Manuaalinen valokuvien käsittely voi parantaa kuvanlaatua huomattavasti ja sitä pidetään oleellisena osana valokuvausprosessia. Perinteisesti tätä tehtävää varten on käytetty erityisiä manuaalisesti operoitavia kuvankäsittelyohjelmia. Nykytutkimus on kuitenkin todistanut neuroverkkojen paremmuuden automaattisessa kuvanparannussovelluksissa perinteisiin menetelmiin verrattuna. Tämän diplomityön kirjallisuuskatsauksessa tutkittiin useita neuroverkkopohjaisia kuvanparannusmenetelmiä, ja yksi näistä valittiin tarkempaa tutkimusta ja arviointia varten. Valitulla verkkomallilla on useita vetoavia ominaisuuksia, kuten paikallisten sekä globaalien kuvanparannusten oppiminen ja sen yksinkertaistettu arkkitehtuuri, joka on rakennettu tehokasta suoritusnopeutta varten. Tämä tutkimus esittää uuden opetusdatan generointimenetelmän automaattisia kuvanparannusmetodeja varten, ja testaa sen soveltuvuutta käyttämällä valittua neuroverkkorakennetta. Tämä opetusdatan generointimenetelmä simuloi usein esiintyviä valokuvauksellisia virheitä, ja alkuperäisiä korkealaatuisia kuvia voi käyttää opetuksen tavoitedatana. Tämän generointitavan avulla voitiin tutkia erillisten valokuvausvirheiden, sekä näiden yhdistelmän korjausta. Tämän menetelmän tarkoitus oli opettaa verkkoa korjaamaan erilaisia virheitä sekä tuottamaan esteettisesti miellyttäviä ja yhtenäisiä tuloksia. Kvalitatiivinen arviointi todisti, että käytetty neuroverkko kykenee oppimaan erillisiä korjauksia näille virheille. Neuroverkko pystyy oppimaan myös mallin, joka korjaa kaikkia ennalta määrättyjä virheitä samanaikaisesti, mutta alhaisemmalla tarkkuudella. Lisäksi neuroverkon kyvykkyyttä oppia paikallisia muotokuvakohtaisia kuvanparannuksia arvioitiin. Koulutetut mallit pystyvät myös toteuttamaan paikallisen kuvanparannuksen onnistuneesti, mutta nämä mallit eivät yltäneet globaalien parannusten tasolle. Toteutetun kyselytutkimuksen mukaan esitetty yleisen kuvanparannuksen malli pystyy parantamaan kuvanlaatua onnistuneesti, sekä tuottaa parempia tuloksia kuin osa vertailluista kuvanparannustekniikoista.
APA, Harvard, Vancouver, ISO, and other styles
38

Wu, Jimmy M. Eng Massachusetts Institute of Technology. "Robotic object pose estimation with deep neural networks." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119699.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 39-45).
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively-annotated object pose data, our pose interpreter network is trained entirely on synthetic data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically-trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.
by Jimmy Wu.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
39

Paula, Thomas da Silva. "Contributions in face detection with deep neural networks." Pontif?cia Universidade Cat?lica do Rio Grande do Sul, 2017. http://tede2.pucrs.br/tede2/handle/tede/7563.

Full text
Abstract:
Submitted by Caroline Xavier (caroline.xavier@pucrs.br) on 2017-07-04T12:23:43Z No. of bitstreams: 1 DIS_THOMAS_DA_SILVA_PAULA_COMPLETO.pdf: 10601063 bytes, checksum: f63f9b6e33e22c4a2553f784a3a029e1 (MD5)
Made available in DSpace on 2017-07-04T12:23:44Z (GMT). No. of bitstreams: 1 DIS_THOMAS_DA_SILVA_PAULA_COMPLETO.pdf: 10601063 bytes, checksum: f63f9b6e33e22c4a2553f784a3a029e1 (MD5) Previous issue date: 2017-03-28
Reconhecimento facial ? um dos assuntos mais estudos no campo de Vis?o Computacional. Dada uma imagem arbitr?ria ou um frame arbitr?rio, o objetivo do reconhecimento facial ? determinar se existem faces na imagem e, se existirem, obter a localiza??o e a extens?o de cada face encontrada. Tal detec??o ? facilmente feita por seres humanos, por?m continua sendo um desafio em Vis?o Computacional. O alto grau de variabilidade e a dinamicidade da face humana tornam-a dif?cil de detectar, principalmente em ambientes complexos. Recentementemente, abordagens de Aprendizado Profundo come?aram a ser utilizadas em tarefas de Vis?o Computacional com bons resultados. Tais resultados abriram novas possibilidades de pesquisa em diferentes aplica??es, incluindo Reconhecimento Facial. Embora abordagens de Aprendizado Profundo tenham sido aplicadas com sucesso para tal tarefa, a maior parte das implementa??es estado da arte utilizam detectores faciais off-the-shelf e n?o avaliam as diferen?as entre eles. Em outros casos, os detectores faciais s?o treinados para m?ltiplas tarefas, como detec??o de pontos fiduciais, detec??o de idade, entre outros. Portanto, n?s temos tr?s principais objetivos. Primeiramente, n?s resumimos e explicamos alguns avan?os do Aprendizado Profundo, detalhando como cada arquitetura e implementa??o funcionam. Depois, focamos no problema de detec??o facial em si, realizando uma rigorosa an?lise de alguns dos detectores existentes assim como algumas implementa??es nossas. N?s experimentamos e avaliamos varia??es de alguns hiper-par?metros para cada um dos detectores e seu impacto em diferentes bases de dados. N?s exploramos tanto implementa??es tradicionais quanto mais recentes, al?m de implementarmos nosso pr?prio detector facial. Por fim, n?s implementamos, testamos e comparamos uma abordagem de meta-aprendizado para detec??o facial, que visa aprender qual o melhor detector facial para uma determinada imagem. Nossos experimentos contribuem para o entendimento do papel do Aprendizado Profundo em detec??o facial, assim como os detalhes relacionados a mudan?a de hiper-par?metros dos detectores faciais e seu impacto no resultado da detec??o facial. N?s tamb?m mostramos o qu?o bem features obtidas com redes neurais profundas ? treinadas em bases de dados de prop?sito geral ? combinadas com uma abordagem de meta-aprendizado, se aplicam a detec??o facial. Nossos experimentos e conclus?es mostram que o aprendizado profundo possui de fato um papel not?vel em detec??o facial.
Face Detection is one of the most studied subjects in the Computer Vision field. Given an arbitrary image or video frame, the goal of face detection is to determine whether there are any faces in the image and, if present, return the image location and the extent of each face. Such a detection is easily done by humans, but it is still a challenge within Computer Vision. The high degree of variability and the dynamicity of the human face makes it an object very difficult to detect, mainly in complex environments. Recently, Deep Learning approaches started to be applied for Computer Vision tasks with great results. They opened new research possibilities in different applications, including Face Detection. Even though Deep Learning has been successfully applied for such a task, most of the state-of-the-art implementations make use of off-the-shelf face detectors and do not evaluate differences among them. In other cases, the face detectors are trained in a multitask manner that includes face landmark detection, age detection, and so on. Hence, our goal is threefold. First, we summarize and explain many advances of deep learning, detailing how each different architecture and implementation work. Second, we focus on the face detection problem itself, performing a rigorous analysis of some of the existing face detectors as well as implementations of our own. We experiment and evaluate variations of hyper-parameters for each of the detectors and their impact in different datasets. We explore both traditional and more recent approaches, as well as implementing our own face detectors. Finally, we implement, test, and compare a meta learning approach for face detection, which aims to learn the best face detector for a given image. Our experiments contribute in understanding the role of deep learning in face detection as well as the subtleties of changing hyper-parameters of the face detectors and their impact in face detection. We also show how well features obtained with deep neural networks trained on a general-purpose dataset perform on a meta learning approach for face detection. Our experiments and conclusions show that deep learning has indeed a notable role in face detection.
APA, Harvard, Vancouver, ISO, and other styles
40

D'Amicantonio, Giacomo. "Improvements to knowledge distillation of deep neural networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24178/.

Full text
Abstract:
One of the main problems in the field of Artificial Intelligence is the efficiency of neural networks models. In the past few years, it seemed that most tasks involving such models could simply be solved by designing larger, deeper models and training them on larger datasets for longer time. This approach requires better performing and therefore expensive and energy consuming hardware and will have an increasingly significant environmental impact when those models are deployed at scale. In 2015 G. Hinton, J. Dean and O. Vinyals presented Knowledge Distillation (KD), a technique that leveraged the logits produced by a big, cumbersome model to guide the training of a smaller model. The two networks were called “Teacher” and “Student” given the analogy between the big model with large knowledge and the small model which has yet to learn everything. They proved that it is possible to extract useful knowledge from the teacher logits and use it to obtain a better performing student when compared with the same model that learned all by itself. This thesis provides an overview of the current state-of-the-art in the field of Knowledge Distillation, analyses some of the most interesting approaches, and builds on them to exploit very confident logits in a more effective way. Furthermore, it provides experimental evidence on the importance of using also smaller logit entries and correcting mistaken predictions from the teacher in the distillation process.
APA, Harvard, Vancouver, ISO, and other styles
41

Conway, Alexander. "Deep neural networks for video classification in ecology." Master's thesis, University of Cape Town, 2020. http://hdl.handle.net/11427/32520.

Full text
Abstract:
Analyzing large volumes of video data is a challenging and time-consuming task. Automating this process would very valuable, especially in ecological research where massive amounts of video can be used to unlock new avenues of ecological research into the behaviour of animals in their environments. Deep Neural Networks, particularly Deep Convolutional Neural Networks, are a powerful class of models for computer vision. When combined with Recurrent Neural Networks, Deep Convolutional models can be applied to video for frame level video classification. This research studies two datasets: penguins and seals. The purpose of the research is to compare the performance of image-only CNNs, which treat each frame of a video independently, against a combined CNN-RNN approach; and to assess whether incorporating the motion information in the temporal aspect of video improves the accuracy of classifications in these two datasets. Video and image-only models offer similar out-of-sample performance on the simpler seals dataset but the video model led to moderate performance improvements on the more complex penguin action recognition dataset.
APA, Harvard, Vancouver, ISO, and other styles
42

Hocquet, Guillaume. "Class Incremental Continual Learning in Deep Neural Networks." Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPAST070.

Full text
Abstract:
Nous nous intéressons au problème de l'apprentissage continu de réseaux de neurones artificiels dans le cas où les données ne sont accessibles que pour une seule catégorie à la fois. Pour remédier au problème de l'oubli catastrophique qui limite les performances d'apprentissage dans ces conditions, nous proposons une approche basée sur la représentation des données d'une catégorie par une loi normale. Les transformations associées à ces représentations sont effectuées à l'aide de réseaux inversibles, qui peuvent alors être entraînés avec les données d'une seule catégorie. Chaque catégorie se voit attribuer un réseau pour représenter ses caractéristiques. Prédire la catégorie revient alors à identifier le réseau le plus représentatif. L'avantage d'une telle approche est qu'une fois qu'un réseau est entraîné, il n'est plus nécessaire de le mettre à jour par la suite, chaque réseau étant indépendant des autres. C'est cette propriété particulièrement avantageuse qui démarque notre méthode des précédents travaux dans ce domaine. Nous appuyons notre démonstration sur des expériences réalisées sur divers jeux de données et montrons que notre approche fonctionne favorablement comparé à l'état de l'art. Dans un second temps, nous proposons d'optimiser notre approche en réduisant son impact en mémoire en factorisant les paramètres des réseaux. Il est alors possible de réduire significativement le coût de stockage de ces réseaux avec une perte de performances limitée. Enfin, nous étudions également des stratégies pour produire des réseaux capables d'être réutilisés sur le long terme et nous montrons leur pertinence par rapport aux réseaux traditionnellement utilisés pour l'apprentissage continu
We are interested in the problem of continual learning of artificial neural networks in the case where the data are available for only one class at a time. To address the problem of catastrophic forgetting that restrain the learning performances in these conditions, we propose an approach based on the representation of the data of a class by a normal distribution. The transformations associated with these representations are performed using invertible neural networks, which can be trained with the data of a single class. Each class is assigned a network that will model its features. In this setting, predicting the class of a sample corresponds to identifying the network that best fit the sample. The advantage of such an approach is that once a network is trained, it is no longer necessary to update it later, as each network is independent of the others. It is this particularly advantageous property that sets our method apart from previous work in this area. We support our demonstration with experiments performed on various datasets and show that our approach performs favorably compared to the state of the art. Subsequently, we propose to optimize our approach by reducing its impact on memory by factoring the network parameters. It is then possible to significantly reduce the storage cost of these networks with a limited performance loss. Finally, we also study strategies to produce efficient feature extractor models for continual learning and we show their relevance compared to the networks traditionally used for continual learning
APA, Harvard, Vancouver, ISO, and other styles
43

Faraone, Julian. "Simplification Of Deep Neural Networks For Efficient Inference." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/25846.

Full text
Abstract:
In recent years, Deep Neural Networks (DNNs) have become an area of high interest due to it's ground-breaking results in many fields and applications. In many of these applications however, the model's runtime and memory cost of computing inference is more important than the cost of training the model. Inference is computationally expensive, making them difficult to deploy in constrained hardware environments. This has lead to an increasing interest in recent years for model compression techniques for these models. In this thesis, model compression techniques are presented for achieving efficient representations of DNNs for hardware acceleration. Firstly, a weight pruning technique to achieve unstructured sparse representations of bitwise DNNs is explored on the MNIST and CIFAR10 datasets. Accompanying this, is a hardware exploration of the resulting representations. Secondly, a hardware-aware filter pruning technique to achieve structured sparse representations of bitwise DNNs is investigated on the ImageNet dataset and hardware performance improvements are evaluated via a Field Programmable Gate Array (FPGA) implementation. Thirdly, a quantization method is introduced for training highly accurate bitwise networks with high computational efficiency on the ImageNet dataset. A hardware architecture is designed for this representation and its performance evaluated via FPGA simulations. Lastly, a custom arithmetic is designed which utilizes FPGA-optimized multipliers. Additionally, a training methodology is presented which is customized for DNN models to be compatible with the multiplier. Together, this work illustrates the effectiveness of designing DNNs with hardware in mind. Adjunctly, designing customized hardware helps in optimizing accuracy and hardware efficiency. This is very useful for many real-world DNN applications where hardware performance is paramount.
APA, Harvard, Vancouver, ISO, and other styles
44

Kan, Jichao. "Visual-Text Translation with Deep Graph Neural Networks." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/23759.

Full text
Abstract:
Visual-text translation is to produce textual descriptions in natural languages from images and videos. In this thesis, we investigate two topics in the field: image captioning and continuous sign language recognition, by exploring structural representations of visual content. Image captioning is to generate text descriptions for a given image. Deep learning based methods have achieved impressive performance on this topic. However, the relations among objects in an image have not been fully explored. Thus, a topic-guided local-global graph neural network is proposed to extract graph properties at both local and global levels. The local features are built with visual objects, while the global features are characterized with topics, both modelled with two individual graphs. Experimental results on the MS-COCO dataset showed that our proposed method outperforms several state-of-the-art image captioning methods. Continuous Sign language recognition (SLR) takes video clips of a sign language sentence as input while producing a sentence as output in a natural language, which can be regarded as a machine translation problem. However, SLR is different from general machine translation problem because of the unique features of the input, e.g., facial expression and relationship among body parts. The facial and hand features can be extracted with neural networks while the interaction between body parts has not yet fully exploited. Therefore, a hierarchical spatio-temporal graph neural network is proposed, which takes both appearance and motion features into account and models the relationship between body parts with a hierarchical graph convolution network. Experimental results on two widely used datasets, PHOENIX-2014-T and Chinese Sign Language, show the effectiveness of our proposed method. In summary, our studies demonstrate structural representations with graph neural networks are helpful for improving the translation performance from visual content to text descriptions.
APA, Harvard, Vancouver, ISO, and other styles
45

Peterson, Joshua C. "Leveraging Deep Neural Networks to Study Human Cognition." Thesis, University of California, Berkeley, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10930700.

Full text
Abstract:

The majority of computational theories of inductive processes in psychology derive from small-scale experiments with simple stimuli that are easy to represent. However, real-world stimuli are complex, hard to represent efficiently, and likely require very different cognitive strategies to cope with. Indeed, the difficulty of such tasks are part of what make humans so impressive, yet methodological resources for modeling their solutions are limited. This presents a fundamental challenge to the precision of psychology as a science, especially if traditional laboratory methods fail to generalize. Recently, a number of computationally tractable, data-driven methods such as deep neural networks have emerged in machine learning for deriving useful representations of complex perceptual stimuli, but they are explicitly optimized in service to engineering objectives rather than modeling human cognition. It has remained unclear to what extent engineering models, while often state-of-the-art in terms of human-level task performance, can be leveraged to model, predict, and understand humans.

In the following, I outline a methodology by which psychological research can confidently leverage representations learned by deep neural networks to model and predict complex human behavior, potentially extending the scope of the field. In Chapter 1, I discuss the challenges to ecological validity in the laboratory that may be partially circumvented by technological advances and trends in machine learning, and weigh the advantages and disadvantages of bootstrapping from largely uninterpretable models. In Chapter 2, I contrast methods from psychology and machine learning for representing complex stimuli like images. Chapter 3 provides a first case study of applying deep neural networks to predict whether objects in a large database of images will be remembered by humans. Chapter 4 provides the central argument for using representations from deep neural networks as proxies for human psychological representations in general. To do this, I establish and demonstrate methods for quantifying their correspondence, improving their correspondence with minimal cost, and applying the result to the modeling of downstream cognitive processes. Building on this, Chapter 5 develops a method for modeling human subjective probability over deep representations in order to capture multimodal mental visual concepts such as "landscape". Finally, in Chapter 6, I discuss the implications of the overall paradigm espoused in the current work, along with the most crucial challenges ahead and potential ways forward. The overall endeavor is almost certainly a stepping stone to methods that may look very different in the near future, as the gains in leveraging machine learning methods are consolidated and made more interpretable/useful. The hope is that a synergy can be formed between the two fields, each bootstrapping and learning from the other.

APA, Harvard, Vancouver, ISO, and other styles
46

Lenc, Karel. "Representation of spatial transformations in deep neural networks." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:87a16dc2-9d77-49c3-8096-cf3416fa6893.

Full text
Abstract:
This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection. Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well. Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived covariance constraint can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification. The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.
APA, Harvard, Vancouver, ISO, and other styles
47

Larsson, Susanna. "Monocular Depth Estimation Using Deep Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159981.

Full text
Abstract:
For a long time stereo-cameras have been deployed in visual Simultaneous Localization And Mapping (SLAM) systems to gain 3D information. Even though stereo-cameras show good performance, the main disadvantage is the complex and expensive hardware setup it requires, which limits the use of the system. A simpler and cheaper alternative are monocular cameras, however monocular images lack the important depth information. Recent works have shown that having access to depth maps in monocular SLAM system is beneficial since they can be used to improve the 3D reconstruction. This work proposes a deep neural network that predicts dense high-resolution depth maps from monocular RGB images by casting the problem as a supervised regression task. The network architecture follows an encoder-decoder structure in which multi-scale information is captured and skip-connections are used to recover details. The network is trained and evaluated on the KITTI dataset achieving results comparable to state-of-the-art methods. With further development, this network shows good potential to be incorporated in a monocular SLAM system to improve the 3D reconstruction.
APA, Harvard, Vancouver, ISO, and other styles
48

Venigalla, Abhinav S. "Strongly-transferring memorized examples in deep neural networks." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/123124.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: M. Eng. in Computer Science and Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 59-60).
Training deep neural networks requires large quantities of labeled training data, on the order of thousands of examples per class. These requirements make model training both time-consuming and expensive, which provides an incentive for adversaries to steal, or copy, other users' models. In this work, we examine a recent defense method called neural network watermarking via memorized examples, where an owner intentionally trains his model to mislabel particular inputs. We try to isolate the mechanism by which memorized examples are learned by a model in order to better evaluate their robustness. We find that memorized examples are indeed strongly embedded in trained models and actually transfer to stolen models under one form of model stealing. When access to local input-logit gradient information is used by an attacker, the stolen model also learns to mislabel the memorized examples. We show that this transfer is robust to architecture mismatch and perturbations of the query set used for stealing. We present different possible mechanisms for memorized example transfer and find that local input geometry is insufficient to explain the phenomenon. Finally, we describe a simple method for a model owner to boost the transfer rate of memorized examples, increasing their effectiveness as a defense against model stealing.
by Abhinav S. Venigalla.
M. Eng. in Computer Science and Engineering
M.Eng.inComputerScienceandEngineering Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
APA, Harvard, Vancouver, ISO, and other styles
49

Boukli, Hacene Ghouthi. "Processing and learning deep neural networks on chip." Thesis, Ecole nationale supérieure Mines-Télécom Atlantique Bretagne Pays de la Loire, 2019. http://www.theses.fr/2019IMTA0153/document.

Full text
Abstract:
Dans le domaine de l'apprentissage machine, les réseaux de neurones profonds sont devenus la référence incontournable pour un très grand nombre de problèmes. Ces systèmes sont constitués par un assemblage de couches, lesquelles réalisent des traitements élémentaires, paramétrés par un grand nombre de variables. À l'aide de données disponibles pendant une phase d'apprentissage, ces variables sont ajustées de façon à ce que le réseau de neurones réponde à la tâche donnée. Il est ensuite possible de traiter de nouvelles données. Si ces méthodes atteignent les performances à l'état de l'art dans bien des cas, ils reposent pour cela sur un très grand nombre de paramètres, et donc des complexités en mémoire et en calculs importantes. De fait, ils sont souvent peu adaptés à l'implémentation matérielle sur des systèmes contraints en ressources. Par ailleurs, l'apprentissage requiert de repasser sur les données d'entraînement plusieurs fois, et s'adapte donc difficilement à des scénarios où de nouvelles informations apparaissent au fil de l'eau. Dans cette thèse, nous nous intéressons dans un premier temps aux méthodes permettant de réduire l'impact en calculs et en mémoire des réseaux de neurones profonds. Nous proposons dans un second temps des techniques permettant d'effectuer l'apprentissage au fil de l'eau, dans un contexte embarqué
In the field of machine learning, deep neural networks have become the inescapablereference for a very large number of problems. These systems are made of an assembly of layers,performing elementary operations, and using a large number of tunable variables. Using dataavailable during a learning phase, these variables are adjusted such that the neural networkaddresses the given task. It is then possible to process new data.To achieve state-of-the-art performance, in many cases these methods rely on a very largenumber of parameters, and thus large memory and computational costs. Therefore, they are oftennot very adapted to a hardware implementation on constrained resources systems. Moreover, thelearning process requires to reuse the training data several times, making it difficult to adapt toscenarios where new information appears on the fly.In this thesis, we are first interested in methods allowing to reduce the impact of computations andmemory required by deep neural networks. Secondly, we propose techniques for learning on thefly, in an embedded context
APA, Harvard, Vancouver, ISO, and other styles
50

Kalchbrenner, Nal. "Encoder-decoder neural networks." Thesis, University of Oxford, 2017. http://ora.ox.ac.uk/objects/uuid:d56e48db-008b-4814-bd82-a5d612000de9.

Full text
Abstract:
This thesis introduces the concept of an encoder-decoder neural network and develops architectures for the construction of such networks. Encoder-decoder neural networks are probabilistic conditional generative models of high-dimensional structured items such as natural language utterances and natural images. Encoder-decoder neural networks estimate a probability distribution over structured items belonging to a target set conditioned on structured items belonging to a source set. The distribution over structured items is factorized into a product of tractable conditional distributions over individual elements that compose the items. The networks estimate these conditional factors explicitly. We develop encoder-decoder neural networks for core tasks in natural language processing and natural image and video modelling. In Part I, we tackle the problem of sentence modelling and develop deep convolutional encoders to classify sentences; we extend these encoders to models of discourse. In Part II, we go beyond encoders to study the longstanding problem of translating from one human language to another. We lay the foundations of neural machine translation, a novel approach that views the entire translation process as a single encoder-decoder neural network. We propose a beam search procedure to search over the outputs of the decoder to produce a likely translation in the target language. Besides known recurrent decoders, we also propose a decoder architecture based solely on convolutional layers. Since the publication of these new foundations for machine translation in 2013, encoder-decoder translation models have been richly developed and have displaced traditional translation systems both in academic research and in large-scale industrial deployment. In services such as Google Translate these models process in the order of a billion translation queries a day. In Part III, we shift from the linguistic domain to the visual one to study distributions over natural images and videos. We describe two- and three- dimensional recurrent and convolutional decoder architectures and address the longstanding problem of learning a tractable distribution over high-dimensional natural images and videos, where the likely samples from the distribution are visually coherent. The empirical validation of encoder-decoder neural networks as state-of- the-art models of tasks ranging from machine translation to video prediction has a two-fold significance. On the one hand, it validates the notions of assigning probabilities to sentences or images and of learning a distribution over a natural language or a domain of natural images; it shows that a probabilistic principle of compositionality, whereby a high- dimensional item is composed from individual elements at the encoder side and whereby a corresponding item is decomposed into conditional factors over individual elements at the decoder side, is a general method for modelling cognition involving high-dimensional items; and it suggests that the relations between the elements are best learnt in an end-to-end fashion as non-linear functions in distributed space. On the other hand, the empirical success of the networks on the tasks characterizes the underlying cognitive processes themselves: a cognitive process as complex as translating from one language to another that takes a human a few seconds to perform correctly can be accurately modelled via a learnt non-linear deterministic function of distributed vectors in high-dimensional space.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography