Dissertations / Theses on the topic 'Deep learning architecture'

To see the other types of publications on this topic, follow the link: Deep learning architecture.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Deep learning architecture.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Glatt, Ruben [UNESP]. "Deep learning architecture for gesture recognition." Universidade Estadual Paulista (UNESP), 2014. http://hdl.handle.net/11449/115718.

Full text
Abstract:
Made available in DSpace on 2015-03-03T11:52:29Z (GMT). No. of bitstreams: 0 Previous issue date: 2014-07-25Bitstream added on 2015-03-03T12:06:38Z : No. of bitstreams: 1 000807195.pdf: 2462524 bytes, checksum: 91686fbe11c74337c40fe57671eb8d82 (MD5)
O reconhecimento de atividade de visão de computador desempenha um papel importante na investigação para aplicações como interfaces humanas de computador, ambientes inteligentes, vigilância ou sistemas médicos. Neste trabalho, é proposto um sistema de reconhecimento de gestos com base em uma arquitetura de aprendizagem profunda. Ele é usado para analisar o desempenho quando treinado com os dados de entrada multi-modais em um conjunto de dados de linguagem de sinais italiana. A área de pesquisa subjacente é um campo chamado interação homem-máquina. Ele combina a pesquisa sobre interfaces naturais, reconhecimento de gestos e de atividade, aprendizagem de máquina e tecnologias de sensores que são usados para capturar a entrada do meio ambiente para processamento posterior. Essas áreas são introduzidas e os conceitos básicos são descritos. O ambiente de desenvolvimento para o pré-processamento de dados e algoritmos de aprendizagem de máquina programada em Python é descrito e as principais bibliotecas são discutidas. A coleta dos fluxos de dados é explicada e é descrito o conjunto de dados utilizado. A arquitetura proposta de aprendizagem consiste em dois passos. O pré-processamento dos dados de entrada e a arquitetura de aprendizagem. O pré-processamento é limitado a três estratégias diferentes, que são combinadas para oferecer seis diferentes perfis de préprocessamento. No segundo passo, um Deep Belief Network é introduzido e os seus componentes são explicados. Com esta definição, 294 experimentos são realizados com diferentes configurações. As variáveis que são alteradas são as definições de pré-processamento, a estrutura de camadas do modelo, a taxa de aprendizagem de pré-treino e a taxa de aprendizagem de afinação. A avaliação dessas experiências mostra que a abordagem de utilização de uma arquitetura ... (Resumo completo, clicar acesso eletrônico abaixo)
Activity recognition from computer vision plays an important role in research towards applications like human computer interfaces, intelligent environments, surveillance or medical systems. In this work, a gesture recognition system based on a deep learning architecture is proposed. It is used to analyze the performance when trained with multi-modal input data on an Italian sign language dataset. The underlying research area is a field called human-machine interaction. It combines research on natural user interfaces, gesture and activity recognition, machine learning and sensor technologies, which are used to capture the environmental input for further processing. Those areas are introduced and the basic concepts are described. The development environment for preprocessing data and programming machine learning algorithms with Python is described and the main libraries are discussed. The gathering of the multi-modal data streams is explained and the used dataset is outlined. The proposed learning architecture consists of two steps. The preprocessing of the input data and the actual learning architecture. The preprocessing is limited to three different strategies, which are combined to offer six different preprocessing profiles. In the second step, a Deep Belief network is introduced and its components are explained. With this setup, 294 experiments are conducted with varying configuration settings. The variables that are altered are the preprocessing settings, the layer structure of the model, the pretraining and the fine-tune learning rate. The evaluation of these experiments show that the approach of using a deep learning architecture on an activity or gesture recognition task yields acceptable results, but has not yet reached a level of maturity, which would allow to use the developed models in serious applications.
APA, Harvard, Vancouver, ISO, and other styles
2

Glatt, Ruben. "Deep learning architecture for gesture recognition /." Guaratinguetá, 2014. http://hdl.handle.net/11449/115718.

Full text
Abstract:
Orientador: José Celso Freire Junior
Coorientador: Daniel Julien Barros da Silva Sampaio
Banca: Galeno José de Sena
Banca: Luiz de Siqueira Martins Filho
Resumo: O reconhecimento de atividade de visão de computador desempenha um papel importante na investigação para aplicações como interfaces humanas de computador, ambientes inteligentes, vigilância ou sistemas médicos. Neste trabalho, é proposto um sistema de reconhecimento de gestos com base em uma arquitetura de aprendizagem profunda. Ele é usado para analisar o desempenho quando treinado com os dados de entrada multi-modais em um conjunto de dados de linguagem de sinais italiana. A área de pesquisa subjacente é um campo chamado interação homem-máquina. Ele combina a pesquisa sobre interfaces naturais, reconhecimento de gestos e de atividade, aprendizagem de máquina e tecnologias de sensores que são usados para capturar a entrada do meio ambiente para processamento posterior. Essas áreas são introduzidas e os conceitos básicos são descritos. O ambiente de desenvolvimento para o pré-processamento de dados e algoritmos de aprendizagem de máquina programada em Python é descrito e as principais bibliotecas são discutidas. A coleta dos fluxos de dados é explicada e é descrito o conjunto de dados utilizado. A arquitetura proposta de aprendizagem consiste em dois passos. O pré-processamento dos dados de entrada e a arquitetura de aprendizagem. O pré-processamento é limitado a três estratégias diferentes, que são combinadas para oferecer seis diferentes perfis de préprocessamento. No segundo passo, um Deep Belief Network é introduzido e os seus componentes são explicados. Com esta definição, 294 experimentos são realizados com diferentes configurações. As variáveis que são alteradas são as definições de pré-processamento, a estrutura de camadas do modelo, a taxa de aprendizagem de pré-treino e a taxa de aprendizagem de afinação. A avaliação dessas experiências mostra que a abordagem de utilização de uma arquitetura ... (Resumo completo, clicar acesso eletrônico abaixo)
Abstract: Activity recognition from computer vision plays an important role in research towards applications like human computer interfaces, intelligent environments, surveillance or medical systems. In this work, a gesture recognition system based on a deep learning architecture is proposed. It is used to analyze the performance when trained with multi-modal input data on an Italian sign language dataset. The underlying research area is a field called human-machine interaction. It combines research on natural user interfaces, gesture and activity recognition, machine learning and sensor technologies, which are used to capture the environmental input for further processing. Those areas are introduced and the basic concepts are described. The development environment for preprocessing data and programming machine learning algorithms with Python is described and the main libraries are discussed. The gathering of the multi-modal data streams is explained and the used dataset is outlined. The proposed learning architecture consists of two steps. The preprocessing of the input data and the actual learning architecture. The preprocessing is limited to three different strategies, which are combined to offer six different preprocessing profiles. In the second step, a Deep Belief network is introduced and its components are explained. With this setup, 294 experiments are conducted with varying configuration settings. The variables that are altered are the preprocessing settings, the layer structure of the model, the pretraining and the fine-tune learning rate. The evaluation of these experiments show that the approach of using a deep learning architecture on an activity or gesture recognition task yields acceptable results, but has not yet reached a level of maturity, which would allow to use the developed models in serious applications.
Mestre
APA, Harvard, Vancouver, ISO, and other styles
3

Salman, Ahmad. "Learning speaker-specific characteristics with deep neural architecture." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/learning-speakerspecific-characteristics-with-deep-neural-architecture(24acb31d-2106-4e52-80ab-6c649838026a).html.

Full text
Abstract:
Robust Speaker Recognition (SR) has been a focus of attention for researchers since long. The advancement in speech-aided technologies especially biometrics highlights the necessity of foolproof SR systems. However, the performance of a SR system critically depends on the quality of speech features used to represent the speaker-specific information. This research aims at extracting the speaker-specific information from Mel-frequency Cepstral Coefficients (MFCCs) using deep learning. Speech is a mixture of various information components that include linguistic, speaker-specific and speaker’s emotional state information. Feature extraction for each information component is inevitable in different speech-related tasks for robust performance. However, almost all forms of speech representation carry all the information as a whole, which is responsible for the compromised performances by SR systems. Motivated by the complex problem solving ability of deep architectures by learning high-level task-specific information in the data, we propose a novel Deep Neural Architecture (DNA) to extract speaker-specific information (SI) from MFCCs, a popular frequency domain speech signal representation. A two-stage learning strategy is adopted, which is based on unsupervised training for network initialisation followed by regularised contrastive learning. To train our network in the 2nd stage, we devise a contrastive loss function to discriminate the speakers on the basis of their intrinsic statistical patterns, distributed in the representations yielded by our deep network. This is achieved in the contrastive pair-wise comparison of these representations for similar or dissimilar speakers. To improve the generalisation and reduce the interference of environmental effects with the speaker-specific representation, we regulate the contrastive loss with the data reconstruction loss in a multi-objective optimisation. A detailed study has been done to analyse the parametric space in training the proposed deep architecture for optimum performance. Finally we compare the performance of our learned speaker-specific representations with several state-of-the-art techniques in speaker verification and speaker segmentation tasks. It is evident that the representations acquired through learned DNA are invariant and comparatively less sensitive to the text, language and environmental variability.
APA, Harvard, Vancouver, ISO, and other styles
4

Goh, Hanlin. "Learning deep visual representations." Paris 6, 2013. http://www.theses.fr/2013PA066356.

Full text
Abstract:
Les avancées récentes en apprentissage profond et en traitement d'image présentent l'opportunité d'unifier ces deux champs de recherche complémentaires pour une meilleure résolution du problème de classification d'images dans des catégories sémantiques. L'apprentissage profond apporte au traitement d'image le pouvoir de représentation nécessaire à l'amélioration des performances des méthodes de classification d'images. Cette thèse propose de nouvelles méthodes d'apprentissage de représentations visuelles profondes pour la résolution de cette tache. L'apprentissage profond a été abordé sous deux angles. D'abord nous nous sommes intéressés à l'apprentissage non supervisé de représentations latentes ayant certaines propriétés à partir de données en entrée. Il s'agit ici d'intégrer une connaissance à priori, à travers un terme de régularisation, dans l'apprentissage d'une machine de Boltzmann restreinte (RBM). Nous proposons plusieurs formes de régularisation qui induisent différentes propriétés telles que la parcimonie, la sélectivité et l'organisation en structure topographique. Le second aspect consiste au passage graduel de l'apprentissage non supervisé à l'apprentissage supervisé de réseaux profonds. Ce but est réalisé par l'introduction sous forme de supervision, d'une information relative à la catégorie sémantique. Deux nouvelles méthodes sont proposées. Le premier est basé sur une régularisation top-down de réseaux de croyance profonds à base de RBMs. Le second optimise un cout intégrant un critre de reconstruction et un critre de supervision pour l'entrainement d'autoencodeurs profonds. Les méthodes proposées ont été appliquées au problme de classification d'images. Nous avons adopté le modèle sac-de-mots comme modèle de base parce qu'il offre d'importantes possibilités grâce à l'utilisation de descripteurs locaux robustes et de pooling par pyramides spatiales qui prennent en compte l'information spatiale de l'image. L'apprentissage profonds avec agrÉgation spatiale est utilisé pour apprendre un dictionnaire hiÉrarchique pour l'encodage de reprÉsentations visuelles de niveau intermÉdiaire. Cette mÉthode donne des rÉsultats trs compétitifs en classification de scènes et d'images. Les dictionnaires visuels appris contiennent diverses informations non-redondantes ayant une structure spatiale cohérente. L'inférence est aussi très rapide. Nous avons par la suite optimisé l'étape de pooling sur la base du codage produit par le dictionnaire hiérarchique précédemment appris en introduisant introduit une nouvelle paramétrisation dérivable de l'opération de pooling qui permet un apprentissage par descente de gradient utilisant l'algorithme de rétro-propagation. Ceci est la premire tentative d'unification de l'apprentissage profond et du modèle de sac de mots. Bien que cette fusion puisse sembler évidente, l'union de plusieurs aspects de l'apprentissage profond de représentations visuelles demeure une tache complexe à bien des égards et requiert encore un effort de recherche important
Recent advancements in the areas of deep learning and visual information processing have presented an opportunity to unite both fields. These complementary fields combine to tackle the problem of classifying images into their semantic categories. Deep learning brings learning and representational capabilities to a visual processing model that is adapted for image classification. This thesis addresses problems that lead to the proposal of learning deep visual representations for image classification. The problem of deep learning is tackled on two fronts. The first aspect is the problem of unsupervised learning of latent representations from input data. The main focus is the integration of prior knowledge into the learning of restricted Boltzmann machines (RBM) through regularization. Regularizers are proposed to induce sparsity, selectivity and topographic organization in the coding to improve discrimination and invariance. The second direction introduces the notion of gradually transiting from unsupervised layer-wise learning to supervised deep learning. This is done through the integration of bottom-up information with top-down signals. Two novel implementations supporting this notion are explored. The first method uses top-down regularization to train a deep network of RBMs. The second method combines predictive and reconstructive loss functions to optimize a stack of encoder-decoder networks. The proposed deep learning techniques are applied to tackle the image classification problem. The bag-of-words model is adopted due to its strengths in image modeling through the use of local image descriptors and spatial pooling schemes. Deep learning with spatial aggregation is used to learn a hierarchical visual dictionary for encoding the image descriptors into mid-level representations. This method achieves leading image classification performances for object and scene images. The learned dictionaries are diverse and non-redundant. The speed of inference is also high. From this, a further optimization is performed for the subsequent pooling step. This is done by introducing a differentiable pooling parameterization and applying the error backpropagation algorithm. This thesis represents one of the first attempts to synthesize deep learning and the bag-of-words model. This union results in many challenging research problems, leaving much room for further study in this area
APA, Harvard, Vancouver, ISO, and other styles
5

Kola, Ramya Sree. "Generation of synthetic plant images using deep learning architecture." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18450.

Full text
Abstract:
Background: Generative Adversarial Networks (Goodfellow et al., 2014) (GANs)are the current state of the art machine learning data generating systems. Designed with two neural networks in the initial architecture proposal, generator and discriminator. These neural networks compete in a zero-sum game technique, to generate data having realistic properties inseparable to that of original datasets. GANs have interesting applications in various domains like Image synthesis, 3D object generation in gaming industry, fake music generation(Dong et al.), text to image synthesis and many more. Despite having a widespread application domains, GANs are popular for image data synthesis. Various architectures have been developed for image synthesis evolving from fuzzy images of digits to photorealistic images. Objectives: In this research work, we study various literature on different GAN architectures. To understand significant works done essentially to improve the GAN architectures. The primary objective of this research work is synthesis of plant images using Style GAN (Karras, Laine and Aila, 2018) variant of GAN using style transfer. The research also focuses on identifying various machine learning performance evaluation metrics that can be used to measure Style GAN model for the generated image datasets. Methods: A mixed method approach is used in this research. We review various literature work on GANs and elaborate in detail how each GAN networks are designed and how they evolved over the base architecture. We then study the style GAN (Karras, Laine and Aila, 2018a) design details. We then study related literature works on GAN model performance evaluation and measure the quality of generated image datasets. We conduct an experiment to implement the Style based GAN on leaf dataset(Kumar et al., 2012) to generate leaf images that are similar to the ground truth. We describe in detail various steps in the experiment like data collection, preprocessing, training and configuration. Also, we evaluate the performance of Style GAN training model on the leaf dataset. Results: We present the results of literature review and the conducted experiment to address the research questions. We review and elaborate various GAN architecture and their key contributions. We also review numerous qualitative and quantitative evaluation metrics to measure the performance of a GAN architecture. We then present the generated synthetic data samples from the Style based GAN learning model at various training GPU hours and the latest synthetic data sample after training for around ~8 GPU days on leafsnap dataset (Kumar et al., 2012). The results we present have a decent quality to expand the dataset for most of the tested samples. We then visualize the model performance by tensorboard graphs and an overall computational graph for the learning model. We calculate the Fréchet Inception Distance score for our leaf Style GAN and is observed to be 26.4268 (the lower the better). Conclusion: We conclude the research work with an overall review of sections in the paper. The generated fake samples are much similar to the input ground truth and appear to be convincingly realistic for a human visual judgement. However, the calculated FID score to measure the performance of the leaf StyleGAN accumulates a large value compared to that of Style GANs original celebrity HD faces image data set. We attempted to analyze the reasons for this large score.
APA, Harvard, Vancouver, ISO, and other styles
6

Xiao, Yao. "Vehicle Detection in Deep Learning." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/91375.

Full text
Abstract:
Computer vision techniques are becoming increasingly popular. For example, face recognition is used to help police find criminals, vehicle detection is used to prevent drivers from serious traffic accidents, and written word recognition is used to convert written words into printed words. With the rapid development of vehicle detection given the use of deep learning techniques, there are still concerns about the performance of state-of-the-art vehicle detection techniques. For example, state-of-the-art vehicle detectors are restricted by the large variation of scales. People working on vehicle detection are developing techniques to solve this problem. This thesis proposes an advanced vehicle detection model, adopting one of the classical neural networks, which are the residual neural network and the region proposal network. The model utilizes the residual neural network as a feature extractor and the region proposal network to detect the potential objects' information.
Master of Science
Computer vision techniques are becoming increasingly popular. For example, face recognition is used to help police find criminals, vehicle detection is used to prevent drivers from serious traffic accidents, and written word recognition is used to convert written words into printed words. With the rapid development of vehicle detection given the use of deep learning techniques, there are still concerns about the performance of state-of-the art vehicle detection techniques. For example, state-of-the-art vehicle detectors are restricted by the large variation of scales. People working on vehicle detection are developing techniques to solve this problem. This thesis proposes an advanced vehicle detection model, utilizing deep learning techniques to detect the potential objects’ information.
APA, Harvard, Vancouver, ISO, and other styles
7

Tsardakas, Renhuldt Nikos. "Protein contact prediction based on the Tiramisu deep learning architecture." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231494.

Full text
Abstract:
Experimentally determining protein structure is a hard problem, with applications in both medicine and industry. Predicting protein structure is also difficult. Predicted contacts between residues within a protein is helpful during protein structure prediction. Recent state-of-the-art models have used deep learning to improve protein contact prediction. This thesis presents a new deep learning model for protein contact prediction, TiramiProt. It is based on the Tiramisu deep learning architecture, and trained and evaluated on the same data as the PconsC4 protein contact prediction model. 228 models using different combinations of hyperparameters were trained until convergence. The final TiramiProt model performs on par with two current state-of-the-art protein contact prediction models, PconsC4 and RaptorX-Contact, across a range of different metrics. A Python package and a Singularity container for running TiramiProt are available at https://gitlab.com/nikos.t.renhuldt/TiramiProt.
Att kunna bestämma proteiners struktur har tillämpningar inom både medicin och industri. Såväl experimentell bestämning av proteinstruktur som prediktion av densamma är svårt. Predicerad kontakt mellan olika delar av ett protein underlättar prediktion av proteinstruktur. Under senare tid har djupinlärning använts för att bygga bättre modeller för kontaktprediktion. Den här uppsatsen beskriver en ny djupinlärningsmodell för prediktion av proteinkontakter, TiramiProt. Modellen bygger på djupinlärningsarkitekturen Tiramisu. TiramiProt tränas och utvärderas på samma data som kontaktprediktionsmodellen PconsC4. Totalt tränades modeller med 228 olika hyperparameterkombinationer till konvergens. Mätt över ett flertal olika parametrar presterar den färdiga TiramiProt-modellen resultat i klass med state-of-the-art-modellerna PconsC4 och RaptorX-Contact. TiramiProt finns tillgängligt som ett Python-paket samt en Singularity-container via https://gitlab.com/nikos.t.renhuldt/TiramiProt.
APA, Harvard, Vancouver, ISO, and other styles
8

Fayyazifar, Najmeh. "Deep learning and neural architecture search for cardiac arrhythmias classification." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2022. https://ro.ecu.edu.au/theses/2553.

Full text
Abstract:
Cardiovascular disease (CVD) is the primary cause of mortality worldwide. Among people with CVD, cardiac arrhythmias (changes in the natural rhythm of the heart), are a leading cause of death. The clinical routine for arrhythmia diagnosis includes acquiring an electrocardiogram (ECG) and manually reviewing the ECG trace to identify the arrhythmias. However, due to the varying expertise level of clinicians, accurate diagnosis of arrhythmias with similar visual characteristics (that naturally exists in some different types of arrhythmias) can be challenging for some front-line clinicians. In addition, there is a shortage of trained cardiologists globally, and especially in remote areas of Australia, where patients are sometimes required to wait for weeks or months for a visiting cardiologist. This impacts the timely care of patients living in remote areas. Therefore, developing an AI-based model, that assists clinicians in accurate real-time decision-making, is an essential task. This thesis provides supporting evidence that the problem of delayed and/or inaccurate cardiac arrhythmias diagnosis can be addressed by designing accurate deep learning models through Neural Architecture Search (NAS). These models can automatically differentiate different types of arrhythmias in a timely manner. Many different deep learning models and more specifically, Convolutional Neural Networks (CNNs) have been developed for automatic and accurate cardiac arrhythmias detection. However, these models are heavily hand-crafted which means designing an accurate model for a given task, requires significant trial and error. In this thesis, the process of designing an accurate CNN model for 1-dimensional biomedical data classification is automated by applying NAS techniques. NAS is a recent research paradigm in which the process of designing an accurate model (for a given task) is automated by employing a search algorithm over a pre-defined search space of possible operations in a deep learning model. In this thesis, we developed a CNN model for detection of ‘Atrial Fibrillation’ (AF) among ‘normal sinus rhythm’, ‘noise’, and ‘other arrhythmias. This model is designed by employing a well-known NAS method, Efficient Neural Architecture Search (ENAS) which uses Reinforcement Learning (RL) to perform a search over common operations in a CNN structure. This CNN model outperformed state-of-the-art deep learning models for AF detection while minimizing human intervention in CNN structure design. In order to reduce the high computation time that was required by ENAS (and typically by RL-based NAS), in this thesis, a recent NAS method called DARTS was utilized to design a CNN model for accurate diagnosis of a wider range of cardiac arrhythmias. This method employs Stochastic Gradient Descent (SGD) to perform the search procedure over a continuous and therefore differentiable search space. The search space (operations and building blocks) of DARTS was tailored to implement the search procedure over a public dataset of standard 12-lead ECG recordings containing 111 types of arrhythmias (released by the PhysioNet challenge, 2020). The performance of DARTS was further studied by utilizing it to differentiate two major sub-types of Wide QRS Complex Tachycardia (Ventricular Tachycardia- VT vs Supraventricular Tachycardia- SVT). These sub-types have similar visual characteristics, which makes differentiating between them challenging, even for experienced clinicians. This dataset is a unique collection of Wide Complex Tachycardia (WCT) recordings, collected by our medical collaborator (University of Ottawa heart institute) over the course of 11 years. The DARTS-derived model achieved 91% accuracy, outperforming cardiologists (77% accuracy) and state-of-the-art deep learning models (88% accuracy). Lastly, the efficacy of the original DARTS algorithm for the image classification task is empirically studied. Our experiments showed that the performance of the DARTS search algorithm does not deteriorate over the search course; however, the search procedure can be terminated earlier than what was designated in the original algorithm. In addition, the accuracy of the derived model could be further improved by modifying the original search operations (excluding the zero operation), making it highly valuable in a clinical setting.
APA, Harvard, Vancouver, ISO, and other styles
9

Qian, Xiaoye. "Wearable Computing Architecture over Distributed Deep Learning Hierarchy: Fall Detection Study." Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case156195574310931.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ähdel, Victor. "On the effect of architecture on deep learning based features for homography estimation." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233194.

Full text
Abstract:
Keypoint detection and description is the first step of homography and essential matrix estimation, which in turn is used in Visual Odometry and Visual SLAM. This work explores the effect (in terms of speed and accuracy) of using different deep learning architectures for such keypoints. The fully convolutional networks — with heads for both the detector and descriptor — are trained through an existing self-supervised method, where correspondences are obtained through known randomly sampled homographies. A new strategy for choosing negative correspondences for the descriptor loss is presented, which enables more flexibility in the architecture design. The new strategy turns out to be essential as it enables networks that outperform the learnt baseline at no cost in inference time. Varying the model size leads to a trade-off in speed and accuracy, and while all models outperform ORB in homography estimation, only the larger models approach SIFT’s performance; performing about 1-7% worse. Training for longer and with additional types of data might give the push needed to outperform SIFT. While the smallest models are 3× faster and use 50× fewer parameters than the learnt baseline, they still require 3× as much time as SIFT while performing about 10-30% worse. However, there is still room for improvement through optimization methods that go beyond architecture modification, e.g. quantization, which might make the method faster than SIFT.
Nyckelpunkts-detektion och deskriptor-skapande är det första steget av homografi och essentiell matris estimering, vilket i sin tur används inom Visuell Odometri och Visuell SLAM. Det här arbetet utforskar effekten (i form av snabbhet och exakthet) av användandet av olika djupinlärnings-arkitekturer för sådana nyckelpunkter. De hel-faltade nätverken – med huvuden för både detektorn och deskriptorn – tränas genom en existerande själv-handledd metod, där korrespondenser fås genom kända slumpmässigt valda homografier. En ny strategi för valet av negativa korrespondenser för deskriptorns träning presenteras, vilket möjliggör mer flexibilitet i designen av arkitektur. Den nya strategin visar sig vara väsentlig då den möjliggör nätverk som presterar bättre än den lärda baslinjen utan någon kostnad i inferenstid. Varieringen av modellstorleken leder till en kompromiss mellan snabbhet och exakthet, och medan alla modellerna presterar bättre än ORB i homografi-estimering, så är det endast de större modellerna som närmar sig SIFTs prestanda; där de presterar 1-7% sämre. Att träna längre och med ytterligare typer av data kanske ger tillräcklig förbättring för att prestera bättre än SIFT. Även fast de minsta modellerna är 3× snabbare och använder 50× färre parametrar än den lärda baslinjen, så kräver de fortfarande 3× så mycket tid som SIFT medan de presterar runt 10-30% sämre. Men det finns fortfarande utrymme för förbättring genom optimeringsmetoder som övergränsar ändringar av arkitekturen, som till exempel kvantisering, vilket skulle kunna göra metoden snabbare än SIFT.
APA, Harvard, Vancouver, ISO, and other styles
11

Silvestri, Gianluigi. "One-Shot Neural Architecture Search for Deep Multi-Task Learning in Computer Vision." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282831.

Full text
Abstract:
In this work, a neural architecture search algorithm for multi-task learning is proposed. Given any dataset and tasks group, the method aims to find the optimal way of sharing layers among tasks in convolutional neural networks. A search space suited to multi-task learning is designed, and a novel strategy to rank different Pareto optimal solutions is developed. The core of the algorithm is an adaptation of a state-of-the-art neural architecture search strategy. Experimental results on the Cityscapes dataset, on the tasks of semantic segmentation and depth estimation, do not provide the expected results. Despite the lack of stable results, this work lays down the fundamentals to further develop novel multi-task neural architecture search methods.
I detta arbete föreslås en sökalgoritm för arkitektur inom multiaktivitetsinlärning. Givet en generell datamängd och aktivitetsgrupp, syftar metoden till att hitta det optimala sättet att dela lager mellan aktiviteterna i ett faltningsnätverk. Ett sökrum anpassat till multiaktivitetsinlärning har designats och en ny strategi att ranka olika optimala Pareto-lösningar har utvecklats. Kärnan i algoritmen är en anpassad state-of-the-art sökstrategi för arkitektur. Experimentella resultat för Cityscapes-datasetet, i uppgifter rörande semantisk segmentation och estimation av djup, levererar inte förväntade resultat. Trots avsaknaden av stabila resultat, ger detta arbete en grund för fortsätt utveckling av sökmetoder för arkitektur inom multiaktivitetsinlärning.
APA, Harvard, Vancouver, ISO, and other styles
12

Manero, Font Jaume. "Deep learning architectures applied to wind time series multi-step forecasting." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669283.

Full text
Abstract:
Forecasting is a critical task for the integration of wind-generated energy into electricity grids. Numerical weather models applied to wind prediction, work with grid sizes too large to reproduce all the local features that influence wind, thus making the use of time series with past observations a necessary tool for wind forecasting. This research work is about the application of deep neural networks to multi-step forecasting using multivariate time series as an input, to forecast wind speed at 12 hours ahead. Wind time series are sequences of meteorological observations like wind speed, temperature, pressure, humidity, and direction. Wind series have two statistically relevant properties; non-linearity and non-stationarity, which makes the modelling with traditional statistical tools very inaccurate. In this thesis we design, test and validate novel deep learning models for the wind energy prediction task, applying new deep architectures to the largest open wind data repository available from the National Renewable Laboratory of the US (NREL) with 126,692 wind sites evenly distributed on the US geography. The heterogeneity of the series, obtained from several data origins, allows us to obtain conclusions about the level of fitness of each model to time series that range from highly stationary locations to variable sites from complex areas. We propose Multi-Layer, Convolutional and recurrent Networks as basic building blocks, and then combined into heterogeneous architectures with different variants, trained with optimisation strategies like drop and skip connections, early stopping, adaptive learning rates, filters and kernels of different sizes, between others. The architectures are optimised by the use of structured hyper-parameter setting strategies to obtain the best performing model across the whole dataset. The learning capabilities of the architectures applied to the various sites find relationships between the site characteristics (terrain complexity, wind variability, geographical location) and the model accuracy, establishing novel measures of site predictability relating the fit of the models with indexes from time series spectral or stationary analysis. The designed methods offer new, and superior, alternatives to traditional methods.
La predicció de vent és clau per a la integració de l'energia eòlica en els sistemes elèctrics. Els models meteorològics es fan servir per predicció, però tenen unes graelles geogràfiques massa grans per a reproduir totes les característiques locals que influencien la formació de vent, fent necessària la predicció d'acord amb les sèries temporals de mesures passades d'una localització concreta. L'objectiu d'aquest treball d'investigació és l'aplicació de xarxes neuronals profundes a la predicció \textit{multi-step} utilitzant com a entrada series temporals de múltiples variables meteorològiques, per a fer prediccions de vent d'ací a 12 hores. Les sèries temporals de vent són seqüències d'observacions meteorològiques tals com, velocitat del vent, temperatura, humitat, pressió baromètrica o direcció. Les sèries temporals de vent tenen dues propietats estadístiques rellevants, que són la no linearitat i la no estacionalitat, que fan que la modelització amb eines estadístiques sigui poc precisa. En aquesta tesi es validen i proven models de deep learning per la predicció de vent, aquests models d'arquitectures d'autoaprenentatge s'apliquen al conjunt de dades de vent més gran del món, que ha produït el National Renewable Laboratory dels Estats Units (NREL) i que té 126,692 ubicacions físiques de vent distribuïdes per total la geografia de nord Amèrica. L'heterogeneïtat d'aquestes sèries de dades permet establir conclusions fermes en la precisió de cada mètode aplicat a sèries temporals generades en llocs geogràficament molt diversos. Proposem xarxes neuronals profundes de tipus multi-capa, convolucionals i recurrents com a blocs bàsics sobre els quals es fan combinacions en arquitectures heterogènies amb variants, que s'entrenen amb estratègies d'optimització com drops, connexions skip, estratègies de parada, filtres i kernels de diferents mides entre altres. Les arquitectures s'optimitzen amb algorismes de selecció de paràmetres que permeten obtenir el model amb el millor rendiment, en totes les dades. Les capacitats d'aprenentatge de les arquitectures aplicades a ubicacions heterogènies permet establir relacions entre les característiques d'un lloc (complexitat del terreny, variabilitat del vent, ubicació geogràfica) i la precisió dels models, establint mesures de predictibilitat que relacionen la capacitat dels models amb les mesures definides a partir d'anàlisi espectral o d'estacionalitat de les sèries temporals. Els mètodes desenvolupats ofereixen noves i superiors alternatives als algorismes estadístics i mètodes tradicionals.
Arquitecturas de aprendizaje profundo aplicadas a la predición en múltiple escalón de series temporales de viento. La predicción de viento es clave para la integración de esta energía eólica en los sistemas eléctricos. Los modelos meteorológicos tienen una resolución geográfica demasiado amplia que no reproduce todas las características locales que influencian en la formación del viento, haciendo necesaria la predicción en base a series temporales de cada ubicación concreta. El objetivo de este trabajo de investigación es la aplicación de redes neuronales profundas a la predicción multi-step usando como entrada series temporales de múltiples variables meteorológicas, para realizar predicciones de viento a 12 horas. Las series temporales de viento son secuencias de observaciones meteorológicas tales como, velocidad de viento, temperatura, humedad, presión barométrica o dirección. Las series temporales de viento tienen dos propiedades estadísticas relevantes, que son la no linealidad y la no estacionalidad, lo que implica que su modelización con herramientas estadísticas sea poco precisa. En esta tesis se validan y verifican modelos de aprendizaje profundo para la predicción de viento, estos modelos de arquitecturas de aprendizaje automático se aplican al conjunto de datos de viento más grande del mundo, que ha sido generado por el National Renewable Laboratory de los Estados Unidos (NREL) y que tiene 126,682 ubicaciones físicas de viento distribuidas por toda la geografía de Estados Unidos. La heterogeneidad de estas series de datos permite establecer conclusiones válidas sobre la validez de cada método al ser aplicado en series temporales generadas en ubicaciones físicas muy diversas. Proponemos redes neuronales profundas de tipo multi capa, convolucionales y recurrentes como tipos básicos, sobre los que se han construido combinaciones en arquitecturas heterogéneas con variantes de entrenamiento como drops, conexiones skip, estrategias de parada, filtros y kernels de distintas medidas, entre otros. Las arquitecturas se optimizan con algoritmos de selección de parámetros que permiten obtener el mejor modelo buscando el mejor rendimiento, incluyendo todos los datos. Las capacidades de aprendizaje de las arquitecturas aplicadas a localizaciones físicas muy variadas permiten establecer relaciones entre las características de una ubicación (complejidad del terreno, variabilidad de viento, ubicación geográfica) y la precisión de los modelos, estableciendo medidas de predictibilidad que relacionan la capacidad de los algoritmos con índices que se definen a partir del análisis espectral o de estacionalidad de las series temporales. Los métodos desarrollados ofrecen nuevas alternativas a los algoritmos estadísticos tradicionales.
APA, Harvard, Vancouver, ISO, and other styles
13

Ferré, Paul. "Adéquation algorithme-architecture de réseaux de neurones à spikes pour les architectures matérielles massivement parallèles." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30318/document.

Full text
Abstract:
Cette dernière décennie a donné lieu à la réémergence des méthodes d'apprentissage machine basées sur les réseaux de neurones formels sous le nom d'apprentissage profond. Bien que ces méthodes aient permis des avancées majeures dans le domaine de l'apprentissage machine, plusieurs obstacles à la possibilité d'industrialiser ces méthodes persistent, notamment la nécessité de collecter et d'étiqueter une très grande quantité de données ainsi que la puissance de calcul nécessaire pour effectuer l'apprentissage et l'inférence avec ce type de réseau neuronal. Dans cette thèse, nous proposons d'étudier l'adéquation entre des algorithmes d'inférence et d'apprentissage issus des réseaux de neurones biologiques pour des architectures matérielles massivement parallèles. Nous montrons avec trois contributions que de telles adéquations permettent d'accélérer drastiquement les temps de calculs inhérents au réseaux de neurones. Dans notre premier axe, nous réalisons l'étude d'adéquation du moteur BCVision de Brainchip SAS pour les plate-formes GPU. Nous proposons également l'introduction d'une architecture hiérarchique basée sur des cellules complexes. Nous montrons que l'adéquation pour GPU accélère les traitements par un facteur sept, tandis que l'architecture hiérarchique atteint un facteur mille. La deuxième contribution présente trois algorithmes de propagation de décharges neuronales adaptés aux architectures parallèles. Nous réalisons une étude complète des modèles computationels de ces algorithmes, permettant de sélectionner ou de concevoir un système matériel adapté aux paramètres du réseau souhaité. Dans notre troisième axe nous présentons une méthode pour appliquer la règle Spike-Timing-Dependent-Plasticity à des données images afin d'apprendre de manière non-supervisée des représentations visuelles. Nous montrons que notre approche permet l'apprentissage d'une hiérarchie de représentations pertinente pour des problématiques de classification d'images, tout en nécessitant dix fois moins de données que les autres approches de la littérature
The last decade has seen the re-emergence of machine learning methods based on formal neural networks under the name of deep learning. Although these methods have enabled a major breakthrough in machine learning, several obstacles to the possibility of industrializing these methods persist, notably the need to collect and label a very large amount of data as well as the computing power necessary to perform learning and inference with this type of neural network. In this thesis, we propose to study the adequacy between inference and learning algorithms derived from biological neural networks and massively parallel hardware architectures. We show with three contribution that such adequacy drastically accelerates computation times inherent to neural networks. In our first axis, we study the adequacy of the BCVision software engine developed by Brainchip SAS for GPU platforms. We also propose the introduction of a coarse-to-fine architecture based on complex cells. We show that GPU portage accelerates processing by a factor of seven, while the coarse-to-fine architecture reaches a factor of one thousand. The second contribution presents three algorithms for spike propagation adapted to parallel architectures. We study exhaustively the computational models of these algorithms, allowing the selection or design of the hardware system adapted to the parameters of the desired network. In our third axis we present a method to apply the Spike-Timing-Dependent-Plasticity rule to image data in order to learn visual representations in an unsupervised manner. We show that our approach allows the effective learning a hierarchy of representations relevant to image classification issues, while requiring ten times less data than other approaches in the literature
APA, Harvard, Vancouver, ISO, and other styles
14

Nemirovsky, Daniel A. "Improving heterogeneous system efficiency : architecture, scheduling, and machine learning." Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/461499.

Full text
Abstract:
Computer architects are beginning to embrace heterogeneous systems as an effective method to utilize increases in transistor densities for executing a diverse range of workloads under varying performance and energy constraints. As heterogeneous systems become more ubiquitous, architects will need to develop novel CPU scheduling techniques capable of exploiting the diversity of computational resources. In recognizing hardware diversity, state-of-the-art heterogeneous schedulers are able to produce significant performance improvements over their predecessors and enable more flexible system designs. Nearly all of these, however, are unable to efficiently identify the mapping schemes which will result in the highest system performance. Accurately estimating the performance of applications on different heterogeneous resources can provide a significant advantage to heterogeneous schedulers for identifying a performance maximizing mapping scheme to improve system performance. Recent advances in machine learning techniques including artificial neural networks have led to the development of powerful and practical prediction models for a variety of fields. As of yet, however, no significant leaps have been taken towards employing machine learning for heterogeneous scheduling in order to maximize system throughput. The core issue we approach is how to understand and utilize the rise of heterogeneous architectures, benefits of heterogeneous scheduling, and the promise of machine learning techniques with respect to maximizing system performance. We present studies that promote a future computing model capable of supporting massive hardware diversity, discuss the constraints faced by heterogeneous designers, explore the advantages and shortcomings of conventional heterogeneous schedulers, and pioneer applying machine learning to optimize mapping and system throughput. The goal of this thesis is to highlight the importance of efficiently exploiting heterogeneity and to validate the opportunities that machine learning can offer for various areas in computer architecture.
Arquitectos de computadores estan empesando a diseñar systemas heterogeneos como una manera efficiente de usar los incrementos en densidades de transistors para ejecutar una gran diversidad de programas corriendo debajo de differentes condiciones y requisitos de energia y rendimiento (performance). En cuanto los sistemas heterogeneos van ganando popularidad de uso, arquitectos van a necesitar a diseñar nuevas formas de hacer el scheduling de las applicaciones en los cores distintos de los CPUs. Schedulers nuevos que tienen en cuenta la heterogeniedad de los recursos en el hardware logran importantes beneficios en terminos de rendimiento en comparacion con schedulers hecho para sistemas homogenios. Pero, casi todos de estos schedulers heterogeneos no son capaz de poder identificar la esquema de mapping que produce el rendimiento maximo dado el estado de los cores y las applicaciones. Estimando con precision el rendimiento de los programas ejecutando sobre diferentes cores de un CPU es un a gran ventaja para poder identificar el mapping para lograr el mejor rendimiento posible para el proximo scheduling quantum. Desarollos nuevos en la area de machine learning, como redes neurales, han producido predictores muy potentes y con gran precision in disciplinas numerosas. Pero en estos momentos, la aplicacion de metodos de machine learning no se han casi explorados para poder mejorar la eficiencia de los CPUs y menos para mejorar los schedulers para sistemas heterogeneos. El tema de enfoque en esta tesis es como poder entender y utilizar los sistemas heterogeneos, los beneficios de scheduling para estos sistemas, y como aprovechar las promesas de los metodos de machine learning con respeto a maximizer el redimiento de el Sistema. Presentamos estudios que dan una esquema para un modelo de computacion para el futuro capaz de dar suporte a recursos heterogeneos en gran escala, discutimos las restricciones enfrentados por diseñadores de sistemas heterogeneos, exploramos las ventajas y desventajas de las ultimas schedulers heterogeneos, y abrimos el camino de usar metodos de machine learning para optimizer el mapping y rendimiento de un sistema heterogeneo. El objetivo de esta tesis es destacar la imporancia de explotando eficientemente la heterogenidad de los recursos y tambien validar las oportunidades para mejorar la eficiencia en diferente areas de arquitectura de computadoras que pueden ser realizadas gracias a machine learning.
APA, Harvard, Vancouver, ISO, and other styles
15

Li, Yanxi. "Efficient Neural Architecture Search with an Active Performance Predictor." Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/24092.

Full text
Abstract:
This thesis searches for the optimal neural architecture by minimizing a proxy of validation loss. Existing neural architecture search (NAS) methods used to discover the optimal neural architecture that best fits the validation examples given the up-to-date network weights. However, back propagation with a number of validation examples could be time consuming, especially when it needs to be repeated many times in NAS. Though these intermediate validation results are invaluable, they would be wasted if we cannot use them to predict the future from the past. In this thesis, we propose to approximate the validation loss landscape by learning a mapping from neural architectures to their corresponding validate losses. The optimal neural architecture thus can be easily identified as the minimum of this proxy validation loss landscape. A novel sampling strategy is further developed for an efficient approximation of the loss landscape. Theoretical analysis indicates that the validation loss estimator learned with our sampling strategy can reach a lower error rate and a lower label complexity compared with a uniform sampling. Experimental results on benchmarks demonstrate that the architecture searched by the proposed algorithm can achieve a satisfactory accuracy with less time cost.
APA, Harvard, Vancouver, ISO, and other styles
16

Cuan, Bonan. "Deep similarity metric learning for multiple object tracking." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI065.

Full text
Abstract:
Le suivi d’objets multiples dans une scène est une tâche importante dans le domaine de la vision par ordinateur, et présente toujours de très nombreux verrous. Les objets doivent être détectés et distingués les uns des autres de manière continue et simultanée. Les approches «suivi par détection» sont largement utilisées, où la détection des objets est d’abord réalisée sur toutes les frames, puis le suivi est ramené à un problème d’association entre les détections d’un même objet et les trajectoires identifiées. La plupart des algorithmes de suivi associent des modèles de mouvement et des modèles d’apparence. Dans cette thèse, nous proposons un modèle de ré-identification basé sur l’apparence et utilisant l’apprentissage de métrique de similarité. Nous faisons tout d’abord appel à un réseau siamois profond pour apprendre un maping de bout en bout, des images d’entrée vers un espace de caractéristiques où les objets sont mieux discriminés. De nombreuses configurations sont évaluées, afin d’en déduire celle offrant les meilleurs scores. Le modèle ainsi obtenu atteint des résultats de ré-identification satisfaisants comparables à l’état de l’art. Ensuite, notre modèle est intégré dans un système de suivi d’objets multiples pour servir de guide d’apparence pour l’association des objets. Un modèle d’apparence est établi pour chaque objet détecté s’appuyant sur le modèle de ré-identification. Les similarités entre les objets détectés sont alors exploitées pour la classification. Par ailleurs, nous avons étudié la coopération et les interférences entre les modèles d’apparence et de mouvement dans le processus de suivi. Un couplage actif entre ces 2 modèles est proposé pour améliorer davantage les performances du suivi, et la contribution de chacun d’eux est estimée en continue. Les expérimentations menées dans le cadre du benchmark «Multiple Object Tracking Challenge» ont prouvé l’efficacité de nos propositions et donné de meilleurs résultats de suivi que l’état de l’art
Multiple object tracking, i.e. simultaneously tracking multiple objects in the scene, is an important but challenging visual task. Objects should be accurately detected and distinguished from each other to avoid erroneous trajectories. Since remarkable progress has been made in object detection field, “tracking-by-detection” approaches are widely adopted in multiple object tracking research. Objects are detected in advance and tracking reduces to an association problem: linking detections of the same object through frames into trajectories. Most tracking algorithms employ both motion and appearance models for data association. For multiple object tracking problems where exist many objects of the same category, a fine-grained discriminant appearance model is paramount and indispensable. Therefore, we propose an appearance-based re-identification model using deep similarity metric learning to deal with multiple object tracking in mono-camera videos. Two main contributions are reported in this dissertation: First, a deep Siamese network is employed to learn an end-to-end mapping from input images to a discriminant embedding space. Different metric learning configurations using various metrics, loss functions, deep network structures, etc., are investigated, in order to determine the best re-identification model for tracking. In addition, with an intuitive and simple classification design, the proposed model achieves satisfactory re-identification results, which are comparable to state-of-the-art approaches using triplet losses. Our approach is easy and fast to train and the learned embedding can be readily transferred onto the domain of tracking tasks. Second, we integrate our proposed re-identification model in multiple object tracking as appearance guidance for detection association. For each object to be tracked in a video, we establish an identity-related appearance model based on the learned embedding for re-identification. Similarities among detected object instances are exploited for identity classification. The collaboration and interference between appearance and motion models are also investigated. An online appearance-motion model coupling is proposed to further improve the tracking performance. Experiments on Multiple Object Tracking Challenge benchmark prove the effectiveness of our modifications, with a state-of-the-art tracking accuracy
APA, Harvard, Vancouver, ISO, and other styles
17

Amara, Pavan Kumar. "Towards a Unilateral Sensor Architecture for Detecting Person-to-Person Contacts." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1404573/.

Full text
Abstract:
The contact patterns among individuals can significantly affect the progress of an infectious outbreak within a population. Gathering data about these interaction and mixing patterns is essential to assess computational modeling of infectious diseases. Various self-report approaches have been designed in different studies to collect data about contact rates and patterns. Recent advances in sensing technology provide researchers with a bilateral automated data collection devices to facilitate contact gathering overcoming the disadvantages of previous approaches. In this study, a novel unilateral wearable sensing architecture has been proposed that overcome the limitations of the bi-lateral sensing. Our unilateral wearable sensing system gather contact data using hybrid sensor arrays embedded in wearable shirt. A smartphone application has been used to transfer the collected sensors data to the cloud and apply deep learning model to estimate the number of human contacts and the results are stored in the cloud database. The deep learning model has been developed on the hand labelled data over multiple experiments. This model has been tested and evaluated, and these results were reported in the study. Sensitivity analysis has been performed to choose the most suitable image resolution and format for the model to estimate contacts and to analyze the model's consumption of computer resources.
APA, Harvard, Vancouver, ISO, and other styles
18

Pereira, Renato de Pontes. "HIGMN : an IGMN-based hierarchical architecture and its applications for robotic tasks." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/80752.

Full text
Abstract:
O recente campo de Deep Learning introduziu a área de Aprendizagem de Máquina novos métodos baseados em representações distribuídas e abstratas dos dados de treinamento ao longo de estruturas hierárquicas. A organização hierárquica de camadas permite que esses métodos guardem informações distribuídas sobre os sinais sensoriais e criem conceitos com diferentes níveis de abstração para representar os dados de entrada. Este trabalho investiga o impacto de uma estrutura hierárquica inspirada pelas ideias apresentadas em Deep Learning, e com base na Incremental Gaussian Mixture Network (IGMN), uma rede neural probabilística com aprendizagem online e incremental, especialmente adequada para as tarefas de robótica. Como resultado, foi desenvolvida uma arquitetura hierárquica, denominada Hierarchical Incremental Gaussian Mixture Network (HIGMN), que combina dois níveis de IGMNs. As camadas de primeiro nível da HIGMN são capazes de aprender conceitos a partir de dados de diferentes domínios que são então relacionados na camada de segundo nível. O modelo proposto foi comparado com a IGMN em tarefas de robótica, em especial, na tarefa de aprender e reproduzir um comportamento de seguir paredes, com base em uma abordagem de Aprendizado por Demonstração. Os experimentos mostraram como a HIGMN pode executar três diferentes tarefas em paralelo (aprendizagem de conceitos, segmentação de comportamento, e aprendizagem e reprodução de comportamentos) e sua capacidade de aprender um comportamento de seguir paredes e reproduzi-lo em ambientes desconhecidos com novas informações sensoriais. A HIGMN conseguiu reproduzir o comportamento de seguir paredes depois de uma única, simples e curta demonstração do comportamento. Além disso, ela adquiriu conhecimento de diferentes tipos: informações sobre o ambiente, a cinemática do robô, e o comportamento alvo.
The recent field of Deep Learning has introduced to Machine Learning new meth- ods based on distributed abstract representations of the training data throughout hierarchical structures. The hierarchical organization of layers allows these meth- ods to store distributed information on sensory signals and to create concepts with different abstraction levels to represent the input data. This work investigates the impact of a hierarchical structure inspired by ideas on Deep Learning and based on the Incremental Gaussian Mixture Network (IGMN), a probabilistic neural network with an on-line and incremental learning, specially suitable for robotic tasks. As a result, a hierarchical architecture, called Hierarchical Incremental Gaussian Mixture Network (HIGMN), was developed, which combines two levels of IGMNs. The HIGMN first-level layers are able to learn concepts from data of different domains that are then related in the second-level layer. The proposed model was compared with the IGMN regarding robotic tasks, in special, the task of learning and repro- ducing a wall-following behavior, based on a Learning from Demonstration (LfD) approach. The experiments showed how the HIGMN can perform parallely three different tasks concept learning, behavior segmentation, and learning and repro- ducing behaviors and its ability to learn a wall-following behavior and to perform it in unknown environments with new sensory information. HIGMN could reproduce the wall-following behavior after a single, simple, and short demonstration of the behavior. Moreover, it acquired different types of knowledge: information on the environment, the robot kinematics, and the target behavior.
APA, Harvard, Vancouver, ISO, and other styles
19

García, López Javier. "Geometric computer vision meets deep learning for autonomous driving applications." Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672708.

Full text
Abstract:
This dissertation intends to provide theoretical and practical contributions on the development of deep learning algorithms for autonomous driving applications. The research is motivated by the need of deep neural networks (DNNs) to get a full understanding of the surrounding area and to be executed on real driving scenarios with real vehicles equipped with specific hardware, such as memory constrained (DSP or GPU platforms) or multiple optical sensors, which constraints the algorithm's development forcing the designed deep networks to be accurate, with minimum number of operations and low memory consumption. The main objective of this thesis is, on one hand, the research in the actual limitations of DL-based algorithms that prevent them of being integrated in nowadays' ADAS (Autonomous Driving System) functionalities, and on the other hand, the design and implementation of deep learning algorithms able to overcome such constraints to be applied on real autonomous driving scenarios, enabling their integration in low memory hardware platforms and avoiding sensor redundancy. Deep learning (DL) applications have been widely exploited over the last years but have some weak points that need to be faced and overcame in order to fully integrate DL into the development process of big manufacturers or automotive companies, like the time needed to design, train and validate and optimal network for a specific application or the vast knowledge from the required experts to tune hyperparameters of predefined networks in order to make them executable in the target platform and to obtain the biggest advantage of the hardware resources. During this thesis, we have addressed these topics and focused on the implementations of breakthroughs that would help in the industrial integration of DL-based applications in the automobile industry. This work has been done as part of the "Doctorat Industrial" program, at the company FICOSA ADAS, and it is because of the possibilities that developing this research at the company's facilities have brought to the author, that a direct impact of the achieved algorithms could be tested on real scenarios to proof their validity. Moreover, in this work, the author investigates deep in the automatic design of deep neural networks (DNN) based on state-of-the-art deep learning frameworks like NAS (neural architecture search). As stated in this work, one of the identified barriers of deep learning technology in nowadays automobile companies is the difficulty of developing light and accurate networks that could be integrated in small system on chips (SoC) or DSP. To overcome this constraint, the author proposes a framework named E-DNAS for the automatic design, training and validation of deep neural networks to perform image classification tasks and run on resource-limited hardware platforms. This apporach have been validated on a real system on chip by the company Texas Instrumets (tda2x) provided by the company, whose results are published within this thesis. As an extension of the mentioned E-DNAS, in the last chapter of this work the author presents a framework based on NAS valid for detecting objects whose main contribution is a learnable and fast way of finding object proposals on images that, on a second step, will be classified into one of the labeled classes.
Esta disertación tiene como objetivo principal proporcionar contribuciones teóricas y prácticas sobre el desarrollo de algoritmos de aprendizaje profundo para aplicaciones de conducción autónoma. La investigación está motivada por la necesidad de redes neuronales profundas (DNN) para obtener una comprensión completa del entorno y para ejecutarse en escenarios de conducción reales con vehículos reales equipados con hardware específico, los cuales tienen memoria limitada (plataformas DSP o GPU) o utilizan múltiples sensores ópticos Esto limita el desarrollo del algoritmo obligando a las redes profundas diseñadas a ser precisas, con un número mínimo de operaciones y bajo consumo de memoria y energía. El objetivo principal de esta tesis es, por un lado, investigar las limitaciones reales de los algoritmos basados en DL que impiden que se integren en las funcionalidades ADAS (Autonomous Driving System) actuales, y por otro, el diseño e implementación de algoritmos de aprendizaje profundo capaces de superar tales limitaciones para ser aplicados en escenarios reales de conducción autónoma, permitiendo su integración en plataformas de hardware de baja memoria y evitando la redundancia de sensores. Las aplicaciones de aprendizaje profundo (DL) se han explotado ampliamente en los últimos años, pero tienen algunos puntos débiles que deben enfrentarse y superarse para integrar completamente la DL en el proceso de desarrollo de los grandes fabricantes o empresas automobilísticas, como el tiempo necesario para diseñar, entrenar y validar una red óptima para una aplicación específica o el vasto conocimiento de los expertos requeridos para tunear hiperparámetros de redes predefinidas con el fin de hacerlas ejecutables en una plataforma concreta y obtener la mayor ventaja de los recursos de hardware. Durante esta tesis, hemos abordado estos temas y nos hemos centrado en las implementaciones de avances que ayudarían en la integración industrial de aplicaciones basadas en DL en la industria del automóvil. Este trabajo se ha realizado en el marco del programa "Doctorat Industrial", en la empresa FICOSA ADAS, y es por las posibilidades que la empresa ha ofrecido que se ha podido demostrar un impacto rápido y directo de los algoritmos conseguidos en escenarios de test reales para probar su validez. Además, en este trabajo, se investiga en profundidad el diseño automático de redes neuronales profundas (DNN) basadas en frameworks de deep learning de última generación como NAS (neural architecture search). Como se afirma en esta tesis, una de las barreras identificadas de la tecnología de aprendizaje profundo en las empresas automotrices de hoy en día es la dificultad de desarrollar redes ligeras y precisas que puedan integrarse en pequeños systems on chip(SoC) o DSP. Para superar esta restricción, se propone un framework llamado E-DNAS para el diseño automático, entrenamiento y validación de redes neuronales profundas para realizar tareas de clasificación de imágenes y ejecutarse en plataformas de hardware con recursos limitados. Este apporach ha sido validado en un system on chip real de la empresa Texas Instrumets (tda2x) facilitado por FICOSA ADAS, cuyos resultados se publican dentro de esta tesis. Como extensión del mencionado E-DNAS, en el último capítulo de este trabajo se presenta un framework basado en NAS válido para la detección de objetos cuya principal contribución es una forma fácil y rápida de encontrar propuestas de objetos en imágenes que, en un segundo paso, se clasificará en una de las clases etiquetadas.
Automàtica, robòtica i visió
APA, Harvard, Vancouver, ISO, and other styles
20

Amara, Pavan Kumar. "Towards a Unilateral Sensor Architecture for Detecting Person-to-Person Contacts." Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc1703441/.

Full text
Abstract:
The contact patterns among individuals can significantly affect the progress of an infectious outbreak within a population. Gathering data about these interaction and mixing patterns is essential to assess computational modeling of infectious diseases. Various self-report approaches have been designed in different studies to collect data about contact rates and patterns. Recent advances in sensing technology provide researchers with a bilateral automated data collection devices to facilitate contact gathering overcoming the disadvantages of previous approaches. In this study, a novel unilateral wearable sensing architecture has been proposed that overcome the limitations of the bi-lateral sensing. Our unilateral wearable sensing system gather contact data using hybrid sensor arrays embedded in wearable shirt. A smartphone application has been used to transfer the collected sensors data to the cloud and apply deep learning model to estimate the number of human contacts and the results are stored in the cloud database. The deep learning model has been developed on the hand labelled data over multiple experiments. This model has been tested and evaluated, and these results were reported in the study. Sensitivity analysis has been performed to choose the most suitable image resolution and format for the model to estimate contacts and to analyze the model's consumption of computer resources.
APA, Harvard, Vancouver, ISO, and other styles
21

Bono, Guillaume. "Deep multi-agent reinforcement learning for dynamic and stochastic vehicle routing problems." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI096.

Full text
Abstract:
La planification de tournées de véhicules dans des environnements urbains denses est un problème difficile qui nécessite des solutions robustes et flexibles. Les approches existantes pour résoudre ces problèmes de planification de tournées dynamiques et stochastiques (DS-VRPs) sont souvent basés sur les mêmes heuristiques utilisées dans le cas statique et déterministe, en figeant le problème à chaque fois que la situation évolue. Au lieu de cela, nous proposons dans cette thèse d’étudier l’application de méthodes d’apprentissage par renforcement multi-agent (MARL) aux DS-VRPs en s’appuyant sur des réseaux de neurones profonds (DNNs). Plus précisément, nous avons d’abord contribuer à étendre les méthodes basées sur le gradient de la politique (PG) aux cadres des processus de décision de Markov (MDPs) partiellement observables et décentralisés (Dec-POMDPs). Nous avons ensuite proposé un nouveau modèle de décision séquentiel en relâchant la contrainte d’observabilité partielle que nous avons baptisé MDP multi-agent séquentiel (sMMDP). Ce modèle permet de décrire plus naturellement les DS-VRPs, dans lesquels les véhicules prennent la décision de servir leurs prochains clients à l’issu de leurs précédents services, sans avoir à attendre les autres. Pour représenter nos solutions, des politiques stochastiques fournissant aux véhicules des règles de décisions, nous avons développé une architecture de DNN basée sur des mécanismes d’attention (MARDAM). Nous avons évalué MARDAM sur un ensemble de bancs de test artificiels qui nous ont permis de valider la qualité des solutions obtenues, la robustesse et la flexibilité de notre approche dans un contexte dynamique et stochastique, ainsi que sa capacité à généraliser à toute une classe de problèmes sans avoir à être ré-entraînée. Nous avons également développé un banc de test plus réaliste à base d’une simulation micro-traffic, et présenté une preuve de concept de l’applicabilité de MARDAM face à une variété de situations différentes
Routing delivery vehicles in dynamic and uncertain environments like dense city centers is a challenging task, which requires robustness and flexibility. Such logistic problems are usually formalized as Dynamic and Stochastic Vehicle Routing Problems (DS-VRPs) with a variety of additional operational constraints, such as Capacitated vehicles or Time Windows (DS-CVRPTWs). Main heuristic approaches to dynamic and stochastic problems simply consist in restarting the optimization process on a frozen (static and deterministic) version of the problem given the new information. Instead, Reinforcement Learning (RL) offers models such as Markov Decision Processes (MDPs) which naturally describe the evolution of stochastic and dynamic systems. Their application to more complex problems has been facilitated by recent progresses in Deep Neural Networks, which can learn to represent a large class of functions in high dimensional spaces to approximate solutions with high performances. Finding a compact and sufficiently expressive state representation is the key challenge in applying RL to VRPs. Recent work exploring this novel approach demonstrated the capabilities of Attention Mechanisms to represent sets of customers and learn policies generalizing to different configurations of customers. However, all existing work using DNNs reframe the VRP as a single-vehicle problem and cannot provide online decision rules for a fleet of vehicles.In this thesis, we study how to apply Deep RL methods to rich DS-VRPs as multi-agent systems. We first explore the class of policy-based approaches in Multi-Agent RL and Actor-Critic methods for Decentralized, Partially Observable MDPs in the Centralized Training for Decentralized Control (CTDC) paradigm. To address DS-VRPs, we then introduce a new sequential multi-agent model we call sMMDP. This fully observable model is designed to capture the fact that consequences of decisions can be predicted in isolation. Afterwards, we use it to model a rich DS-VRP and propose a new modular policy network to represent the state of the customers and the vehicles in this new model, called MARDAM. It provides online decision rules adapted to the information contained in the state and takes advantage of the structural properties of the model. Finally, we develop a set of artificial benchmarks to evaluate the flexibility, the robustness and the generalization capabilities of MARDAM. We report promising results in the dynamic and stochastic case, which demonstrate the capacity of MARDAM to address varying scenarios with no re-optimization, adapting to new customers and unexpected delays caused by stochastic travel times. We also implement an additional benchmark based on micro-traffic simulation to better capture the dynamics of a real city and its road infrastructures. We report preliminary results as a proof of concept that MARDAM can learn to represent different scenarios, handle varying traffic conditions, and customers configurations
APA, Harvard, Vancouver, ISO, and other styles
22

Bonazza, Pierre. "Système de sécurité biométrique multimodal par imagerie, dédié au contrôle d’accès." Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCK017/document.

Full text
Abstract:
Les travaux de recherche de cette thèse consistent à mettre en place des solutions performantes et légères permettant de répondre aux problèmes de sécurisation de produits sensibles. Motivé par une collaboration avec différents acteurs au sein du projet Nuc-Track,le développement d'un système de sécurité biométrique, possiblement multimodal, mènera à une étude sur différentes caractéristiques biométriques telles que le visage, les empreintes digitales et le réseau vasculaire. Cette thèse sera axée sur une adéquation algorithme et architecture, dans le but de minimiser la taille de stockage des modèles d'apprentissages tout en garantissant des performances optimales. Cela permettra leur stockage sur un support personnel, respectant ainsi les normes de vie privée
Research of this thesis consists in setting up efficient and light solutions to answer the problems of securing sensitive products. Motivated by a collaboration with various stakeholders within the Nuc-Track project, the development of a biometric security system, possibly multimodal, will lead to a study on various biometric features such as the face, fingerprints and the vascular network. This thesis will focus on an algorithm and architecture matching, with the aim of minimizing the storage size of the learning models while guaranteeing optimal performances. This will allow it to be stored on a personal support, thus respecting privacy standards
APA, Harvard, Vancouver, ISO, and other styles
23

Anani-Manyo, Nina K. "Computer Vision and Building Envelopes." Kent State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=kent1619539038754026.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Bai, Kang Jun. "Moving Toward Intelligence: A Hybrid Neural Computing Architecture for Machine Intelligence Applications." Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103711.

Full text
Abstract:
Rapid advances in machine learning have made information analysis more efficient than ever before. However, to extract valuable information from trillion bytes of data for learning and decision-making, general-purpose computing systems or cloud infrastructures are often deployed to train a large-scale neural network, resulting in a colossal amount of resources in use while themselves exposing other significant security issues. Among potential approaches, the neuromorphic architecture, which is not only amenable to low-cost implementation, but can also deployed with in-memory computing strategy, has been recognized as important methods to accelerate machine intelligence applications. In this dissertation, theoretical and practical properties of a hybrid neural computing architecture are introduced, which utilizes a dynamic reservoir having the short-term memory to enable the historical learning capability with the potential to classify non-separable functions. The hybrid neural computing architecture integrates both spatial and temporal processing structures, sidestepping the limitations introduced by the vanishing gradient. To be specific, this is made possible through four critical features: (i) a feature extractor built based upon the in-memory computing strategy, (ii) a high-dimensional mapping with the Mackey-Glass neural activation, (iii) a delay-dynamic system with historical learning capability, and (iv) a unique learning mechanism by only updating readout weights. To support the integration of neuromorphic architecture and deep learning strategies, the first generation of delay-feedback reservoir network has been successfully fabricated in 2017, better yet, the spatial-temporal hybrid neural network with an improved delay-feedback reservoir network has been successfully fabricated in 2020. To demonstrate the effectiveness and performance across diverse machine intelligence applications, the introduced network structures are evaluated through (i) time series prediction, (ii) image classification, (iii) speech recognition, (iv) modulation symbol detection, (v) radio fingerprint identification, and (vi) clinical disease identification.
Doctor of Philosophy
Deep learning strategies are the cutting-edge of artificial intelligence, in which the artificial neural networks are trained to extract key features or finding similarities from raw sensory information. This is made possible through multiple processing layers with a colossal amount of neurons, in a similar way to humans. Deep learning strategies run on von Neumann computers are deployed worldwide. However, in today's data-driven society, the use of general-purpose computing systems and cloud infrastructures can no longer offer a timely response while themselves exposing other significant security issues. Arose with the introduction of neuromorphic architecture, application-specific integrated circuit chips have paved the way for machine intelligence applications in recently years. The major contributions in this dissertation include designing and fabricating a new class of hybrid neural computing architecture and implementing various deep learning strategies to diverse machine intelligence applications. The resulting hybrid neural computing architecture offers an alternative solution to accelerate the neural computations required for sophisticated machine intelligence applications with a simple system-level design, and therefore, opening the door to low-power system-on-chip design for future intelligence computing, what is more, providing prominent design solutions and performance improvements for internet of things applications.
APA, Harvard, Vancouver, ISO, and other styles
25

Blot, Michaël. "Étude de l'apprentissage et de la généralisation des réseaux profonds en classification d'images." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS412.

Full text
Abstract:
L'intelligence artificielle connait une résurgence ces dernières années. En cause, la capacité croissante à rassembler et à stocker un nombre considérable de données digitalisées. Ces immenses bases de données permettent aux algorithmes de machine learning de répondre à certaines tâches par apprentissage supervisé. Parmi les données digitalisées, les images demeurent prépondérantes dans l’environnement moderne. D'immenses datasets ont été constitués. De plus, la classification d'image a permis l’essor de modèles jusqu'alors négligés, les réseaux de neurones profonds ou deep learning. Cette famille d'algorithmes démontre une grande facilité à apprendre parfaitement des datasets, même de très grande taille. Leurs capacités de généralisation demeure largement incomprise, mais les réseaux de convolutions sont aujourd'hui l'état de l'art incontesté. D'un point de vue recherche et application du deep learning, les demandes vont être de plus en plus exigeantes, nécessitant de fournir un effort pour porter les performances des réseaux de neurone au maximum de leurs capacités. C'est dans cet objectif que se place nos recherches dont les contributions sont présentées dans cette thèse. Nous nous sommes d'abord penchés sur la question de l'entrainement et avons envisagé d’accélérer celui ci grâce à des méthodes distribuées. Nous avons ensuite étudié les architectures dans le but de les améliorer sans toutefois trop augmenter leurs complexités. Enfin nous avons particulièrement étudié la régularisation de l'entrainement des réseaux. Nous avons envisagé un critère de régularisation basée sur la théorie de l'information que nous avons déployé de deux façons différentes
Artificial intelligence is experiencing a resurgence in recent years. This is due to the growing ability to collect and store a considerable amount of digitized data. These huge databases allow machine learning algorithms to respond to certain tasks through supervised learning. Among the digitized data, images remain predominant in the modern environment. Huge datasets have been created. moreover, the image classification has allowed the development of previously neglected models, deep neural networks or deep learning. This family of algorithms demonstrates a great facility to learn perfectly datasets, even very large. Their ability to generalize remains largely misunderstood, but the networks of convolutions are today the undisputed state of the art. From a research and application point of view of deep learning, the demands will be more and more demanding, requiring to make an effort to bring the performances of the neuron networks to the maximum of their capacities. This is the purpose of our research, whose contributions are presented in this thesis. We first looked at the issue of training and considered accelerating it through distributed methods. We then studied the architectures in order to improve them without increasing their complexity. Finally, we particularly study the regularization of network training. We studied a regularization criterion based on information theory that we deployed in two different ways
APA, Harvard, Vancouver, ISO, and other styles
26

Carbonera, Luvizon Diogo. "Apprentissage automatique pour la reconnaissance d'action humaine et l'estimation de pose à partir de l'information 3D." Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1015.

Full text
Abstract:
La reconnaissance d'actions humaines en 3D est une tâche difficile en raisonde la complexité de mouvements humains et de la variété des poses et desactions accomplies par différents sujets. Les technologies récentes baséessur des capteurs de profondeur peuvent fournir les représentationssquelettiques à faible coût de calcul, ce qui est une information utilepour la reconnaissance d'actions.Cependant, ce type de capteurs se limite à des environnementscontrôlés et génère fréquemment des données bruitées. Parallèlement à cesavancées technologiques, les réseaux de neurones convolutifs (CNN) ontmontré des améliorations significatives pour la reconnaissance d’actions etpour l’estimation de la pose humaine en 3D à partir des images couleurs.Même si ces problèmes sont étroitement liés, les deux tâches sont souventtraitées séparément dans la littérature.Dans ce travail, nous analysons le problème de la reconnaissance d'actionshumaines dans deux scénarios: premièrement, nous explorons lescaractéristiques spatiales et temporelles à partir de représentations desquelettes humains, et qui sont agrégées par une méthoded'apprentissage de métrique. Dans le deuxième scénario, nous montrons nonseulement l'importance de la précision de la pose en 3D pour lareconnaissance d'actions, mais aussi que les deux tâches peuvent êtreefficacement effectuées par un seul réseau de neurones profond capabled'obtenir des résultats du niveau de l'état de l'art.De plus, nous démontrons que l'optimisation de bout en bout en utilisant lapose comme contrainte intermédiaire conduit à une précision plus élevée sur latâche de reconnaissance d'action que l'apprentissage séparé de ces tâches. Enfin, nous proposons une nouvellearchitecture adaptable pour l’estimation de la pose en 3D et la reconnaissancede l’actions simultanément et en temps réel. Cette architecture offre une gammede compromis performances vs vitesse avec une seule procédure d’entraînementmultitâche et multimodale
3D human action recognition is a challenging task due to the complexity ofhuman movements and to the variety on poses and actions performed by distinctsubjects. Recent technologies based on depth sensors can provide 3D humanskeletons with low computational cost, which is an useful information foraction recognition. However, such low cost sensors are restricted tocontrolled environment and frequently output noisy data. Meanwhile,convolutional neural networks (CNN) have shown significant improvements onboth action recognition and 3D human pose estimation from RGB images. Despitebeing closely related problems, the two tasks are frequently handled separatedin the literature. In this work, we analyze the problem of 3D human actionrecognition in two scenarios: first, we explore spatial and temporalfeatures from human skeletons, which are aggregated by a shallow metriclearning approach. In the second scenario, we not only show that precise 3Dposes are beneficial to action recognition, but also that both tasks can beefficiently performed by a single deep neural network and stillachieves state-of-the-art results. Additionally, wedemonstrate that optimization from end-to-end using poses as an intermediateconstraint leads to significant higher accuracy on the action task thanseparated learning. Finally, we propose a new scalable architecture forreal-time 3D pose estimation and action recognition simultaneously, whichoffers a range of performance vs speed trade-off with a single multimodal andmultitask training procedure
APA, Harvard, Vancouver, ISO, and other styles
27

Speranza, Nicholas A. "Adaptive Two-Stage Edge-Centric Architecture for Deeply-Learned Embedded Real-Time Target Classification in Aerospace Sense-and-Avoidance Applications." Wright State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1621886997260122.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Lomonaco, Vincenzo <1991&gt. "Continual Learning with Deep Architectures." Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amsdottorato.unibo.it/9073/1/vincenzo_lomonaco_thesis.pdf.

Full text
Abstract:
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills. However, despite early speculations and few pioneering works, very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for. Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus. In this dissertation, we study the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI. We propose a comprehensive and unifying framework for continual learning, new metrics, benchmarks and algorithms, as well as providing substantial experimental evaluations in different supervised, unsupervised and reinforcement learning tasks.
APA, Harvard, Vancouver, ISO, and other styles
29

Bäuml, Berthold [Verfasser], Bernd [Akademischer Betreuer] Krieg-Brückner, Bernd [Gutachter] Krieg-Brückner, and Gerd [Gutachter] Hirzinger. "Bringing a Humanoid Robot Closer to Human Versatility : Hard Realtime Software Architecture and Deep Learning Based Tactile Sensing / Berthold Bäuml ; Gutachter: Bernd Krieg-Brückner, Gerd Hirzinger ; Betreuer: Bernd Krieg-Brückner." Bremen : Staats- und Universitätsbibliothek Bremen, 2019. http://d-nb.info/1177239914/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Sarpangala, Kishan. "Semantic Segmentation Using Deep Learning Neural Architectures." University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin157106185092304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Sudhakaran, Swathikiran. "Deep Neural Architectures for Video Representation Learning." Doctoral thesis, Università degli studi di Trento, 2019. https://hdl.handle.net/11572/369191.

Full text
Abstract:
Automated analysis of videos for content understanding is one of the most challenging and well researched areas in computer vision and multimedia. This thesis addresses the problem of video content understanding in the context of action recognition. The major challenge faced by this research problem is the variations of the spatio-temporal patterns that constitute each action category and the difficulty in generating a succinct representation encapsulating these patterns. This thesis considers two important aspects of videos for addressing this problem: (1) a video is a sequence of images with an inherent temporal dependency that defines the actual pattern to be recognized; (2) not all spatial regions of the video frame are equally important for discriminating one action category from another. The first aspect shows the importance of aggregating frame level features in a sequential manner while the second aspect signifies the importance of selective encoding of frame level features. The first problem is addressed by analyzing popular Convolutional Neural Network (CNN)-Recurrent Neural Network (RNN) architectures for video representation generation and concludes that Convolutional Long Short-Term Memory (ConvLSTM), a variant of the popular Long Short-Term Memory (LSTM) RNN unit, is suitable for encoding spatio-temporal patterns occurring in a video sequence. The second problem is tackled by developing a spatial attention mechanism for the selective encoding of spatial features by weighting spatial regions in the feature tensor that are relevant for identifying the action category. Detailed experimental analysis carried out on two video recognition tasks showed that spatially selective encoding is indeed beneficial. Inspired from the two aforementioned findings, a new recurrent neural unit, called Long Short-Term Attention (LSTA), is developed by augmenting LSTM with built-in spatial attention and a revised output gating. The first enables LSTA to attend to the relevant spatial regions while maintaining a smooth tracking of the attended regions and the latter allows the network to propagate a filtered version of the memory localized on the most discriminative components of the video. LSTA surpasses the recognition accuracy of existing state-of-the-art techniques on popular egocentric activity recognition benchmarks, showing its effectiveness in video representation generation.
APA, Harvard, Vancouver, ISO, and other styles
32

Pageaud, Simon. "SmartGov : architecture générique pour la co-construction de politiques urbaines basée sur l'apprentissage par renforcement multi-agent." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1128.

Full text
Abstract:
Dans cette thèse, nous proposons un outil SmartGov, mixant simulation multi-agents et apprentissage multi-agents par renforcement profond, pour permettre la co-construction de politiques urbaines et inscrire les acteurs de la ville dans la boucle de conception. La Smart City permet à l’outil d’intégrer les données collectées par les capteurs présents dans la ville pour la modéliser de façon réaliste. Notre première contribution est une architecture générique pour construire une simulation multi-agents représentant la ville, et étudier l’émergence de comportement globaux avec des agents réalistes capables de réagir aux décisions politiques. Grâce à une modélisation multi-niveaux, et le couplage de différentes dynamiques, le système apprend les spécificités de l’environnement pour proposer des politiques pertinentes. Notre seconde contribution concerne l'autonomie et l'adaptation de la couche décisionnelle avec un apprentissage par renforcement multi-agents et multi-niveaux. Un ensemble d'agents, regroupés en clusters, est distribué dans le périmètre étudié pour apprendre des spécificités locales sans connaissance a priori de son environnement. L’attribution d’un score de confiance et de récompenses individuelles permettent d'atténuer l'impact de la non-stationnarité sur la réutilisation d'expériences nécessaire à l'apprentissage profond. Ces contributions conduisent à un système complet de co-construction de politiques urbaines dans le contexte de la Smart City. Nous comparons notre modèle avec d'autres approches de la littérature sur une politique de tarification du stationnement urbain, afin de mettre en évidence les apports et les limites de nos contributions
In this thesis, we propose the SmartGov model, coupling multi-agent simulation and multi-agent deep reinforcement learning, to help co-construct urban policies and integrate all stakeholders in the decision process. Smart Cities provide sensor data from the urban areas to increase realism of the simulation in SmartGov.Our first contribution is a generic architecture for multi-agent simulation of the city to study global behavior emergence with realistic agents reacting to political decisions. With a multi-level modeling and a coupling of different dynamics, our tool learns environment specificities and suggests relevant policies. Our second contribution improves autonomy and adaptation of the decision function with multi-agent, multi-level reinforcement learning. A set of clustered agents is distributed over the studied area to learn local specificities without any prior knowledge on the environment. Trust score assignment and individual rewards help reduce non-stationary impact on experience replay in deep reinforcement learning.These contributions bring forth a complete system to co-construct urban policies in the Smart City. We compare our model with different approaches from the literature on a parking fee policy to display the benefits and limits of our contributions
APA, Harvard, Vancouver, ISO, and other styles
33

Луцишин, Роман Олегович, and Roman Olehovych Lutsyshyn. "Методи автоматизованого перекладу природної мови на основі нейромережевої моделі “послідовність-послідовність”." Master's thesis, Тернопільський національний технічний університет імені Івана Пулюя, 2020. http://elartu.tntu.edu.ua/handle/lib/33271.

Full text
Abstract:
Кваліфікаційну роботу магістра присвячено дослідженню та реалізації методів автоматизованого перекладу природної мови на основі нейромережевої моделі “послідовність-послідовність”. Розглянуто основні принципи та підходи до підготовки тренувальної вибірки даних, у тому числі з використанням глибоких нейронних мереж у якості енкодерів. Досліджено та проаналізовано наявні методи вирішення задачі перекладу природної мови, зокрема, було розглянуто декілька нейромережевих архітектур глибокого машинного навчання. Наведено приклади створення та обробки корпусів природної мови для вирішення задачі формування тренувальної та тестувальної вибірок даних. Було проведено повну оцінку вартості створення комп’ютерної системи, необхідної для вирішення поставленого завдання, а також описано повний процес розгортання програмного забезпечення на даному середовищі за допомогою сторонніх платформ.
The master's thesis is devoted to the research and implementation of methods of automated translation of natural language on the basis of the neural network model "sequence-sequence". The basic principles and approaches to the preparation of training data sampling, including the use of deep neural networks as encoders, are considered. The existing methods of solving the problem of natural language translation have been studied and analyzed, in particular, several neural network architectures of deep machine origin have been considered. Examples of creation and processing of natural language corpora to solve the problem of forming training and test data samples are given. A full assessment of the cost of creating a computer system required to solve the problem was performed, as well as a complete process of deploying software in this environment using third-party platforms. The results of the research were a complete review of existing solutions to solve the problem, choosing the best technology, improving the latter, implementation and training of a deep neural network model such as sequence-sequence" for the problem of natural language translation.
1. ВСТУП 2. АНАЛІЗ ПРЕДМЕТНОЇ ОБЛАСТІ 3. ОБҐРУНТУВАННЯ ОБРАНИХ ЗАСОБІВ 4. РЕАЛІЗАЦІЯ СИСТЕМИ ПЕРЕКЛАДУ ПРИРОДНОЇ МОВИ НА ОСНОВІ МОДЕЛІ "ПОСЛІДОВНІСТЬ-ПОСЛІДОВНІСТЬ" ТА НЕЙРОМЕРЕЖЕВОЇ АРХІТЕКТУРИ ТРАСНФОРМЕРС 5. ОХОРОНА ПРАЦІ ТА БЕЗПЕКА В НАДЗВИЧАЙНИХ СИТУАЦІЯХ
APA, Harvard, Vancouver, ISO, and other styles
34

Bahl, Gaétan. "Architectures deep learning pour l'analyse d'images satellite embarquée." Thesis, Université Côte d'Azur, 2022. https://tel.archives-ouvertes.fr/tel-03789667.

Full text
Abstract:
Les progrès des satellites d'observation de la Terre à haute résolution et la réduction des temps de revisite introduite par la création de constellations de satellites ont conduit à la création quotidienne de grandes quantités d'images (des centaines de Teraoctets par jour). Simultanément, la popularisation des techniques de Deep Learning a permis le développement d'architectures capables d'extraire le contenu sémantique des images. Bien que ces algorithmes nécessitent généralement l'utilisation de matériel puissant, des accélérateurs d'inférence IA de faible puissance ont récemment été développés et ont le potentiel d'être utilisés dans les prochaines générations de satellites, ouvrant ainsi la possibilité d'une analyse embarquée des images satellite. En extrayant les informations intéressantes des images satellite directement à bord, il est possible de réduire considérablement l'utilisation de la bande passante, du stockage et de la mémoire. Les applications actuelles et futures, telles que la réponse aux catastrophes, l'agriculture de précision et la surveillance du climat, bénéficieraient d'une latence de traitement plus faible, voire d'alertes en temps réel.Dans cette thèse, notre objectif est double : D'une part, nous concevons des architectures de Deep Learning efficaces, capables de fonctionner sur des périphériques de faible puissance, tels que des satellites ou des drones, tout en conservant une précision suffisante. D'autre part, nous concevons nos algorithmes en gardant à l'esprit l'importance d'avoir une sortie compacte qui peut être efficacement calculée, stockée, transmise au sol ou à d'autres satellites dans une constellation.Tout d'abord, en utilisant des convolutions séparables en profondeur et des réseaux neuronaux récurrents convolutionnels, nous concevons des réseaux neuronaux de segmentation sémantique efficaces avec un faible nombre de paramètres et une faible utilisation de la mémoire. Nous appliquons ces architectures à la segmentation des nuages et des forêts dans les images satellites. Nous concevons également une architecture spécifique pour la segmentation des nuages sur le FPGA d'OPS-SAT, un satellite lancé par l'ESA en 2019, et réalisons des expériences à bord à distance. Deuxièmement, nous développons une architecture de segmentation d'instance pour la régression de contours lisses basée sur une représentation à coefficients de Fourier, qui permet de stocker et de transmettre efficacement les formes des objets détectés. Nous évaluons la performance de notre méthode sur une variété de dispositifs informatiques à faible puissance. Enfin, nous proposons une architecture d'extraction de graphes routiers basée sur une combinaison de Fully Convolutional Networks et de Graph Neural Networks. Nous montrons que notre méthode est nettement plus rapide que les méthodes concurrentes, tout en conservant une bonne précision
The recent advances in high-resolution Earth observation satellites and the reduction in revisit times introduced by the creation of constellations of satellites has led to the daily creation of large amounts of image data hundreds of TeraBytes per day). Simultaneously, the popularization of Deep Learning techniques allowed the development of architectures capable of extracting semantic content from images. While these algorithms usually require the use of powerful hardware, low-power AI inference accelerators have recently been developed and have the potential to be used in the next generations of satellites, thus opening the possibility of onboard analysis of satellite imagery. By extracting the information of interest from satellite images directly onboard, a substantial reduction in bandwidth, storage and memory usage can be achieved. Current and future applications, such as disaster response, precision agriculture and climate monitoring, would benefit from a lower processing latency and even real-time alerts.In this thesis, our goal is two-fold: On the one hand, we design efficient Deep Learning architectures that are able to run on low-power edge devices, such as satellites or drones, while retaining a sufficient accuracy. On the other hand, we design our algorithms while keeping in mind the importance of having a compact output that can be efficiently computed, stored, transmitted to the ground or other satellites within a constellation.First, by using depth-wise separable convolutions and convolutional recurrent neural networks, we design efficient semantic segmentation neural networks with a low number of parameters and a low memory usage. We apply these architectures to cloud and forest segmentation in satellite images. We also specifically design an architecture for cloud segmentation on the FPGA of OPS-SAT, a satellite launched by ESA in 2019, and perform onboard experiments remotely. Second, we develop an instance segmentation architecture for the regression of smooth contours based on the Fourier coefficient representation, which allows detected object shapes to be stored and transmitted efficiently. We evaluate the performance of our method on a variety of low-power computing devices. Finally, we propose a road graph extraction architecture based on a combination of fully convolutional and graph neural networks. We show that our method is significantly faster than competing methods, while retaining a good accuracy
APA, Harvard, Vancouver, ISO, and other styles
35

Chen, Hua. "FPGA Based Multi-core Architectures for Deep Learning Networks." University of Dayton / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1449417091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Edström, Jacob, and Pontus Mjöberg. "The Optimal Hardware Architecture for High Precision 3D Localization on the Edge. : A Study of Robot Guidance for Automated Bolt Tightening." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263104.

Full text
Abstract:
The industry is moving towards a higher degree of automation and connectivity, where previously manual operations are being adapted for interconnected industrial robots. This thesis focuses specifically on the automation of tightening applications with pre-tightened bolts and collaborative robots. The use of 3D computer vision is investigated for direct localization of bolts, to allow for flexible assembly solutions. A localization algorithm based on 3D data is developed with the intention to create a lightweight software to be run on edge devices. A restrictive use of deep learning classification is therefore included, to enable product flexibility while minimizing the computational load. The cloud-to-edge and cluster-to-edge trade-offs for the chosen application are investigated to identify smart offloading possibilities to cloud or cluster resources. To reduce operational delay, image partitioning to sub-image processing is also evaluated, to more quickly start the operation with a first coordinate and to enable processing in parallel with robot movement. Four different hardware architectures are tested, consisting of two different Single Board Computers (SBC), a cluster of SBCs and a high-end computer as an emulated local cloud solution. All systems but the cluster is seen to perform without operational delay for the application. The optimal hardware architecture is therefore found to be a consumer grade SBC, being optimized on energy efficiency, cost and size. If only the variance in communication time can be minimized, the cluster shows potential to reduce the total calculation time without causing an operational delay. Smart offloading to deep learning optimized cloud resources or a cluster of interconnected robot stations is found to enable increasing complexity and robustness of the algorithm. The SBC is also found to be able to switch between an edge and a cluster setup, to either optimize on the time to start the operation or the total calculation time. This offers a high flexibility in industrial settings, where product changes can be handled without the need for a change in visual processing hardware, further enabling its integration in factory devices.
Industrin rör sig mot en högre grad av automatisering och uppkoppling, där tidigare manuella operationer anpassas för sammankopplade industriella robotar. Denna masteruppsats fokuserar specifikt på automatiseringen av åtdragningsapplikationer med förmonterade bultar och kollaborativa robotar. Användningen av 3D-datorseende undersöks för direkt lokalisering av bultar, för att möjliggöra flexibla monteringslösningar. En lokaliseringsalgoritm baserad på 3Ddata utvecklas med intentionen att skapa en lätt mjukvara för att köras på Edge-enheter. En restriktiv användning av djupinlärningsklassificering är därmed inkluderad, för att möjliggöra produktflexibilitet tillsammans med en minimering av den behövda beräkningskraften. Avvägningarna mellan edge- och moln- eller klusterberäkning för den valda applikationen undersöks för att identifiera smarta avlastningsmöjligheter till moln- eller klusterresurser. För att minska operationell fördröjning utvärderas även bildpartitionering, för att snabbare kunna starta operationen med en första koordinat och möjliggöra beräkningar parallellt med robotrörelser. Fyra olika hårdvaruarkitekturer testas, bestående av två olika enkortsdatorer, ett kluster av enkortsdatorer och en marknadsledande dator som en efterliknad lokal molnlösning. Alla system utom klustret visar sig prestera utan operationell fördröjning för applikationen. Den optimala hårdvaruarkitekturen visar sig därmed vara en konsumentklassad enkortsdator, optimerad på energieffektivitet, kostnad och storlek. Om endast variansen i kommunikationstid kan minskas visar klustret potential för att kunna reducera den totala beräkningstiden utan att skapa operationell fördröjning. Smart avlastning till djupinlärningsoptimerade molnresurser eller kluster av sammankopplade robotstationer visar sig möjliggöra ökad komplexitet och tillförlitlighet av algoritmen. Enkortsdatorn visar sig även kunna växla mellan en edge- och en klusterkonfiguration, för att antingen optimera för tiden att starta operationen eller för den totala beräkningstiden. Detta medför en hög flexibilitet i industriella sammanhang, där produktändringar kan hanteras utan behovet av hårdvaruförändringar för visuella beräkningar, vilket ytterligare möjliggör dess integrering i fabriksenheter.
APA, Harvard, Vancouver, ISO, and other styles
37

Palasek, Petar. "Action recognition using deep learning." Thesis, Queen Mary, University of London, 2017. http://qmro.qmul.ac.uk/xmlui/handle/123456789/30828.

Full text
Abstract:
In this thesis we study deep learning architectures for the problem of human action recognition in image sequences, i.e. the problem of automatically recognizing what people are doing in a given video. As unlabeled video data is easily accessible these days, we first explore models that can learn meaningful representations of sequences without actually having to know what is happening in the sequences at hand. More specifically, we first explore the convolutional restricted Boltzmann machine (RBM) and show how a stack of convolutional RBMs can be used to learn and extract features from sequences in an unsupervised way. Using the classical Fisher vector pipeline to encode the extracted features we apply them on the task of action classification. We move on to feature extraction using larger, deep convolutional neural networks and propose a novel architecture which expresses the processing steps of the classical Fisher vector pipeline as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, defining them as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. We show that our method achieves significant improvements in comparison to the classical Fisher vector extraction chain and results in a comparable performance to other convolutional networks, while largely reducing the number of required trainable parameters. Finally, we explore how the proposed architecture can be modified into a hybrid network that combines the benefits of both unsupervised and supervised training methods, resulting in a model that learns a semi-supervised Fisher vector descriptor of the input data. We evaluate the proposed model at image classification and action recognition problems and show how the model's classification performance improves as the amount of unlabeled data increases during training.
APA, Harvard, Vancouver, ISO, and other styles
38

Thangthai, Ausdang. "Visual speech synthesis using dynamic visemes and deep learning architectures." Thesis, University of East Anglia, 2018. https://ueaeprints.uea.ac.uk/69371/.

Full text
Abstract:
The aim of this work is to improve the naturalness of visual speech synthesis produced automatically from a linguistic input over existing methods. Firstly, the most important contribution is on the investigation of the most suitable speech units for the visual speech synthesis. We propose the use of dynamic visemes instead of phonemes or static visemes and found that dynamic visemes can generate better visual speech than either phone or static viseme units. Moreover, best performance is obtained by a combined phoneme-dynamic viseme system. Secondly, we examine the most appropriate model between hidden Markov model (HMM) and different deep learning models that include feedforward and recurrent structures consisting of one-to-one, many-to-one and many-to-many architectures. Results suggested that that frame-by-frame synthesis from deep learning approach outperforms state-based synthesis from HMM approaches and an encoder-decoder many-to-many architecture is better than the one-to-one and many-to-one architectures. Thirdly, we explore the importance of contextual features that include information at varying linguistic levels, from frame level up to the utterance level. Our findings found that frame level information is the most valuable feature, as it is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic animation output. Fourthly, we found that the two most common objective measures of correlation and root mean square error are not able to indicate realism and naturalness of human perceived quality. We introduce an alternative objective measure and show that the global variance is a better indicator of human perception of quality. Finally, we propose a novel method to convert a given text input and phoneme transcription into a dynamic viseme transcription in the case when a reference dynamic viseme sequence is not available. Subjective preference tests confirmed that our proposed method is able to produce animation, that are statistically indistinguishable from animation produced using reference data.
APA, Harvard, Vancouver, ISO, and other styles
39

Vukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data." Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.

Full text
Abstract:
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des contenus textuels ou visuels, ou la combinaison des deux. De manière générale, le travail tire parti de la capacité des réseaux de neurones à apprendre des représentations abstraites. Les principales contributions de la thèse sont les suivantes: 1) Réseaux récurrents pour la compréhension de la parole: différentes architectures de réseaux sont comparées pour cette tâche sur leurs facultés à modéliser les observations ainsi que les dépendances sur les étiquettes à prédire. 2) Prédiction d’image et de mouvement : nous proposons une architecture permettant d'apprendre une représentation d'une image représentant une action humaine afin de prédire l'évolution du mouvement dans une vidéo ; l'originalité du modèle proposé réside dans sa capacité à prédire des images à une distance arbitraire dans une vidéo. 3) Encodeurs bidirectionnels multimodaux : le résultat majeur de la thèse concerne la proposition d'un réseau bidirectionnel permettant de traduire une modalité en une autre, offrant ainsi la possibilité de représenter conjointement plusieurs modalités. L'approche été étudiée principalement en structuration de collections de vidéos, dons le cadre d'évaluations internationales où l'approche proposée s'est imposée comme l'état de l'art. 4) Réseaux adverses pour la fusion multimodale: la thèse propose d'utiliser les architectures génératives adverses pour apprendre des représentations multimodales en offrant la possibilité de visualiser les représentations dans l'espace des images
In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
APA, Harvard, Vancouver, ISO, and other styles
40

Riera, Villanueva Marc. "Low-power accelerators for cognitive computing." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.

Full text
Abstract:
Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications, and are especially efficient in classification and decision making problems such as speech recognition or machine translation. Mobile and embedded devices increasingly rely on DNNs to understand the world. Smartphones, smartwatches and cars perform discriminative tasks, such as face or object recognition, on a daily basis. Despite the increasing popularity of DNNs, running them on mobile and embedded systems comes with several main challenges: delivering high accuracy and performance with a small memory and energy budget. Modern DNN models consist of billions of parameters requiring huge computational and memory resources and, hence, they cannot be directly deployed on low-power systems with limited resources. The objective of this thesis is to address these issues and propose novel solutions in order to design highly efficient custom accelerators for DNN-based cognitive computing systems. In first place, we focus on optimizing the inference of DNNs for sequence processing applications. We perform an analysis of the input similarity between consecutive DNN executions. Then, based on the high degree of input similarity, we propose DISC, a hardware accelerator implementing a Differential Input Similarity Computation technique to reuse the computations of the previous execution, instead of computing the entire DNN. We observe that, on average, more than 60% of the inputs of any neural network layer tested exhibit negligible changes with respect to the previous execution. Avoiding the memory accesses and computations for these inputs results in 63% energy savings on average. In second place, we propose to further optimize the inference of FC-based DNNs. We first analyze the number of unique weights per input neuron of several DNNs. Exploiting common optimizations, such as linear quantization, we observe a very small number of unique weights per input for several FC layers of modern DNNs. Then, to improve the energy-efficiency of FC computation, we present CREW, a hardware accelerator that implements a Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage. We evaluate CREW on a diverse set of modern DNNs. On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator. In third place, we propose a mechanism to optimize the inference of RNNs. RNN cells perform element-wise multiplications across the activations of different gates, sigmoid and tanh being the common activation functions. We perform an analysis of the activation function values, and show that a significant fraction are saturated towards zero or one in popular RNNs. Then, we propose CGPA to dynamically prune activations from RNNs at a coarse granularity. CGPA avoids the evaluation of entire neurons whenever the outputs of peer neurons are saturated. CGPA significantly reduces the amount of computations and memory accesses while avoiding sparsity by a large extent, and can be easily implemented on top of conventional accelerators such as TPU with negligible area overhead, resulting in 12% speedup and 12% energy savings on average for a set of widely used RNNs. Finally, in the last contribution of this thesis we focus on static DNN pruning methodologies. DNN pruning reduces memory footprint and computational work by removing connections and/or neurons that are ineffectual. However, we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning parameters. Then, we propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron's connection that automatically finds the optimized DNN in one shot.
Les xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres
APA, Harvard, Vancouver, ISO, and other styles
41

Dhamija, Tanush. "Deep Learning Architectures for time of arrival detection in Acoustic Emissions Monitoring." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24620/.

Full text
Abstract:
Acoustic Emission (AE) monitoring can be used to detect the presence of damage as well as determine its location in Structural Health Monitoring (SHM) applications. Information on the time difference of the signal generated by the damage event arriving at different sensors is essential in performing localization. This makes the time of arrival (ToA) an important piece of information to retrieve from the AE signal. Generally, this is determined using statistical methods such as the Akaike Information Criterion (AIC) which is particularly prone to errors in the presence of noise. And given that the structures of interest are surrounded with harsh environments, a way to accurately estimate the arrival time in such noisy scenarios is of particular interest. In this work, two new methods are presented to estimate the arrival times of AE signals which are based on Machine Learning. Inspired by great results in the field, two models are presented which are Deep Learning models - a subset of machine learning. They are based on Convolutional Neural Network (CNN) and Capsule Neural Network (CapsNet). The primary advantage of such models is that they do not require the user to pre-define selected features but only require raw data to be given and the models establish non-linear relationships between the inputs and outputs. The performance of the models is evaluated using AE signals generated by a custom ray-tracing algorithm by propagating them on an aluminium plate and compared to AIC. It was found that the relative error in estimation on the test set was < 5% for the models compared to around 45% of AIC. The testing process was further continued by preparing an experimental setup and acquiring real AE signals to test on. Similar performances were observed where the two models not only outperform AIC by more than a magnitude in their average errors but also they were shown to be a lot more robust as compared to AIC which fails in the presence of noise.
APA, Harvard, Vancouver, ISO, and other styles
42

Donnot, Benjamin. "Deep learning methods for predicting flows in power grids : novel architectures and algorithms." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS060/document.

Full text
Abstract:
Cette thèse porte sur les problèmes de sécurité sur le réseau électrique français exploité par RTE, le Gestionnaire de Réseau de Transport (GRT). Les progrès en matière d'énergie durable, d'efficacité du marché de l'électricité ou de nouveaux modes de consommation poussent les GRT à exploiter le réseau plus près de ses limites de sécurité. Pour ce faire, il est essentiel de rendre le réseau plus "intelligent". Pour s'attaquer à ce problème, ce travail explore les avantages des réseaux neuronaux artificiels. Nous proposons de nouveaux algorithmes et architectures d'apprentissage profond pour aider les opérateurs humains (dispatcheurs) à prendre des décisions que nous appelons " guided dropout ". Ceci permet de prévoir les flux électriques consécutifs à une modification volontaire ou accidentelle du réseau. Pour se faire, les données continues (productions et consommations) sont introduites de manière standard, via une couche d'entrée au réseau neuronal, tandis que les données discrètes (topologies du réseau électrique) sont encodées directement dans l'architecture réseau neuronal. L’architecture est modifiée dynamiquement en fonction de la topologie du réseau électrique en activant ou désactivant des unités cachées. Le principal avantage de cette technique réside dans sa capacité à prédire les flux même pour des topologies de réseau inédites. Le "guided dropout" atteint une précision élevée (jusqu'à 99% de précision pour les prévisions de débit) tout en allant 300 fois plus vite que des simulateurs de grille physiques basés sur les lois de Kirchoff, même pour des topologies jamais vues, sans connaissance détaillée de la structure de la grille. Nous avons également montré que le "guided dropout" peut être utilisé pour classer par ordre de gravité des évènements pouvant survenir. Dans cette application, nous avons démontré que notre algorithme permet d'obtenir le même risque que les politiques actuellement mises en œuvre tout en n'exigeant que 2 % du budget informatique. Le classement reste pertinent, même pour des cas de réseau jamais vus auparavant, et peut être utilisé pour avoir une estimation globale de la sécurité globale du réseau électrique
This thesis addresses problems of security in the French grid operated by RTE, the French ``Transmission System Operator'' (TSO). Progress in sustainable energy, electricity market efficiency, or novel consumption patterns push TSO's to operate the grid closer to its security limits. To this end, it is essential to make the grid ``smarter''. To tackle this issue, this work explores the benefits of artificial neural networks. We propose novel deep learning algorithms and architectures to assist the decisions of human operators (TSO dispatchers) that we called “guided dropout”. This allows the predictions on power flows following of a grid willful or accidental modification. This is tackled by separating the different inputs: continuous data (productions and consumptions) are introduced in a standard way, via a neural network input layer while discrete data (grid topologies) are encoded directly in the neural network architecture. This architecture is dynamically modified based on the power grid topology by switching on or off the activation of hidden units. The main advantage of this technique lies in its ability to predict the flows even for previously unseen grid topologies. The "guided dropout" achieves a high accuracy (up to 99% of precision for flow predictions) with a 300 times speedup compared to physical grid simulators based on Kirchoff's laws even for unseen contingencies, without detailed knowledge of the grid structure. We also showed that guided dropout can be used to rank contingencies that might occur in the order of severity. In this application, we demonstrated that our algorithm obtains the same risk as currently implemented policies while requiring only 2% of today's computational budget. The ranking remains relevant even handling grid cases never seen before, and can be used to have an overall estimation of the global security of the power grid
APA, Harvard, Vancouver, ISO, and other styles
43

Sovrano, Francesco. "Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16718/.

Full text
Abstract:
Reinforcement Learning (RL) is based on the Markov Decision Process (MDP) framework, but not all the problems of interest can be modeled with MDPs because some of them have non-markovian temporal dependencies. To handle them, one of the solutions proposed in literature is Hierarchical Reinforcement Learning (HRL). HRL takes inspiration from hierarchical planning in artificial intelligence literature and it is an emerging sub-discipline for RL, in which RL methods are augmented with some kind of prior knowledge about the high-level structure of behavior in order to decompose the underlying problem into simpler sub-problems. The high-level goal of our thesis is to investigate the advantages that a HRL approach may have over a simple RL approach. Thus, we study problems of interest (rarely tackled by mean of RL) like Sentiment Analysis, Rogue and Car Controller, showing how the ability of RL algorithms to solve them in a partially observable environment is affected by using (or not) generic hierarchical architectures based on RL algorithms of the Actor-Critic family. Remarkably, we claim that especially our work in Sentiment Analysis is very innovative for RL, resulting in state-of-the-art performances; as far as the author knows, Reinforcement Learning approach is only rarely applied to the domain of computational linguistic and sentiment analysis. Furthermore, our work on the famous video-game Rogue is probably the first example of Deep RL architecture able to explore Rogue dungeons and fight against its monsters achieving a success rate of more than 75% on the first game level. While our work on Car Controller allowed us to make some interesting considerations on the nature of some components of the policy gradient equation.
APA, Harvard, Vancouver, ISO, and other styles
44

Baldassarre, Federico. "Morphing architectures for pose-based image generation of people in clothing." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233361.

Full text
Abstract:
This project investigates the task of conditional image generation from misaligned sources, with an example application in the context of content creation for the fashion industry. The problem of spatial misalignment between images is identified, the related literature is discussed, and different approaches are introduced to address it. In particular, several non-linear differentiable morphing modules are designed and integrated in current architectures for image-to-image translation. The proposed method for conditional image generation is applied on a clothes swapping task, using a real-world dataset of fashion images provided by Zalando. In comparison to previous methods for clothes swapping and virtual try-on, the result achieved with our method are of high visual quality and achieve precise reconstruction of the details of the garments.
Detta projekt undersöker villkorad bildgenerering från förskjutna bild-källor, med ett tillämpat exempel inom innehållsskapande för modebranschen. Problemet med rumslig förskjutning mellan bilder identifieras varpå relaterad litteratur diskuteras. Därefter introduceras olika tillvägagångssätt för att lösa problemet. Projektet fokuserar i synnerhet på ickelinjära, differentierbara morphing-moduler vilka designas och integreras i befintlig arkitektur för bild-till-bild-översättning. Den föreslagna metoden för villkorlig bildgenerering tillämpas på en uppgift för klädbyte, med hjälp av ett verklighetsbaserat dataset av modebilder från Zalando. I jämförelse med tidigare modeller för klädbyte och virtuell provning har resultaten från vår metod hög visuell kvalité och uppnår exakt återuppbyggnad av klädernas detaljer.
APA, Harvard, Vancouver, ISO, and other styles
45

Silfa, Franyell. "Energy-efficient architectures for recurrent neural networks." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671448.

Full text
Abstract:
Deep Learning algorithms have been remarkably successful in applications such as Automatic Speech Recognition and Machine Translation. Thus, these kinds of applications are ubiquitous in our lives and are found in a plethora of devices. These algorithms are composed of Deep Neural Networks (DNNs), such as Convolutional Neural Networks and Recurrent Neural Networks (RNNs), which have a large number of parameters and require a large amount of computations. Hence, the evaluation of DNNs is challenging due to their large memory and power requirements. RNNs are employed to solve sequence to sequence problems such as Machine Translation. They contain data dependencies among the executions of time-steps hence the amount of parallelism is severely limited. Thus, evaluating them in an energy-efficient manner is more challenging than evaluating other DNN algorithms. This thesis studies applications using RNNs to improve their energy efficiency on specialized architectures. Specifically, we propose novel energy-saving techniques and highly efficient architectures tailored to the evaluation of RNNs. We focus on the most successful RNN topologies which are the Long Short Term memory and the Gated Recurrent Unit. First, we characterize a set of RNNs running on a modern SoC. We identify that accessing the memory to fetch the model weights is the main source of energy consumption. Thus, we propose E-PUR: an energy-efficient processing unit for RNN inference. E-PUR achieves 6.8x speedup and improves energy consumption by 88x compared to the SoC. These benefits are obtained by improving the temporal locality of the model weights. In E-PUR, fetching the parameters is the main source of energy consumption. Thus, we strive to reduce memory accesses and propose a scheme to reuse previous computations. Our observation is that when evaluating the input sequences of an RNN model, the output of a given neuron tends to change lightly between consecutive evaluations.Thus, we develop a scheme that caches the neurons' outputs and reuses them whenever it detects that the change between the current and previously computed output value for a given neuron is small avoiding to fetch the weights. In order to decide when to reuse a previous value we employ a Binary Neural Network (BNN) as a predictor of reusability. The low-cost BNN can be employed in this context since its output is highly correlated to the output of RNNs. We show that our proposal avoids more than 24.2% of computations. Hence, on average, energy consumption is reduced by 18.5% for a speedup of 1.35x. RNN models’ memory footprint is usually reduced by using low precision for evaluation and storage. In this case, the minimum precision used is identified offline and it is set such that the model maintains its accuracy. This method utilizes the same precision to compute all time-steps.Yet, we observe that some time-steps can be evaluated with a lower precision while preserving the accuracy. Thus, we propose a technique that dynamically selects the precision used to compute each time-step. A challenge of our proposal is choosing a lower bit-width. We address this issue by recognizing that information from a previous evaluation can be employed to determine the precision required in the current time-step. Our scheme evaluates 57% of the computations on a bit-width lower than the fixed precision employed by static methods. We implement it on E-PUR and it provides 1.46x speedup and 19.2% energy savings on average.
Los algoritmos de aprendizaje profundo han tenido un éxito notable en aplicaciones como el reconocimiento automático de voz y la traducción automática. Por ende, estas aplicaciones son omnipresentes en nuestras vidas y se encuentran en una gran cantidad de dispositivos. Estos algoritmos se componen de Redes Neuronales Profundas (DNN), tales como las Redes Neuronales Convolucionales y Redes Neuronales Recurrentes (RNN), las cuales tienen un gran número de parámetros y cálculos. Por esto implementar DNNs en dispositivos móviles y servidores es un reto debido a los requisitos de memoria y energía. Las RNN se usan para resolver problemas de secuencia a secuencia tales como traducción automática. Estas contienen dependencias de datos entre las ejecuciones de cada time-step, por ello la cantidad de paralelismo es limitado. Por eso la evaluación de RNNs de forma energéticamente eficiente es un reto. En esta tesis se estudian RNNs para mejorar su eficiencia energética en arquitecturas especializadas. Para esto, proponemos técnicas de ahorro energético y arquitecturas de alta eficiencia adaptadas a la evaluación de RNN. Primero, caracterizamos un conjunto de RNN ejecutándose en un SoC. Luego identificamos que acceder a la memoria para leer los pesos es la mayor fuente de consumo energético el cual llega hasta un 80%. Por ende, creamos E-PUR: una unidad de procesamiento para RNN. E-PUR logra una aceleración de 6.8x y mejora el consumo energético en 88x en comparación con el SoC. Esas mejoras se deben a la maximización de la ubicación temporal de los pesos. En E-PUR, la lectura de los pesos representa el mayor consumo energético. Por ende, nos enfocamos en reducir los accesos a la memoria y creamos un esquema que reutiliza resultados calculados previamente. La observación es que al evaluar las secuencias de entrada de un RNN, la salida de una neurona dada tiende a cambiar ligeramente entre evaluaciones consecutivas, por lo que ideamos un esquema que almacena en caché las salidas de las neuronas y las reutiliza cada vez que detecta un cambio pequeño entre el valor de salida actual y el valor previo, lo que evita leer los pesos. Para decidir cuándo usar un cálculo anterior utilizamos una Red Neuronal Binaria (BNN) como predictor de reutilización, dado que su salida está altamente correlacionada con la salida de la RNN. Esta propuesta evita más del 24.2% de los cálculos y reduce el consumo energético promedio en 18.5%. El tamaño de la memoria de los modelos RNN suele reducirse utilizando baja precisión para la evaluación y el almacenamiento de los pesos. En este caso, la precisión mínima utilizada se identifica de forma estática y se establece de manera que la RNN mantenga su exactitud. Normalmente, este método utiliza la misma precisión para todo los cálculos. Sin embargo, observamos que algunos cálculos se pueden evaluar con una precisión menor sin afectar la exactitud. Por eso, ideamos una técnica que selecciona dinámicamente la precisión utilizada para calcular cada time-step. Un reto de esta propuesta es como elegir una precisión menor. Abordamos este problema reconociendo que el resultado de una evaluación previa se puede emplear para determinar la precisión requerida en el time-step actual. Nuestro esquema evalúa el 57% de los cálculos con una precisión menor que la precisión fija empleada por los métodos estáticos. Por último, la evaluación en E-PUR muestra una aceleración de 1.46x con un ahorro de energía promedio de 19.2%
APA, Harvard, Vancouver, ISO, and other styles
46

Singh, Jaswinder. "RNA Structure Prediction using Deep Neural Network Architectures and Improved Evolutionary Profiles." Thesis, Griffith University, 2022. http://hdl.handle.net/10072/414924.

Full text
Abstract:
RNAs are important biological macro-molecules that play critical roles in many biological processes. The functionality of RNA depends on its three-dimensional (3D) structure, which further depends on its primary structure, i.e. the order of sequence of nucleotides in the RNA chain. Direct prediction of the 3D structure of an RNA from its sequence is a challenging task. Therefore, the 3D structure is further divided into two-dimensional (2D) properties such as secondary structure, contact maps and one-dimensional (1D) properties such as torsion angles and solvent accessibility. An accurate prediction of these 1D and 2D structural properties will increase the accuracy in predicting the 3D structure of the RNA. This thesis explores various deep learning algorithms and input features relevant to predicting the 1D and 2D structural properties of an RNA. Using these predicted 1D and 2D structural properties further as restraints, we have demonstrated an improvement in the prediction of the RNA 3D structure. There are four primary studies performed in this thesis for RNA structural properties prediction. The first study introduces two methods (SPOT-RNA and SPOT-RNA2) for RNA secondary structure prediction using an ensemble of Residual Con-volution and Bi-directional LSTM recurrent neural networks. This study shows that deep learning based methods can outperform existing dynamic programming based algorithms and achieve state-of-the-art performance using single-sequence and evolutionary information as input. The second study investigates the application of deep neural networks for predicting RNA backbone torsion and pseudotorsion angles. We have pioneered in predicting the backbone torsion and pseudotorsion angles using deep learning (SPOT-RNA-1D). The angles predicted using SPOT-RNA-1D could be used as 3D model quality indicators. The third study introduces a method (SPOT-RNA-2D) to predict RNA distance-based contact maps using an ensemble of deep neural networks and improved evolutionary profles from RNAcmap. This study shows that the use of predicted distance-based contact maps as restraints can signifcantly improve the performance of 3D structure prediction. The fourth study developed a fully automated pipeline (RNAcmap2) to generate aligned homologs. Here, we showed that using a combination of BLAST-N and iterative INFERNAL searches along with an expanded sequence database leads to multiple sequence alignments (MSA) comparable to those provided by Rfam MSAs according to secondary structure extracted from mutational coupling analysis and alignment accuracy when compared to structural alignment. This fully automatic tool (RNAcmap2) allows to search homolog, multiple sequence alignment, and mutational coupling analysis for any non-Rfam RNA sequences with Rfam-like performance. The improved RNA 1D and 2D structural properties predictions using deep learning along with improved homolog search collectively is expected to be useful in predicting RNA three-dimensional structure and better un-derstand its biological function.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
47

Xing, Luo Oscar. "Deep Learning for Speech Enhancement : A Study on WaveNet, GANs and General CNN-RNN Architectures." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260351.

Full text
Abstract:
Clarity and intelligiblity are important aspects of speech, especially in a time of misinformation and mistrust. The breakthrough in generative models for audio files has brought massive improvements for speech enhancement. Google’s WaveNet architecture has been modified for noise reduction in a model called WaveNet denoising and has proven to be state-of-the-art. Another competitor on the market would be the Speech Enhancement Generative Adversarial Network (SEGAN) which adapts the GAN architecture into applications on speech. While most older models focus on feature extraction and spectrogram analysis, these two models attempt to skip those steps and become end-to-end models completely. While end-to-end is good, data preprocessing is still a valuable asset to consider. A network designed by Microsoft Research called EHNet uses the spectrogram data as input instead of the mere 1D waveforms to capture more relations between datapoints as a higher dimension can enable more information. This thesis aims to explore the speech enhancement field of study from a deep learning perspective and focus on the three mentioned architectures in theory dissection and results from new datasets. There is also an implementation of the Wiener filter as a benchmark. We arrive at the conclusion that all three networks are viable in the task of enhancing speech, however SEGAN performed better on our dataset and was more robust to new data in comparison. For future work one could improve the evaluation methods, change datasets and implement hyperparameter optimization for further comparative analysis.
Klarhet och förståelse är viktiga aspekter av tal, särskilt i en tid då falsk information och misstrogenhet är vanligt. Genombrottet för generativa modeller inom ljud har medfört stora förbättringar inom talsignalförbättring. Googles WaveNet-arkitektur har modifierats för brusreducering i en modell som kallas för WaveNet-denoising vilket har visat goda resultat. En annan konkurrent på marknaden är den generella adversariella nätverket för talsignalförbättring (SEGAN) som anpassar GAN-arkitekturen till tillämpningar på tal. Medan de flesta äldre modeller fokuserar på särdragsextraktion och spektrogramanalys, så försöker de två nya modellerna med att ignorera dessa koncept och vara end-to-end istället. Medan end-to-end är bra är databehandling fortfarande en viktig aspekt som är värdefull att överväga. Ett nätverk som designats av Microsoft Research heter EHNet och använder spektrogramdata som input istället för enbart 1D-vågformer för att fånga upp fler relationer mellan datapunkter, då högre dimensioner möjliggör mer information. Detta examensarbete syftar till att utforska studieområdet inom talsignalförbättring samt utreda de tre nämnda arkitekturerna genom teoretisk undersökning och resultat på nya dataset. Det kommer också vara en implementering av Wienerfilter som riktmärke för resultaten. Vi kommer fram till slutsatsen att alla tre nätverk är möjliga alternativ inom talsignalförbättring men SEGAN är den bästa modellen när det kommer till resultat på vårt specifika dataset och med avseende på robusthet. För framtida arbeten kan man förbättra utvärderingsmetoderna, ändra datasetet och implementera hyperparameteroptimeringför ytterligare jämförande analyser.
APA, Harvard, Vancouver, ISO, and other styles
48

Ali, Abdou-Djalilou. "Prediction of tomato seed germination from images with deep learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24666/.

Full text
Abstract:
Assessment of seed germination is an essential task for seed researchers to measure the quality and performance of seeds not only to achieve high productivity but also for economic growth. In fact, knowing in advance the germination rate of the seeds can give farmers a better idea of how much their fields will produce. The seeds assessment can be done before or after the experiment. In the after-experiment assessment, trained analysts evaluate the seed germination by counting the seeds which present radicles or leaves emanating from them. However, the counting process done by analysts is cumbersome, error-prone, and time-consuming. Hence, machine learning-based methods have been proposed for the situation in which the assessment of seeds is done after experiment, to determine whether a seed germinated or not. Assessment of the seeds done before or after the experiment via model-based approach present many advantages: it is fast, more repeatable, and more accurate. In this thesis, we will consider the situation where the assessment of seeds is performed instead before the start of the experiment. That is, the proposed model will try to predict the seeds that are going to germinate and those that are not going to germinate before they will be placed in a chamber under proper growing condition for seven days. Prediction before the experiment holds the potential to further reduce the time required to select the seeds that are going to germinate and to let only valid seeds proceed to use the germination equipment. Therefore, in this thesis, we study the performance of a model-based approach that uses modern convolutional neural networks to predict the germination of tomato seeds, that is, whether a seed will germinate or not after it will have spent some period in the controlled environment for growing purposes.
APA, Harvard, Vancouver, ISO, and other styles
49

Policarpi, Andrea. "Transformers architectures for time series forecasting." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25005/.

Full text
Abstract:
Time series forecasting is an important task related to countless applications, spacing from anomaly detection to healthcare problems. The ability to predict future values of a given time series is a non­trivial operation, whose complexity heavily depends on the number and the quality of data available. Historically, the problem has been addressed by statistical models and simple deep learning architectures such as CNNs and RNNs; recently many Transformer-based models have also been used, with excellent results. This thesis work aims to evaluate the performances of two transformer-based models, namely a TransformerT2V and an Informer, when applied to time series forecasting problems, and compare them with non-transformer architectures. Furthermore, a second contribution resides in the exploration of the Informer's Probsparse mechanism, and the suggestion of improvements to increase the model performances.
APA, Harvard, Vancouver, ISO, and other styles
50

Buttar, Sarpreet Singh. "Applying Artificial Neural Networks to Reduce the Adaptation Space in Self-Adaptive Systems : an exploratory work." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-87117.

Full text
Abstract:
Self-adaptive systems have limited time to adjust their configurations whenever their adaptation goals, i.e., quality requirements, are violated due to some runtime uncertainties. Within the available time, they need to analyze their adaptation space, i.e., a set of configurations, to find the best adaptation option, i.e., configuration, that can achieve their adaptation goals. Existing formal analysis approaches find the best adaptation option by analyzing the entire adaptation space. However, exhaustive analysis requires time and resources and is therefore only efficient when the adaptation space is small. The size of the adaptation space is often in hundreds or thousands, which makes formal analysis approaches inefficient in large-scale self-adaptive systems. In this thesis, we tackle this problem by presenting an online learning approach that enables formal analysis approaches to analyze large adaptation spaces efficiently. The approach integrates with the standard feedback loop and reduces the adaptation space to a subset of adaptation options that are relevant to the current runtime uncertainties. The subset is then analyzed by the formal analysis approaches, which allows them to complete the analysis faster and efficiently within the available time. We evaluate our approach on two different instances of an Internet of Things application. The evaluation shows that our approach dramatically reduces the adaptation space and analysis time without compromising the adaptation goals.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography