Tesis sobre el tema "Deep learning architecture"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Deep learning architecture".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Glatt, Ruben [UNESP]. "Deep learning architecture for gesture recognition". Universidade Estadual Paulista (UNESP), 2014. http://hdl.handle.net/11449/115718.
Texto completoO reconhecimento de atividade de visão de computador desempenha um papel importante na investigação para aplicações como interfaces humanas de computador, ambientes inteligentes, vigilância ou sistemas médicos. Neste trabalho, é proposto um sistema de reconhecimento de gestos com base em uma arquitetura de aprendizagem profunda. Ele é usado para analisar o desempenho quando treinado com os dados de entrada multi-modais em um conjunto de dados de linguagem de sinais italiana. A área de pesquisa subjacente é um campo chamado interação homem-máquina. Ele combina a pesquisa sobre interfaces naturais, reconhecimento de gestos e de atividade, aprendizagem de máquina e tecnologias de sensores que são usados para capturar a entrada do meio ambiente para processamento posterior. Essas áreas são introduzidas e os conceitos básicos são descritos. O ambiente de desenvolvimento para o pré-processamento de dados e algoritmos de aprendizagem de máquina programada em Python é descrito e as principais bibliotecas são discutidas. A coleta dos fluxos de dados é explicada e é descrito o conjunto de dados utilizado. A arquitetura proposta de aprendizagem consiste em dois passos. O pré-processamento dos dados de entrada e a arquitetura de aprendizagem. O pré-processamento é limitado a três estratégias diferentes, que são combinadas para oferecer seis diferentes perfis de préprocessamento. No segundo passo, um Deep Belief Network é introduzido e os seus componentes são explicados. Com esta definição, 294 experimentos são realizados com diferentes configurações. As variáveis que são alteradas são as definições de pré-processamento, a estrutura de camadas do modelo, a taxa de aprendizagem de pré-treino e a taxa de aprendizagem de afinação. A avaliação dessas experiências mostra que a abordagem de utilização de uma arquitetura ... (Resumo completo, clicar acesso eletrônico abaixo)
Activity recognition from computer vision plays an important role in research towards applications like human computer interfaces, intelligent environments, surveillance or medical systems. In this work, a gesture recognition system based on a deep learning architecture is proposed. It is used to analyze the performance when trained with multi-modal input data on an Italian sign language dataset. The underlying research area is a field called human-machine interaction. It combines research on natural user interfaces, gesture and activity recognition, machine learning and sensor technologies, which are used to capture the environmental input for further processing. Those areas are introduced and the basic concepts are described. The development environment for preprocessing data and programming machine learning algorithms with Python is described and the main libraries are discussed. The gathering of the multi-modal data streams is explained and the used dataset is outlined. The proposed learning architecture consists of two steps. The preprocessing of the input data and the actual learning architecture. The preprocessing is limited to three different strategies, which are combined to offer six different preprocessing profiles. In the second step, a Deep Belief network is introduced and its components are explained. With this setup, 294 experiments are conducted with varying configuration settings. The variables that are altered are the preprocessing settings, the layer structure of the model, the pretraining and the fine-tune learning rate. The evaluation of these experiments show that the approach of using a deep learning architecture on an activity or gesture recognition task yields acceptable results, but has not yet reached a level of maturity, which would allow to use the developed models in serious applications.
Glatt, Ruben. "Deep learning architecture for gesture recognition /". Guaratinguetá, 2014. http://hdl.handle.net/11449/115718.
Texto completoCoorientador: Daniel Julien Barros da Silva Sampaio
Banca: Galeno José de Sena
Banca: Luiz de Siqueira Martins Filho
Resumo: O reconhecimento de atividade de visão de computador desempenha um papel importante na investigação para aplicações como interfaces humanas de computador, ambientes inteligentes, vigilância ou sistemas médicos. Neste trabalho, é proposto um sistema de reconhecimento de gestos com base em uma arquitetura de aprendizagem profunda. Ele é usado para analisar o desempenho quando treinado com os dados de entrada multi-modais em um conjunto de dados de linguagem de sinais italiana. A área de pesquisa subjacente é um campo chamado interação homem-máquina. Ele combina a pesquisa sobre interfaces naturais, reconhecimento de gestos e de atividade, aprendizagem de máquina e tecnologias de sensores que são usados para capturar a entrada do meio ambiente para processamento posterior. Essas áreas são introduzidas e os conceitos básicos são descritos. O ambiente de desenvolvimento para o pré-processamento de dados e algoritmos de aprendizagem de máquina programada em Python é descrito e as principais bibliotecas são discutidas. A coleta dos fluxos de dados é explicada e é descrito o conjunto de dados utilizado. A arquitetura proposta de aprendizagem consiste em dois passos. O pré-processamento dos dados de entrada e a arquitetura de aprendizagem. O pré-processamento é limitado a três estratégias diferentes, que são combinadas para oferecer seis diferentes perfis de préprocessamento. No segundo passo, um Deep Belief Network é introduzido e os seus componentes são explicados. Com esta definição, 294 experimentos são realizados com diferentes configurações. As variáveis que são alteradas são as definições de pré-processamento, a estrutura de camadas do modelo, a taxa de aprendizagem de pré-treino e a taxa de aprendizagem de afinação. A avaliação dessas experiências mostra que a abordagem de utilização de uma arquitetura ... (Resumo completo, clicar acesso eletrônico abaixo)
Abstract: Activity recognition from computer vision plays an important role in research towards applications like human computer interfaces, intelligent environments, surveillance or medical systems. In this work, a gesture recognition system based on a deep learning architecture is proposed. It is used to analyze the performance when trained with multi-modal input data on an Italian sign language dataset. The underlying research area is a field called human-machine interaction. It combines research on natural user interfaces, gesture and activity recognition, machine learning and sensor technologies, which are used to capture the environmental input for further processing. Those areas are introduced and the basic concepts are described. The development environment for preprocessing data and programming machine learning algorithms with Python is described and the main libraries are discussed. The gathering of the multi-modal data streams is explained and the used dataset is outlined. The proposed learning architecture consists of two steps. The preprocessing of the input data and the actual learning architecture. The preprocessing is limited to three different strategies, which are combined to offer six different preprocessing profiles. In the second step, a Deep Belief network is introduced and its components are explained. With this setup, 294 experiments are conducted with varying configuration settings. The variables that are altered are the preprocessing settings, the layer structure of the model, the pretraining and the fine-tune learning rate. The evaluation of these experiments show that the approach of using a deep learning architecture on an activity or gesture recognition task yields acceptable results, but has not yet reached a level of maturity, which would allow to use the developed models in serious applications.
Mestre
Salman, Ahmad. "Learning speaker-specific characteristics with deep neural architecture". Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/learning-speakerspecific-characteristics-with-deep-neural-architecture(24acb31d-2106-4e52-80ab-6c649838026a).html.
Texto completoGoh, Hanlin. "Learning deep visual representations". Paris 6, 2013. http://www.theses.fr/2013PA066356.
Texto completoRecent advancements in the areas of deep learning and visual information processing have presented an opportunity to unite both fields. These complementary fields combine to tackle the problem of classifying images into their semantic categories. Deep learning brings learning and representational capabilities to a visual processing model that is adapted for image classification. This thesis addresses problems that lead to the proposal of learning deep visual representations for image classification. The problem of deep learning is tackled on two fronts. The first aspect is the problem of unsupervised learning of latent representations from input data. The main focus is the integration of prior knowledge into the learning of restricted Boltzmann machines (RBM) through regularization. Regularizers are proposed to induce sparsity, selectivity and topographic organization in the coding to improve discrimination and invariance. The second direction introduces the notion of gradually transiting from unsupervised layer-wise learning to supervised deep learning. This is done through the integration of bottom-up information with top-down signals. Two novel implementations supporting this notion are explored. The first method uses top-down regularization to train a deep network of RBMs. The second method combines predictive and reconstructive loss functions to optimize a stack of encoder-decoder networks. The proposed deep learning techniques are applied to tackle the image classification problem. The bag-of-words model is adopted due to its strengths in image modeling through the use of local image descriptors and spatial pooling schemes. Deep learning with spatial aggregation is used to learn a hierarchical visual dictionary for encoding the image descriptors into mid-level representations. This method achieves leading image classification performances for object and scene images. The learned dictionaries are diverse and non-redundant. The speed of inference is also high. From this, a further optimization is performed for the subsequent pooling step. This is done by introducing a differentiable pooling parameterization and applying the error backpropagation algorithm. This thesis represents one of the first attempts to synthesize deep learning and the bag-of-words model. This union results in many challenging research problems, leaving much room for further study in this area
Kola, Ramya Sree. "Generation of synthetic plant images using deep learning architecture". Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18450.
Texto completoXiao, Yao. "Vehicle Detection in Deep Learning". Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/91375.
Texto completoMaster of Science
Computer vision techniques are becoming increasingly popular. For example, face recognition is used to help police find criminals, vehicle detection is used to prevent drivers from serious traffic accidents, and written word recognition is used to convert written words into printed words. With the rapid development of vehicle detection given the use of deep learning techniques, there are still concerns about the performance of state-of-the art vehicle detection techniques. For example, state-of-the-art vehicle detectors are restricted by the large variation of scales. People working on vehicle detection are developing techniques to solve this problem. This thesis proposes an advanced vehicle detection model, utilizing deep learning techniques to detect the potential objects’ information.
Tsardakas, Renhuldt Nikos. "Protein contact prediction based on the Tiramisu deep learning architecture". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231494.
Texto completoAtt kunna bestämma proteiners struktur har tillämpningar inom både medicin och industri. Såväl experimentell bestämning av proteinstruktur som prediktion av densamma är svårt. Predicerad kontakt mellan olika delar av ett protein underlättar prediktion av proteinstruktur. Under senare tid har djupinlärning använts för att bygga bättre modeller för kontaktprediktion. Den här uppsatsen beskriver en ny djupinlärningsmodell för prediktion av proteinkontakter, TiramiProt. Modellen bygger på djupinlärningsarkitekturen Tiramisu. TiramiProt tränas och utvärderas på samma data som kontaktprediktionsmodellen PconsC4. Totalt tränades modeller med 228 olika hyperparameterkombinationer till konvergens. Mätt över ett flertal olika parametrar presterar den färdiga TiramiProt-modellen resultat i klass med state-of-the-art-modellerna PconsC4 och RaptorX-Contact. TiramiProt finns tillgängligt som ett Python-paket samt en Singularity-container via https://gitlab.com/nikos.t.renhuldt/TiramiProt.
Fayyazifar, Najmeh. "Deep learning and neural architecture search for cardiac arrhythmias classification". Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2022. https://ro.ecu.edu.au/theses/2553.
Texto completoQian, Xiaoye. "Wearable Computing Architecture over Distributed Deep Learning Hierarchy: Fall Detection Study". Case Western Reserve University School of Graduate Studies / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=case156195574310931.
Texto completoÄhdel, Victor. "On the effect of architecture on deep learning based features for homography estimation". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233194.
Texto completoNyckelpunkts-detektion och deskriptor-skapande är det första steget av homografi och essentiell matris estimering, vilket i sin tur används inom Visuell Odometri och Visuell SLAM. Det här arbetet utforskar effekten (i form av snabbhet och exakthet) av användandet av olika djupinlärnings-arkitekturer för sådana nyckelpunkter. De hel-faltade nätverken – med huvuden för både detektorn och deskriptorn – tränas genom en existerande själv-handledd metod, där korrespondenser fås genom kända slumpmässigt valda homografier. En ny strategi för valet av negativa korrespondenser för deskriptorns träning presenteras, vilket möjliggör mer flexibilitet i designen av arkitektur. Den nya strategin visar sig vara väsentlig då den möjliggör nätverk som presterar bättre än den lärda baslinjen utan någon kostnad i inferenstid. Varieringen av modellstorleken leder till en kompromiss mellan snabbhet och exakthet, och medan alla modellerna presterar bättre än ORB i homografi-estimering, så är det endast de större modellerna som närmar sig SIFTs prestanda; där de presterar 1-7% sämre. Att träna längre och med ytterligare typer av data kanske ger tillräcklig förbättring för att prestera bättre än SIFT. Även fast de minsta modellerna är 3× snabbare och använder 50× färre parametrar än den lärda baslinjen, så kräver de fortfarande 3× så mycket tid som SIFT medan de presterar runt 10-30% sämre. Men det finns fortfarande utrymme för förbättring genom optimeringsmetoder som övergränsar ändringar av arkitekturen, som till exempel kvantisering, vilket skulle kunna göra metoden snabbare än SIFT.
Silvestri, Gianluigi. "One-Shot Neural Architecture Search for Deep Multi-Task Learning in Computer Vision". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-282831.
Texto completoI detta arbete föreslås en sökalgoritm för arkitektur inom multiaktivitetsinlärning. Givet en generell datamängd och aktivitetsgrupp, syftar metoden till att hitta det optimala sättet att dela lager mellan aktiviteterna i ett faltningsnätverk. Ett sökrum anpassat till multiaktivitetsinlärning har designats och en ny strategi att ranka olika optimala Pareto-lösningar har utvecklats. Kärnan i algoritmen är en anpassad state-of-the-art sökstrategi för arkitektur. Experimentella resultat för Cityscapes-datasetet, i uppgifter rörande semantisk segmentation och estimation av djup, levererar inte förväntade resultat. Trots avsaknaden av stabila resultat, ger detta arbete en grund för fortsätt utveckling av sökmetoder för arkitektur inom multiaktivitetsinlärning.
Manero, Font Jaume. "Deep learning architectures applied to wind time series multi-step forecasting". Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669283.
Texto completoLa predicció de vent és clau per a la integració de l'energia eòlica en els sistemes elèctrics. Els models meteorològics es fan servir per predicció, però tenen unes graelles geogràfiques massa grans per a reproduir totes les característiques locals que influencien la formació de vent, fent necessària la predicció d'acord amb les sèries temporals de mesures passades d'una localització concreta. L'objectiu d'aquest treball d'investigació és l'aplicació de xarxes neuronals profundes a la predicció \textit{multi-step} utilitzant com a entrada series temporals de múltiples variables meteorològiques, per a fer prediccions de vent d'ací a 12 hores. Les sèries temporals de vent són seqüències d'observacions meteorològiques tals com, velocitat del vent, temperatura, humitat, pressió baromètrica o direcció. Les sèries temporals de vent tenen dues propietats estadístiques rellevants, que són la no linearitat i la no estacionalitat, que fan que la modelització amb eines estadístiques sigui poc precisa. En aquesta tesi es validen i proven models de deep learning per la predicció de vent, aquests models d'arquitectures d'autoaprenentatge s'apliquen al conjunt de dades de vent més gran del món, que ha produït el National Renewable Laboratory dels Estats Units (NREL) i que té 126,692 ubicacions físiques de vent distribuïdes per total la geografia de nord Amèrica. L'heterogeneïtat d'aquestes sèries de dades permet establir conclusions fermes en la precisió de cada mètode aplicat a sèries temporals generades en llocs geogràficament molt diversos. Proposem xarxes neuronals profundes de tipus multi-capa, convolucionals i recurrents com a blocs bàsics sobre els quals es fan combinacions en arquitectures heterogènies amb variants, que s'entrenen amb estratègies d'optimització com drops, connexions skip, estratègies de parada, filtres i kernels de diferents mides entre altres. Les arquitectures s'optimitzen amb algorismes de selecció de paràmetres que permeten obtenir el model amb el millor rendiment, en totes les dades. Les capacitats d'aprenentatge de les arquitectures aplicades a ubicacions heterogènies permet establir relacions entre les característiques d'un lloc (complexitat del terreny, variabilitat del vent, ubicació geogràfica) i la precisió dels models, establint mesures de predictibilitat que relacionen la capacitat dels models amb les mesures definides a partir d'anàlisi espectral o d'estacionalitat de les sèries temporals. Els mètodes desenvolupats ofereixen noves i superiors alternatives als algorismes estadístics i mètodes tradicionals.
Arquitecturas de aprendizaje profundo aplicadas a la predición en múltiple escalón de series temporales de viento. La predicción de viento es clave para la integración de esta energía eólica en los sistemas eléctricos. Los modelos meteorológicos tienen una resolución geográfica demasiado amplia que no reproduce todas las características locales que influencian en la formación del viento, haciendo necesaria la predicción en base a series temporales de cada ubicación concreta. El objetivo de este trabajo de investigación es la aplicación de redes neuronales profundas a la predicción multi-step usando como entrada series temporales de múltiples variables meteorológicas, para realizar predicciones de viento a 12 horas. Las series temporales de viento son secuencias de observaciones meteorológicas tales como, velocidad de viento, temperatura, humedad, presión barométrica o dirección. Las series temporales de viento tienen dos propiedades estadísticas relevantes, que son la no linealidad y la no estacionalidad, lo que implica que su modelización con herramientas estadísticas sea poco precisa. En esta tesis se validan y verifican modelos de aprendizaje profundo para la predicción de viento, estos modelos de arquitecturas de aprendizaje automático se aplican al conjunto de datos de viento más grande del mundo, que ha sido generado por el National Renewable Laboratory de los Estados Unidos (NREL) y que tiene 126,682 ubicaciones físicas de viento distribuidas por toda la geografía de Estados Unidos. La heterogeneidad de estas series de datos permite establecer conclusiones válidas sobre la validez de cada método al ser aplicado en series temporales generadas en ubicaciones físicas muy diversas. Proponemos redes neuronales profundas de tipo multi capa, convolucionales y recurrentes como tipos básicos, sobre los que se han construido combinaciones en arquitecturas heterogéneas con variantes de entrenamiento como drops, conexiones skip, estrategias de parada, filtros y kernels de distintas medidas, entre otros. Las arquitecturas se optimizan con algoritmos de selección de parámetros que permiten obtener el mejor modelo buscando el mejor rendimiento, incluyendo todos los datos. Las capacidades de aprendizaje de las arquitecturas aplicadas a localizaciones físicas muy variadas permiten establecer relaciones entre las características de una ubicación (complejidad del terreno, variabilidad de viento, ubicación geográfica) y la precisión de los modelos, estableciendo medidas de predictibilidad que relacionan la capacidad de los algoritmos con índices que se definen a partir del análisis espectral o de estacionalidad de las series temporales. Los métodos desarrollados ofrecen nuevas alternativas a los algoritmos estadísticos tradicionales.
Ferré, Paul. "Adéquation algorithme-architecture de réseaux de neurones à spikes pour les architectures matérielles massivement parallèles". Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30318/document.
Texto completoThe last decade has seen the re-emergence of machine learning methods based on formal neural networks under the name of deep learning. Although these methods have enabled a major breakthrough in machine learning, several obstacles to the possibility of industrializing these methods persist, notably the need to collect and label a very large amount of data as well as the computing power necessary to perform learning and inference with this type of neural network. In this thesis, we propose to study the adequacy between inference and learning algorithms derived from biological neural networks and massively parallel hardware architectures. We show with three contribution that such adequacy drastically accelerates computation times inherent to neural networks. In our first axis, we study the adequacy of the BCVision software engine developed by Brainchip SAS for GPU platforms. We also propose the introduction of a coarse-to-fine architecture based on complex cells. We show that GPU portage accelerates processing by a factor of seven, while the coarse-to-fine architecture reaches a factor of one thousand. The second contribution presents three algorithms for spike propagation adapted to parallel architectures. We study exhaustively the computational models of these algorithms, allowing the selection or design of the hardware system adapted to the parameters of the desired network. In our third axis we present a method to apply the Spike-Timing-Dependent-Plasticity rule to image data in order to learn visual representations in an unsupervised manner. We show that our approach allows the effective learning a hierarchy of representations relevant to image classification issues, while requiring ten times less data than other approaches in the literature
Nemirovsky, Daniel A. "Improving heterogeneous system efficiency : architecture, scheduling, and machine learning". Doctoral thesis, Universitat Politècnica de Catalunya, 2017. http://hdl.handle.net/10803/461499.
Texto completoArquitectos de computadores estan empesando a diseñar systemas heterogeneos como una manera efficiente de usar los incrementos en densidades de transistors para ejecutar una gran diversidad de programas corriendo debajo de differentes condiciones y requisitos de energia y rendimiento (performance). En cuanto los sistemas heterogeneos van ganando popularidad de uso, arquitectos van a necesitar a diseñar nuevas formas de hacer el scheduling de las applicaciones en los cores distintos de los CPUs. Schedulers nuevos que tienen en cuenta la heterogeniedad de los recursos en el hardware logran importantes beneficios en terminos de rendimiento en comparacion con schedulers hecho para sistemas homogenios. Pero, casi todos de estos schedulers heterogeneos no son capaz de poder identificar la esquema de mapping que produce el rendimiento maximo dado el estado de los cores y las applicaciones. Estimando con precision el rendimiento de los programas ejecutando sobre diferentes cores de un CPU es un a gran ventaja para poder identificar el mapping para lograr el mejor rendimiento posible para el proximo scheduling quantum. Desarollos nuevos en la area de machine learning, como redes neurales, han producido predictores muy potentes y con gran precision in disciplinas numerosas. Pero en estos momentos, la aplicacion de metodos de machine learning no se han casi explorados para poder mejorar la eficiencia de los CPUs y menos para mejorar los schedulers para sistemas heterogeneos. El tema de enfoque en esta tesis es como poder entender y utilizar los sistemas heterogeneos, los beneficios de scheduling para estos sistemas, y como aprovechar las promesas de los metodos de machine learning con respeto a maximizer el redimiento de el Sistema. Presentamos estudios que dan una esquema para un modelo de computacion para el futuro capaz de dar suporte a recursos heterogeneos en gran escala, discutimos las restricciones enfrentados por diseñadores de sistemas heterogeneos, exploramos las ventajas y desventajas de las ultimas schedulers heterogeneos, y abrimos el camino de usar metodos de machine learning para optimizer el mapping y rendimiento de un sistema heterogeneo. El objetivo de esta tesis es destacar la imporancia de explotando eficientemente la heterogenidad de los recursos y tambien validar las oportunidades para mejorar la eficiencia en diferente areas de arquitectura de computadoras que pueden ser realizadas gracias a machine learning.
Li, Yanxi. "Efficient Neural Architecture Search with an Active Performance Predictor". Thesis, University of Sydney, 2020. https://hdl.handle.net/2123/24092.
Texto completoCuan, Bonan. "Deep similarity metric learning for multiple object tracking". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI065.
Texto completoMultiple object tracking, i.e. simultaneously tracking multiple objects in the scene, is an important but challenging visual task. Objects should be accurately detected and distinguished from each other to avoid erroneous trajectories. Since remarkable progress has been made in object detection field, “tracking-by-detection” approaches are widely adopted in multiple object tracking research. Objects are detected in advance and tracking reduces to an association problem: linking detections of the same object through frames into trajectories. Most tracking algorithms employ both motion and appearance models for data association. For multiple object tracking problems where exist many objects of the same category, a fine-grained discriminant appearance model is paramount and indispensable. Therefore, we propose an appearance-based re-identification model using deep similarity metric learning to deal with multiple object tracking in mono-camera videos. Two main contributions are reported in this dissertation: First, a deep Siamese network is employed to learn an end-to-end mapping from input images to a discriminant embedding space. Different metric learning configurations using various metrics, loss functions, deep network structures, etc., are investigated, in order to determine the best re-identification model for tracking. In addition, with an intuitive and simple classification design, the proposed model achieves satisfactory re-identification results, which are comparable to state-of-the-art approaches using triplet losses. Our approach is easy and fast to train and the learned embedding can be readily transferred onto the domain of tracking tasks. Second, we integrate our proposed re-identification model in multiple object tracking as appearance guidance for detection association. For each object to be tracked in a video, we establish an identity-related appearance model based on the learned embedding for re-identification. Similarities among detected object instances are exploited for identity classification. The collaboration and interference between appearance and motion models are also investigated. An online appearance-motion model coupling is proposed to further improve the tracking performance. Experiments on Multiple Object Tracking Challenge benchmark prove the effectiveness of our modifications, with a state-of-the-art tracking accuracy
Amara, Pavan Kumar. "Towards a Unilateral Sensor Architecture for Detecting Person-to-Person Contacts". Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1404573/.
Texto completoPereira, Renato de Pontes. "HIGMN : an IGMN-based hierarchical architecture and its applications for robotic tasks". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/80752.
Texto completoThe recent field of Deep Learning has introduced to Machine Learning new meth- ods based on distributed abstract representations of the training data throughout hierarchical structures. The hierarchical organization of layers allows these meth- ods to store distributed information on sensory signals and to create concepts with different abstraction levels to represent the input data. This work investigates the impact of a hierarchical structure inspired by ideas on Deep Learning and based on the Incremental Gaussian Mixture Network (IGMN), a probabilistic neural network with an on-line and incremental learning, specially suitable for robotic tasks. As a result, a hierarchical architecture, called Hierarchical Incremental Gaussian Mixture Network (HIGMN), was developed, which combines two levels of IGMNs. The HIGMN first-level layers are able to learn concepts from data of different domains that are then related in the second-level layer. The proposed model was compared with the IGMN regarding robotic tasks, in special, the task of learning and repro- ducing a wall-following behavior, based on a Learning from Demonstration (LfD) approach. The experiments showed how the HIGMN can perform parallely three different tasks concept learning, behavior segmentation, and learning and repro- ducing behaviors and its ability to learn a wall-following behavior and to perform it in unknown environments with new sensory information. HIGMN could reproduce the wall-following behavior after a single, simple, and short demonstration of the behavior. Moreover, it acquired different types of knowledge: information on the environment, the robot kinematics, and the target behavior.
García, López Javier. "Geometric computer vision meets deep learning for autonomous driving applications". Doctoral thesis, TDX (Tesis Doctorals en Xarxa), 2021. http://hdl.handle.net/10803/672708.
Texto completoEsta disertación tiene como objetivo principal proporcionar contribuciones teóricas y prácticas sobre el desarrollo de algoritmos de aprendizaje profundo para aplicaciones de conducción autónoma. La investigación está motivada por la necesidad de redes neuronales profundas (DNN) para obtener una comprensión completa del entorno y para ejecutarse en escenarios de conducción reales con vehículos reales equipados con hardware específico, los cuales tienen memoria limitada (plataformas DSP o GPU) o utilizan múltiples sensores ópticos Esto limita el desarrollo del algoritmo obligando a las redes profundas diseñadas a ser precisas, con un número mínimo de operaciones y bajo consumo de memoria y energía. El objetivo principal de esta tesis es, por un lado, investigar las limitaciones reales de los algoritmos basados en DL que impiden que se integren en las funcionalidades ADAS (Autonomous Driving System) actuales, y por otro, el diseño e implementación de algoritmos de aprendizaje profundo capaces de superar tales limitaciones para ser aplicados en escenarios reales de conducción autónoma, permitiendo su integración en plataformas de hardware de baja memoria y evitando la redundancia de sensores. Las aplicaciones de aprendizaje profundo (DL) se han explotado ampliamente en los últimos años, pero tienen algunos puntos débiles que deben enfrentarse y superarse para integrar completamente la DL en el proceso de desarrollo de los grandes fabricantes o empresas automobilísticas, como el tiempo necesario para diseñar, entrenar y validar una red óptima para una aplicación específica o el vasto conocimiento de los expertos requeridos para tunear hiperparámetros de redes predefinidas con el fin de hacerlas ejecutables en una plataforma concreta y obtener la mayor ventaja de los recursos de hardware. Durante esta tesis, hemos abordado estos temas y nos hemos centrado en las implementaciones de avances que ayudarían en la integración industrial de aplicaciones basadas en DL en la industria del automóvil. Este trabajo se ha realizado en el marco del programa "Doctorat Industrial", en la empresa FICOSA ADAS, y es por las posibilidades que la empresa ha ofrecido que se ha podido demostrar un impacto rápido y directo de los algoritmos conseguidos en escenarios de test reales para probar su validez. Además, en este trabajo, se investiga en profundidad el diseño automático de redes neuronales profundas (DNN) basadas en frameworks de deep learning de última generación como NAS (neural architecture search). Como se afirma en esta tesis, una de las barreras identificadas de la tecnología de aprendizaje profundo en las empresas automotrices de hoy en día es la dificultad de desarrollar redes ligeras y precisas que puedan integrarse en pequeños systems on chip(SoC) o DSP. Para superar esta restricción, se propone un framework llamado E-DNAS para el diseño automático, entrenamiento y validación de redes neuronales profundas para realizar tareas de clasificación de imágenes y ejecutarse en plataformas de hardware con recursos limitados. Este apporach ha sido validado en un system on chip real de la empresa Texas Instrumets (tda2x) facilitado por FICOSA ADAS, cuyos resultados se publican dentro de esta tesis. Como extensión del mencionado E-DNAS, en el último capítulo de este trabajo se presenta un framework basado en NAS válido para la detección de objetos cuya principal contribución es una forma fácil y rápida de encontrar propuestas de objetos en imágenes que, en un segundo paso, se clasificará en una de las clases etiquetadas.
Automàtica, robòtica i visió
Amara, Pavan Kumar. "Towards a Unilateral Sensor Architecture for Detecting Person-to-Person Contacts". Thesis, University of North Texas, 2012. https://digital.library.unt.edu/ark:/67531/metadc1703441/.
Texto completoBono, Guillaume. "Deep multi-agent reinforcement learning for dynamic and stochastic vehicle routing problems". Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI096.
Texto completoRouting delivery vehicles in dynamic and uncertain environments like dense city centers is a challenging task, which requires robustness and flexibility. Such logistic problems are usually formalized as Dynamic and Stochastic Vehicle Routing Problems (DS-VRPs) with a variety of additional operational constraints, such as Capacitated vehicles or Time Windows (DS-CVRPTWs). Main heuristic approaches to dynamic and stochastic problems simply consist in restarting the optimization process on a frozen (static and deterministic) version of the problem given the new information. Instead, Reinforcement Learning (RL) offers models such as Markov Decision Processes (MDPs) which naturally describe the evolution of stochastic and dynamic systems. Their application to more complex problems has been facilitated by recent progresses in Deep Neural Networks, which can learn to represent a large class of functions in high dimensional spaces to approximate solutions with high performances. Finding a compact and sufficiently expressive state representation is the key challenge in applying RL to VRPs. Recent work exploring this novel approach demonstrated the capabilities of Attention Mechanisms to represent sets of customers and learn policies generalizing to different configurations of customers. However, all existing work using DNNs reframe the VRP as a single-vehicle problem and cannot provide online decision rules for a fleet of vehicles.In this thesis, we study how to apply Deep RL methods to rich DS-VRPs as multi-agent systems. We first explore the class of policy-based approaches in Multi-Agent RL and Actor-Critic methods for Decentralized, Partially Observable MDPs in the Centralized Training for Decentralized Control (CTDC) paradigm. To address DS-VRPs, we then introduce a new sequential multi-agent model we call sMMDP. This fully observable model is designed to capture the fact that consequences of decisions can be predicted in isolation. Afterwards, we use it to model a rich DS-VRP and propose a new modular policy network to represent the state of the customers and the vehicles in this new model, called MARDAM. It provides online decision rules adapted to the information contained in the state and takes advantage of the structural properties of the model. Finally, we develop a set of artificial benchmarks to evaluate the flexibility, the robustness and the generalization capabilities of MARDAM. We report promising results in the dynamic and stochastic case, which demonstrate the capacity of MARDAM to address varying scenarios with no re-optimization, adapting to new customers and unexpected delays caused by stochastic travel times. We also implement an additional benchmark based on micro-traffic simulation to better capture the dynamics of a real city and its road infrastructures. We report preliminary results as a proof of concept that MARDAM can learn to represent different scenarios, handle varying traffic conditions, and customers configurations
Bonazza, Pierre. "Système de sécurité biométrique multimodal par imagerie, dédié au contrôle d’accès". Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCK017/document.
Texto completoResearch of this thesis consists in setting up efficient and light solutions to answer the problems of securing sensitive products. Motivated by a collaboration with various stakeholders within the Nuc-Track project, the development of a biometric security system, possibly multimodal, will lead to a study on various biometric features such as the face, fingerprints and the vascular network. This thesis will focus on an algorithm and architecture matching, with the aim of minimizing the storage size of the learning models while guaranteeing optimal performances. This will allow it to be stored on a personal support, thus respecting privacy standards
Anani-Manyo, Nina K. "Computer Vision and Building Envelopes". Kent State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=kent1619539038754026.
Texto completoBai, Kang Jun. "Moving Toward Intelligence: A Hybrid Neural Computing Architecture for Machine Intelligence Applications". Diss., Virginia Tech, 2021. http://hdl.handle.net/10919/103711.
Texto completoDoctor of Philosophy
Deep learning strategies are the cutting-edge of artificial intelligence, in which the artificial neural networks are trained to extract key features or finding similarities from raw sensory information. This is made possible through multiple processing layers with a colossal amount of neurons, in a similar way to humans. Deep learning strategies run on von Neumann computers are deployed worldwide. However, in today's data-driven society, the use of general-purpose computing systems and cloud infrastructures can no longer offer a timely response while themselves exposing other significant security issues. Arose with the introduction of neuromorphic architecture, application-specific integrated circuit chips have paved the way for machine intelligence applications in recently years. The major contributions in this dissertation include designing and fabricating a new class of hybrid neural computing architecture and implementing various deep learning strategies to diverse machine intelligence applications. The resulting hybrid neural computing architecture offers an alternative solution to accelerate the neural computations required for sophisticated machine intelligence applications with a simple system-level design, and therefore, opening the door to low-power system-on-chip design for future intelligence computing, what is more, providing prominent design solutions and performance improvements for internet of things applications.
Blot, Michaël. "Étude de l'apprentissage et de la généralisation des réseaux profonds en classification d'images". Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS412.
Texto completoArtificial intelligence is experiencing a resurgence in recent years. This is due to the growing ability to collect and store a considerable amount of digitized data. These huge databases allow machine learning algorithms to respond to certain tasks through supervised learning. Among the digitized data, images remain predominant in the modern environment. Huge datasets have been created. moreover, the image classification has allowed the development of previously neglected models, deep neural networks or deep learning. This family of algorithms demonstrates a great facility to learn perfectly datasets, even very large. Their ability to generalize remains largely misunderstood, but the networks of convolutions are today the undisputed state of the art. From a research and application point of view of deep learning, the demands will be more and more demanding, requiring to make an effort to bring the performances of the neuron networks to the maximum of their capacities. This is the purpose of our research, whose contributions are presented in this thesis. We first looked at the issue of training and considered accelerating it through distributed methods. We then studied the architectures in order to improve them without increasing their complexity. Finally, we particularly study the regularization of network training. We studied a regularization criterion based on information theory that we deployed in two different ways
Carbonera, Luvizon Diogo. "Apprentissage automatique pour la reconnaissance d'action humaine et l'estimation de pose à partir de l'information 3D". Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1015.
Texto completo3D human action recognition is a challenging task due to the complexity ofhuman movements and to the variety on poses and actions performed by distinctsubjects. Recent technologies based on depth sensors can provide 3D humanskeletons with low computational cost, which is an useful information foraction recognition. However, such low cost sensors are restricted tocontrolled environment and frequently output noisy data. Meanwhile,convolutional neural networks (CNN) have shown significant improvements onboth action recognition and 3D human pose estimation from RGB images. Despitebeing closely related problems, the two tasks are frequently handled separatedin the literature. In this work, we analyze the problem of 3D human actionrecognition in two scenarios: first, we explore spatial and temporalfeatures from human skeletons, which are aggregated by a shallow metriclearning approach. In the second scenario, we not only show that precise 3Dposes are beneficial to action recognition, but also that both tasks can beefficiently performed by a single deep neural network and stillachieves state-of-the-art results. Additionally, wedemonstrate that optimization from end-to-end using poses as an intermediateconstraint leads to significant higher accuracy on the action task thanseparated learning. Finally, we propose a new scalable architecture forreal-time 3D pose estimation and action recognition simultaneously, whichoffers a range of performance vs speed trade-off with a single multimodal andmultitask training procedure
Speranza, Nicholas A. "Adaptive Two-Stage Edge-Centric Architecture for Deeply-Learned Embedded Real-Time Target Classification in Aerospace Sense-and-Avoidance Applications". Wright State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1621886997260122.
Texto completoLomonaco, Vincenzo <1991>. "Continual Learning with Deep Architectures". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amsdottorato.unibo.it/9073/1/vincenzo_lomonaco_thesis.pdf.
Texto completoBäuml, Berthold [Verfasser], Bernd [Akademischer Betreuer] Krieg-Brückner, Bernd [Gutachter] Krieg-Brückner y Gerd [Gutachter] Hirzinger. "Bringing a Humanoid Robot Closer to Human Versatility : Hard Realtime Software Architecture and Deep Learning Based Tactile Sensing / Berthold Bäuml ; Gutachter: Bernd Krieg-Brückner, Gerd Hirzinger ; Betreuer: Bernd Krieg-Brückner". Bremen : Staats- und Universitätsbibliothek Bremen, 2019. http://d-nb.info/1177239914/34.
Texto completoSarpangala, Kishan. "Semantic Segmentation Using Deep Learning Neural Architectures". University of Cincinnati / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ucin157106185092304.
Texto completoSudhakaran, Swathikiran. "Deep Neural Architectures for Video Representation Learning". Doctoral thesis, Università degli studi di Trento, 2019. https://hdl.handle.net/11572/369191.
Texto completoPageaud, Simon. "SmartGov : architecture générique pour la co-construction de politiques urbaines basée sur l'apprentissage par renforcement multi-agent". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1128.
Texto completoIn this thesis, we propose the SmartGov model, coupling multi-agent simulation and multi-agent deep reinforcement learning, to help co-construct urban policies and integrate all stakeholders in the decision process. Smart Cities provide sensor data from the urban areas to increase realism of the simulation in SmartGov.Our first contribution is a generic architecture for multi-agent simulation of the city to study global behavior emergence with realistic agents reacting to political decisions. With a multi-level modeling and a coupling of different dynamics, our tool learns environment specificities and suggests relevant policies. Our second contribution improves autonomy and adaptation of the decision function with multi-agent, multi-level reinforcement learning. A set of clustered agents is distributed over the studied area to learn local specificities without any prior knowledge on the environment. Trust score assignment and individual rewards help reduce non-stationary impact on experience replay in deep reinforcement learning.These contributions bring forth a complete system to co-construct urban policies in the Smart City. We compare our model with different approaches from the literature on a parking fee policy to display the benefits and limits of our contributions
Луцишин, Роман Олегович y Roman Olehovych Lutsyshyn. "Методи автоматизованого перекладу природної мови на основі нейромережевої моделі “послідовність-послідовність”". Master's thesis, Тернопільський національний технічний університет імені Івана Пулюя, 2020. http://elartu.tntu.edu.ua/handle/lib/33271.
Texto completoThe master's thesis is devoted to the research and implementation of methods of automated translation of natural language on the basis of the neural network model "sequence-sequence". The basic principles and approaches to the preparation of training data sampling, including the use of deep neural networks as encoders, are considered. The existing methods of solving the problem of natural language translation have been studied and analyzed, in particular, several neural network architectures of deep machine origin have been considered. Examples of creation and processing of natural language corpora to solve the problem of forming training and test data samples are given. A full assessment of the cost of creating a computer system required to solve the problem was performed, as well as a complete process of deploying software in this environment using third-party platforms. The results of the research were a complete review of existing solutions to solve the problem, choosing the best technology, improving the latter, implementation and training of a deep neural network model such as sequence-sequence" for the problem of natural language translation.
1. ВСТУП 2. АНАЛІЗ ПРЕДМЕТНОЇ ОБЛАСТІ 3. ОБҐРУНТУВАННЯ ОБРАНИХ ЗАСОБІВ 4. РЕАЛІЗАЦІЯ СИСТЕМИ ПЕРЕКЛАДУ ПРИРОДНОЇ МОВИ НА ОСНОВІ МОДЕЛІ "ПОСЛІДОВНІСТЬ-ПОСЛІДОВНІСТЬ" ТА НЕЙРОМЕРЕЖЕВОЇ АРХІТЕКТУРИ ТРАСНФОРМЕРС 5. ОХОРОНА ПРАЦІ ТА БЕЗПЕКА В НАДЗВИЧАЙНИХ СИТУАЦІЯХ
Bahl, Gaétan. "Architectures deep learning pour l'analyse d'images satellite embarquée". Thesis, Université Côte d'Azur, 2022. https://tel.archives-ouvertes.fr/tel-03789667.
Texto completoThe recent advances in high-resolution Earth observation satellites and the reduction in revisit times introduced by the creation of constellations of satellites has led to the daily creation of large amounts of image data hundreds of TeraBytes per day). Simultaneously, the popularization of Deep Learning techniques allowed the development of architectures capable of extracting semantic content from images. While these algorithms usually require the use of powerful hardware, low-power AI inference accelerators have recently been developed and have the potential to be used in the next generations of satellites, thus opening the possibility of onboard analysis of satellite imagery. By extracting the information of interest from satellite images directly onboard, a substantial reduction in bandwidth, storage and memory usage can be achieved. Current and future applications, such as disaster response, precision agriculture and climate monitoring, would benefit from a lower processing latency and even real-time alerts.In this thesis, our goal is two-fold: On the one hand, we design efficient Deep Learning architectures that are able to run on low-power edge devices, such as satellites or drones, while retaining a sufficient accuracy. On the other hand, we design our algorithms while keeping in mind the importance of having a compact output that can be efficiently computed, stored, transmitted to the ground or other satellites within a constellation.First, by using depth-wise separable convolutions and convolutional recurrent neural networks, we design efficient semantic segmentation neural networks with a low number of parameters and a low memory usage. We apply these architectures to cloud and forest segmentation in satellite images. We also specifically design an architecture for cloud segmentation on the FPGA of OPS-SAT, a satellite launched by ESA in 2019, and perform onboard experiments remotely. Second, we develop an instance segmentation architecture for the regression of smooth contours based on the Fourier coefficient representation, which allows detected object shapes to be stored and transmitted efficiently. We evaluate the performance of our method on a variety of low-power computing devices. Finally, we propose a road graph extraction architecture based on a combination of fully convolutional and graph neural networks. We show that our method is significantly faster than competing methods, while retaining a good accuracy
Chen, Hua. "FPGA Based Multi-core Architectures for Deep Learning Networks". University of Dayton / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1449417091.
Texto completoEdström, Jacob y Pontus Mjöberg. "The Optimal Hardware Architecture for High Precision 3D Localization on the Edge. : A Study of Robot Guidance for Automated Bolt Tightening". Thesis, KTH, Skolan för industriell teknik och management (ITM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263104.
Texto completoIndustrin rör sig mot en högre grad av automatisering och uppkoppling, där tidigare manuella operationer anpassas för sammankopplade industriella robotar. Denna masteruppsats fokuserar specifikt på automatiseringen av åtdragningsapplikationer med förmonterade bultar och kollaborativa robotar. Användningen av 3D-datorseende undersöks för direkt lokalisering av bultar, för att möjliggöra flexibla monteringslösningar. En lokaliseringsalgoritm baserad på 3Ddata utvecklas med intentionen att skapa en lätt mjukvara för att köras på Edge-enheter. En restriktiv användning av djupinlärningsklassificering är därmed inkluderad, för att möjliggöra produktflexibilitet tillsammans med en minimering av den behövda beräkningskraften. Avvägningarna mellan edge- och moln- eller klusterberäkning för den valda applikationen undersöks för att identifiera smarta avlastningsmöjligheter till moln- eller klusterresurser. För att minska operationell fördröjning utvärderas även bildpartitionering, för att snabbare kunna starta operationen med en första koordinat och möjliggöra beräkningar parallellt med robotrörelser. Fyra olika hårdvaruarkitekturer testas, bestående av två olika enkortsdatorer, ett kluster av enkortsdatorer och en marknadsledande dator som en efterliknad lokal molnlösning. Alla system utom klustret visar sig prestera utan operationell fördröjning för applikationen. Den optimala hårdvaruarkitekturen visar sig därmed vara en konsumentklassad enkortsdator, optimerad på energieffektivitet, kostnad och storlek. Om endast variansen i kommunikationstid kan minskas visar klustret potential för att kunna reducera den totala beräkningstiden utan att skapa operationell fördröjning. Smart avlastning till djupinlärningsoptimerade molnresurser eller kluster av sammankopplade robotstationer visar sig möjliggöra ökad komplexitet och tillförlitlighet av algoritmen. Enkortsdatorn visar sig även kunna växla mellan en edge- och en klusterkonfiguration, för att antingen optimera för tiden att starta operationen eller för den totala beräkningstiden. Detta medför en hög flexibilitet i industriella sammanhang, där produktändringar kan hanteras utan behovet av hårdvaruförändringar för visuella beräkningar, vilket ytterligare möjliggör dess integrering i fabriksenheter.
Palasek, Petar. "Action recognition using deep learning". Thesis, Queen Mary, University of London, 2017. http://qmro.qmul.ac.uk/xmlui/handle/123456789/30828.
Texto completoThangthai, Ausdang. "Visual speech synthesis using dynamic visemes and deep learning architectures". Thesis, University of East Anglia, 2018. https://ueaeprints.uea.ac.uk/69371/.
Texto completoVukotic, Verdran. "Deep Neural Architectures for Automatic Representation Learning from Multimedia Multimodal Data". Thesis, Rennes, INSA, 2017. http://www.theses.fr/2017ISAR0015/document.
Texto completoIn this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame as input.3) Bidirectional multimodal encoders: the main contribution of this thesis consists of neural architecture that translates from one modality to the other and conversely and offers and improved multimodal representation space where the initially disjoint representations can translated and fused. This enables for improved multimodal fusion of multiple modalities. The architecture was extensively studied an evaluated in international benchmarks within the task of video hyperlinking where it defined the state of the art today.4) Generative adversarial networks for multimodal fusion: continuing on the topic of multimodal fusion, we evaluate the possibility of using conditional generative adversarial networks to lean multimodal representations in addition to providing multimodal representations, generative adversarial networks permit to visualize the learned model directly in the image domain
Riera, Villanueva Marc. "Low-power accelerators for cognitive computing". Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/669828.
Texto completoLes xarxes neuronals profundes (DNN) han aconseguit un èxit enorme en aplicacions cognitives, i són especialment eficients en problemes de classificació i presa de decisions com ara reconeixement de veu o traducció automàtica. Els dispositius mòbils depenen cada cop més de les DNNs per entendre el món. Els telèfons i rellotges intel·ligents, o fins i tot els cotxes, realitzen diàriament tasques discriminatòries com ara el reconeixement de rostres o objectes. Malgrat la popularitat creixent de les DNNs, el seu funcionament en sistemes mòbils presenta diversos reptes: proporcionar una alta precisió i rendiment amb un petit pressupost de memòria i energia. Les DNNs modernes consisteixen en milions de paràmetres que requereixen recursos computacionals i de memòria enormes i, per tant, no es poden utilitzar directament en sistemes de baixa potència amb recursos limitats. L'objectiu d'aquesta tesi és abordar aquests problemes i proposar noves solucions per tal de dissenyar acceleradors eficients per a sistemes de computació cognitiva basats en DNNs. En primer lloc, ens centrem en optimitzar la inferència de les DNNs per a aplicacions de processament de seqüències. Realitzem una anàlisi de la similitud de les entrades entre execucions consecutives de les DNNs. A continuació, proposem DISC, un accelerador que implementa una tècnica de càlcul diferencial, basat en l'alt grau de semblança de les entrades, per reutilitzar els càlculs de l'execució anterior, en lloc de computar tota la xarxa. Observem que, de mitjana, més del 60% de les entrades de qualsevol capa de les DNNs utilitzades presenten canvis menors respecte a l'execució anterior. Evitar els accessos de memòria i càlculs d'aquestes entrades comporta un estalvi d'energia del 63% de mitjana. En segon lloc, proposem optimitzar la inferència de les DNNs basades en capes FC. Primer analitzem el nombre de pesos únics per neurona d'entrada en diverses xarxes. Aprofitant optimitzacions comunes com la quantització lineal, observem un nombre molt reduït de pesos únics per entrada en diverses capes FC de DNNs modernes. A continuació, per millorar l'eficiència energètica del càlcul de les capes FC, presentem CREW, un accelerador que implementa un eficient mecanisme de reutilització de càlculs i emmagatzematge dels pesos. CREW redueix el nombre de multiplicacions i proporciona estalvis importants en l'ús de la memòria. Avaluem CREW en un conjunt divers de DNNs modernes. CREW proporciona, de mitjana, una millora en rendiment de 2,61x i un estalvi d'energia de 2,42x. En tercer lloc, proposem un mecanisme per optimitzar la inferència de les RNNs. Les cel·les de les xarxes recurrents realitzen multiplicacions element a element de les activacions de diferents comportes, sigmoides i tanh sent les funcions habituals d'activació. Realitzem una anàlisi dels valors de les funcions d'activació i mostrem que una fracció significativa està saturada cap a zero o un en un conjunto d'RNNs populars. A continuació, proposem CGPA per podar dinàmicament les activacions de les RNNs a una granularitat gruixuda. CGPA evita l'avaluació de neurones senceres cada vegada que les sortides de neurones parelles estan saturades. CGPA redueix significativament la quantitat de càlculs i accessos a la memòria, aconseguint en mitjana un 12% de millora en el rendiment i estalvi d'energia. Finalment, en l'última contribució d'aquesta tesi ens centrem en metodologies de poda estàtica de les DNNs. La poda redueix la petjada de memòria i el treball computacional mitjançant l'eliminació de connexions o neurones redundants. Tanmateix, mostrem que els esquemes de poda previs fan servir un procés iteratiu molt llarg que requereix l'entrenament de les DNNs moltes vegades per ajustar els paràmetres de poda. A continuació, proposem un esquema de poda basat en l'anàlisi de components principals i la importància relativa de les connexions de cada neurona que optimitza automàticament el DNN optimitzat en un sol tret sense necessitat de sintonitzar manualment múltiples paràmetres
Dhamija, Tanush. "Deep Learning Architectures for time of arrival detection in Acoustic Emissions Monitoring". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24620/.
Texto completoDonnot, Benjamin. "Deep learning methods for predicting flows in power grids : novel architectures and algorithms". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS060/document.
Texto completoThis thesis addresses problems of security in the French grid operated by RTE, the French ``Transmission System Operator'' (TSO). Progress in sustainable energy, electricity market efficiency, or novel consumption patterns push TSO's to operate the grid closer to its security limits. To this end, it is essential to make the grid ``smarter''. To tackle this issue, this work explores the benefits of artificial neural networks. We propose novel deep learning algorithms and architectures to assist the decisions of human operators (TSO dispatchers) that we called “guided dropout”. This allows the predictions on power flows following of a grid willful or accidental modification. This is tackled by separating the different inputs: continuous data (productions and consumptions) are introduced in a standard way, via a neural network input layer while discrete data (grid topologies) are encoded directly in the neural network architecture. This architecture is dynamically modified based on the power grid topology by switching on or off the activation of hidden units. The main advantage of this technique lies in its ability to predict the flows even for previously unseen grid topologies. The "guided dropout" achieves a high accuracy (up to 99% of precision for flow predictions) with a 300 times speedup compared to physical grid simulators based on Kirchoff's laws even for unseen contingencies, without detailed knowledge of the grid structure. We also showed that guided dropout can be used to rank contingencies that might occur in the order of severity. In this application, we demonstrated that our algorithm obtains the same risk as currently implemented policies while requiring only 2% of today's computational budget. The ranking remains relevant even handling grid cases never seen before, and can be used to have an overall estimation of the global security of the power grid
Sovrano, Francesco. "Deep Reinforcement Learning and sub-problem decomposition using Hierarchical Architectures in partially observable environments". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16718/.
Texto completoBaldassarre, Federico. "Morphing architectures for pose-based image generation of people in clothing". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233361.
Texto completoDetta projekt undersöker villkorad bildgenerering från förskjutna bild-källor, med ett tillämpat exempel inom innehållsskapande för modebranschen. Problemet med rumslig förskjutning mellan bilder identifieras varpå relaterad litteratur diskuteras. Därefter introduceras olika tillvägagångssätt för att lösa problemet. Projektet fokuserar i synnerhet på ickelinjära, differentierbara morphing-moduler vilka designas och integreras i befintlig arkitektur för bild-till-bild-översättning. Den föreslagna metoden för villkorlig bildgenerering tillämpas på en uppgift för klädbyte, med hjälp av ett verklighetsbaserat dataset av modebilder från Zalando. I jämförelse med tidigare modeller för klädbyte och virtuell provning har resultaten från vår metod hög visuell kvalité och uppnår exakt återuppbyggnad av klädernas detaljer.
Silfa, Franyell. "Energy-efficient architectures for recurrent neural networks". Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671448.
Texto completoLos algoritmos de aprendizaje profundo han tenido un éxito notable en aplicaciones como el reconocimiento automático de voz y la traducción automática. Por ende, estas aplicaciones son omnipresentes en nuestras vidas y se encuentran en una gran cantidad de dispositivos. Estos algoritmos se componen de Redes Neuronales Profundas (DNN), tales como las Redes Neuronales Convolucionales y Redes Neuronales Recurrentes (RNN), las cuales tienen un gran número de parámetros y cálculos. Por esto implementar DNNs en dispositivos móviles y servidores es un reto debido a los requisitos de memoria y energía. Las RNN se usan para resolver problemas de secuencia a secuencia tales como traducción automática. Estas contienen dependencias de datos entre las ejecuciones de cada time-step, por ello la cantidad de paralelismo es limitado. Por eso la evaluación de RNNs de forma energéticamente eficiente es un reto. En esta tesis se estudian RNNs para mejorar su eficiencia energética en arquitecturas especializadas. Para esto, proponemos técnicas de ahorro energético y arquitecturas de alta eficiencia adaptadas a la evaluación de RNN. Primero, caracterizamos un conjunto de RNN ejecutándose en un SoC. Luego identificamos que acceder a la memoria para leer los pesos es la mayor fuente de consumo energético el cual llega hasta un 80%. Por ende, creamos E-PUR: una unidad de procesamiento para RNN. E-PUR logra una aceleración de 6.8x y mejora el consumo energético en 88x en comparación con el SoC. Esas mejoras se deben a la maximización de la ubicación temporal de los pesos. En E-PUR, la lectura de los pesos representa el mayor consumo energético. Por ende, nos enfocamos en reducir los accesos a la memoria y creamos un esquema que reutiliza resultados calculados previamente. La observación es que al evaluar las secuencias de entrada de un RNN, la salida de una neurona dada tiende a cambiar ligeramente entre evaluaciones consecutivas, por lo que ideamos un esquema que almacena en caché las salidas de las neuronas y las reutiliza cada vez que detecta un cambio pequeño entre el valor de salida actual y el valor previo, lo que evita leer los pesos. Para decidir cuándo usar un cálculo anterior utilizamos una Red Neuronal Binaria (BNN) como predictor de reutilización, dado que su salida está altamente correlacionada con la salida de la RNN. Esta propuesta evita más del 24.2% de los cálculos y reduce el consumo energético promedio en 18.5%. El tamaño de la memoria de los modelos RNN suele reducirse utilizando baja precisión para la evaluación y el almacenamiento de los pesos. En este caso, la precisión mínima utilizada se identifica de forma estática y se establece de manera que la RNN mantenga su exactitud. Normalmente, este método utiliza la misma precisión para todo los cálculos. Sin embargo, observamos que algunos cálculos se pueden evaluar con una precisión menor sin afectar la exactitud. Por eso, ideamos una técnica que selecciona dinámicamente la precisión utilizada para calcular cada time-step. Un reto de esta propuesta es como elegir una precisión menor. Abordamos este problema reconociendo que el resultado de una evaluación previa se puede emplear para determinar la precisión requerida en el time-step actual. Nuestro esquema evalúa el 57% de los cálculos con una precisión menor que la precisión fija empleada por los métodos estáticos. Por último, la evaluación en E-PUR muestra una aceleración de 1.46x con un ahorro de energía promedio de 19.2%
Singh, Jaswinder. "RNA Structure Prediction using Deep Neural Network Architectures and Improved Evolutionary Profiles". Thesis, Griffith University, 2022. http://hdl.handle.net/10072/414924.
Texto completoThesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text
Xing, Luo Oscar. "Deep Learning for Speech Enhancement : A Study on WaveNet, GANs and General CNN-RNN Architectures". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260351.
Texto completoKlarhet och förståelse är viktiga aspekter av tal, särskilt i en tid då falsk information och misstrogenhet är vanligt. Genombrottet för generativa modeller inom ljud har medfört stora förbättringar inom talsignalförbättring. Googles WaveNet-arkitektur har modifierats för brusreducering i en modell som kallas för WaveNet-denoising vilket har visat goda resultat. En annan konkurrent på marknaden är den generella adversariella nätverket för talsignalförbättring (SEGAN) som anpassar GAN-arkitekturen till tillämpningar på tal. Medan de flesta äldre modeller fokuserar på särdragsextraktion och spektrogramanalys, så försöker de två nya modellerna med att ignorera dessa koncept och vara end-to-end istället. Medan end-to-end är bra är databehandling fortfarande en viktig aspekt som är värdefull att överväga. Ett nätverk som designats av Microsoft Research heter EHNet och använder spektrogramdata som input istället för enbart 1D-vågformer för att fånga upp fler relationer mellan datapunkter, då högre dimensioner möjliggör mer information. Detta examensarbete syftar till att utforska studieområdet inom talsignalförbättring samt utreda de tre nämnda arkitekturerna genom teoretisk undersökning och resultat på nya dataset. Det kommer också vara en implementering av Wienerfilter som riktmärke för resultaten. Vi kommer fram till slutsatsen att alla tre nätverk är möjliga alternativ inom talsignalförbättring men SEGAN är den bästa modellen när det kommer till resultat på vårt specifika dataset och med avseende på robusthet. För framtida arbeten kan man förbättra utvärderingsmetoderna, ändra datasetet och implementera hyperparameteroptimeringför ytterligare jämförande analyser.
Ali, Abdou-Djalilou. "Prediction of tomato seed germination from images with deep learning". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24666/.
Texto completoPolicarpi, Andrea. "Transformers architectures for time series forecasting". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25005/.
Texto completoButtar, Sarpreet Singh. "Applying Artificial Neural Networks to Reduce the Adaptation Space in Self-Adaptive Systems : an exploratory work". Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-87117.
Texto completo