Dissertations / Theses: 'Deep Recurrent Neural Network (DRNN)'

1

Tekin, Mim Kemal. "Vehicle Path Prediction Using Recurrent Neural Network." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166134.

Full text

Abstract:

Vehicle Path Prediction can be used to support Advanced Driver Assistance Systems (ADAS) that covers different technologies like Autonomous Braking System, Adaptive Cruise Control, etc. In this thesis, the vehicle’s future path, parameterized as 5 coordinates along the path, is predicted by using only visual data collected by a front vision sensor. This approach provides cheaper application opportunities without using different sensors. The predictions are done by deep convolutional neural networks (CNN) and the goal of the project is to use recurrent neural networks (RNN) and to investigate the benefits of using reccurence to the task. Two different approaches are used for the models. The first approach is a single-frame approach that makes predictions by using only one image frame as input and predicts the future location points of the car. The single-frame approach is the baseline model. The second approach is a sequential approach that enables the network the usage of historical information of previous image frames in order to predict the vehicle’s future path for the current frame. With this approach, the effect of using recurrence is investigated. Moreover, uncertainty is important for the model reliability. Having a small uncertainty in most of the predictions or having a high uncertainty in unfamiliar situations for the model will increase success of the model. In this project, the uncertainty estimation approach is based on capturing the uncertainty by following a method that allows to work on deep learning models. The uncertainty approach uses the same models that are defined by the first two approaches. Finally, the evaluation of the approaches are done by the mean absolute error and defining two different reasonable tolerance levels for the distance between the prediction path and the ground truth path. The difference between two tolerance levels is that the first one is a strict tolerance level and the the second one is a more relaxed tolerance level. When using strict tolerance level based on distances on test data, 36% of the predictions are accepted for single-frame model, 48% for the sequential model, 27% and 13% are accepted for single-frame and sequential models of uncertainty models. When using relaxed tolerance level on test data, 60% of the predictions are accepted by single-frame model, 67% for the sequential model, 65% and 53% are accepted for single-frame and sequential models of uncertainty models. Furthermore, by using stored information for each sequence, the methods are evaluated for different conditions such as day/night, road type and road cover. As a result, the sequential model outperforms in the majority of the evaluation results.

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Xutao. "Chinese Text Classification Based On Deep Learning." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-35322.

Full text

Abstract:

Text classification has always been a concern in area of natural language processing, especially nowadays the data are getting massive due to the development of internet. Recurrent neural network (RNN) is one of the most popular method for natural language processing due to its recurrent architecture which give it ability to process serialized information. In the meanwhile, Convolutional neural network (CNN) has shown its ability to extract features from visual imagery. This paper combine the advantages of RNN and CNN and proposed a model called BLSTM-C for Chinese text classification. BLSTM-C begins with a Bidirectional long short-term memory (BLSTM) layer which is an special kind of RNN to get a sequence output based on the past context and the future context. Then it feed this sequence to CNN layer which is utilized to extract features from the previous sequence. We evaluate BLSTM-C model on several tasks such as sentiment classification and category classification and the result shows our model’s remarkable performance on these text tasks.

APA, Harvard, Vancouver, ISO, and other styles

3

Wen, Tsung-Hsien. "Recurrent neural network language generation for dialogue systems." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/275648.

Full text

Abstract:

Language is the principal medium for ideas, while dialogue is the most natural and effective way for humans to interact with and access information from machines. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact on usability and perceived quality. Many commonly used NLG systems employ rules and heuristics, which tend to generate inflexible and stylised responses without the natural variation of human language. However, the frequent repetition of identical output forms can quickly make dialogue become tedious for most real-world users. Additionally, these rules and heuristics are not scalable and hence not trivially extensible to other domains or languages. A statistical approach to language generation can learn language decisions directly from data without relying on hand-coded rules or heuristics, which brings scalability and flexibility to NLG. Statistical models also provide an opportunity to learn in-domain human colloquialisms and cross-domain model adaptations. A robust and quasi-supervised NLG model is proposed in this thesis. The model leverages a Recurrent Neural Network (RNN)-based surface realiser and a gating mechanism applied to input semantics. The model is motivated by the Long-Short Term Memory (LSTM) network. The RNN-based surface realiser and gating mechanism use a neural network to learn end-to-end language generation decisions from input dialogue act and sentence pairs; it also integrates sentence planning and surface realisation into a single optimisation problem. The single optimisation not only bypasses the costly intermediate linguistic annotations but also generates more natural and human-like responses. Furthermore, a domain adaptation study shows that the proposed model can be readily adapted and extended to new dialogue domains via a proposed recipe. Continuing the success of end-to-end learning, the second part of the thesis speculates on building an end-to-end dialogue system by framing it as a conditional generation problem. The proposed model encapsulates a belief tracker with a minimal state representation and a generator that takes the dialogue context to produce responses. These features suggest comprehension and fast learning. The proposed model is capable of understanding requests and accomplishing tasks after training on only a few hundred human-human dialogues. A complementary Wizard-of-Oz data collection method is also introduced to facilitate the collection of human-human conversations from online workers. The results demonstrate that the proposed model can talk to human judges naturally, without any difficulty, for a sample application domain. In addition, the results also suggest that the introduction of a stochastic latent variable can help the system model intrinsic variation in communicative intention much better.

APA, Harvard, Vancouver, ISO, and other styles

4

Ayoub, Issa. "Multimodal Affective Computing Using Temporal Convolutional Neural Network and Deep Convolutional Neural Networks." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39337.

Full text

Abstract:

Affective computing has gained significant attention from researchers in the last decade due to the wide variety of applications that can benefit from this technology. Often, researchers describe affect using emotional dimensions such as arousal and valence. Valence refers to the spectrum of negative to positive emotions while arousal determines the level of excitement. Describing emotions through continuous dimensions (e.g. valence and arousal) allows us to encode subtle and complex affects as opposed to discrete emotions, such as the basic six emotions: happy, anger, fear, disgust, sad and neutral. Recognizing spontaneous and subtle emotions remains a challenging problem for computers. In our work, we employ two modalities of information: video and audio. Hence, we extract visual and audio features using deep neural network models. Given that emotions are time-dependent, we apply the Temporal Convolutional Neural Network (TCN) to model the variations in emotions. Additionally, we investigate an alternative model that combines a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). Given our inability to fit the latter deep model into the main memory, we divide the RNN into smaller segments and propose a scheme to back-propagate gradients across all segments. We configure the hyperparameters of all models using Gaussian processes to obtain a fair comparison between the proposed models. Our results show that TCN outperforms RNN for the recognition of the arousal and valence emotional dimensions. Therefore, we propose the adoption of TCN for emotion detection problems as a baseline method for future work. Our experimental results show that TCN outperforms all RNN based models yielding a concordance correlation coefficient of 0.7895 (vs. 0.7544) on valence and 0.8207 (vs. 0.7357) on arousal on the validation dataset of SEWA dataset for emotion prediction.

APA, Harvard, Vancouver, ISO, and other styles

5

Javid, Gelareh. "Contribution à l’estimation de charge et à la gestion optimisée d’une batterie Lithium-ion : application au véhicule électrique." Thesis, Mulhouse, 2021. https://www.learning-center.uha.fr/.

Full text

Abstract:

L'estimation de l'état de charge (SOC) est un point crucial pour la sécurité des performances et la durée de vie des batteries lithium-ion (Li-ion) utilisées pour alimenter les VE.Dans cette thèse, la précision de l'estimation de l'état de charge est étudiée à l'aide d'algorithmes de réseaux neuronaux récurrents profonds (DRNN). Pour ce faire, pour une cellule d’une batterie Li-ion, trois nouvelles méthodes sont proposées : une mémoire bidirectionnelle à long et court terme (BiLSTM), une mémoire robuste à long et court terme (RoLSTM) et une technique d'unités récurrentes à grille (GRU).En utilisant ces techniques, on ne dépend pas de modèles précis de la batterie et on peut éviter les méthodes mathématiques complexes, en particulier dans un bloc de batterie. En outre, ces modèles sont capables d'estimer précisément le SOC à des températures variables. En outre, contrairement au réseau de neurones récursif traditionnel dont le contenu est réécrit à chaque fois, ces réseaux peuvent décider de préserver la mémoire actuelle grâce aux passerelles proposées. Dans ce cas, il peut facilement transférer l'information sur de longs chemins pour recevoir et maintenir des dépendances à long terme.La comparaison des résultats indique que le réseau BiLSTM a de meilleures performances que les deux autres méthodes. De plus, le modèle BiLSTM peut travailler avec des séquences plus longues provenant de deux directions, le passé et le futur, sans problème de disparition du gradient. Cette caractéristique permet de sélectionner une longueur de séquence équivalente à une période de décharge dans un cycle de conduite, et d'obtenir une plus grande précision dans l'estimation. En outre, ce modèle s'est bien comporté face à une valeur initiale incorrecte du SOC.Enfin, une nouvelle méthode BiLSTM a été introduite pour estimer le SOC d'un pack de batteries dans un EV. Le logiciel IPG Carmaker a été utilisé pour collecter les données et tester le modèle en simulation. Les résultats ont montré que l'algorithme proposé peut fournir une bonne estimation du SOC sans utilisation de filtre dans le système de gestion de la batterie (BMS)
The State Of Charge (SOC) estimation is a significant issue for safe performance and the lifespan of Lithium-ion (Li-ion) batteries, which is used to power the Electric Vehicles (EVs). In this thesis, the accuracy of SOC estimation is investigated using Deep Recurrent Neural Network (DRNN) algorithms. To do this, for a one cell Li-ion battery, three new SOC estimator based on different DRNN algorithms are proposed: a Bidirectional LSTM (BiLSTM) method, Robust Long-Short Term Memory (RoLSTM) algorithm, and a Gated Recurrent Units (GRUs) technique. Using these, one is not dependent on precise battery models and can avoid complicated mathematical methods especially in a battery pack. In addition, these models are able to precisely estimate the SOC at varying temperature. Also, unlike the traditional recursive neural network where content is re-written at each time, these networks can decide on preserving the current memory through the proposed gateways. In such case, it can easily transfer the information over long paths to receive and maintain long-term dependencies. Comparing the results indicates the BiLSTM network has a better performance than the other two. Moreover, the BiLSTM model can work with longer sequences from two direction, the past and the future, without gradient vanishing problem. This feature helps to select a sequence length as much as a discharge period in one drive cycle, and to have more accuracy in the estimation. Also, this model well behaved against the incorrect initial value of SOC. Finally, a new BiLSTM method introduced to estimate the SOC of a pack of batteries in an Ev. IPG Carmaker software was used to collect data and test the model in the simulation. The results showed that the suggested algorithm can provide a good SOC estimation without using any filter in the Battery Management System (BMS)

APA, Harvard, Vancouver, ISO, and other styles

6

Parakkal, Sreenivasan Akshai. "Deep learning prediction of Quantmap clusters." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445909.

Full text

Abstract:

The hypothesis that similar chemicals exert similar biological activities has been widely adopted in the field of drug discovery and development. Quantitative Structure-Activity Relationship (QSAR) models have been used ubiquitously in drug discovery to understand the function of chemicals in biological systems. A common QSAR modeling method calculates similarity scores between chemicals to assess their biological function. However, due to the fact that some chemicals can be similar and yet have different biological activities, or conversely can be structurally different yet have similar biological functions, various methods have instead been developed to quantify chemical similarity at the functional level. Quantmap is one such method, which utilizes biological databases to quantify the biological similarity between chemicals. Quantmap uses quantitative molecular network topology analysis to cluster chemical substances based on their bioactivities. This method by itself, unfortunately, cannot assign new chemicals (those which may not yet have biological data) to the derived clusters. Owing to the fact that there is a lack of biological data for many chemicals, deep learning models were explored in this project with respect to their ability to correctly assign unknown chemicals to Quantmap clusters. The deep learning methods explored included both convolutional and recurrent neural networks. Transfer learning/pretraining based approaches and data augmentation methods were also investigated. The best performing model, among those considered, was the Seq2seq model (a recurrent neural network containing two joint networks, a perceiver and an interpreter network) without pretraining, but including data augmentation.

APA, Harvard, Vancouver, ISO, and other styles

7

Putchala, Manoj Kumar. "Deep Learning Approach for Intrusion Detection System (IDS) in the Internet of Things (IoT) Network using Gated Recurrent Neural Networks (GRU)." Wright State University / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=wright1503680452498351.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Mohammadisohrabi, Ali. "Design and implementation of a Recurrent Neural Network for Remaining Useful Life prediction." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020.

Find full text

Abstract:

A key idea underlying many Predictive Maintenance solutions is Remaining Useful Life (RUL) of machine parts, and it simply involves a prediction on the time remaining before a machine part is likely to require repair or replacement. Nowadays, with respect to fact that the systems are getting more complex, the innovative Machine Learning and Deep Learning algorithms can be deployed to study the more sophisticated correlations in complex systems. The exponential increase in both data accumulation and processing power make the Deep Learning algorithms more desirable that before. In this paper a Long Short-Term Memory (LSTM) which is a Recurrent Neural Network is designed to predict the Remaining Useful Life (RUL) of Turbofan Engines. The dataset is taken from NASA data repository. Finally, the performance obtained by RNN is compared to the best Machine Learning algorithm for the dataset.

APA, Harvard, Vancouver, ISO, and other styles

9

Engström, Isak. "Automated Gait Analysis : Using Deep Metric Learning." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178139.

Full text

Abstract:

Sectors of security, safety, and defence require methods for identifying people on the individual level. Automation of these tasks has the potential of outperforming manual labor, as well as relieving workloads. The ever-extending surveillance camera networks, advances in human pose estimation from monocular cameras, together with the progress of deep learning techniques, pave the way for automated walking gait analysis as an identification method. This thesis investigates the use of 2D kinematic pose sequences to represent gait, monocularly extracted from a limited dataset containing walking individuals captured from five camera views. The sequential information of the gait is captured using recurrent neural networks. Techniques in deep metric learning are applied to evaluate two network models, with contrasting output dimensionalities, against deep-metric-, and non-deep-metric-based embedding spaces. The results indicate that the gait representation, network designs, and network learning structure show promise when identifying individuals, scaling particularly well to unseen individuals. However, with the limited dataset, the network models performed best when the dataset included the labels from both the individuals and the camera views simultaneously, contrary to when the data only contained the labels from the individuals without the information of the camera views. For further investigations, an extension of the data would be required to evaluate the accuracy and effectiveness of these methods, for the re-identification task of each individual.

Examensarbetet är utfört vid Institutionen för teknik och naturvetenskap (ITN) vid Tekniska fakulteten, Linköpings universitet

APA, Harvard, Vancouver, ISO, and other styles

10

Guan, Xing. "Predict Next Location of Users using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263620.

Full text

Abstract:

Predicting the next location of a user has been interesting for both academia and industry. Applications like location-based advertising, traffic planning, intelligent resource allocation as well as in recommendation services are some of the problems that many are interested in solving. Along with the technological advancement and the widespread usage of electronic devices, many location-based records are created. Today, deep learning framework has successfully surpassed many conventional methods in many learning tasks, most known in the areas of image and voice recognition. One of the neural network architecture that has shown the promising result at sequential data is Recurrent Neural Network (RNN). Since the creation of RNN, much alternative architecture have been proposed, and architectures like Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU) are one of the popular ones that are created[5]. This thesis uses GRU architecture and features that incorporate time and location into the network to forecast people’s next location In this paper, a spatial-temporal neural network (ST-GRU) has been proposed. It can be seen as two parts, which are ST and GRU. The first part is a feature extraction algorithm that pulls out the information from a trajectory into location sequences. That process transforms the trajectory into a friendly sequence format in order to feed into the model. The second part, GRU is proposed to predict the next location given a user’s trajectory. The study shows that the proposed model ST-GRU has the best results comparing the baseline models.
Att förutspå vart en individ är på väg har varit intressant för både akademin och industrin. Tillämpningar såsom platsbaserad annonsering, trafikplanering, intelligent resursallokering samt rekommendationstjänster är några av de problem som många är intresserade av att lösa. Tillsammans med den tekniska utvecklingen och den omfattande användningen av elektroniska enheter har många platsbaserade data skapats. Idag har tekniken djupinlärning framgångsrikt överträffat många konventionella metoder i inlärningsuppgifter, bland annat inom områdena bild och röstigenkänning. En neural nätverksarkitektur som har visat lovande resultat med sekventiella data kallas återkommande neurala nätverk (RNN). Sedan skapandet av RNN har många alternativa arkitekturer skapats, bland de mest kända är Long Short Term Memory (LSTM) och Gated Recurrent Units (GRU). Den här studien använder en modifierad GRU där man bland annat lägger till attribut såsom tid och distans i nätverket för att prognostisera nästa plats. I det här examensarbetet har ett rumsligt temporalt neuralt nätverk (ST-GRU) föreslagits. Den består av två delar, nämligen ST och GRU. Den första delen är en extraktionsalgoritm som drar ut relevanta korrelationer mellan tid och plats som är inkorporerade i nätverket. Den andra delen, GRU, förutspår nästa plats med avseende på användarens aktuella plats. Studien visar att den föreslagna modellen ST-GRU ger bättre resultat jämfört med benchmarkmodellerna.

APA, Harvard, Vancouver, ISO, and other styles

11

Mishra, Vishal Vijayshankar. "Sequence-to-Sequence Learning using Deep Learning for Optical Character Recognition (OCR)." University of Toledo / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1513273051760905.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Santos, Claudio Filipi Gonçalves dos. "Optical character recognition using deep learning." Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/154100.

Full text

Abstract:

Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-24T11:51:59Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 8334356 bytes, checksum: 8dd05363a96c946ae1f6d665edc80d09 (MD5)
Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) Corrigir a ordem das páginas pré-textuais; a ordem correta (capa, folha de rosto, dedicatória, agradecimentos, epígrafe, resumo na língua vernácula, resumo em língua estrangeira, listas de ilustrações, de tabelas, de abreviaturas, de siglas e de símbolos e sumário). Problema 03) Faltam as palavras-chave no resumo e no abstracts. Na página da Seção de pós-graduação, em Instruções para Qualificação e Defesas de Dissertação e Tese, você pode acessar o modelo das páginas pré-textuais. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-24T20:59:53Z (GMT)
Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T00:43:19Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11084990 bytes, checksum: 6f8d7431cd17efd931a31c0eade10c65 (MD5)
Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) A paginação deve ser sequencial, iniciando a contagem na folha de rosto e mostrando o número a partir da introdução, a ficha catalográfica ficará após a folha de rosto e não deverá ser contada. Problema 03) Na descrição do item: Título em outro idioma – Se você colocou no título em inglês deve por neste campo o título em outro idioma (ex: português, espanhol, francês...) Estamos encaminhando via e-mail o template/modelo para que você possa fazer as correções. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-25T15:22:45Z (GMT)
Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T15:52:53Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11089966 bytes, checksum: d6c863077a995bd2519035b8a3e97c80 (MD5)
Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Agradecemos a compreensão. on 2018-05-25T18:03:19Z (GMT)
Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T18:08:09Z No. of bitstreams: 1 Claudio Filipi Gonçalves dos Santos Corrigido Biblioteca.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5)
Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-05-25T18:51:24Z (GMT) No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5)
Made available in DSpace on 2018-05-25T18:51:24Z (GMT). No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5) Previous issue date: 2018-04-26
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Detectores óticos de caracteres, ou Optical Character Recognition (OCR) é o nome dado à técnologia de traduzir dados de imagens em arquivo de texto. O objetivo desse projeto é usar aprendizagem profunda, também conhecido por aprendizado hierárquico ou Deep Learning para o desenvolvimento de uma aplicação com a habilidade de detectar áreas candidatas, segmentar esses espaços dan imagem e gerar o texto contido na figura. Desde 2006, Deep Learning emergiu como uma nova área em aprendizagem de máquina. Em tempos recentes, as técnicas desenvolvidas em pesquisas com Deep Learning têm influenciado e expandido escopo, incluindo aspectos chaves nas área de inteligência artificial e aprendizagem de máquina. Um profundo estudo foi conduzido com a intenção de desenvolver um sistema OCR usando apenas arquiteturas de Deep Learning.A evolução dessas técnicas, alguns trabalhos passados e como esses trabalhos influenciaram o desenvolvimento dessa estrutura são explicados nesse texto. Essa tese demonstra com resultados como um classificador de caracteres foi desenvolvido. Em seguida é explicado como uma rede neural pode ser desenvolvida para ser usada como um detector de objetos e como ele pode ser transformado em um detector de texto. Logo após é demonstrado como duas técnicas diferentes de Deep Learning podem ser combinadas e usadas na tarefa de transformar segmentos de imagens em uma sequência de caracteres. Finalmente é demonstrado como o detector de texto e o sistema transformador de imagem em texto podem ser combinados para se desenvolver um sistema OCR completo que detecta regiões de texto nas imagens e o que está escrito nessa região. Esse estudo demonstra que a idéia de usar apenas estruturas de Deep Learning podem ter performance melhores do técnicas baseadas em outras áreas da computação como por exemplo o processamento de imagens. Para detecção de texto foi alcançado mais de 70% de precisão quando uma arquitetura mais complexa foi usada, por volta de 69% de traduções de imagens para texto corretas e por volta de 50% na tarefa ponta-à-ponta de detectar as áreas de texto e traduzi-las em sequência de caracteres.
Optical Character Recognition (OCR) is the name given to the technology used to translate image data into a text file. The objective of this project is to use Deep Learning techniques to develop a software with the ability to segment images, detecting candidate characters and generating textthatisinthepicture. Since2006,DeepLearningorhierarchicallearning, emerged as a new machine learning area. Over recent years, the techniques developed from deep learning research have influenced and expanded scope, including key aspects of artificial intelligence and machine learning. A thorough study was carried out in order to develop an OCR system using only Deep Learning architectures. It is explained the evolution of these techniques, some past works and how they influenced thisframework’sdevelopment. Inthisthesisitisdemonstratedwithresults how a single character classifier was developed. Then it is explained how a neural network can be developed to be an object detector and how to transform this object detector into a text detector. After that it shows how a set of two Deep Learning techniques can be combined and used in the taskoftransformingacroppedregionofanimageinastringofcharacters. Finally, it demonstrates how the text detector and the Image-to-Text systemswerecombinedinordertodevelopafullend-to-endOCRsystemthat detects the regions of a given image containing text and what is written in this region. It shows the idea of using only Deep Learning structures can outperform other techniques based on other areas like image processing. In text detection it reached over 70% of precision when a more complex architecture was used, around 69% of correct translation of image-to-text areasandaround50%onend-to-endtaskofdetectingareasandtranslating them into text.
1623685

APA, Harvard, Vancouver, ISO, and other styles

13

Talevi, Luca, and Luca Talevi. "“Decodifica di intenzioni di movimento dalla corteccia parietale posteriore di macaco attraverso il paradigma Deep Learning”." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17846/.

Full text

Abstract:

Le Brain Computer Interfaces (BCI) invasive permettono di restituire la mobilità a pazienti che hanno perso il controllo degli arti: ciò avviene attraverso la decodifica di segnali bioelettrici prelevati da aree corticali di interesse al fine di guidare un arto prostetico. La decodifica dei segnali neurali è quindi un punto critico nelle BCI, richiedendo lo sviluppo di algoritmi performanti, affidabili e robusti. Tali requisiti sono soddisfatti in numerosi campi dalle Deep Neural Networks, algoritmi adattivi le cui performance scalano con la quantità di dati forniti, allineandosi con il crescente numero di elettrodi degli impianti. Impiegando segnali pre-registrati dalla corteccia di due macachi durante movimenti di reach-to-grasp verso 5 oggetti differenti, ho testato tre basilari esempi notevoli di DNN – una rete densa multistrato, una Convolutional Neural Network (CNN) ed una Recurrent NN (RNN) – nel compito di discriminare in maniera continua e real-time l’intenzione di movimento verso ciascun oggetto. In particolare, è stata testata la capacità di ciascun modello di decodificare una generica intenzione (single-class), la performance della migliore rete risultante nel discriminarle (multi-class) con o senza metodi di ensemble learning e la sua risposta ad un degrado del segnale in ingresso. Per agevolarne il confronto, ciascuna rete è stata costruita e sottoposta a ricerca iperparametrica seguendo criteri comuni. L’architettura CNN ha ottenuto risultati particolarmente interessanti, ottenendo F-Score superiori a 0.6 ed AUC superiori a 0.9 nel caso single-class con metà dei parametri delle altre reti e tuttavia maggior robustezza. Ha inoltre mostrato una relazione quasi-lineare con il degrado del segnale, priva di crolli prestazionali imprevedibili. Le DNN impiegate si sono rivelate performanti e robuste malgrado la semplicità, rendendo eventuali architetture progettate ad-hoc promettenti nello stabilire un nuovo stato dell’arte nel controllo neuroprotesico.

APA, Harvard, Vancouver, ISO, and other styles

14

Wang, Wei. "Event Detection and Extraction from News Articles." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/82238.

Full text

Abstract:

Event extraction is a type of information extraction(IE) that works on extracting the specific knowledge of certain incidents from texts. Nowadays the amount of available information (such as news, blogs, and social media) grows in exponential order. Therefore, it becomes imperative to develop algorithms that automatically extract the machine-readable information from large volumes of text data. In this dissertation, we focus on three problems in obtaining event-related information from news articles. (1) The first effort is to comprehensively analyze the performance and challenges in current large-scale event encoding systems. (2) The second problem involves event detection and critical information extractions from news articles. (3) Third, the efforts concentrate on event-encoding which aims to extract event extent and arguments from texts. We start by investigating the two large-scale event extraction systems (ICEWS and GDELT) in the political science domain. We design a set of experiments to evaluate the quality of the extracted events from the two target systems, in terms of reliability and correctness. The results show that there exist significant discrepancies between the outputs of automated systems and hand-coded system and the accuracy of both systems are far away from satisfying. These findings provide preliminary background and set the foundation for using advanced machine learning algorithms for event related information extraction. Inspired by the successful application of deep learning in Natural Language Processing (NLP), we propose a Multi-Instance Convolutional Neural Network (MI-CNN) model for event detection and critical sentences extraction without sentence level labels. To evaluate the model, we run a set of experiments on a real-world protest event dataset. The result shows that our model could be able to outperform the strong baseline models and extract the meaningful key sentences without domain knowledge and manually designed features. We also extend the MI-CNN model and propose an MIMTRNN model for event extraction with distant supervision to overcome the problem of lacking fine level labels and small size training data. The proposed MIMTRNN model systematically integrates the RNN, Multi-Instance Learning, and Multi-Task Learning into a unified framework. The RNN module aims to encode into the representation of entity mentions the sequential information as well as the dependencies between event arguments, which are very useful in the event extraction task. The Multi-Instance Learning paradigm makes the system does not require the precise labels in entity mention level and make it perfect to work together with distant supervision for event extraction. And the Multi-Task Learning module in our approach is designed to alleviate the potential overfitting problem caused by the relatively small size of training data. The results of the experiments on two real-world datasets(Cyber-Attack and Civil Unrest) show that our model could be able to benefit from the advantage of each component and outperform other baseline methods significantly.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

15

Odinsdottir, Gudny Björk, and Jesper Larsson. "Deep Learning Approach for Extracting Heart Rate Variability from a Photoplethysmographic Signal." Thesis, Högskolan Kristianstad, Fakulteten för naturvetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hkr:diva-21368.

Full text

Abstract:

Photoplethysmography (PPG) is a method to detect blood volume changes in every heartbeat. The peaks in the PPG signal corresponds to the electrical impulses sent by the heart. The duration between each heartbeat varies, and these variances are better known as heart rate variability (HRV). Thus, finding peaks correctly from PPG signals provides the opportunity to measure an accurate HRV. Additional research indicates that deep learning approaches can extract HRV from a PPG signal with significantly greater accuracy compared to other traditional methods. In this study, deep learning classifiers were built to detect peaks in a noise-contaminated PPG signal and to recognize the performed activity during the data recording. The dataset used in this study is provided by the PhysioBank database consisting of synchronized PPG-, acceleration- and gyro data. The models investigated in this study were limited toa one-layer LSTM network with six varying numbers of neurons and four different window sizes. The most accurate model for the peak classification was the model consisting of 256 neurons and a window size of 15 time steps, with a Matthews correlation coefficient (MCC) of 0.74. The model consisted of64 neurons and a window duration of 1.25 seconds resulted in the most accurate activity classification, with an MCC score of 0.63. Concludingly, more optimization of a deep learning approach could lead to promising accuracy on peak detection and thus an accurate measurement of HRV. The probable cause for the low accuracy of the activity classification problem is the limited data used in this study.

APA, Harvard, Vancouver, ISO, and other styles

16

Holm, Noah, and Emil Plynning. "Spatio-temporal prediction of residential burglaries using convolutional LSTM neural networks." Thesis, KTH, Geoinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229952.

Full text

Abstract:

The low amount solved residential burglary crimes calls for new and innovative methods in the prevention and investigation of the cases. There were 22 600 reported residential burglaries in Sweden 2017 but only four to five percent of these will ever be solved. There are many initiatives in both Sweden and abroad for decreasing the amount of occurring residential burglaries and one of the areas that are being tested is the use of prediction methods for more efficient preventive actions. This thesis is an investigation of a potential method of prediction by using neural networks to identify areas that have a higher risk of burglaries on a daily basis. The model use reported burglaries to learn patterns in both space and time. The rationale for the existence of patterns is based on near repeat theories in criminology which states that after a burglary both the burgled victim and an area around that victim has an increased risk of additional burglaries. The work has been conducted in cooperation with the Swedish Police authority. The machine learning is implemented with convolutional long short-term memory (LSTM) neural networks with max pooling in three dimensions that learn from ten years of residential burglary data (2007-2016) in a study area in Stockholm, Sweden. The model's accuracy is measured by performing predictions of burglaries during 2017 on a daily basis. It classifies cells in a 36x36 grid with 600 meter square grid cells as areas with elevated risk or not. By classifying 4% of all grid cells during the year as risk areas, 43% of all burglaries are correctly predicted. The performance of the model could potentially be improved by further configuration of the parameters of the neural network, along with a use of more data with factors that are correlated to burglaries, for instance weather. Consequently, further work in these areas could increase the accuracy. The conclusion is that neural networks or machine learning in general could be a powerful and innovative tool for the Swedish Police authority to predict and moreover prevent certain crime. This thesis serves as a first prototype of how such a system could be implemented and used.

APA, Harvard, Vancouver, ISO, and other styles

17

Gustafsson, Anton, and Julian Sjödal. "Energy Predictions of Multiple Buildings using Bi-directional Long short-term Memory." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43552.

Full text

Abstract:

The process of energy consumption and monitoring of a buildingis time-consuming. Therefore, an feasible approach for using trans-fer learning is presented to decrease the necessary time to extract re-quired large dataset. The technique applies a bidirectional long shortterm memory recurrent neural network using sequence to sequenceprediction. The idea involves a training phase that extracts informa-tion and patterns of a building that is presented with a reasonablysized dataset. The validation phase uses a dataset that is not sufficientin size. This dataset was acquired through a related paper, the resultscan therefore be validated accordingly. The conducted experimentsinclude four cases that involve different strategies in training and val-idation phases and percentages of fine-tuning. Our proposed modelgenerated better scores in terms of prediction performance comparedto the related paper.

APA, Harvard, Vancouver, ISO, and other styles

18

Sláma, Štěpán. "Pokročilá klasifikace poruch srdečního rytmu v EKG." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413024.

Full text

Abstract:

This work focuses on a theoretical explanation of heart rhythm disorders and the possibility of their automatic detection using deep learning networks. For the purposes of this work, a total of 6884 10-second ECG recordings with measured eight leads were used. Those recordings were divided into 5 groups according to heart rhythm into a group of records with atrial fibrillation, sinus rhythms, supraventricular rhythms, ventricular rhythms, and the last group consisted of the others records. Individual groups were unbalanced represented and more than 85 % of the total number of data are sinus rhythm group records. The used classification methods served effectively as a record detector of the largest group and the most effective of all was a procedure consisting of a 2D convolutional neural network into which data entered in the form of scalalograms (classification procedure number 3). It achieved results of precision of 91%, recall of 96% and F1-score values of 0.93. On the contrary, when classifying all groups at the same time, there were no such quality results for all groups. The most efficient procedure seems to be a variant composed of PCA on eight input signals with the gain of one output signal, which becomes the input of a 1D convolutional neural network (classification procedure number 5). This procedure achieved the following F1-score values: 1) group of records with atrial fibrillation 0.54, 2) group of sinus rhythms 0.91, 3) group of supraventricular rhythms 0.65, 4) group of ventricular rhythms 0.68, 5) others records 0.65.

APA, Harvard, Vancouver, ISO, and other styles

19

Suta, Adin. "Multilabel text classification of public procurements using deep learning intent detection." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-252558.

Full text

Abstract:

Textual data is one of the most widespread forms of data and the amount of such data available in the world increases at a rapid rate. Text can be understood as either a sequence of characters or words, where the latter approach is the most common. With the breakthroughs within the area of applied artificial intelligence in recent years, more and more tasks are aided by automatic processing of text in various applications. The models introduced in the following sections rely on deep-learning sequence-processing in order to process and text to produce a regression algorithm for classification of what the text input refers to. We investigate and compare the performance of several model architectures along with different hyperparameters. The data set was provided by e-Avrop, a Swedish company which hosts a web platform for posting and bidding of public procurements. It consists of titles and descriptions of Swedish public procurements posted on the website of e-Avrop, along with the respective category/categories of each text. When the texts are described by several categories (multi label case) we suggest a deep learning sequence-processing regression algorithm, where a set of deep learning classifiers are used. Each model uses one of the several labels in the multi label case, along with the text input to produce a set of text - label observation pairs. The goal becomes to investigate whether these classifiers can carry out different levels of intent, an intent which should theoretically be imposed by the different training data sets used by each of the individual deep learning classifiers.
Data i form av text är en av de mest utbredda formerna av data och mängden tillgänglig textdata runt om i världen ökar i snabb takt. Text kan tolkas som en följd av bokstäver eller ord, där tolkning av text i form av ordföljder är absolut vanligast. Genombrott inom artificiell intelligens under de senaste åren har medfört att fler och fler arbetsuppgifter med koppling till text assisteras av automatisk textbearbetning. Modellerna som introduceras i denna uppsats är baserade på djupa artificiella neuronnät med sekventiell bearbetning av textdata, som med hjälp av regression förutspår tillhörande ämnesområde för den inmatade texten. Flera modeller och tillhörande hyperparametrar utreds och jämförs enligt prestanda. Datamängden som använts är tillhandahållet av e-Avrop, ett svenskt företag som erbjuder en webbtjänst för offentliggörande och budgivning av offentliga upphandlingar. Datamängden består av titlar, beskrivningar samt tillhörande ämneskategorier för offentliga upphandlingar inom Sverige, tagna från e-Avrops webtjänst. När texterna är märkta med ett flertal kategorier, föreslås en algoritm baserad på ett djupt artificiellt neuronnät med sekventiell bearbetning, där en mängd klassificeringsmodeller används. Varje sådan modell använder en av de märkta kategorierna tillsammans med den tillhörande texten, som skapar en mängd av text - kategori par. Målet är att utreda huruvida dessa klassificerare kan uppvisa olika former av uppsåt som teoretiskt sett borde vara medfört från de olika datamängderna modellerna mottagit.

APA, Harvard, Vancouver, ISO, and other styles

20

Andersson, Aron, and Shabnam Mirkhani. "Portfolio Performance Optimization Using Multivariate Time Series Volatilities Processed With Deep Layering LSTM Neurons and Markowitz." Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273617.

Full text

Abstract:

The stock market is a non-linear field, but many of the best-known portfolio optimization algorithms are based on linear models. In recent years, the rapid development of machine learning has produced flexible models capable of complex pattern recognition. In this paper, we propose two different methods of portfolio optimization; one based on the development of a multivariate time-dependent neural network,thelongshort-termmemory(LSTM),capable of finding lon gshort-term price trends. The other is the linear Markowitz model, where we add an exponential moving average to the input price data to capture underlying trends. The input data to our neural network are daily prices, volumes and market indicators such as the volatility index (VIX).The output variables are the prices predicted for each asset the following day, which are then further processed to produce metrics such as expected returns, volatilities and prediction error to design a portfolio allocation that optimizes a custom utility function like the Sharpe Ratio. The LSTM model produced a portfolio with a return and risk that was close to the actual market conditions for the date in question, but with a high error value, indicating that our LSTM model is insufficient as a sole forecasting tool. However,the ability to predict upward and downward trends was somewhat better than expected and therefore we conclude that multiple neural network can be used as indicators, each responsible for some specific aspect of what is to be analysed, to draw a conclusion from the result. The findings also suggest that the input data should be more thoroughly considered, as the prediction accuracy is enhanced by the choice of variables and the external information used for training.
Aktiemarknaden är en icke-linjär marknad, men många av de mest kända portföljoptimerings algoritmerna är baserad på linjära modeller. Under de senaste åren har den snabba utvecklingen inom maskininlärning skapat flexibla modeller som kan extrahera information ur komplexa mönster. I det här examensarbetet föreslår vi två sätt att optimera en portfölj, ett där ett neuralt nätverk utvecklas med avseende på multivariata tidsserier och ett annat där vi använder den linjära Markowitz modellen, där vi även lägger ett exponentiellt rörligt medelvärde på prisdatan. Ingångsdatan till vårt neurala nätverk är de dagliga slutpriserna, volymerna och marknadsindikatorer som t.ex. volatilitetsindexet VIX. Utgångsvariablerna kommer vara de predikterade priserna för nästa dag, som sedan bearbetas ytterligare för att producera mätvärden såsom förväntad avkastning, volatilitet och Sharpe ratio. LSTM-modellen producerar en portfölj med avkastning och risk som ligger närmre de verkliga marknadsförhållandena, men däremot gav resultatet ett högt felvärde och det visar att vår LSTM-modell är otillräckligt för att använda som ensamt predikteringssverktyg. Med det sagt så gav det ändå en bättre prediktion när det gäller trender än vad vi antog den skulle göra. Vår slutsats är därför att man bör använda flera neurala nätverk som indikatorer, där var och en är ansvarig för någon specifikt aspekt man vill analysera, och baserat på dessa dra en slutsats. Vårt resultat tyder också på att inmatningsdatan bör övervägas mera noggrant, eftersom predikteringsnoggrannheten.

APA, Harvard, Vancouver, ISO, and other styles

21

He, Fan. "Real-time Process Modelling Based on Big Data Stream Learning." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-35823.

Full text

Abstract:

Most control systems now are assumed to be unchangeable, but this is an ideal situation. In real applications, they are often accompanied with many changes. Some of changes are from environment changes, and some are system requirements. So, the goal of thesis is to model a dynamic adaptive real-time control system process with big data stream. In this way, control system model can adjust itself using example measurements acquired during the operation and give suggestion to next arrival input, which also indicates the accuracy of states under control highly depends on quality of the process model. In this thesis, we choose recurrent neural network to model process because it is a kind of cheap and fast artificial intelligence. In most of existent artificial intelligence, a database is necessity and the bigger the database is, the more accurate result can be. For example, in case-based reasoning, testcase should be compared with all of cases in database, then take the closer one’s result as reference. However, in neural network, it does not need any big database to support and search, and only needs simple calculation instead, because information is all stored in each connection. All small units called neuron are linear combination, but a neural network made up of neurons can perform some complex and non-linear functionalities. For training part, Backpropagation and Kalman filter are used together. Backpropagation is a widely-used and stable optimization algorithm. Kalman filter is new to gradient-based optimization, but it has been proved to converge faster than other traditional first-order-gradient-based algorithms. Several experiments were prepared to compare new and existent algorithms under various circumstances. The first set of experiments are static systems and they are only used to investigate convergence rate and accuracy of different algorithms. The second set of experiments are time-varying systems and the purpose is to take one more attribute, adaptivity, into consideration.

APA, Harvard, Vancouver, ISO, and other styles

22

Prencipe, Michele Pio. "Elaborazione del Linguaggio Naturale con Metodi Probabilistici e Reti Neurali." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24312/.

Full text

Abstract:

L'elaborazione del linguaggio naturale (NLP) è il processo per il quale la macchina tenta di imparare le informazioni del parlato o dello scritto tipico dell'essere umano. La procedura è resa particolarmente complessa dalle numerose ambiguità tipiche della lingua o del testo: ironia, metafore, errori ortografici e così via. Grazie all'apprendimento profondo, il Deep Learning, che ha permesso lo sviluppo delle reti neurali, si è raggiunto lo stato dell'arte nell'ambito NLP, tramite l'introduzione di architetture quali Encoder-Decoder, Transformers o meccanismi di attenzione. Le reti neurali, in particolare quelle con memoria o ricorrenti, si prestano molto bene ai task di NLP, per via della loro capacità di apprendere da una grande mole di dati a disposizione, ma anche perché riescono a concentrarsi particolarmente bene sul contesto di ciascuna parola in input o sulla sentiment analysis di una frase. In questo elaborato vengono analizzate le principali tecniche per fare apprendere il linguaggio naturale al calcolatore elettronico; il tutto viene descritto con esempi e parti di codice Python. Per avere una visione completa sull'ambito, si prende come riferimento il libro di testo "Hands-On Machine Learning with Scikit-Learn, Keras and Tensorflow" di Aurélien Géron, oltre che alla bibliografia correlata.

APA, Harvard, Vancouver, ISO, and other styles

23

Cîrstea, Bogdan-Ionut. "Contribution à la reconnaissance de l'écriture manuscrite en utilisant des réseaux de neurones profonds et le calcul quantique." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0059.

Full text

Abstract:

Dans cette thèse, nous fournissons plusieurs contributions des domaines de l’apprentissage profond et du calcul quantique à la reconnaissance de l’écriture manuscrite. Nous commençons par intégrer certaines des techniques d’apprentissage profond les plus récentes(comme dropout, batch normalization et différentes fonctions d’activation) dans les réseaux de neurones à convolution et obtenons des meilleures performances sur le fameux jeu de données MNIST. Nous proposons ensuite des réseaux TSTN (Tied Spatial Transformer Networks), une variante des réseaux STN (Spatial Transformer Networks) avec poids partagés, ainsi que différentes variantes d’entraînement du TSTN. Nous présentons des performances améliorées sur une variante déformée du jeu de données MNIST. Dans un autre travail, nous comparons les performances des réseaux récurrents de neurones Associative Long Short-Term Memory (ALSTM), une architecture récemment introduite, par rapport aux réseaux récurrents de neurones Long Short-Term Memory (LSTM), sur le jeu de données de reconnaissance d’écriture arabe IFN-ENIT. Enfin, nous proposons une architecture de réseau de neurones que nous appelons réseau hybride classique-quantique, capable d’intégrer et de tirer parti de l’informatique quantique. Alors que nos simulations sont effectuées à l’aide du calcul classique (sur GPU), nos résultats sur le jeu de données Fashion-MNIST suggèrent que des améliorations exponentielles en complexité computationnelle pourraient être réalisables, en particulier pour les réseaux de neurones récurrents utilisés pour la classification de séquence
In this thesis, we provide several contributions from the fields of deep learning and quantum computation to handwriting recognition. We begin by integrating some of the more recent deep learning techniques (such as dropout, batch normalization and different activation functions) into convolutional neural networks and show improved performance on the well-known MNIST dataset. We then propose Tied Spatial Transformer Networks (TSTNs), a variant of Spatial Transformer Networks (STNs) with shared weights, as well as different training variants of the TSTN. We show improved performance on a distorted variant of the MNIST dataset. In another work, we compare the performance of Associative Long Short-Term Memory (ALSTM), a recently introduced recurrent neural network (RNN) architecture, against Long Short-Term Memory (LSTM), on the Arabic handwriting recognition IFN-ENIT dataset. Finally, we propose a neural network architecture, which we name a hybrid classical-quantum neural network, which can integrate and take advantage of quantum computing. While our simulations are performed using classical computation (on a GPU), our results on the Fashion-MNIST dataset suggest that exponential improvements in computational requirements might be achievable, especially for recurrent neural networks trained for sequence classification

APA, Harvard, Vancouver, ISO, and other styles

24

Nilsson, Mathias, and Corswant Sophie von. "How Certain Are You of Getting a Parking Space? : A deep learning approach to parking availability prediction." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166989.

Full text

Abstract:

Traffic congestion is a severe problem in urban areas and it leads to the emission of greenhouse gases and air pollution. In general, drivers lack knowledge of the location and availability of free parking spaces in urban cities. This leads to people driving around searching for parking places, and about one-third of traffic congestion in cities is due to drivers searching for an available parking lot. In recent years, various solutions to provide parking information ahead have been proposed. The vast majority of these solutions have been applied in large cities, such as Beijing and San Francisco. This thesis has been conducted in collaboration with Knowit and Dukaten to predict parking occupancy in car parks one hour ahead in the relatively small city of Linköping. To make the predictions, this study has investigated the possibility to use long short-term memory and gradient boosting regression trees, trained on historical parking data. To enhance decision making, the predictive uncertainty was estimated using the novel approach Monte Carlo dropout for the former, and quantile regression for the latter. This study reveals that both of the models can predict parking occupancy ahead of time and they are found to excel in different contexts. The inclusion of exogenous features can improve prediction quality. More specifically, we found that incorporating hour of the day improved the models’ performances, while weather features did not contribute much. As for uncertainty, the employed method Monte Carlo dropout was shown to be sensitive to parameter tuning to obtain good uncertainty estimates.

APA, Harvard, Vancouver, ISO, and other styles

25

Keisala, Simon. "Using a Character-Based Language Model for Caption Generation." Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-163001.

Full text

Abstract:

Using AI to automatically describe images is a challenging task. The aim of this study has been to compare the use of character-based language models with one of the current state-of-the-art token-based language models, im2txt, to generate image captions, with focus on morphological correctness. Previous work has shown that character-based language models are able to outperform token-based language models in morphologically rich languages. Other studies show that simple multi-layered LSTM-blocks are able to learn to replicate the syntax of its training data. To study the usability of character-based language models an alternative model based on TensorFlow im2txt has been created. The model changes the token-generation architecture into handling character-sized tokens instead of word-sized tokens. The results suggest that a character-based language model could outperform the current token-based language models, although due to time and computing power constraints this study fails to draw a clear conclusion. A problem with one of the methods, subsampling, is discussed. When using the original method on character-sized tokens this method removes characters (including special characters) instead of full words. To solve this issue, a two-phase approach is suggested, where training data first is separated into word-sized tokens where subsampling is performed. The remaining tokens are then separated into character-sized tokens. Future work where the modified subsampling and fine-tuning of the hyperparameters are performed is suggested to gain a clearer conclusion of the performance of character-based language models.

APA, Harvard, Vancouver, ISO, and other styles

26

Чапалюк, Богдан Володимирович. "Системи автоматичної медичної комп’ютерної дiагностики з використанням методiв штучного iнтелекту." Doctoral thesis, Київ, 2020. https://ela.kpi.ua/handle/123456789/39677.

Full text

Abstract:

Мета даного дисертацiйного дослiдження полягає в детальному розглядi, розробцi та удосконаленнi систем автоматичної комп’ютерної дiагностики раку легень використовуючи методи штучного iнтелекту, зокрема застосовуючи та удосконалюючи останнi досягнення в областi глибинного навчання. Для дiагностування раку легенiв в сучасних медичних закладах використовують комп’ютерну томографiю, що представляє собою тривимiрне зображення легенiв пацiєнта, отримане за допомогою рентгенiвського променю, що пошарово та поступово проходить через тканини людського тiла в рiзних напрямках, з рiзних кутiв та положень. Такий вид зображень використовується в роботi для аналiзу присутностi пухлини в легенях за допомогою згорткових нейронних мереж. Однак, такi особливi данi накладають свої складностi в розробцi систем медичного комп’ютерного дiагностування, оскiльки при роботi з ними необхiдно враховувати їхню тривимiрну природу та вiдповiднi просторовi зв’язки. Тому, в дисертацiйному дослiдженнi розглядається три основнi пiдходи для роботи з такими даними: 1. Використання двовимiрної згорткової нейронної мережi. Для кожного шару КТ знiмка застосовується згорткова нейронна мережа. Виходи мережi для кожного шару знiмку об’єднуються та фiнальний висновок робиться на основi правил навчання за набором зразкiв. 2. Використання тривимiрних згорткових нейронних мереж, якi враховують тривимiрну природу вхiдних даних та можуть вiднайти кориснi патерни використовуючи всi три просторовi вiсi. Часто, такi системи роздiляють задачу на декiлька етапiв, кожен з яких використовує тривимiрну згорткову нейронну мережу налаштовану пiд конкретну пiдзадачу. 3. Використання комбiнованої структури двовимiрної згорткової та рекурентної нейронних мереж. В такому пiдходi двовимiрну згорткову нейронну мережу використовують для представлення вхiдних даних в менш мiрному просторi шляхом навчання многовиду меншої розмiрностi. Завдяки цьому на кожному шарi КТ зображення будуть видiлятися тiльки найбiльш важливi високорiвневi ознаки. Отриманi ознаки обробляються двонаправленою рекурентною нейронною мережею з вентильним вузлом (англ. bidirectional gated recurrent neural network), яка навчається складним нелiнiйним функцiям, що описують просторовi залежностi та вплив мiж ними. Вихiд рекурентної мережi повертає ймовiрнiсть наявностi пухлини на знiмку. В рамках даного дисертацiйного дослiдження проводиться аналiз та виконується експерименти для кожного пiдходу, а отриманi результати порiвнюються з роботами iнших авторiв. Експерименти показують, що найбiльш точними є системи побудованi iз декiлькох тривимiрних згорткових нейронних мереж (одна мережа сегментує потенцiйнi проблемнi регiони, iнша класифiкує присутнiсть в таких регiонах пухлини). Однак, такi системи мають дуже великi обчислювальнi вимоги, через те що використовують операцiю тривимiрної згортки, вимоги до обчислювальної потужностi якої ростуть кубiчно зi збiльшенням розмiрностi вхiдного зображення. В такому випадку, запропонована архiтектура рекурентної згорткової нейронної мережi дозволяє отримати точнiсть роботи системи на достатньо високому рiвнi, в той же час використовуючи значно менш вимогливу до обчислювальних потужностей та пам’ятi операцiю двовимiрної згортки. Наукова новизна отриманих результатiв дисертацiї полягає в запропонованому здобувачем методi побудови комбiнованої структури системи комп’ютерної дiагностики, що полягає в поєднаннi двовимiрної згорткової та двонаправленої рекурентної нейронної мережi LSTM. На вiдмiну вiд iнших рiшень, така система враховує просторовi зв’язки мiж рiзними шарами знiмку комп’ютерної томографiї шляхом використання двонаправленої рекурентної нейронної мережi, на входi якої використовують високорiвневi ознаки сформованi за допомогою двовимiрної згорткової нейронної мережi. Високорiвневi ознаки будуються для кожного шару знiмку пацiєнта. За результатами експериментiв така архiтектура нейронної мережi змогла досягти значення AUC ROC на рiвнi 83%, що трохи нижче у порiвнянi з системами тривимiрних згорткових нейронних мереж, що показують значення AUC ROC на рiвнi 90-95%. Однак, отриманi результати є найвищими результатами для рекурентних нейронних мереж, що застосовуються для побудови систем комп’ютерної дiагностики раку легенiв. Також, запропонована архiтектура має вищу швидкодiю, що досягається шляхом використання операцiї двовимiрної згортки замiсть операцiї тривимiрної згортки, вимоги якої до обчислювальної потужностi та пам’ятi ростуть квадратично з розмiром вхiдних даних, а не кубiчно. Для ефективного навчання комбiнованої структури згорткової рекурентної нейронної мережi був запропонований механiзм м’якої уваги, що надав можливiсть нейроннiй мережi отримати iнформацiю про локацiю пухлини пiд час навчання. Згiдно проведених експериментiв, такий пiдхiд допомiг покращити показники метрики AUC ROC бiльш нiж на 8%. Практичне значення отриманих результатiв полягає в розширенi та удосконаленi iснуючих методiв побудови систем комп’ютерної дiагностики. Запропонована комбiнована структура згорткової нейронної мережi та двонаправленої рекурентної мережi дозволяє отримати достатньо високу точнiсть роботи системи та пiдвищує точнiсть роботи системи у порiвнянi з використанням звичайних рекурентних нейронних мереж. Також, така система вiдзначається використанням меншої кiлькостi ресурсiв чим у тривимiрної згорткової нейронної мережi. Проведенi експерименти та аналiз iснуючих методiв систем комп’ютерної дiагностики дозволив сформулювати необхiднi вимоги та пiдходи, якi потрiбно використовувати в залежностi вiд прiоритету швидкодiї чи точностi роботи системи. Запропонований механiзм м’якої уваги дозволяє значно пiдвищити ефективнiсть навчання комбiнованих архiтектур згорткових рекурентних нейронних мереж. Результати дисертацiйного дослiдження впроваджено в НДР за темою “Розроблення та дослiдження методiв обробки, розпiзнавання, захисту та зберiгання медичних зображень в розподiлених комп’ютерних системах” за номером держ реєстрацiї 0117U004267 (тема №2021п, код КВНТД I.1 01.05.02). Також, основнi результати роботи викладенi в 6 друкованих наукових роботах, з них двоє статей в наукових фахових виданнях України, 2 опублiковано в iноземних журналах, що iндексується в Googel Scholar та iнших базах даних, 1-а стаття у виданнi, що входить до Web of Science Core Collection та SCOPUS. Також опублiковано одну роботу в тезах доповiдей мiжнародної наукової конференцiї.

APA, Harvard, Vancouver, ISO, and other styles

27

Lu, Tsai-Wei, and 盧采威. "Tikhonov regularization for deep recurrent neural network acoustic modeling." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/70636533678066549649.

Full text

Abstract:

碩士
國立交通大學
電信工程研究所
102
Deep learning has been widely demonstrated to achieve high performance in many classification tasks. Deep neural network is now a new trend in the areas of automatic speech recognition. In this dissertation, we deal with the issue of model regularization in deep recurrent neural network and develop the deep acoustic models for speech recognition in noisy environments. Our idea is to compensate the variations of input speech data in the restricted Boltzmann machine (RBM) which is applied as a pre-training stage for feature learning and acoustic modeling. We implement the Tikhonov regularization in pre-training procedure and build the invariance properties in acoustic neural network model. The regularization based on weight decay is further combined with Tikhonov regularization to increase the mixing rate of the alternating Gibbs Markov chain so that the contrastive divergence training tends to approximate the maximum likelihood learning. In addition, the backpropagation through time (BPTT) algorithm is developed in modified truncated minibatch training for recurrent neural network. This algorithm is not implemented in the recurrent weights but also in the weights between previous layer and recurrent layer. In the experiments, we carry out the proposed methods using the open-source Kaldi toolkit. The experimental results using the speech corpora of Resource Management (RM) and Aurora4 show that the ideas of hybrid regularization and BPTT training do improve the performance of deep neural network acoustic model for robust speech recognition.

APA, Harvard, Vancouver, ISO, and other styles

28

Yu, Kuo, and 俞果. "Complex-Valued Deep Recurrent Neural Network for Singing Voice Separation." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/4waab5.

Full text

Abstract:

碩士
國立中央大學
資訊工程學系
105
Deep neural networks (DNN) have performed impressively in the processing of multimedia signals. Most DNN-based approaches were developed to handle real-valued data; very few have been designed for complex-valued data, despite their being essential for processing various types of multimedia signal. Accordingly, this work presents a complex-valued deep recurrent neural network (C-DRNN) for singing voice separation. The C-DRNN operates on the complex-valued short-time discrete Fourier transform (STFT) domain. A key aspect of the C-DRNN is that the activations and weights are complex-valued. The goal herein is to reconstruct the singing voice and the background music from a mixed signal. For error back-propagation, CR-calculus is utilized to calculate the complex-valued gradients of the objective function. To reinforce model regularity, two constraints are incorporated into the cost function of the C-DRNN. The first is an additional masking layer that ensures the sum of separated sources equals the input mixture. The second is a discriminative term that preserves the mutual difference between two separated sources. Finally, the proposed method is evaluated using the MIR-1K dataset and a singing voice separation task. Experimental results demonstrate that the proposed method outperforms the state-of-the-art DNN-based methods.

APA, Harvard, Vancouver, ISO, and other styles

29

CHUANG, YI-TING, and 莊宜庭. "A PM2.5 Prediction Model Based on Deep Learning with Recurrent Neural Network." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/sfu623.

Full text

Abstract:

碩士
東海大學
資訊管理學系
107
In recent years, many studies have verified that air pollution will seriously affect human health. In addition, the media reported many issues concerning air pollution, so people have begun to pay attention to its existence. This study analyzes the data of the Environmental Protection Administration air quality immediate pollution indicators in 2018. Five methods are used to deal with the missing values. The main correlation variables affecting the PM25 concentration are identified by principal component analysis and correlation coefficients (single factor: PM10, SO2, NOX, NO2, CO, two-factor: NOX+NO2+CO, SO2+PM10), and the Long-Short Term Memory Model (LSTM) of the Recurrent Neural Network (RNN) was used to model the PM25 concentration model for the next 8 hours. According to the research results, most of the errors between the predicted and true values of Fengyuan Station fall within the reasonable range of MAPE (0.2~0.5). In addition, the best way to deal with the missing value is linear interpolation.

APA, Harvard, Vancouver, ISO, and other styles

30

Lopes, Ana Patrícia Ribeiro. "Study of Deep Neural Network architectures for medical image segmentation." Master's thesis, 2020. http://hdl.handle.net/1822/69850.

Full text

Abstract:

Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Eletrónica Médica)
Medical image segmentation plays a crucial role in the medical field, since it allows performing quantitative analyses used for screening, monitoring and planning the treatment of numerous pathologies. Manual segmentation is time-consuming and prone to inter-rater variability. Thus, several automatic approaches have been proposed for medical image segmentation and most are based on Deep Learning. These approaches became specially relevant after the development of the Fully Convolutional Network. In this method, the fully-connected layers were eliminated and upsampling layers were incorporated, allowing one image to be segmented at once. Nowadays, the developed architectures are based on the FCN, being U-Net one of the most popular. The aim of this dissertation is to study Deep Learning architectures for medical image segmentation. Two challenging and very distinct tasks were selected, namely, retinal vessel segmentation from retinal fundus images and brain tumor segmentation from MRI images. The architectures studied in this work are based on the U-Net, due to high performances obtained in multiple medical segmentation tasks. The models developed for retinal vessel and brain tumor segmentation were tested in publicly available databases, DRIVE and BRATS 2017, respectively. Several studies were performed for the first segmentation task, namely, comparison of downsampling operations, replacement of a downsampling step with dilated convolutions, incorporation of a RNN-based layer and application of test time data augmentation techniques. In the second segmentation task, three modifications were evaluated, specifically, the incorporation of long skip connections, the substitution of standard convolutions with dilated convolutions and the replacement of a downsampling step with dilated convolutions. Regarding retinal vessel segmentation, the final approach achieved accuracy, sensitivity and AUC of 0.9575, 0.7938 and 0.9804, respectively. This approach consists on a U-Net, containing one strided convolution as downsampling step and dilated convolutions with dilation rate of 3, followed by a test time data augmentation technique, performed by a ConvLSTM. Regarding brain tumor segmentation, the proposed approach achieved Dice of 0.8944, 0.8051 and 0.7353 and HD95 of 6.79, 8.34 and 4.76 for complete, core and enhanced regions, respectively. The final method consists on a DLA architecture with a long skip connection and dilated convolutions with dilation rate of 2. For both tasks, the proposed approach is competitive with state-of-the-art methods.
A segmentação de imagens médicas desempenha um papel fundamental na área médica, pois permite realizar análises quantitativas usadas no rastreio, monitorização e planeamento do tratamento de inúmeras patologias. A segmentação manual é demorada e varia consoante o técnico. Assim, diversas abordagens automáticas têm sido propostas para a segmentação de imagens médicas e a maioria é baseada em Deep Learning. Estas abordagens tornaram-se especialmente relevantes após o desenvolvimento da Fully Convolutional Network. Neste método, as camadas totalmente ligadas foram eliminadas e foram incorporadas camadas de upsampling, permitindo que uma imagem seja segmentada de uma só vez. Atualmente, as arquiteturas desenvolvidas baseiam-se na FCN, sendo a U-Net uma das mais populares. O objetivo desta dissertação é estudar arquiteturas de Deep Learning para a segmentação de imagens médicas. Foram selecionadas duas tarefas desafiantes e muito distintas, a segmentação de vasos retinianos a partir de imagens do fundo da retina e a segmentação de tumores cerebrais a partir de imagens de MRI. As arquiteturas estudadas neste trabalho são baseadas na U-Net, devido às elevadas performances que esta obteve em diversas tarefas de segmentação médica. Os modelos desenvolvidos para segmentação de vasos retinianos e de tumores cerebrais foram testados em bases de dados públicas, DRIVE and BRATS 2017, respetivamente. Vários estudos foram realizados para a primeira tarefa, nomeadamente, comparação de operações de downsampling, substituição de uma camada de downsampling por convoluções dilatadas, incorporação de uma camada composta por RNNs e aplicação de técnicas de aumento de dados na fase de teste. Na segunda tarefa, três modificações foram avaliadas, a incorporação de long skip connections, a substituição de convoluções standard por convoluções dilatadas e a substituição de uma camada de downsampling por convoluções dilatadas. Quanto à segmentação de vasos retinianos, a abordagem final obteve accuracy, sensibilidade e AUC de 0.9575, 0.7938 e 0.9804, respetivamente. Esta abordagem consiste numa U-Net, que contém uma convolução strided como operação de downsampling e convoluções dilatadas com dilation rate de 3, seguida de uma técnica de aumento de dados em fase de teste, executada por uma ConvLSTM. Em relação à segmentação de tumores cerebrais, a bordagem proposta obteve Dice de 0.8944, 0.8051 e 0.7353 e HD95 de 6.79, 8.34 e 4.76 para o tumor completo, região central e região contrastante, respetivamente. O método final consiste numa arquitetura DLA com uma long skip connection e convoluções dilatadas com dilation rate de 2. As duas abordagens são competitivas com os métodos do estado da arte.

APA, Harvard, Vancouver, ISO, and other styles

31

Hong, Zih-Siang, and 洪梓翔. "Using Ensemble Learning and Deep Recurrent Neural Network to Construct an Internet Forum Conversation Prediction Model." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/s67dep.

Full text

Abstract:

碩士
中原大學
資訊管理研究所
106
The study on natural language dialogue or conversation involves language understanding, reasoning, and basic common sense, therefore it is one of the most challenging artificial intelligence issues. To design a common and general conversation model is even more complicated and difficult. In the past, the studies on natural language processing and dialogue mainly focused on the rule-based and machine learning-based methods. Although these methods can solve part of the dialogue problems in the specific fields, but they have their own learning bottlenecks. Until recurrent neural networks (RNN) and sequence to sequence model is proposed, the research in this field has been further breakthrough. However, although deep learning can automatically extract the features of a large number of dialogue data, it has high requirements on the quantity and quality of data sets, and has the overfitting problem. Therefore, how to extract the useful features from the limited training dataset, and achieve model generalization ability in different situations, is the challenge of deep learning in the natural language dialogue problem. This project is titled “Conversation Model using Deep Recurrent Neural Networks with Ensemble Learning”. The advantage of the ensemble learning is that it enhances the generalization ability of the model to reinforce the prediction, and make the model suitable for the prediction of various contexts and scenarios. In this study, ensemble learning will be applied to the natural language dialogue and conversation model in various and complex contexts and scenarios. This method is a deep neural network conversation model, using the ensemble learning method to train the sub-prediction model of multiple different types, different parameters, and different training data sets. Then to obtain the prediction results by the specific designed ensemble strategy. Through a number of sub-models jointly predicted and judged to get a generalized conversation prediction model.

APA, Harvard, Vancouver, ISO, and other styles

32

Ting, Tzu-hsuan, and 丁子軒. "Combining Deep De-noising Auto-encoder and Recurrent Neural Network in End-to-end Speech Recognition for Noise Robustness." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/nrcpz2.

Full text

Abstract:

碩士
國立中山大學
資訊工程學系研究所
106
In this paper, we implement an end-to-end noise-robustness speech recognition system on Aurora 2.0 dataset through combining deep de-noising auto-encoders and recurrent neural networks. At front-end we use fully connected auto-encoder (FCDAE) to deal with noisy data. We propose two efficient methods to improve denoising performance when training FCDAE. The first method is to plus different weights for the loss value of distinct signal-to-noise ratio data. The second method is change the way of use on training data. Finally, we combine the two methods and get the best experimental results. For the back-end speech recognition, we use an end-to-end system based on bidirectional recurrent neural network which is trained via connectionist temporal classification criterion, and compared to a baseline backend based on hidden Markov models and Gaussian mixture models (HMM-GMM). With integrating FCDAE and recognition models, we get 94.20% word accuracy rate in clean condition, and 94.24% word accuracy rate in multi condition. The two results have a relative improvement rate of 65% and 20% compared with the baseline experiments, of which 94.20% is obtained by the FCDAE and HMM-GMM, and 94.24% is obtained by combining the FCDAE and bidirectional recurrent neural network.

APA, Harvard, Vancouver, ISO, and other styles

33

Sha, Hao. "Solving Prediction Problems from Temporal Event Data on Networks." Thesis, 2021. http://dx.doi.org/10.7912/C2/46.

Full text

Abstract:

Indiana University-Purdue University Indianapolis (IUPUI)
Many complex processes can be viewed as sequential events on a network. In this thesis, we study the interplay between a network and the event sequences on it. We first focus on predicting events on a known network. Examples of such include: modeling retweet cascades, forecasting earthquakes, and tracing the source of a pandemic. In specific, given the network structure, we solve two types of problems - (1) forecasting future events based on the historical events, and (2) identifying the initial event(s) based on some later observations of the dynamics. The inverse problem of inferring the unknown network topology or links, based on the events, is also of great important. Examples along this line include: constructing influence networks among Twitter users from their tweets, soliciting new members to join an event based on their participation history, and recommending positions for job seekers according to their work experience. Following this direction, we study two types of problems - (1) recovering influence networks, and (2) predicting links between a node and a group of nodes, from event sequences.

APA, Harvard, Vancouver, ISO, and other styles

34

Gao, Jun. "Omni SCADA intrusion detection." Thesis, 2020. http://hdl.handle.net/1828/11745.

Full text

Abstract:

We investigate deep learning based omni intrusion detection system (IDS) for supervisory control and data acquisition (SCADA) networks that are capable of detecting both temporally uncorrelated and correlated attacks. Regarding the IDSs developed in this paper, a feedforward neural network (FNN) can detect temporally uncorrelated attacks at an F1 of 99.967±0.005% but correlated attacks as low as 58±2%. In contrast, long-short term memory (LSTM) detects correlated attacks at 99.56±0.01% while uncorrelated attacks at 99.3±0.1%. Combining LSTM and FNN through an ensemble approach further improves the IDS performance with F1 of 99.68±0.04% regardless the temporal correlations among the data packets.
Graduate

APA, Harvard, Vancouver, ISO, and other styles

35

Goyette, Kyle. "On two sequential problems : the load planning and sequencing problem and the non-normal recurrent neural network." Thesis, 2020. http://hdl.handle.net/1866/24314.

Full text

Abstract:

The work in this thesis is separated into two parts. The first part deals with the load planning and sequencing problem for double-stack intermodal railcars, an operational problem found at many rail container terminals. In this problem, containers must be assigned to a platform on which the container will be loaded, and the loading order must be determined. These decisions are made with the objective of minimizing the costs associated with handling the containers, as well as minimizing the cost of containers left behind. The deterministic version of the problem can be cast as a shortest path problem on an ordered graph. This problem is challenging to solve because of the large size of the graph. We propose a two-stage heuristic based on the Iterative Deepening A* algorithm to compute solutions to the load planning and sequencing problem within a five-minute time budget. Next, we also illustrate how a Deep Q-learning algorithm can be used to heuristically solve the same problem.The second part of this thesis considers sequential models in deep learning. A recent strategy to circumvent the exploding and vanishing gradient problem in recurrent neural networks (RNNs) is to enforce recurrent weight matrices to be orthogonal or unitary. While this ensures stable dynamics during training, it comes at the cost of reduced expressivity due to the limited variety of orthogonal transformations. We propose a parameterization of RNNs, based on the Schur decomposition, that mitigates the exploding and vanishing gradient problem, while allowing for non-orthogonal recurrent weight matrices in the model.
Le travail de cette thèse est divisé en deux parties. La première partie traite du problème de planification et de séquencement des chargements de conteneurs sur des wagons, un problème opérationnel rencontré dans de nombreux terminaux ferroviaires intermodaux. Dans ce problème, les conteneurs doivent être affectés à une plate-forme sur laquelle un ou deux conteneurs seront chargés et l'ordre de chargement doit être déterminé. Ces décisions sont prises dans le but de minimiser les coûts associés à la manutention des conteneurs, ainsi que de minimiser le coût des conteneurs non chargés. La version déterministe du problème peut être formulé comme un problème de plus court chemin sur un graphe ordonné. Ce problème est difficile à résoudre en raison de la grande taille du graphe. Nous proposons une heuristique en deux étapes basée sur l'algorithme Iterative Deepening A* pour calculer des solutions au problème de planification et de séquencement de la charge dans un budget de cinq minutes. Ensuite, nous illustrons également comment un algorithme d'apprentissage Deep Q peut être utilisé pour résoudre heuristiquement le même problème. La deuxième partie de cette thèse examine les modèles séquentiels en apprentissage profond. Une stratégie récente pour contourner le problème de gradient qui explose et disparaît dans les réseaux de neurones récurrents (RNN) consiste à imposer des matrices de poids récurrentes orthogonales ou unitaires. Bien que cela assure une dynamique stable pendant l'entraînement, cela se fait au prix d'une expressivité réduite en raison de la variété limitée des transformations orthogonales. Nous proposons une paramétrisation des RNN, basée sur la décomposition de Schur, qui atténue les problèmes de gradient, tout en permettant des matrices de poids récurrentes non orthogonales dans le modèle.

APA, Harvard, Vancouver, ISO, and other styles

36

"Sequencing Behavior in an Intelligent Pro-active Co-Driver System." Doctoral diss., 2020. http://hdl.handle.net/2286/R.I.57049.

Full text

Abstract:

abstract: Driving is the coordinated operation of mind and body for movement of a vehicle, such as a car, or a bus. Driving, being considered an everyday activity for many people, still has an issue of safety. Driver distraction is becoming a critical safety problem. Speed, drunk driving as well as distracted driving are the three leading factors in the fatal car crashes. Distraction, which is defined as an excessive workload and limited attention, is the main paradigm that guides this research area. Driver behavior analysis can be used to address the distraction problem and provide an intelligent adaptive agent to work closely with the driver, fay beyond traditional algorithmic computational models. A variety of machine learning approaches has been proposed to estimate or predict drivers’ fatigue level using car data, driver status or a combination of them. Three important features of intelligence and cognition are perception, attention and sensory memory. In this thesis, I focused on memory and attention as essential parts of highly intelligent systems. Without memory, systems will only show limited intelligence since their response would be exclusively based on spontaneous decision without considering the effect of previous events. I proposed a memory-based sequence to predict the driver behavior and distraction level using neural network. The work started with a large-scale experiment to collect data and make an artificial intelligence-friendly dataset. After that, the data was used to train a deep neural network to estimate the driver behavior. With a focus on memory by using Long Short Term Memory (LSTM) network to increase the level of intelligence in two dimensions: Forgiveness of minor glitches, and accumulation of anomalous behavior., I reduced the model error and computational expense by adding attention mechanism on the top of LSTM models. This system can be generalized to build and train highly intelligent agents in other domains.
Dissertation/Thesis
Doctoral Dissertation Computer Engineering 2020

APA, Harvard, Vancouver, ISO, and other styles

37

(11048391), Hao Sha. "SOLVING PREDICTION PROBLEMS FROM TEMPORAL EVENT DATA ON NETWORKS." Thesis, 2021.

Find full text

Abstract:

Many complex processes can be viewed as sequential events on a network. In this thesis, we study the interplay between a network and the event sequences on it. We first focus on predicting events on a known network. Examples of such include: modeling retweet cascades, forecasting earthquakes, and tracing the source of a pandemic. In specific, given the network structure, we solve two types of problems - (1) forecasting future events based on the historical events, and (2) identifying the initial event(s) based on some later observations of the dynamics. The inverse problem of inferring the unknown network topology or links, based on the events, is also of great important. Examples along this line include: constructing influence networks among Twitter users from their tweets, soliciting new members to join an event based on their participation history, and recommending positions for job seekers according to their work experience. Following this direction, we study two types of problems - (1) recovering influence networks, and (2) predicting links between a node and a group of nodes, from event sequences.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Deep Recurrent Neural Network (DRNN)'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles