Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: CNN AND LSTM NETWORKS.

Rozprawy doktorskie na temat „CNN AND LSTM NETWORKS”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „CNN AND LSTM NETWORKS”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Graffi, Giacomo. "A novel approach for Credit Scoring using Deep Neural Networks with bank transaction data". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Znajdź pełny tekst źródła
Streszczenie:
With the PSD2 open banking revolution FinTechs obtained a key role in the financial industry. This role implies the inquiry and development of new techniques, products and solutions to compete with other players in this area. The aim of this thesis is to investigate the applicability of the state-of-the-art Deep Learning techniques for Credit Risk Modeling. In order to accomplish it, a PSD2-related synthetic and anonymized dataset has been used to simulate an application process with only one account per user. Firstly, a machine-readable representation of the bank accounts has been created, starting from the raw transactions’ data and scaling the variables using the quantile function. Afterwards, a Deep Neural Network has been created in order to capture the complex relations between the input variables and to extract information from the accounts’ representations. The proposed architecture accomplished the assigned tasks with a Gini index of 0.55, exploiting a Convolutional encoder to extract features from the inputs and a Recurrent decoder to analyze them.
Style APA, Harvard, Vancouver, ISO itp.
2

Holm, Noah, i Emil Plynning. "Spatio-temporal prediction of residential burglaries using convolutional LSTM neural networks". Thesis, KTH, Geoinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229952.

Pełny tekst źródła
Streszczenie:
The low amount solved residential burglary crimes calls for new and innovative methods in the prevention and investigation of the cases. There were 22 600 reported residential burglaries in Sweden 2017 but only four to five percent of these will ever be solved. There are many initiatives in both Sweden and abroad for decreasing the amount of occurring residential burglaries and one of the areas that are being tested is the use of prediction methods for more efficient preventive actions. This thesis is an investigation of a potential method of prediction by using neural networks to identify areas that have a higher risk of burglaries on a daily basis. The model use reported burglaries to learn patterns in both space and time. The rationale for the existence of patterns is based on near repeat theories in criminology which states that after a burglary both the burgled victim and an area around that victim has an increased risk of additional burglaries. The work has been conducted in cooperation with the Swedish Police authority. The machine learning is implemented with convolutional long short-term memory (LSTM) neural networks with max pooling in three dimensions that learn from ten years of residential burglary data (2007-2016) in a study area in Stockholm, Sweden. The model's accuracy is measured by performing predictions of burglaries during 2017 on a daily basis. It classifies cells in a 36x36 grid with 600 meter square grid cells as areas with elevated risk or not. By classifying 4% of all grid cells during the year as risk areas, 43% of all burglaries are correctly predicted. The performance of the model could potentially be improved by further configuration of the parameters of the neural network, along with a use of more data with factors that are correlated to burglaries, for instance weather. Consequently, further work in these areas could increase the accuracy. The conclusion is that neural networks or machine learning in general could be a powerful and innovative tool for the Swedish Police authority to predict and moreover prevent certain crime. This thesis serves as a first prototype of how such a system could be implemented and used.
Style APA, Harvard, Vancouver, ISO itp.
3

Lin, Alvin. "Video Based Automatic Speech Recognition Using Neural Networks". DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2343.

Pełny tekst źródła
Streszczenie:
Neural network approaches have become popular in the field of automatic speech recognition (ASR). Most ASR methods use audio data to classify words. Lip reading ASR techniques utilize only video data, which compensates for noisy environments where audio may be compromised. A comprehensive approach, including the vetting of datasets and development of a preprocessing chain, to video-based ASR is developed. This approach will be based on neural networks, namely 3D convolutional neural networks (3D-CNN) and Long short-term memory (LSTM). These types of neural networks are designed to take in temporal data such as videos. Various combinations of different neural network architecture and preprocessing techniques are explored. The best performing neural network architecture, a CNN with bidirectional LSTM, compares favorably against recent works on video-based ASR.
Style APA, Harvard, Vancouver, ISO itp.
4

BHATT, HARSHIT. "SPEAKER IDENTIFICATION FROM VOICE SIGNALS USING HYBRID NEURAL NETWORK". Thesis, DELHI TECHNOLOGICAL UNIVERSITY, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18865.

Pełny tekst źródła
Streszczenie:
Identifying the speaker in audio visual environment is a crucial task which is now surfacing in the research domain researchers nowadays are moving towards utilizing deep neural networks to match people with their respective voices the applications of deep learning are many-fold that include the ability to process huge volume of data robust training of algorithms feasibility of optimization and reduced computation time. Previous studies have explored recurrent and convolutional neural network incorporating GRUs, Bi-GRUs, LSTM, Bi-LSTM and many more[1]. This work proposes a hybrid mechanism which consist of an CNN and LSTM network fused using an early fusion method. We accumulated a dataset of 1,330 voices by recording through a python script of length of 3 seconds in .wav format. The dataset consists of 14 categories and we used 80% for training and 20% for testing. We optimized and fine-tuned the neural networks and modified them to yield optimum results. For the early fusion approach, we used the concatenation operation that fuses neural networks prior to the training phase. The proposed method achieves 97.72% accuracy on our dataset and outperforms all existing baseline mechanisms like MLP, LSTM, CNN, and RNN. This research serves as a contribution to the ongoing research in speaker identification domain and paves way to future directions using deep learning.
Style APA, Harvard, Vancouver, ISO itp.
5

Lagerhjelm, Linus. "Extracting Information from Encrypted Data using Deep Neural Networks". Thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-155904.

Pełny tekst źródła
Streszczenie:
In this paper we explore various approaches to using deep neural networks to per- form cryptanalysis, with the ultimate goal of having a deep neural network deci- pher encrypted data. We use long short-term memory networks to try to decipher encrypted text and we use a convolutional neural network to perform classification tasks on encrypted MNIST images. We find that although the network is unable to decipher encrypted data, it is able to perform classification on encrypted data. We also find that the networks performance is depending on what key were used to en- crypt the data. These findings could be valuable for further research into the topic of cryptanalysis using deep neural networks.
Style APA, Harvard, Vancouver, ISO itp.
6

Näslund, Per. "Artificial Neural Networks in Swedish Speech Synthesis". Thesis, KTH, Tal-kommunikation, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239350.

Pełny tekst źródła
Streszczenie:
Text-to-speech (TTS) systems have entered our daily lives in the form of smart assistants and many other applications. Contemporary re- search applies machine learning and artificial neural networks (ANNs) to synthesize speech. It has been shown that these systems outperform the older concatenative and parametric methods. In this paper, ANN-based methods for speech synthesis are ex- plored and one of the methods is implemented for the Swedish lan- guage. The implemented method is dubbed “Tacotron” and is a first step towards end-to-end ANN-based TTS which puts many differ- ent ANN-techniques to work. The resulting system is compared to a parametric TTS through a strength-of-preference test that is carried out with 20 Swedish speaking subjects. A statistically significant pref- erence for the ANN-based TTS is found. Test subjects indicate that the ANN-based TTS performs better than the parametric TTS when it comes to audio quality and naturalness but sometimes lacks in intelli- gibility.
Talsynteser, också kallat TTS (text-to-speech) används i stor utsträckning inom smarta assistenter och många andra applikationer. Samtida forskning applicerar maskininlärning och artificiella neurala nätverk (ANN) för att utföra talsyntes. Det har visats i studier att dessa system presterar bättre än de äldre konkatenativa och parametriska metoderna. I den här rapporten utforskas ANN-baserade TTS-metoder och en av metoderna implementeras för det svenska språket. Den använda metoden kallas “Tacotron” och är ett första steg mot end-to-end TTS baserat på neurala nätverk. Metoden binder samman flertalet olika ANN-tekniker. Det resulterande systemet jämförs med en parametriskt TTS genom ett graderat preferens-test som innefattar 20 svensktalande försökspersoner. En statistiskt säkerställd preferens för det ANN- baserade TTS-systemet fastställs. Försökspersonerna indikerar att det ANN-baserade TTS-systemet presterar bättre än det parametriska när det kommer till ljudkvalitet och naturlighet men visar brister inom tydlighet.
Style APA, Harvard, Vancouver, ISO itp.
7

Evholt, David, i Oscar Larsson. "Generative Adversarial Networks and Natural Language Processing for Macroeconomic Forecasting". Thesis, KTH, Matematisk statistik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-273422.

Pełny tekst źródła
Streszczenie:
Macroeconomic forecasting is a classic problem, today most often modeled using time series analysis. Few attempts have been made using machine learning methods, and even fewer incorporating unconventional data, such as that from social media. In this thesis, a Generative Adversarial Network (GAN) is used to predict U.S. unemployment, beating the ARIMA benchmark on all horizons. Furthermore, attempts at using Twitter data and the Natural Language Processing (NLP) model DistilBERT are performed. While these attempts do not beat the benchmark, they do show promising results with predictive power. The models are also tested at predicting the U.S. stock index S&P 500. For these models, the Twitter data does improve the accuracy and shows the potential of social media data when predicting a more erratic index with less seasonality that is more responsive to current trends in public discourse. The results also show that Twitter data can be used to predict trends in both unemployment and the S&P 500 index. This sets the stage for further research into NLP-GAN models for macroeconomic predictions using social media data.
Makroekonomiska prognoser är sedan länge en svår utmaning. Idag löses de oftast med tidsserieanalys och få försök har gjorts med maskininlärning. I denna uppsats används ett generativt motstridande nätverk (GAN) för att förutspå amerikansk arbetslöshet, med resultat som slår samtliga riktmärken satta av en ARIMA. Ett försök görs också till att använda data från Twitter och den datorlingvistiska (NLP) modellen DistilBERT. Dessa modeller slår inte riktmärkena men visar lovande resultat. Modellerna testas vidare på det amerikanska börsindexet S&P 500. För dessa modeller förbättrade Twitterdata resultaten vilket visar på den potential data från sociala medier har när de appliceras på mer oregelbunda index, utan tydligt säsongsberoende och som är mer känsliga för trender i det offentliga samtalet. Resultaten visar på att Twitterdata kan användas för att hitta trender i både amerikansk arbetslöshet och S&P 500 indexet. Detta lägger grunden för fortsatt forskning inom NLP-GAN modeller för makroekonomiska prognoser baserade på data från sociala medier.
Style APA, Harvard, Vancouver, ISO itp.
8

Volný, Miloš. "Využití umělé inteligence jako podpory pro rozhodování v podniku". Master's thesis, Vysoké učení technické v Brně. Fakulta podnikatelská, 2019. http://www.nusl.cz/ntk/nusl-399447.

Pełny tekst źródła
Streszczenie:
This thesis is concerned with future trend prediction on capital markets on the basis of neural networks. Usage of convolutional and recurrent neural networks, Elliott wave theory and scalograms for capital market's future trend prediction is discussed. The aim of this thesis is to propose a novel approach to future trend prediction based on Elliott's wave theory. The proposed approach will be based on the principle of classification of chosen patterns from Elliott's theory by the way of convolutional neural network. To this end scalograms of the chosen Elliott patterns will be created through application of continuous wavelet transform on parts of historical time series of price for chosen stocks.
Style APA, Harvard, Vancouver, ISO itp.
9

Broomé, Sofia. "Objectively recognizing human activity in body-worn sensor data with (more or less) deep neural networks". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210243.

Pełny tekst źródła
Streszczenie:
This thesis concerns the application of different artificial neural network architectures on the classification of multivariate accelerometer time series data into activity classes such as sitting, lying down, running, or walking. There is a strong correlation between increased health risks in children and their amount of daily screen time (as reported in questionnaires). The dependency is not clearly understood, as there are no such dependencies reported when the sedentary (idle) time is measured objectively. Consequently, there is an interest from the medical side to be able to perform such objective measurements. To enable large studies the measurement equipment should ideally be low-cost and non-intrusive. The report investigates how well these movement patterns can be distinguished given a certain measurement setup and a certain network structure, and how well the networks generalise to noisier data. Recurrent neural networks are given extra attention among the different networks, since they are considered well suited for data of sequential nature. Close to state-of-the-art results (95% weighted F1-score) are obtained for the tasks with 4 and 5 classes, which is notable since a considerably smaller number of sensors is used than in the previously published results. Another contribution of this thesis is that a new labeled dataset with 12 activity categories is provided, consisting of around 6 hours of recordings, comparable in number of samples to benchmarking datasets. The data collection was made in collaboration with the Department of Public Health at Karolinska Institutet.
Inom ramen för uppsatsen testas hur väl rörelsemönster kan urskiljas ur accelerometerdatamed hjälp av den gren av maskininlärning som kallas djupinlärning; där djupa artificiellaneurala nätverk av noder funktionsapproximerar mappandes från domänen av sensordatatill olika fördefinerade kategorier av aktiviteter så som gång, stående, sittande eller liggande.Det finns ett intresse från den medicinska sidan att kunna mäta fysisk aktivitet objektivt,bland annat eftersom det visats att det finns en korrelation mellan ökade hälsorisker hosbarn och deras mängd daglig skärmtid. Denna typ av mätningar ska helst kunna göras medicke-invasiv utrustning till låg kostnad för att kunna göra större studier.Enklare nätverksarkitekturer samt återimplementeringar av bästa möjliga teknik inomområdet Mänsklig aktivitetsigenkänning (HAR) testas både på ett benchmarkingdataset ochpå egeninhämtad data i samarbete med Institutet för Folkhälsovetenskap på Karolinska Institutetoch resultat redovisas för olika val av möjliga klassificeringar och olika antal dimensionerper mätpunkt. De uppnådda resultaten (95% F1-score) på ett 4- och 5-klass-problem ärjämförbara med de bästa tidigare publicerade resultaten för aktivitetsigenkänning, vilket äranmärkningsvärt då då betydligt färre accelerometrar har använts här än i de åsyftade studierna.Förutom klassificeringsresultaten som redovisas bidrar det här arbetet med ett nyttinhämtat och kategorimärkt dataset; KTH-KI-AA. Det är jämförbart i antal datapunkter medspridda benchmarkingdataset inom HAR-området.
Style APA, Harvard, Vancouver, ISO itp.
10

Chowdhury, Muhammad Iqbal Hasan. "Question-answering on image/video content". Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/205096/1/Muhammad%20Iqbal%20Hasan_Chowdhury_Thesis.pdf.

Pełny tekst źródła
Streszczenie:
This thesis explores a computer's ability to understand multimodal data where the correspondence between image/video content and natural language text are utilised to answer open-ended natural language questions through question-answering tasks. Static image data consisting of both indoor and outdoor scenes, where complex textual questions are arbitrarily posed to a machine to generate correct answers, was examined. Dynamic videos consisting of both single-camera and multi-camera settings for the exploration of more challenging and unconstrained question-answering tasks were also considered. In exploring these challenges, new deep learning processes were developed to improve a computer's ability to understand and consider multimodal data.
Style APA, Harvard, Vancouver, ISO itp.
11

Ďuriš, Denis. "Detekce ohně a kouře z obrazového signálu". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412968.

Pełny tekst źródła
Streszczenie:
This diploma thesis deals with the detection of fire and smoke from the image signal. The approach of this work uses a combination of convolutional and recurrent neural network. Machine learning models created in this work contain inception modules and blocks of long short-term memory. The research part describes selected models of machine learning used in solving the problem of fire detection in static and dynamic image data. As part of the solution, a data set containing videos and still images used to train the designed neural networks was created. The results of this approach are evaluated in conclusion.
Style APA, Harvard, Vancouver, ISO itp.
12

Kvita, Jakub. "Popis fotografií pomocí rekurentních neuronových sítí". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255324.

Pełny tekst źródła
Streszczenie:
Tato práce se zabývá automatickým generovaním popisů obrázků s využitím několika druhů neuronových sítí. Práce je založena na článcích z MS COCO Captioning Challenge 2015 a znakových jazykových modelech, popularizovaných A. Karpathym. Navržený model je kombinací konvoluční a rekurentní neuronové sítě s architekturou kodér--dekodér. Vektor reprezentující zakódovaný obrázek je předáván jazykovému modelu jako hodnoty paměti LSTM vrstev v síti. Práce zkoumá, na jaké úrovni je model s takto jednoduchou architekturou schopen popisovat obrázky a jak si stojí v porovnání s ostatními současnými modely. Jedním ze závěrů práce je, že navržená architektura není dostatečná pro jakýkoli popis obrázků.
Style APA, Harvard, Vancouver, ISO itp.
13

Hedar, Sara. "Applying Machine Learning Methods to Predict the Outcome of Shots in Football". Thesis, Uppsala universitet, Avdelningen för systemteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-414774.

Pełny tekst źródła
Streszczenie:
The thesis investigates a publicly available dataset which covers morethan three million events in football matches. The aim of the study isto train machine learning models capable of modeling the relationshipbetween a shot event and its outcome. That is, to predict if a footballshot will result in a goal or not. By representing the shot indifferent ways, the aim is to draw conclusion regarding what elementsof a shot allows for a good prediction of its outcome. The shotrepresentation was varied both by including different numbers of eventspreceding the shot and by varying the set of features describing eachevent.The study shows that the performance of the machine learning modelsbenefit from including events preceding the shot. The highestpredictive performance was achieved by a long short-term memory neuralnetwork trained on the shot event and six events preceding the shot.The features which were found to have the largest positive impact onthe shot events were the precision of the event, the position on thefield and how the player was in contact with the ball. The size of thedataset was also evaluated and the results suggest that it issufficiently large for the size of the networks evaluated.
Style APA, Harvard, Vancouver, ISO itp.
14

Forslund, John, i Jesper Fahlén. "Predicting customer purchase behavior within Telecom : How Artificial Intelligence can be collaborated into marketing efforts". Thesis, KTH, Skolan för industriell teknik och management (ITM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279575.

Pełny tekst źródła
Streszczenie:
This study aims to investigate the implementation of an AI model that predicts customer purchases, in the telecom industry. The thesis also outlines how such an AI model can assist decision-making in marketing strategies. It is concluded that designing the AI model by following a Recurrent Neural Network (RNN) architecture with a Long Short-Term Memory (LSTM) layer, allow for a successful implementation with satisfactory model performances. Stepwise instructions to construct such model is presented in the methodology section of the study. The RNN-LSTM model further serves as an assisting tool for marketers to assess how a consumer’s website behavior affect their purchase behavior over time, in a quantitative way - by observing what the authors refer to as the Customer Purchase Propensity Journey (CPPJ). The firm empirical basis of CPPJ, can help organizations improve their allocation of marketing resources, as well as benefit the organization’s online presence by allowing for personalization of the customer experience.
Denna studie undersöker implementeringen av en AI-modell som förutspår kunders köp, inom telekombranschen. Studien syftar även till att påvisa hur en sådan AI-modell kan understödja beslutsfattande i marknadsföringsstrategier. Genom att designa AI-modellen med en Recurrent Neural Network (RNN) arkitektur med ett Long Short-Term Memory (LSTM) lager, drar studien slutsatsen att en sådan design möjliggör en framgångsrik implementering med tillfredsställande modellprestation. Instruktioner erhålls stegvis för att konstruera modellen i studiens metodikavsnitt. RNN-LSTM-modellen kan med fördel användas som ett hjälpande verktyg till marknadsförare för att bedöma hur en kunds beteendemönster på en hemsida påverkar deras köpbeteende över tiden, på ett kvantitativt sätt - genom att observera det ramverk som författarna kallar för Kundköpbenägenhetsresan, på engelska Customer Purchase Propensity Journey (CPPJ). Den empiriska grunden av CPPJ kan hjälpa organisationer att förbättra allokeringen av marknadsföringsresurser, samt gynna deras digitala närvaro genom att möjliggöra mer relevant personalisering i kundupplevelsen.
Style APA, Harvard, Vancouver, ISO itp.
15

Hamerník, Pavel. "Využití hlubokého učení pro rozpoznání textu v obrazu grafického uživatelského rozhraní". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403823.

Pełny tekst źródła
Streszczenie:
Optical character recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into a sequence of characters. Despite decades of intense research, OCR systems with capabilities to that of human still remains an open challenge. In this work there is presented a design and implementation of such system, which is capable of detecting texts in graphical user interfaces.
Style APA, Harvard, Vancouver, ISO itp.
16

Kramář, Denis. "Analýza zvukových nahrávek pomocí hlubokého učení". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-442571.

Pełny tekst źródła
Streszczenie:
This master thesis deals with the problem of audio-classification of the chainsaw logging sound in natural environment using mainly convolutional neural networks. First, a theory of grafical representation of audio signal is discussed. Following part is devoted to the machine learning area. In third chapter, some of present works dealing with this problematics are given. Within the practical part, used dataset and tested neural networks are presented. Final resultes are compared by achieved accuracy and by ROC curves. The robustness of the presented solutions was tested by proposed detection program and evaluated using objective criteria.
Style APA, Harvard, Vancouver, ISO itp.
17

Gessle, Gabriel, i Simon Åkesson. "A comparative analysis of CNN and LSTM for music genre classification". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260138.

Pełny tekst źródła
Streszczenie:
The music industry has seen a great influx of new channels to browse and distribute music. This does not come without drawbacks. As the data rapidly increases, manual curation becomes a much more difficult task. Audio files have a plethora of features that could be used to make parts of this process a lot easier. It is possible to extract these features, but the best way to handle these for different tasks is not always known. This thesis compares the two deep learning models, convolutional neural network (CNN) and long short-term memory (LSTM), for music genre classification when trained using mel-frequency cepstral coefficients (MFCCs) in hopes of making audio data as useful as possible for future usage. These models were tested on two different datasets, GTZAN and FMA, and the results show that the CNN had a 56.0% and 50.5% prediction accuracy, respectively. This outperformed the LSTM model that instead achieved a 42.0% and 33.5% prediction accuracy.
Musikindustrin har sett en stor ökning i antalet sätt att hitta och distribuera musik. Det kommer däremot med sina nackdelar, då mängden data ökar fort så blir det svårare att hantera den på ett bra sätt. Ljudfiler har mängder av information man kan extrahera och därmed göra den här processen enklare. Det är möjligt att använda sig av de olika typer av information som finns i filen, men bästa sättet att hantera dessa är inte alltid känt. Den här rapporten jämför två olika djupinlärningsmetoder, convolutional neural network (CNN) och long short-term memory (LSTM), tränade med mel-frequency cepstral coefficients (MFCCs) för klassificering av musikgenre i hopp om att göra ljuddata lättare att hantera inför framtida användning. Modellerna testades på två olika dataset, GTZAN och FMA, där resultaten visade att CNN:et fick en träffsäkerhet på 56.0% och 50.5% tränat på respektive dataset. Denna utpresterade LSTM modellen som istället uppnådde en träffsäkerhet på 42.0% och 33.5%.
Style APA, Harvard, Vancouver, ISO itp.
18

Albert, Florea George, i Filip Weilid. "Deep Learning Models for Human Activity Recognition". Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20201.

Pełny tekst źródła
Streszczenie:
AMI Meeting Corpus (AMI) -databasen används för att undersöka igenkännande av gruppaktivitet. AMI Meeting Corpus (AMI) -databasen ger forskare fjärrstyrda möten och naturliga möten i en kontorsmiljö; mötescenario i ett fyra personers stort kontorsrum. För attuppnågruppaktivitetsigenkänninganvändesbildsekvenserfrånvideosoch2-dimensionella audiospektrogram från AMI-databasen. Bildsekvenserna är RGB-färgade bilder och ljudspektrogram har en färgkanal. Bildsekvenserna producerades i batcher så att temporala funktioner kunde utvärderas tillsammans med ljudspektrogrammen. Det har visats att inkludering av temporala funktioner både under modellträning och sedan förutsäga beteende hos en aktivitet ökar valideringsnoggrannheten jämfört med modeller som endast använder rumsfunktioner[1]. Deep learning arkitekturer har implementerats för att känna igen olika mänskliga aktiviteter i AMI-kontorsmiljön med hjälp av extraherade data från the AMI-databas.Neurala nätverks modellerna byggdes med hjälp av KerasAPI tillsammans med TensorFlow biblioteket. Det finns olika typer av neurala nätverksarkitekturer. Arkitekturerna som undersöktes i detta projektet var Residual Neural Network, Visual GeometryGroup 16, Inception V3 och RCNN (LSTM). ImageNet-vikter har använts för att initialisera vikterna för Neurala nätverk basmodeller. ImageNet-vikterna tillhandahålls av Keras API och är optimerade för varje basmodell [2]. Basmodellerna använder ImageNet-vikter när de extraherar funktioner från inmatningsdata. Funktionsextraktionen med hjälp av ImageNet-vikter eller slumpmässiga vikter tillsammans med basmodellerna visade lovande resultat. Både Deep Learning användningen av täta skikt och LSTM spatio-temporala sekvens predikering implementerades framgångsrikt.
The Augmented Multi-party Interaction(AMI) Meeting Corpus database is used to investigate group activity recognition in an office environment. The AMI Meeting Corpus database provides researchers with remote controlled meetings and natural meetings in an office environment; meeting scenario in a four person sized office room. To achieve the group activity recognition video frames and 2-dimensional audio spectrograms were extracted from the AMI database. The video frames were RGB colored images and audio spectrograms had one color channel. The video frames were produced in batches so that temporal features could be evaluated together with the audio spectrogrames. It has been shown that including temporal features both during model training and then predicting the behavior of an activity increases the validation accuracy compared to models that only use spatial features [1]. Deep learning architectures have been implemented to recognize different human activities in the AMI office environment using the extracted data from the AMI database.The Neural Network models were built using the Keras API together with TensorFlow library. There are different types of Neural Network architectures. The architecture types that were investigated in this project were Residual Neural Network, Visual Geometry Group 16, Inception V3 and RCNN(Recurrent Neural Network). ImageNet weights have been used to initialize the weights for the Neural Network base models. ImageNet weights were provided by Keras API and was optimized for each base model[2]. The base models uses ImageNet weights when extracting features from the input data.The feature extraction using ImageNet weights or random weights together with the base models showed promising results. Both the Deep Learning using dense layers and the LSTM spatio-temporal sequence prediction were implemented successfully.
Style APA, Harvard, Vancouver, ISO itp.
19

ALIBERTI, ALESSANDRO. "Machine learning techniques to forecast non-linear trends in smart environments". Doctoral thesis, Politecnico di Torino, 2020. http://hdl.handle.net/11583/2846613.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
20

Olin, Per. "Evaluation of text classification techniques for log file classification". Thesis, Linköpings universitet, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166641.

Pełny tekst źródła
Streszczenie:
System log files are filled with logged events, status codes, and other messages. By analyzing the log files, the systems current state can be determined, and find out if something during its execution went wrong. Log file analysis has been studied for some time now, where recent studies have shown state-of-the-art performance using machine learning techniques. In this thesis, document classification solutions were tested on log files in order to classify regular system runs versus abnormal system runs. To solve this task, supervised and unsupervised learning methods were combined. Doc2Vec was used to extract document features, and Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) based architectures on the classification task. With the use of the machine learning models and preprocessing techniques the tested models yielded an f1-score and accuracy above 95% when classifying log files.
Style APA, Harvard, Vancouver, ISO itp.
21

Suresh, Sreerag. "An Analysis of Short-Term Load Forecasting on Residential Buildings Using Deep Learning Models". Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/99287.

Pełny tekst źródła
Streszczenie:
Building energy load forecasting is becoming an increasingly important task with the rapid deployment of smart homes, integration of renewables into the grid and the advent of decentralized energy systems. Residential load forecasting has been a challenging task since the residential load is highly stochastic. Deep learning models have showed tremendous promise in the fields of time-series and sequential data and have been successfully used in the field of short-term load forecasting at the building level. Although, other studies have looked at using deep learning models for building energy forecasting, most of those studies have looked at limited number of homes or an aggregate load of a collection of homes. This study aims to address this gap and serve as an investigation on selecting the better deep learning model architecture for short term load forecasting on 3 communities of residential buildings. The deep learning models CNN and LSTM have been used in the study. For 15-min ahead forecasting for a collection of homes it was found that homes with a higher variance were better predicted by using CNN models and LSTM showed better performance for homes with lower variances. The effect of adding weather variables on 24-hour ahead forecasting was studied and it was observed that adding weather parameters did not show an improvement in forecasting performance. In all the homes, deep learning models are shown to outperform the simple ANN model.
Master of Science
Building energy load forecasting is becoming an increasingly important task with the rapid deployment of smart homes, integration of renewables into the grid and the advent of decentralized energy systems. Residential load forecasting has been a challenging task since residential load is highly stochastic. Deep learning models have showed tremendous promise in the fields of time-series and sequential data and have been successfully used in the field of short-term load forecasting. Although, other studies have looked at using deep learning models for building energy forecasting, most of those studies have looked at only a single home or an aggregate load of a collection of homes. This study aims to address this gap and serve as an analysis on short term load forecasting on 3 communities of residential buildings. Detailed analysis on the model performances across all homes have been studied. Deep learning models have been used in this study and their efficacy is measured compared to a simple ANN model.
Style APA, Harvard, Vancouver, ISO itp.
22

Cavallie, Mester Jon William. "Using LSTM Neural Networks To Predict Daily Stock Returns". Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-106124.

Pełny tekst źródła
Streszczenie:
Long short-term memory (LSTM) neural networks have been proven to be effective for time series prediction, even in some instances where the data is non-stationary. This lead us to examine their predictive ability of stock market returns, as the development of stock prices and returns tend to be a non-stationary time series. We used daily stock trading data to let an LSTM train models at predicting daily returns for 60 stocks from the OMX30 and Nasdaq-100 indices. Subsequently, we measured their accuracy, precision, and recall. The mean accuracy was 49.75 percent, meaning that the observed accuracy was close to the accuracy one would observe by randomly selecting a prediction for each day and lower than the accuracy achieved by blindly predicting all days to be positive. Finally, we concluded that further improvements need to be made for models trained by LSTMs to have any notable predictive ability in the area of stock returns.
Style APA, Harvard, Vancouver, ISO itp.
23

Pokhrel, Abhishek <1996&gt. "Stock Returns Prediction using Recurrent Neural Networks with LSTM". Master's Degree Thesis, Università Ca' Foscari Venezia, 2022. http://hdl.handle.net/10579/22038.

Pełny tekst źródła
Streszczenie:
Research in asset pricing has, until recently, side-stepped the high dimensionality problem by focusing on low-dimensional models. Work on cross-sectional stock return prediction, for example, has focused on regressions with a small number of characteristics. Given the background of an enormously large number of variables that could potentially be relevant for predicting returns, focusing on such a small number of factors effectively means that the researchers are imposing a very high degree of sparsity on these models. This research studies the use of the recurrent neural network (RNN) method to deal with the “curse of dimensionality” challenge in the cross-section of stock returns. The purpose is to predict the daily stock returns. Compared with the traditional method of returns, namely the CAPM model, the focus will be on using the LSTM model to do the prediction. LSTM is very powerful in sequence prediction problems because they’re able to store past information. Thus, we compare the forecast of returns from the LSTM model with the traditional CAPM model. The comparison will be made using the out-of-sample R2 along with the Sharpe Ratio and Sortino Ratio. Finally, we conclude with the further improvements that need to be made for models trained by LSTMs to have any notable predictive ability in the area of stock returns.
Style APA, Harvard, Vancouver, ISO itp.
24

Lara, Teodoro. "Controllability and applications of CNN". Diss., Georgia Institute of Technology, 1997. http://hdl.handle.net/1853/28921.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
25

Terefe, Adisu Wagaw. "Handwritten Recognition for Ethiopic (Ge’ez) Ancient Manuscript Documents". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288145.

Pełny tekst źródła
Streszczenie:
The handwritten recognition system is a process of learning a pattern from a given image of text. The recognition process usually combines a computer vision task with sequence learning techniques. Transcribing texts from the scanned image remains a challenging problem, especially when the documents are highly degraded, or have excessive dusty noises. Nowadays, there are several handwritten recognition systems both commercially and in free versions, especially for Latin based languages. However, there is no prior study that has been built for Ge’ez handwritten ancient manuscript documents. In contrast, the language has many mysteries of the past, in human history of science, architecture, medicine and astronomy. In this thesis, we present two separate recognition systems. (1) A character-level recognition system which combines computer vision for character segmentation from ancient books and a vanilla Convolutional Neural Network (CNN) to recognize characters. (2) An end- to- end segmentation free handwritten recognition system using CNN, Multi-Dimensional Recurrent Neural Network (MDRNN) with Connectionist Temporal Classification (CTC) for the Ethiopic (Ge’ez) manuscript documents. The proposed character label recognition model outperforms 97.78% accuracy. In contrast, the second model provides an encouraging result which indicates to further study the language properties for better recognition of all the ancient books.
Det handskrivna igenkännings systemet är en process för att lära sig ett mönster från en viss bild av text. Erkännande Processen kombinerar vanligtvis en datorvisionsuppgift med sekvens inlärningstekniker. Transkribering av texter från den skannade bilden är fortfarande ett utmanande problem, särskilt när dokumenten är mycket försämrad eller har för omåttlig dammiga buller. Nuförtiden finns det flera handskrivna igenkänningar system både kommersiellt och i gratisversionen, särskilt för latin baserade språk. Det finns dock ingen tidigare studie som har byggts för Ge’ez handskrivna gamla manuskript dokument. I motsats till detta språk har många mysterier från det förflutna, i vetenskapens mänskliga historia, arkitektur, medicin och astronomi. I denna avhandling presenterar vi två separata igenkänningssystem. (1) Ett karaktärs nivå igenkänningssystem som kombinerar bildigenkänning för karaktär segmentering från forntida böcker och ett vanilj Convolutional Neural Network (CNN) för att erkänna karaktärer. (2) Ett änd-till-slut-segmentering fritt handskrivet igenkänningssystem som använder CNN, Multi-Dimensional Recurrent Neural Network (MDRNN) med Connectionist Temporal Classification (CTC) för etiopiska (Ge’ez) manuskript dokument. Den föreslagna karaktär igenkännings modellen överträffar 97,78% noggrannhet. Däremot ger den andra modellen ett uppmuntrande resultat som indikerar att ytterligare studera språk egenskaperna för bättre igenkänning av alla antika böcker.
Style APA, Harvard, Vancouver, ISO itp.
26

Carpani, Valerio. "CNN-based video analytics". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Znajdź pełny tekst źródła
Streszczenie:
The content of this thesis illustrates the six months work done during my internship at TKH Security Solutions - Siqura B.V. in Gouda, Netherlands. The aim of this thesis is to investigate on convolutional neural networks possible usage, from two different point of view: first we propose a novel algorithm for person re-identification, second we propose a deployment chain, for bringing research concepts to product ready solutions. In existing works, the person re-identification task is assumed to be independent of the person detection task. In this thesis instead, we consider the two tasks as linked. In fact, features produced by an object detection convolutional neural network (CNN) contain useful information, which is not being used by current re-identification methods. We propose several solutions for learning a metric on CNN features to distinguish between different identities. Then the best of these solutions is compared with state of the art alternatives on the popular Market-1501 dataset. Results show that our method outperforms them in computational efficiency, with only a reasonable loss in accuracy. For this reason, we believe that the proposed method can be more appropriate than current state of the art methods in situations where the computational efficiency is critical, such as embedded applications. The deployment chain we propose in this thesis has two main goals: it must be flexible for introducing new advancement in networks architecture, and it must be able to deploy neural networks both on server and embedded platforms. We tested several frameworks on several platforms and we ended up with a deployment chain that relies on the open source format ONNX.
Style APA, Harvard, Vancouver, ISO itp.
27

Ärlemalm, Filip. "Harbour Porpoise Click Train Classification with LSTM Recurrent Neural Networks". Thesis, KTH, Teknisk informationsvetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-215088.

Pełny tekst źródła
Streszczenie:
The harbour porpoise is a toothed whale whose presence is threatened in Scandinavia. Onestep towards preserving the species in critical areas is to study and observe the harbourporpoise population growth or decline in these areas. Today this is done by using underwateraudio recorders, so called hydrophones, and manual analyzing tools. This report describes amethod that modernizes the process of harbour porpoise detection with machine learning. Thedetection method is based on data collected by the hydrophone AQUAclick 100. The data isprocessed and classified automatically with a stacked long short-term memory recurrent neuralnetwork designed specifically for this purpose.
Vanlig tumlare är en tandval vars närvaro i Skandinavien är hotad. Ett steg mot att kunnabevara arten i utsatta områden är att studera och observera tumlarbeståndets tillväxt ellertillbakagång i dessa områden. Detta görs idag med hjälp av ljudinspelare för undervattensbruk,så kallade hydrofoner, samt manuella analysverktyg. Den här rapporten beskriver enmetod som moderniserar processen för detektering av vanlig tumlare genom maskininlärning.Detekteringen är baserad på insamlad data från hydrofonen AQUAclick 100. Bearbetning ochklassificering av data har automatiserats genom att använda ett staplat återkopplande neuraltnätverk med långt korttidsminne utarbetat specifikt för detta ändamål.
Style APA, Harvard, Vancouver, ISO itp.
28

El-Shafei, Ahmed. "Time multiplexing of cellular neural networks". Thesis, University of Kent, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365221.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

Hossain, Md Tahmid. "Towards robust convolutional neural networks in challenging environments". Thesis, Federation University Australia, 2021. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/181882.

Pełny tekst źródła
Streszczenie:
Image classification is one of the fundamental tasks in the field of computer vision. Although Artificial Neural Network (ANN) showed a lot of promise in this field, the lack of efficient computer hardware subdued its potential to a great extent. In the early 2000s, advances in hardware coupled with better network design saw the dramatic rise of Convolutional Neural Network (CNN). Deep CNNs pushed the State-of-The-Art (SOTA) in a number of vision tasks, including image classification, object detection, and segmentation. Presently, CNNs dominate these tasks. Although CNNs exhibit impressive classification performance on clean images, they are vulnerable to distortions, such as noise and blur. Fine-tuning a pre-trained CNN on mutually exclusive or a union set of distortions is a brute-force solution. This iterative fine-tuning process with all known types of distortion is, however, exhaustive and the network struggles to handle unseen distortions. CNNs are also vulnerable to image translation or shift, partly due to common Down-Sampling (DS) layers, e.g., max-pooling and strided convolution. These operations violate the Nyquist sampling rate and cause aliasing. The textbook solution is low-pass filtering (blurring) before down-sampling, which can benefit deep networks as well. Even so, non-linearity units, such as ReLU, often re-introduce the problem, suggesting that blurring alone may not suffice. Another important but under-explored issue for CNNs is unknown or Open Set Recognition (OSR). CNNs are commonly designed for closed set arrangements, where test instances only belong to some ‘Known Known’ (KK) classes used in training. As such, they predict a class label for a test sample based on the distribution of the KK classes. However, when used under the OSR setup (where an input may belong to an ‘Unknown Unknown’ or UU class), such a network will always classify a test instance as one of the KK classes even if it is from a UU class. Historically, CNNs have struggled with detecting objects in images with large difference in scale, especially small objects. This is because the DS layers inside a CNN often progressively wipe out the signal from small objects. As a result, the final layers are left with no signature from these objects leading to degraded performance. In this work, we propose solutions to the above four problems. First, we improve CNN robustness against distortion by proposing DCT based augmentation, adaptive regularisation, and noise suppressing Activation Functions (AF). Second, to ensure further performance gain and robustness to image transformations, we introduce anti-aliasing properties inside the AF and propose a novel DS method called blurpool. Third, to address the OSR problem, we propose a novel training paradigm that ensures detection of UU classes and accurate classification of the KK classes. Finally, we introduce a novel CNN that enables a deep detector to identify small objects with high precision and recall. We evaluate our methods on a number of benchmark datasets and demonstrate that they outperform contemporary methods in the respective problem set-ups.
Doctor of Philosophy
Style APA, Harvard, Vancouver, ISO itp.
30

Ferreira, de Melo Filho Alberto. "Predicting the unpredictable - Can Artificial Neural Network replace ARIMA for prediction of the Swedish Stock Market (OMXS30)?" Thesis, Mittuniversitetet, Institutionen för ekonomi, geografi, juridik och turism, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-36908.

Pełny tekst źródła
Streszczenie:
During several decades the stock market has been an area of interest forresearchers due to its complexity, noise, uncertainty and nonlinearity of thedata. Most of the studies regarding this area use a classical stochastics method,an example of this is ARIMA which is a standard approach for time seriesprediction. There is however another method for prediction of the stock marketthat is gaining traction in the recent years; Artificial Neural Network (ANN).This method has mostly been used in research on the American and Asian stockmarkets so far. Therefore, the purpose of this essay was to explore if ArtificialNeural Network could be used instead of ARIMA to predict the Swedish stockmarket (OMXS30). The study used data from the Swedish Stock Marketbetween 1991-07-09 to 2018-12-28 for the training of the ARIMA model anda forecast data that ranged between 2019-01-02 to 2019-04-26. The forecastdata of the ANN was composed of 80% of the data between 1991-07-09 to2019-04-26 and the evaluation data was composed of the remaining 20%. TheANN architecture had one input layer with chunks of 20 consecutive days asinput, followed by three Long Short-Term Memory (LSTM) hidden layers with128 neurons in each layer, followed by another hidden layer with RectifiedLinear Unit (ReLU) containing 32 neurons, followed by the output layercontaining 2 neurons with softmax activation. The results showed that theANN, with an accuracy of 0,9892, could be a successful method to forecast theSwedish stock market instead of ARIMA.
Style APA, Harvard, Vancouver, ISO itp.
31

Rintala, Jonathan. "Speech Emotion Recognition from Raw Audio using Deep Learning". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278858.

Pełny tekst źródła
Streszczenie:
Traditionally, in Speech Emotion Recognition, models require a large number of manually engineered features and intermediate representations such as spectrograms for training. However, to hand-engineer such features often requires both expert domain knowledge and resources. Recently, with the emerging paradigm of deep-learning, end-to-end models that extract features themselves and learn from the raw speech signal directly have been explored. A previous approach has been to combine multiple parallel CNNs with different filter lengths to extract multiple temporal features from the audio signal, and then feed the resulting sequence to a recurrent block. Also, other recent work present high accuracies when utilizing local feature learning blocks (LFLBs) for reducing the dimensionality of a raw audio signal, extracting the most important information. Thus, this study will combine the idea of LFLBs for feature extraction with a block of parallel CNNs with different filter lengths for capturing multitemporal features; this will finally be fed into an LSTM layer for global contextual feature learning. To the best of our knowledge, such a combined architecture has yet not been properly investigated. Further, this study will investigate different configurations of such an architecture. The proposed model is then trained and evaluated on the well-known speech databases EmoDB and RAVDESS, both in a speaker-dependent and speaker-independent manner. The results indicate that the proposed architecture can produce comparable results with state-of-the-art; despite excluding data augmentation and advanced pre-processing. It was reported 3 parallel CNN pipes yielded the highest accuracy, together with a series of modified LFLBs that utilize averagepooling and ReLU activation. This shows the power of leaving the feature learning up to the network and opens up for interesting future research on time-complexity and trade-off between introducing complexity in pre-processing or in the model architecture itself.
Traditionellt sätt, vid talbaserad känsloigenkänning, kräver modeller ett stort antal manuellt konstruerade attribut och mellanliggande representationer, såsom spektrogram, för träning. Men att konstruera sådana attribut för hand kräver ofta både domänspecifika expertkunskaper och resurser. Nyligen har djupinlärningens framväxande end-to-end modeller, som utvinner attribut och lär sig direkt från den råa ljudsignalen, undersökts. Ett tidigare tillvägagångssätt har varit att kombinera parallella CNN:er med olika filterlängder för att extrahera flera temporala attribut från ljudsignalen och sedan låta den resulterande sekvensen passera vidare in i ett så kallat Recurrent Neural Network. Andra tidigare studier har också nått en hög noggrannhet när man använder lokala inlärningsblock (LFLB) för att reducera dimensionaliteten hos den råa ljudsignalen, och på så sätt extraheras den viktigaste informationen från ljudet. Således kombinerar denna studie idén om att nyttja LFLB:er för extraktion av attribut, tillsammans med ett block av parallella CNN:er som har olika filterlängder för att fånga multitemporala attribut; detta kommer slutligen att matas in i ett LSTM-lager för global inlärning av kontextuell information. Så vitt vi vet har en sådan kombinerad arkitektur ännu inte undersökts. Vidare kommer denna studie att undersöka olika konfigurationer av en sådan arkitektur. Den föreslagna modellen tränas och utvärderas sedan på de välkända taldatabaserna EmoDB och RAVDESS, både via ett talarberoende och talaroberoende tillvägagångssätt. Resultaten indikerar att den föreslagna arkitekturen kan ge jämförbara resultat med state-of-the-art, trots att ingen ökning av data eller avancerad förbehandling har inkluderats. Det rapporteras att 3 parallella CNN-lager gav högsta noggrannhet, tillsammans med en serie av modifierade LFLB:er som nyttjar average-pooling och ReLU som aktiveringsfunktion. Detta visar fördelarna med att lämna inlärningen av attribut till nätverket och öppnar upp för intressant framtida forskning kring tidskomplexitet och avvägning mellan introduktion av komplexitet i förbehandlingen eller i själva modellarkitekturen.
Style APA, Harvard, Vancouver, ISO itp.
32

Paschou, Michail. "ASIC implementation of LSTM neural network algorithm". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254290.

Pełny tekst źródła
Streszczenie:
LSTM neural networks have been used for speech recognition, image recognition and other artificial intelligence applications for many years. Most applications perform the LSTM algorithm and the required calculations on cloud computers. Off-line solutions include the use of FPGAs and GPUs but the most promising solutions include ASIC accelerators designed for this purpose only. This report presents an ASIC design capable of performing the multiple iterations of the LSTM algorithm on a unidirectional and without peepholes neural network architecture. The proposed design provides arithmetic level parallelism options as blocks are instantiated based on parameters. The internal structure of the design implements pipelined, parallel or serial solutions depending on which is optimal in every case. The implications concerning these decisions are discussed in detail in the report. The design process is described in detail and the evaluation of the design is also presented to measure accuracy and error of the design output.This thesis work resulted in a complete synthesizable ASIC design implementing an LSTM layer, a Fully Connected layer and a Softmax layer which can perform classification of data based on trained weight matrices and bias vectors. The design primarily uses 16-bit fixed point format with 5 integer and 11 fractional bits but increased precision representations are used in some blocks to reduce error output. Additionally, a verification environment has also been designed and is capable of performing simulations, evaluating the design output by comparing it with results produced from performing the same operations with 64-bit floating point precision on a SystemVerilog testbench and measuring the encountered error. The results concerning the accuracy and the design output error margin are presented in this thesis report. The design went through Logic and Physical synthesis and successfully resulted in a functional netlist for every tested configuration. Timing, area and power measurements on the generated netlists of various configurations of the design show consistency and are reported in this report.
LSTM neurala nätverk har använts för taligenkänning, bildigenkänning och andra artificiella intelligensapplikationer i många år. De flesta applikationer utför LSTM-algoritmen och de nödvändiga beräkningarna i digitala moln. Offline lösningar inkluderar användningen av FPGA och GPU men de mest lovande lösningarna inkluderar ASIC-acceleratorer utformade för endast dettaändamål. Denna rapport presenterar en ASIC-design som kan utföra multipla iterationer av LSTM-algoritmen på en enkelriktad neural nätverksarkitetur utan peepholes. Den föreslagna designed ger aritmetrisk nivå-parallellismalternativ som block som är instansierat baserat på parametrar. Designens inre konstruktion implementerar pipelinerade, parallella, eller seriella lösningar beroende på vilket anternativ som är optimalt till alla fall. Konsekvenserna för dessa beslut diskuteras i detalj i rapporten. Designprocessen beskrivs i detalj och utvärderingen av designen presenteras också för att mäta noggrannheten och felmarginal i designutgången. Resultatet av arbetet från denna rapport är en fullständig syntetiserbar ASIC design som har implementerat ett LSTM-lager, ett fullständigt anslutet lager och ett Softmax-lager som kan utföra klassificering av data baserat på tränade viktmatriser och biasvektorer. Designen använder huvudsakligen 16bitars fast flytpunktsformat med 5 heltal och 11 fraktions bitar men ökade precisionsrepresentationer används i vissa block för att minska felmarginal. Till detta har även en verifieringsmiljö utformats som kan utföra simuleringar, utvärdera designresultatet genom att jämföra det med resultatet som produceras från att utföra samma operationer med 64-bitars flytpunktsprecision på en SystemVerilog testbänk och mäta uppstådda felmarginal. Resultaten avseende noggrannheten och designutgångens felmarginal presenteras i denna rapport.Designen gick genom Logisk och Fysisk syntes och framgångsrikt resulterade i en funktionell nätlista för varje testad konfiguration. Timing, area och effektmätningar på den genererade nätlistorna av olika konfigurationer av designen visar konsistens och rapporteras i denna rapport.
Style APA, Harvard, Vancouver, ISO itp.
33

Kapoor, Prince. "Shoulder Keypoint-Detection from Object Detection". Thesis, Université d'Ottawa / University of Ottawa, 2018. http://hdl.handle.net/10393/38015.

Pełny tekst źródła
Streszczenie:
This thesis presents detailed observation of different Convolutional Neural Network (CNN) architecture which had assisted Computer Vision researchers to achieve state-of-the-art performance on classification, detection, segmentation and much more to name image analysis challenges. Due to the advent of deep learning, CNN had been used in almost all the computer vision applications and that is why there is utter need to understand the miniature details of these feature extractors and find out their pros and cons of each feature extractor meticulously. In order to perform our experimentation, we decided to explore an object detection task using a particular model architecture which maintains a sweet spot between computational cost and accuracy. The model architecture which we had used is LSTM-Decoder. The model had been experimented with different CNN feature extractor and found their pros and cons in variant scenarios. The results which we had obtained on different datasets elucidates that CNN plays a major role in obtaining higher accuracy and we had also achieved a comparable state-of-the-art accuracy on Pedestrian Detection Dataset. In extension to object detection, we also implemented two different model architectures which find shoulder keypoints. So, One of our idea can be explicated as follows: using the detected annotation from object detection, a small cropped image is generated which would be feed into a small cascade network which was trained for detection of shoulder keypoints. The second strategy is to use the same object detection model and fine tune their weights to predict shoulder keypoints. Currently, we had generated our results for shoulder keypoint detection. However, this idea could be extended to full-body pose Estimation by modifying the cascaded network for pose estimation purpose and this had become an important topic of discussion for the future work of this thesis.
Style APA, Harvard, Vancouver, ISO itp.
34

Zambezi, Samantha. "Predicting social unrest events in South Africa using LSTM neural networks". Master's thesis, Faculty of Science, 2021. http://hdl.handle.net/11427/33986.

Pełny tekst źródła
Streszczenie:
This thesis demonstrates an approach to predict the count of social unrest events in South Africa. A comparison is made between traditional forecasting approaches and neural networks; the traditional forecast method selected being the Autoregressive Integrated Moving Average (ARIMA model). The type of neural network implemented was the Long Short-Term Memory (LSTM) neural network. The basic theoretical concepts of ARIMA and LSTM neural networks are explained and subsequently, the patterns of the social unrest time series were analysed using time series exploratory techniques. The social unrest time series contained a significant number of irregular fluctuations with a non-linear trend. The structure of the social unrest time series suggested that traditional linear approaches would fail to model the non-linear behaviour of the time series. This thesis confirms this finding. Twelve experiments were conducted, and in these experiments, features, scaling procedures and model configurations are varied (i.e. univariate and multivariate models). Multivariate LSTM achieved the lowest forecast errors and performance improved as more explanatory features were introduced. The ARIMA model's performance deteriorated with added complexity and the univariate ARIMA produced lower forecast errors compared to the multivariate ARIMA. In conclusion, it can be claimed that multivariate LSTM neural networks are useful for predicting social unrest events.
Style APA, Harvard, Vancouver, ISO itp.
35

Engström, Olof. "Deep Learning for Anomaly Detection in Microwave Links : Challenges and Impact on Weather Classification". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276676.

Pełny tekst źródła
Streszczenie:
Artificial intelligence is receiving a great deal of attention in various fields of science and engineering due to its promising applications. In today’s society, weather classification models with high accuracy are of utmost importance. An alternative to using conventional weather radars is to use measured attenuation data in microwave links as the input to deep learning-based weather classification models. Detecting anomalies in the measured attenuation data is of great importance as the output of a classification model cannot be trusted if the input to the classification model contains anomalies. Designing an accurate classification model poses some challenges due to the absence of predefined features to discriminate among the various weather conditions, and due to specific domain requirements in terms of execution time and detection sensitivity. In this thesis we investigate the relationship between anomalies in signal attenuation data, which is the input to a weather classification model, and the model’s misclassifications. To this end, we propose and evaluate two deep learning models based on long short-term memory networks (LSTM) and convolutional neural networks (CNN) for anomaly detection in a weather classification problem. We evaluate the feasibility and possible generalizations of the proposed methodology in an industrial case study at Ericsson AB, Sweden. The results show that both proposed methods can detect anomalies that correlate with misclassifications made by the weather classifier. Although the LSTM performed better than the CNN with regards to top performance on one link and average performance across all 5 tested links, the CNN performance is shown to be more consistent.
Artificiell intelligens har fått mycket uppmärksamhet inom olika teknik- och vetenskapsområden på grund av dess många lovande tillämpningar. I dagens samhälle är väderklassificeringsmodeller med hög noggrannhet av yttersta vikt. Ett alternativ till att använda konventionell väderradar är att använda uppmätta dämpningsdata i mikrovågslänkar som indata till djupinlärningsbaserade väderklassificeringsmodeller. Detektering av avvikelser i uppmätta dämpningsdata är av stor betydelse eftersom en klassificeringsmodells pålitlighet minskar om träningsdatat innehåller avvikelser. Att utforma en noggrann klassificeringsmodell är svårt på grund av bristen på fördefinierade kännetecken för olika typer av väderförhållanden, och på grund av de specifika domänkrav som ofta ställs när det gäller exekveringstid och detekteringskänslighet. I det här examensarbetet undersöker vi förhållandet mellan avvikelser i uppmätta dämpningsdata från mikrovågslänkar, och felklassificeringar gjorda av en väderklassificeringsmodell. För detta ändamål utvärderar vi avvikelsedetektering inom ramen för väderklassificering med hjälp av två djupinlärningsmodeller, baserade på long short-term memory-nätverk (LSTM) och faltningsnätverk (CNN). Vi utvärderar genomförbarhet och generaliserbarhet av den föreslagna metodiken i en industriell fallstudie hos Ericsson AB. Resultaten visar att båda föreslagna metoder kan upptäcka avvikelser som korrelerar med felklassificeringar gjorda av väderklassificeringsmodellen. LSTM-modellen presterade bättre än CNN-modellen både med hänsyn till toppprestanda på en länk och med hänsyn till genomsnittlig prestanda över alla 5 testade länkar, men CNNmodellens prestanda var mer konsistent.
Style APA, Harvard, Vancouver, ISO itp.
36

Verner, Alexander. "LSTM Networks for Detection and Classification of Anomalies in Raw Sensor Data". Diss., NSUWorks, 2019. https://nsuworks.nova.edu/gscis_etd/1074.

Pełny tekst źródła
Streszczenie:
In order to ensure the validity of sensor data, it must be thoroughly analyzed for various types of anomalies. Traditional machine learning methods of anomaly detections in sensor data are based on domain-specific feature engineering. A typical approach is to use domain knowledge to analyze sensor data and manually create statistics-based features, which are then used to train the machine learning models to detect and classify the anomalies. Although this methodology is used in practice, it has a significant drawback due to the fact that feature extraction is usually labor intensive and requires considerable effort from domain experts. An alternative approach is to use deep learning algorithms. Research has shown that modern deep neural networks are very effective in automated extraction of abstract features from raw data in classification tasks. Long short-term memory networks, or LSTMs in short, are a special kind of recurrent neural networks that are capable of learning long-term dependencies. These networks have proved to be especially effective in the classification of raw time-series data in various domains. This dissertation systematically investigates the effectiveness of the LSTM model for anomaly detection and classification in raw time-series sensor data. As a proof of concept, this work used time-series data of sensors that measure blood glucose levels. A large number of time-series sequences was created based on a genuine medical diabetes dataset. Anomalous series were constructed by six methods that interspersed patterns of common anomaly types in the data. An LSTM network model was trained with k-fold cross-validation on both anomalous and valid series to classify raw time-series sequences into one of seven classes: non-anomalous, and classes corresponding to each of the six anomaly types. As a control, the accuracy of detection and classification of the LSTM was compared to that of four traditional machine learning classifiers: support vector machines, Random Forests, naive Bayes, and shallow neural networks. The performance of all the classifiers was evaluated based on nine metrics: precision, recall, and the F1-score, each measured in micro, macro and weighted perspective. While the traditional models were trained on vectors of features, derived from the raw data, that were based on knowledge of common sources of anomaly, the LSTM was trained on raw time-series data. Experimental results indicate that the performance of the LSTM was comparable to the best traditional classifiers by achieving 99% accuracy in all 9 metrics. The model requires no labor-intensive feature engineering, and the fine-tuning of its architecture and hyper-parameters can be made in a fully automated way. This study, therefore, finds LSTM networks an effective solution to anomaly detection and classification in sensor data.
Style APA, Harvard, Vancouver, ISO itp.
37

Chen, Yani. "Deep Learning based 3D Image Segmentation Methods and Applications". Ohio University / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1547066297047003.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Li, Xile. "Real-time Multi-face Tracking with Labels based on Convolutional Neural Networks". Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36707.

Pełny tekst źródła
Streszczenie:
This thesis presents a real-time multi-face tracking system, which is able to track multiple faces for live videos, broadcast, real-time conference recording, etc. The real-time output is one of the most significant advantages. Our proposed tracking system is comprised of three parts: face detection, feature extraction and tracking. We deploy a three-layer Convolutional Neural Network (CNN) to detect a face, a one-layer CNN to extract the features of a detected face and a shallow network for face tracking based on the extracted feature maps of the face. The performance of our multi-face tracking system enables the tracker to run in real-time without any on-line training. This algorithm does not need to change any parameters according to different input video conditions, and the runtime cost will not be affected significantly by an the increase in the number of faces being tracked. In addition, our proposed tracker can overcome most of the generally difficult tracking conditions which include video containing a camera cut, face occlusion, false positive face detection, false negative face detection, e.g. due to faces at the image boundary or faces shown in profile. We use two commonly used metrics to evaluate the performance of our multi-face tracking system demonstrating that our system achieves accurate results. Our multi-face tracker achieves an average runtime cost around 0.035s with GPU acceleration and this runtime cost is close to stable even if the number of tracked faces increases. All the evaluation results and comparisons are tested with four commonly used video data sets.
Style APA, Harvard, Vancouver, ISO itp.
39

Pervej, Md Ferdous. "Edge Caching for Small Cell Networks". DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7580.

Pełny tekst źródła
Streszczenie:
An idea of storing contents, such as media files, music files, movie clips, etc. is simple yet challenging in terms of required effort to make it count. Some of the benefits of pre-storing the contents are reduced delay of accessing/downloading a content, reduced load to the centralized servers and of course, a higher data rate. However, several challenges need to be addressed to achieve these benefits. Among many, some of the fundamentals are limited storage capacity, storing the right content and minimizing the costs. This thesis aims to address these challenges. First, a framework for predicting the proper contents that need to be stored to the limited storage capacity is presented. Then, the cost is minimized considering several real-world scenarios. While doing that, all possible collaborations among the local nodes are performed to ensure high performance. Therefore, the goal of this thesis is to come up with a solution to the content storing problems so that the network cost is minimized.
Style APA, Harvard, Vancouver, ISO itp.
40

Xiang, Wenliang. "Anomaly detection by prediction for health monitoring of satellites using LSTM neural networks". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24695/.

Pełny tekst źródła
Streszczenie:
Anomaly detection in satellite has not been well-documented due to the unavailability of satellite data, while it becomes more and more important with the increasing popularity of satellite applications. Our work focus on the anomaly detection by prediction on the dataset from the satellite, where we try and compare performance among recurrent neural network (RNN), Long Short-Term Memory (LSTM) and conventional neural network (NN). We conclude that LSTM with input length p=16, dimensionality n=32, output length q=2, 128 neurons and without maximum overlap is the best in terms of balanced accuracy. And LSTM with p=128, n=32, q=16, 128 and without maximum overlap outperforms most with respect to AUC metric. We also invent award function as a new performance metric trying to capture not only the correctness of decisions that NN made but also the amount of confidence in making its decisions, and we propose two candidates of award function. Regrettably, they partially meet our expectation as they possess a fatal defect which has been proved both from practical and theoretical viewpoints.
Style APA, Harvard, Vancouver, ISO itp.
41

Díaz, González Fernando. "Federated Learning for Time Series Forecasting Using LSTM Networks: Exploiting Similarities Through Clustering". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254665.

Pełny tekst źródła
Streszczenie:
Federated learning poses a statistical challenge when training on highly heterogeneous sequence data. For example, time-series telecom data collected over long intervals regularly shows mixed fluctuations and patterns. These distinct distributions are an inconvenience when a node not only plans to contribute to the creation of the global model but also plans to apply it on its local dataset. In this scenario, adopting a one-fits-all approach might be inadequate, even when using state-of-the-art machine learning techniques for time series forecasting, such as Long Short-Term Memory (LSTM) networks, which have proven to be able to capture many idiosyncrasies and generalise to new patterns. In this work, we show that by clustering the clients using these patterns and selectively aggregating their updates in different global models can improve local performance with minimal overhead, as we demonstrate through experiments using realworld time series datasets and a basic LSTM model.
Federated Learning utgör en statistisk utmaning vid träning med starkt heterogen sekvensdata. Till exempel så uppvisar tidsseriedata inom telekomdomänen blandade variationer och mönster över längre tidsintervall. Dessa distinkta fördelningar utgör en utmaning när en nod inte bara ska bidra till skapandet av en global modell utan även ämnar applicera denna modell på sin lokala datamängd. Att i detta scenario införa en global modell som ska passa alla kan visa sig vara otillräckligt, även om vi använder oss av de mest framgångsrika modellerna inom maskininlärning för tidsserieprognoser, Long Short-Term Memory (LSTM) nätverk, vilka visat sig kunna fånga komplexa mönster och generalisera väl till nya mönster. I detta arbete visar vi att genom att klustra klienterna med hjälp av dessa mönster och selektivt aggregera deras uppdateringar i olika globala modeller kan vi uppnå förbättringar av den lokal prestandan med minimala kostnader, vilket vi demonstrerar genom experiment med riktigt tidsseriedata och en grundläggande LSTM-modell.
Style APA, Harvard, Vancouver, ISO itp.
42

Castelli, Filippo Maria. "3D CNN methods in biomedical image segmentation". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18796/.

Pełny tekst źródła
Streszczenie:
A definite trend in Biomedical Imaging is the one towards the integration of increasingly complex interpretative layers to the pure data acquisition process. One of the most interesting and looked-forward goals in the field is the automatic segmentation of objects of interest in extensive acquisition data, target that would allow Biomedical Imaging to look beyond its use as a purely assistive tool to become a cornerstone in ambitious large-scale challenges like the extensive quantitative study of the Human Brain. In 2019 Convolutional Neural Networks represent the state of the art in Biomedical Image segmentation and scientific interests from a variety of fields, spacing from automotive to natural resource exploration, converge to their development. While most of the applications of CNNs are focused on single-image segmentation, biomedical image data -being it MRI, CT-scans, Microscopy, etc- often benefits from three-dimensional volumetric expression. This work explores a reformulation of the CNN segmentation problem that is native to the 3D nature of the data, with particular interest to the applications to Fluorescence Microscopy volumetric data produced at the European Laboratories for Nonlinear Spectroscopy in the context of two different large international human brain study projects: the Human Brain Project and the White House BRAIN Initiative.
Style APA, Harvard, Vancouver, ISO itp.
43

Li, Edwin. "LSTM Neural Network Models for Market Movement Prediction". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231627.

Pełny tekst źródła
Streszczenie:
Interpreting time varying phenomena is a key challenge in the capital markets. Time series analysis using autoregressive methods has been carried out over the last couple of decades, often with reassuring results. However, such methods sometimes fail to explain trends and cyclical fluctuations, which may be characterized by long-range dependencies or even dependencies between the input features. The purpose of this thesis is to investigate whether recurrent neural networks with LSTM-cells can be used to capture these dependencies, and ultimately be used as a complement for index trading decisions. Experiments are made on different setups of the S&P-500 stock index, and two distinct models are built, each one being an improvement of the previous model. The first model is a multivariate regression model, and the second model is a multivariate binary classifier. The output of each model is used to reason about the future behavior of the index. The experiment shows for the configuration provided that LSTM RNNs are unsuitable for predicting exact values of daily returns, but gives satisfactory results when used to predict the direction of the movement.
Att förstå och kunna förutsäga hur index varierar med tiden och andra parametrar är ett viktigt problem inom kapitalmarknader. Tidsserieanalys med autoregressiva metoder har funnits sedan årtionden tillbaka, och har oftast gett goda resultat. Dessa metoder saknar dock möjligheten att förklara trender och cykliska variationer i tidsserien, något som kan karaktäriseras av tidsvarierande samband, men även samband mellan parametrar som indexet beror utav. Syftet med denna studie är att undersöka om recurrent neural networks (RNN) med long short-term memory-celler (LSTM) kan användas för att fånga dessa samband, för att slutligen användas som en modell för att komplettera indexhandel. Experimenten är gjorda mot en modifierad S&P-500 datamängd, och två distinkta modeller har tagits fram. Den ena är en multivariat regressionsmodell för att förutspå exakta värden, och den andra modellen är en multivariat klassifierare som förutspår riktningen på nästa dags indexrörelse. Experimenten visar för den konfiguration som presenteras i rapporten att LSTM RNN inte passar för att förutspå exakta värden för indexet, men ger tillfredsställande resultat när modellen ska förutsäga indexets framtida riktning.
Style APA, Harvard, Vancouver, ISO itp.
44

Mazhar, Osama. "Vision-based human gestures recognition for human-robot interaction". Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTS044.

Pełny tekst źródła
Streszczenie:
Dans la perspective des usines du futur, pour garantir une interaction productive, sure et efficace entre l’homme et le robot, il est impératif que le robot puisse interpréter l’information fournie par le collaborateur humain. Pour traiter cette problématique nous avons exploré des solutions basées sur l’apprentissage profond et avons développé un framework pour la détection de gestes humains. Le framework proposé permet une détection robuste des gestes statiques de la main et des gestes dynamiques de la partie supérieure du corps.Pour la détection des gestes statiques de la main, openpose est associé à la caméra Kinect V2 afin d’obtenir un pseudo-squelette humain en 3D. Avec la participation de 10 volontaires, nous avons constitué une base de données d’images, opensign, qui comprend les images RGB et de profondeur de la Kinect V2 correspondant à 10 gestes alphanumériques statiques de la main, issus de l’American Sign Language. Un réseau de neurones convolutifs de type « Inception V3 » est adapté et entrainé à détecter des gestes statiques de la main en temps réel.Ce framework de détection des gestes est ensuite étendu pour permettre la reconnaissance des gestes dynamiques. Nous avons proposé une stratégie de détection de gestes dynamiques basée sur un mécanisme d’attention spatiale. Celle-ci utilise un réseau profond de type « Convolutional Neural Network - Long Short-Term Memory » pour l’extraction des dépendances spatio-temporelles dans des séquences vidéo pur RGB. Les blocs de construction du réseau de neurones convolutifs sont pré-entrainés sur notre base de données opensign de gestes statiques de la main, ce qui permet une extraction efficace des caractéristiques de la main. Un module d’attention spatiale exploite la posture 2D de la partie supérieure du corps pour estimer, d’une part, la distance entre la personne et le capteur pour la normalisation de l’échelle et d’autre part, les paramètres des cadres délimitant les mains du sujet sans avoir recourt à un capteur de profondeur. Ainsi, le module d’attention spatiale se focalise sur les grands mouvements des membres supérieurs mais également sur les images des mains, afin de traiter les petits mouvements de la main et des doigts pour mieux distinguer les classes de gestes. Les informations extraites d’une caméra de profondeur sont acquises de la base de données opensign. Par conséquent, la stratégie proposée pour la reconnaissance des gestes peut être adoptée par tout système muni d’une caméra de profondeur.Ensuite, nous explorons brièvement les stratégies d’estimation de postures 3D à l’aide de caméras monoculaires. Nous proposons d’estimer les postures 3D chez l’homme par une approche hybride qui combine les avantages des estimateurs discriminants de postures 2D avec les approches utilisant des modèles génératifs. Notre stratégie optimise une fonction de coût en minimisant l’écart entre la position et l’échelle normalisée de la posture 2D obtenue à l’aide d’openpose, et la projection 2D virtuelle du modèle cinématique du sujet humain.Pour l’interaction homme-robot en temps réel, nous avons développé un système distribué asynchrone afin d’associer notre module de détection de gestes statiques à une librairie consacrée à l’interaction physique homme-robot OpenPHRI. Nous validons la performance de notre framework grâce à une expérimentation de type « apprentissage par démonstration » avec un bras robotique
In the light of factories of the future, to ensure productive, safe and effective interaction between robot and human coworkers, it is imperative that the robot extracts the essential information of the coworker. To address this, deep learning solutions are explored and a reliable human gesture detection framework is developed in this work. Our framework is able to robustly detect static hand gestures plus upper-body dynamic gestures.For static hand gestures detection, openpose is integrated with Kinect V2 to obtain a pseudo-3D human skeleton. With the help of 10 volunteers, we recorded an image dataset opensign, that contains Kinect V2 RGB and depth images of 10 alpha-numeric static hand gestures taken from the American Sign Language. "Inception V3" neural network is adapted and trained to detect static hand gestures in real-time.Subsequently, we extend our gesture detection framework to recognize upper-body dynamic gestures. A spatial attention based dynamic gestures detection strategy is proposed that employs multi-modal "Convolutional Neural Network - Long Short-Term Memory" deep network to extract spatio-temporal dependencies in pure RGB video sequences. The exploited convolutional neural network blocks are pre-trained on our static hand gestures dataset opensign, which allow efficient extraction of hand features. Our spatial attention module focuses on large-scale movements of upper limbs plus on hand images for subtle hand/fingers movements, to efficiently distinguish gestures classes.This module additionally exploits 2D upper-body pose to estimate distance of user from the sensor for scale-normalization plus determine the parameters of hands bounding boxes without a need of depth sensor. The information typically extracted from a depth camera in similar strategies is learned from opensign dataset. Thus the proposed gestures recognition strategy can be implemented on any system with a monocular camera.Afterwards, we briefly explore 3D human pose estimation strategies for monocular cameras. To estimate 3D human pose, a hybrid strategy is proposed which combines the merits of discriminative 2D pose estimators with that of model based generative approaches. Our method optimizes an objective function, that minimizes the discrepancy between position & scale-normalized 2D pose obtained from openpose, and a virtual 2D projection of a kinematic human model.For real-time human-robot interaction, an asynchronous distributed system is developed to integrate our static hand gestures detector module with an open-source physical human-robot interaction library OpenPHRI. We validate performance of the proposed framework through a teach by demonstration experiment with a robotic manipulator
Style APA, Harvard, Vancouver, ISO itp.
45

Shaif, Ayad. "Predictive Maintenance in Smart Agriculture Using Machine Learning : A Novel Algorithm for Drift Fault Detection in Hydroponic Sensors". Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-42270.

Pełny tekst źródła
Streszczenie:
The success of Internet of Things solutions allowed the establishment of new applications such as smart hydroponic agriculture. One typical problem in such an application is the rapid degradation of the deployed sensors. Traditionally, this problem is resolved by frequent manual maintenance, which is considered to be ineffective and may harm the crops in the long run. The main purpose of this thesis was to propose a machine learning approach for automating the detection of sensor fault drifts. In addition, the solution’s operability was investigated in a cloud computing environment in terms of the response time. This thesis proposes a detection algorithm that utilizes RNN in predicting sensor drifts from time-series data streams. The detection algorithm was later named; Predictive Sliding Detection Window (PSDW) and consisted of both forecasting and classification models. Three different RNN algorithms, i.e., LSTM, CNN-LSTM, and GRU, were designed to predict sensor drifts using forecasting and classification techniques. The algorithms were compared against each other in terms of relevant accuracy metrics for forecasting and classification. The operability of the solution was investigated by developing a web server that hosted the PSDW algorithm on an AWS computing instance. The resulting forecasting and classification algorithms were able to make reasonably accurate predictions for this particular scenario. More specifically, the forecasting algorithms acquired relatively low RMSE values as ~0.6, while the classification algorithms obtained an average F1-score and accuracy of ~80% but with a high standard deviation. However, the response time was ~5700% slower during the simulation of the HTTP requests. The obtained results suggest the need for future investigations to improve the accuracy of the models and experiment with other computing paradigms for more reliable deployments.
Style APA, Harvard, Vancouver, ISO itp.
46

Martell, Patrick Keith. "Hierarchical Auto-Associative Polynomial Convolutional Neural Networks". University of Dayton / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1513164029518038.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
47

Andréasson, David, i Blomquist Jesper Mortensen. "Forecasting the OMXS30 - a comparison between ARIMA and LSTM". Thesis, Uppsala universitet, Statistiska institutionen, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-413793.

Pełny tekst źródła
Streszczenie:
Machine learning is a rapidly growing field with more and more applications being proposed every year, including but not limited to the financial sector. In this thesis, historical adjusted closing prices from the OMXS30 index are used to forecast the corresponding future values using two different approaches; one using an ARIMA model and the other using an LSTM neural network. The forecasts are made on three different time intervals: 90, 30 and 7 days ahead. The results showed that the LSTM model performs slightly better when forecasting 90 and 30 days ahead, whereas the ARIMA model has comparable accuracy on the seven day forecast.
Style APA, Harvard, Vancouver, ISO itp.
48

Roxbo, Daniel. "A Detailed Analysis of Semantic Dependency Parsing with Deep Neural Networks". Thesis, Linköpings universitet, Interaktiva och kognitiva system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-156831.

Pełny tekst źródła
Streszczenie:
The use of Long Short Term Memory (LSTM) networks continues to yield better results in natural language processing tasks. One area which recently has seen significant improvements is semantic dependency parsing, where the current state-of-the-art model uses a multilayer LSTM combined with an attention-based scoring function to predict the dependencies. In this thesis the state of the art model is first replicated and then extended to include features based on syntactical trees, which was found to be useful in a similar model. In addition, the effect of part-of-speech tags is studied. The replicated model achieves a labeled F1 score of 93.6 on the in-domain data and 89.2 on the out-of-domain data on the DM dataset, which shows that the model is indeed replicable. Using multiple features extracted from syntactic gold standard trees of the DELPH-IN Derivation Tree (DT) type increased the labeled scores to 97.1 and 94.1 respectively, while the use of predicted trees of the Stanford Basic (SB) type did not improve the results at all. The usefulness of part-of-speech tags was found to be diminished in the presence of other features.
Style APA, Harvard, Vancouver, ISO itp.
49

El, Ahmar Wassim. "Head and Shoulder Detection using CNN and RGBD Data". Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39448.

Pełny tekst źródła
Streszczenie:
Alex Krizhevsky and his colleagues changed the world of machine vision and image processing in 2012 when their deep learning model, named Alexnet, won the Im- ageNet Large Scale Visual Recognition Challenge with more than 10.8% lower error rate than their closest competitor. Ever since, deep learning approaches have been an area of extensive research for the tasks of object detection, classification, pose esti- mation, etc...This thesis presents a comprehensive analysis of different deep learning models and architectures that have delivered state of the art performances in various machine vision tasks. These models are compared to each other and their strengths and weaknesses are highlighted. We introduce a new approach for human head and shoulder detection from RGB- D data based on a combination of image processing and deep learning approaches. Candidate head-top locations(CHL) are generated from a fast and accurate image processing algorithm that operates on depth data. We propose enhancements to the CHL algorithm making it three times faster. Different deep learning models are then evaluated for the tasks of classification and detection on the candidate head-top loca- tions to regress the head bounding boxes and detect shoulder keypoints. We propose 3 different small models based on convolutional neural networks for this problem. Experimental results for different architectures of our model are highlighted. We also compare the performance of our model to mobilenet. Finally, we show the differences between using 3 types of inputs CNN models: RGB images, a 3-channel representation generated from depth data (Depth map, Multi-order depth template, and Height difference map or DMH), and a 4 channel input composed of RGB+D data.
Style APA, Harvard, Vancouver, ISO itp.
50

Ahlin, Björn, i Marcus Gärdin. "Automated Classification of Steel Samples : An investigation using Convolutional Neural Networks". Thesis, KTH, Materialvetenskap, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-209669.

Pełny tekst źródła
Streszczenie:
Automated image recognition software has earlier been used for various analyses in the steel making industry. In this study, the possibility to apply such software to classify Scanning Electron Microscope (SEM) images of two steel samples was investigated. The two steel samples were of the same steel grade but with the difference that they had been treated with calcium for a different length of time.  To enable automated image recognition, a Convolutional Neural Network (CNN) was built. The construction of the software was performed with open source code provided by Keras Documentation, thus ensuring an easily reproducible program. The network was trained, validated and tested, first for non-binarized images and then with binarized images. Binarized images were used to ensure that the network's prediction only considers the inclusion information and not the substrate. The non-binarized images gave a classification accuracy of 99.99 %. For the binarized images, the classification accuracy obtained was 67.9%.  The results show that it is possible to classify steel samples using CNNs. One interesting aspect of the success in classifying steel samples is that further studies on CNNs could enable automated classification of inclusions.
Automatiserad bildigenkänning har tidigare använts inom ståltillverkning för olika sorters analyser. Den här studiens syfte är att undersöka om bildigenkänningsprogram applicerat på Svepelektronmikroskopi (SEM) bilder kan klassificera två stålprover. Stålproven var av samma sort, med skillnaden att de behandlats med kalcium olika lång tid. För att möjliggöra den automatiserade bildigenkänningen byggdes ett Convolutional Neural Network (CNN). Nätverket byggdes med hjälp av öppen kod från Keras Documentation. Detta för att programmet enkelt skall kunna reproduceras. Nätverket tränades, validerades och testades, först för vanliga bilder och sedan för binariserade bilder. Binariserade bilder användes för att tvinga programmet att bara klassificera med avseende på inneslutningar och inte på grundmatrisen. Resultaten på klassificeringen för vanliga bilder gav en träffsäkerhet på 99.99%. För binariserade bilder blev träffsäkerheten för klassificeringen 67.9%. Resultaten visar att det är möjligt att använda CNNs för att klassificera stålprover. En intressant möjlighet som vidare studier på CNNs kan leda till är att automatisk klassificering av inneslutningar kan möjliggöras.
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii