Dissertations / Theses: 'Convolutional recurrent neural networks'

1

Ayoub, Issa. "Multimodal Affective Computing Using Temporal Convolutional Neural Network and Deep Convolutional Neural Networks." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39337.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Affective computing has gained significant attention from researchers in the last decade due to the wide variety of applications that can benefit from this technology. Often, researchers describe affect using emotional dimensions such as arousal and valence. Valence refers to the spectrum of negative to positive emotions while arousal determines the level of excitement. Describing emotions through continuous dimensions (e.g. valence and arousal) allows us to encode subtle and complex affects as opposed to discrete emotions, such as the basic six emotions: happy, anger, fear, disgust, sad and neutral. Recognizing spontaneous and subtle emotions remains a challenging problem for computers. In our work, we employ two modalities of information: video and audio. Hence, we extract visual and audio features using deep neural network models. Given that emotions are time-dependent, we apply the Temporal Convolutional Neural Network (TCN) to model the variations in emotions. Additionally, we investigate an alternative model that combines a Convolutional Neural Network (CNN) and a Recurrent Neural Network (RNN). Given our inability to fit the latter deep model into the main memory, we divide the RNN into smaller segments and propose a scheme to back-propagate gradients across all segments. We configure the hyperparameters of all models using Gaussian processes to obtain a fair comparison between the proposed models. Our results show that TCN outperforms RNN for the recognition of the arousal and valence emotional dimensions. Therefore, we propose the adoption of TCN for emotion detection problems as a baseline method for future work. Our experimental results show that TCN outperforms all RNN based models yielding a concordance correlation coefficient of 0.7895 (vs. 0.7544) on valence and 0.8207 (vs. 0.7357) on arousal on the validation dataset of SEWA dataset for emotion prediction.

2

Silfa, Franyell. "Energy-efficient architectures for recurrent neural networks." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Deep Learning algorithms have been remarkably successful in applications such as Automatic Speech Recognition and Machine Translation. Thus, these kinds of applications are ubiquitous in our lives and are found in a plethora of devices. These algorithms are composed of Deep Neural Networks (DNNs), such as Convolutional Neural Networks and Recurrent Neural Networks (RNNs), which have a large number of parameters and require a large amount of computations. Hence, the evaluation of DNNs is challenging due to their large memory and power requirements. RNNs are employed to solve sequence to sequence problems such as Machine Translation. They contain data dependencies among the executions of time-steps hence the amount of parallelism is severely limited. Thus, evaluating them in an energy-efficient manner is more challenging than evaluating other DNN algorithms. This thesis studies applications using RNNs to improve their energy efficiency on specialized architectures. Specifically, we propose novel energy-saving techniques and highly efficient architectures tailored to the evaluation of RNNs. We focus on the most successful RNN topologies which are the Long Short Term memory and the Gated Recurrent Unit. First, we characterize a set of RNNs running on a modern SoC. We identify that accessing the memory to fetch the model weights is the main source of energy consumption. Thus, we propose E-PUR: an energy-efficient processing unit for RNN inference. E-PUR achieves 6.8x speedup and improves energy consumption by 88x compared to the SoC. These benefits are obtained by improving the temporal locality of the model weights. In E-PUR, fetching the parameters is the main source of energy consumption. Thus, we strive to reduce memory accesses and propose a scheme to reuse previous computations. Our observation is that when evaluating the input sequences of an RNN model, the output of a given neuron tends to change lightly between consecutive evaluations.Thus, we develop a scheme that caches the neurons' outputs and reuses them whenever it detects that the change between the current and previously computed output value for a given neuron is small avoiding to fetch the weights. In order to decide when to reuse a previous value we employ a Binary Neural Network (BNN) as a predictor of reusability. The low-cost BNN can be employed in this context since its output is highly correlated to the output of RNNs. We show that our proposal avoids more than 24.2% of computations. Hence, on average, energy consumption is reduced by 18.5% for a speedup of 1.35x. RNN models’ memory footprint is usually reduced by using low precision for evaluation and storage. In this case, the minimum precision used is identified offline and it is set such that the model maintains its accuracy. This method utilizes the same precision to compute all time-steps.Yet, we observe that some time-steps can be evaluated with a lower precision while preserving the accuracy. Thus, we propose a technique that dynamically selects the precision used to compute each time-step. A challenge of our proposal is choosing a lower bit-width. We address this issue by recognizing that information from a previous evaluation can be employed to determine the precision required in the current time-step. Our scheme evaluates 57% of the computations on a bit-width lower than the fixed precision employed by static methods. We implement it on E-PUR and it provides 1.46x speedup and 19.2% energy savings on average.
Los algoritmos de aprendizaje profundo han tenido un éxito notable en aplicaciones como el reconocimiento automático de voz y la traducción automática. Por ende, estas aplicaciones son omnipresentes en nuestras vidas y se encuentran en una gran cantidad de dispositivos. Estos algoritmos se componen de Redes Neuronales Profundas (DNN), tales como las Redes Neuronales Convolucionales y Redes Neuronales Recurrentes (RNN), las cuales tienen un gran número de parámetros y cálculos. Por esto implementar DNNs en dispositivos móviles y servidores es un reto debido a los requisitos de memoria y energía. Las RNN se usan para resolver problemas de secuencia a secuencia tales como traducción automática. Estas contienen dependencias de datos entre las ejecuciones de cada time-step, por ello la cantidad de paralelismo es limitado. Por eso la evaluación de RNNs de forma energéticamente eficiente es un reto. En esta tesis se estudian RNNs para mejorar su eficiencia energética en arquitecturas especializadas. Para esto, proponemos técnicas de ahorro energético y arquitecturas de alta eficiencia adaptadas a la evaluación de RNN. Primero, caracterizamos un conjunto de RNN ejecutándose en un SoC. Luego identificamos que acceder a la memoria para leer los pesos es la mayor fuente de consumo energético el cual llega hasta un 80%. Por ende, creamos E-PUR: una unidad de procesamiento para RNN. E-PUR logra una aceleración de 6.8x y mejora el consumo energético en 88x en comparación con el SoC. Esas mejoras se deben a la maximización de la ubicación temporal de los pesos. En E-PUR, la lectura de los pesos representa el mayor consumo energético. Por ende, nos enfocamos en reducir los accesos a la memoria y creamos un esquema que reutiliza resultados calculados previamente. La observación es que al evaluar las secuencias de entrada de un RNN, la salida de una neurona dada tiende a cambiar ligeramente entre evaluaciones consecutivas, por lo que ideamos un esquema que almacena en caché las salidas de las neuronas y las reutiliza cada vez que detecta un cambio pequeño entre el valor de salida actual y el valor previo, lo que evita leer los pesos. Para decidir cuándo usar un cálculo anterior utilizamos una Red Neuronal Binaria (BNN) como predictor de reutilización, dado que su salida está altamente correlacionada con la salida de la RNN. Esta propuesta evita más del 24.2% de los cálculos y reduce el consumo energético promedio en 18.5%. El tamaño de la memoria de los modelos RNN suele reducirse utilizando baja precisión para la evaluación y el almacenamiento de los pesos. En este caso, la precisión mínima utilizada se identifica de forma estática y se establece de manera que la RNN mantenga su exactitud. Normalmente, este método utiliza la misma precisión para todo los cálculos. Sin embargo, observamos que algunos cálculos se pueden evaluar con una precisión menor sin afectar la exactitud. Por eso, ideamos una técnica que selecciona dinámicamente la precisión utilizada para calcular cada time-step. Un reto de esta propuesta es como elegir una precisión menor. Abordamos este problema reconociendo que el resultado de una evaluación previa se puede emplear para determinar la precisión requerida en el time-step actual. Nuestro esquema evalúa el 57% de los cálculos con una precisión menor que la precisión fija empleada por los métodos estáticos. Por último, la evaluación en E-PUR muestra una aceleración de 1.46x con un ahorro de energía promedio de 19.2%

3

Oyharcabal, Astorga Nicolás. "Convolutional recurrent neural networks for remaining useful life prediction in mechanical systems." Tesis, Universidad de Chile, 2018. http://repositorio.uchile.cl/handle/2250/168514.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Memoria para optar al título de Ingeniero Civil Mecánico
La determinación de la vida útil remanente (RUL del inglés "Remaining Useful Life") de una máquina, equipo, dispositivo o elemento mecánico, es algo en lo que se ha estado trabajando en los últimos años y que es crucial para el futuro de cualquier industria que así lo requiera. El continuo monitoreo de máquinas junto a una buena predicción de la RUL permite la minimización de costos de mantención y menor exposición a fallas. Sin embargo, los datos obtenidos del monitoreo son variados, tienen ruido, poseen un carácter secuencial y no siempre guardan estricta relación con la RUL, por lo que su estimación es un problema difícil. Es por ello que en la actualidad se utilizan distintas clases de Redes Neuronales y en particular, cuando se quiere modelar problemas de carácter secuencial, se utilizan las Redes Neuronales Recurrentes o RNN (del inglés "Recurrent Neural Network") como LSTM (del inglés "Long Short Term Memory") o JANET (del inglés "Just Another NETwork"), por su capacidad para identificar de forma autónoma patrones en secuencias temporales, pero también junto a estas últimas redes, también se utilizan alternativas que incorporan la Convolución como operación para cada célula de las RNN y que se conocen como ConvRNN (del inglés "Convolutional Recurrent Neural Network"). Estas últimas redes son mejores que sus pares convolucional y recurrentes en ciertos casos que requieren procesar secuencias de imágenes, y en el caso particular de este trabajo, series de tiempo de datos de monitoreo que son suavizados por la Convolución y procesados por la Recurrencia. El objetivo general de este trabajo es determinar la mejor opción de ConvRNN para la determinación de la RUL de un turbofan a partir de series de tiempo de la base de datos C-MAPSS. También se estudia cómo editar la base de datos para mejorar la precisión de una ConvRNN y la aplicación de la Convolución como una operación primaria en una serie de tiempo cuyos parámetros muestran el comportamiento de un turbofan. Para ello se programa una LSTM Convolucional, LSTM Convolucional Codificador-Decodificador, JANET Convolucional y JANET Convolucional Codificador-Decodificador. A partir de esto se encuentra que el modelo JANET Convolucional Codificador-Decodificador da los mejores resultados en cuanto a exactitud promedio y cantidad de parámetros necesarios (entre menos mejor pues se necesita menos memoria) para la red, siendo además capaz de asimilar la totalidad de las bases de datos C-MAPSS. Por otro lado, también se encuentra que la RUL de la base de datos puede ser modificada para datos antes de la falla. Para la programación y puesta en marcha de las diferentes redes, se utilizan los computadores del laboratorio de Integración de Confiabilidad y Mantenimiento Inteligente (ICMI) del Departamento de Ingeniería Mecánica de la Universidad de Chile.

4

Ljubenkov, Davor. "Optimizing Bike Sharing System Flows using Graph Mining, Convolutional and Recurrent Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-257783.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

A Bicycle-sharing system (BSS) is a popular service scheme deployed in cities of different sizes around the world. Although docked bike systems are its most popular model used, it still experiences a number of weaknesses that could be optimized by investigating bike sharing network properties and evolution of obtained patterns.Efficiently keeping bicycle-sharing system as balanced as possible is the main problem and thus, predicting or minimizing the manual transportation of bikes across the city is the prime objective in order to save logistic costs for operating companies.The purpose of this thesis is two-fold; Firstly, it is to visualize bike flow using data exploration methods and statistical analysis to better understand mobility characteristics with respect to distance, duration, time of the day, spatial distribution, weather circumstances, and other attributes. Secondly, by obtaining flow visualizations, it is possible to focus on specific directed sub-graphs containing only those pairs of stations whose mutual flow difference is the most asymmetric. By doing so, we are able to use graph mining and machine learning techniques on these unbalanced stations.Identification of spatial structures and their structural change can be captured using Convolutional neural network (CNN) that takes adjacency matrix snapshots of unbalanced sub-graphs. A generated structure from the previous method is then used in the Long short-term memory artificial recurrent neural network (RNN LSTM) in order to find and predict its dynamic patterns.As a result, we are predicting bike flows for each node in the possible future sub-graph configuration, which in turn informs bicycle-sharing system owners in advance to plan accordingly. This combination of methods notifies them which prospective areas they should focus on more and how many bike relocation phases are to be expected. Methods are evaluated using Cross validation (CV), Root mean square error (RMSE) and Mean average error (MAE) metrics. Benefits are identified both for urban city planning and for bike sharing companies by saving time and minimizing their cost.
Lånecykel avser ett system för uthyrning eller utlåning av cyklar. Systemet används främst i större städer och bekostas huvudsakligen genom tecknande av ett abonnemang.Effektivt hålla cykel andelssystem som balanseras som möjligt huvud problemand därmed förutsäga eller minimera manuell transport av cyklar över staden isthe främsta mål för att spara logistikkostnaderna för drift companies.Syftet med denna avhandling är tvåfaldigt.För det första är det att visualisera cykelflödet med hjälp av datautforskningsmetoder och statistisk analys för att bättre förstå rörlighetskarakteristika med avseende på avstånd, varaktighet, tid på dagen, rumsfördelning, väderförhållanden och andra attribut.För det andra är det vid möjliga flödesvisualiseringar möjligt att fokusera på specifika riktade grafer som endast innehåller de par eller stationer vars ömsesidiga flödesskillnad är den mest asymmetriska.Genom att göra det kan vi anvnda grafmining och maskininlärningsteknik på dessa obalanserade stationer, och använda konjunktionsnurala nätverk (CNN) som tar adjacency matrix snapshots eller obalanserade subgrafer.En genererad struktur från den tidigare metoden används i det långa kortvariga minnet artificiella återkommande neurala nätverket (RNN LSTM) för att hitta och förutsäga dess dynamiska mönster.Som ett resultat förutsäger vi cykelflden för varje nod i den eventuella framtida underkonfigurationen, vilket i sin tur informerar cykeldelningsägare om att planera i enlighet med detta.Denna kombination av metoder meddelar dem vilka framtida områden som bör inriktas på mer och hur många cykelflyttningsfaser som kan förväntas.Metoder utvärderas med hjälp av cross validation (CV), Root mean square error (RMSE) och Mean average error (MAE) metrics.Fördelar identifieras både för stadsplanering och för cykeldelningsföretag genom att spara tid och minimera kostnaderna.

5

Tan, Ke. "Convolutional and recurrent neural networks for real-time speech separation in the complex domain." The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1626983471600193.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Daliparthi, Venkata Satya Sai Ajay. "Semantic Segmentation of Urban Scene Images Using Recurrent Neural Networks." Thesis, Blekinge Tekniska Högskola, Institutionen för datavetenskap, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20651.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Background: In Autonomous Driving Vehicles, the vehicle receives pixel-wise sensor data from RGB cameras, point-wise depth information from the cameras, and sensors data as input. The computer present inside the Autonomous Driving vehicle processes the input data and provides the desired output, such as steering angle, torque, and brake. To make an accurate decision by the vehicle, the computer inside the vehicle should be completely aware of its surroundings and understand each pixel in the driving scene. Semantic Segmentation is the task of assigning a class label (Such as Car, Road, Pedestrian, or Sky) to each pixel in the given image. So, a better performing Semantic Segmentation algorithm will contribute to the advancement of the Autonomous Driving field. Research Gap: Traditional methods, such as handcrafted features and feature extraction methods, were mainly used to solve Semantic Segmentation. Since the rise of deep learning, most of the works are using deep learning to dealing with Semantic Segmentation. The most commonly used neural network architecture to deal with Semantic Segmentation was the Convolutional Neural Network (CNN). Even though some works made use of Recurrent Neural Network (RNN), the effect of RNN in dealing with Semantic Segmentation was not yet thoroughly studied. Our study addresses this research gap. Idea: After going through the existing literature, we came up with the idea of “Using RNNs as an add-on module, to augment the skip-connections in Semantic Segmentation Networks through residual connections.” Objectives and Method: The main objective of our work is to improve the Semantic Segmentation network’s performance by using RNNs. The Experiment was chosen as a methodology to conduct our study. In our work, We proposed three novel architectures called UR-Net, UAR-Net, and DLR-Net by implementing our idea to the existing networks U-Net, Attention U-Net, and DeepLabV3+ respectively. Results and Findings: We empirically showed that our proposed architectures have shown improvement in efficiently segmenting the edges and boundaries. Through our study, we found that there is a trade-off between using RNNs and Inference time of the model. Suppose we use RNNs to improve the performance of Semantic Segmentation Networks. In that case, we need to trade off some extra seconds during the inference of the model. Conclusion: Our findings will not contribute to the Autonomous driving field, where we need better performance in real-time. But, our findings will contribute to the advancement of Bio-medical Image segmentation, where doctors can trade-off those extra seconds during inference for better performance.

7

Hanson, Jack. "Protein Structure Prediction by Recurrent and Convolutional Deep Neural Network Architectures." Thesis, Griffith University, 2018. http://hdl.handle.net/10072/382722.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this thesis, the application of convolutional and recurrent machine learning techniques to several key structural properties of proteins is explored. Chapter 2 presents the rst application of an LSTM-BRNN in structural bioinformat- ics. The method, called SPOT-Disorder, predicts the per-residue probability of a protein being intrinsically disordered (ie. unstructured, or exible). Using this methodology, SPOT-Disorder achieved the highest accuracy in the literature without separating short and long disordered regions during training as was required in previous models, and was additionally proven capable of indirectly discerning functional sites located in disordered regions. Chapter 3 extends the application of an LSTM-BRNN to a two-dimensional problem in the prediction of protein contact maps. Protein contact maps describe the intra-sequence distance between each residue pairing at a distance cuto , providing key restraints towards the possible conformations of a protein. This work, entitled SPOT-Contact, introduced the coupling of two-dimensional LSTM-BRNNs with ResNets to maximise dependency propagation in order to achieve the highest reported accuracies for contact map preci- sion. Several models of varying architectures were trained and combined as an ensemble predictor in order to minimise incorrect generalisations. Chapter 4 discusses the utilisation of an ensemble of LSTM-BRNNs and ResNets to predict local protein one-dimensional structural properties. The method, called SPOT-1D, predicts for a wide range of local structural descriptors, including several solvent exposure metrics, secondary structure, and real-valued backbone angles. SPOT-1D was signi cantly improved by the inclusion of the outputs of SPOT-Contact in the input features. Using this topology led to the best reported accuracy metrics for all predicted properties. The protein structures constructed by the backbone angles predicted by SPOT-1D achieved the lowest average error from their native structures in the literature. Chapter 5 presents an update on SPOT-Disorder, as it employs the inputs from SPOT- 1D in conjunction with an ensemble of LSTM-BRNN's and Inception Residual Squeeze and Excitation networks to predict for protein intrinsic disorder. This model con rmed the enhancement provided by utilising the coupled architectures over the LSTM-BRNN solely, whilst also introducing a new convolutional format to the bioinformatics eld. The work in Chapter 6 utilises the same topology from SPOT-1D for single-sequence prediction of protein intrinsic disorder in SPOT-Disorder-Single. Single-sequence predic- tion describes the prediction of a protein's properties without the use of evolutionary information. While evolutionary information generally improves the performance of a computational model, it comes at the expense of a greatly increased computational and time load. Removing this from the model allows for genome-scale protein analysis at a minor drop in accuracy. However, models trained without evolutionary profi les can be more accurate for proteins with limited and therefore unreliable evolutionary information.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text

8

Holm, Noah, and Emil Plynning. "Spatio-temporal prediction of residential burglaries using convolutional LSTM neural networks." Thesis, KTH, Geoinformatik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229952.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The low amount solved residential burglary crimes calls for new and innovative methods in the prevention and investigation of the cases. There were 22 600 reported residential burglaries in Sweden 2017 but only four to five percent of these will ever be solved. There are many initiatives in both Sweden and abroad for decreasing the amount of occurring residential burglaries and one of the areas that are being tested is the use of prediction methods for more efficient preventive actions. This thesis is an investigation of a potential method of prediction by using neural networks to identify areas that have a higher risk of burglaries on a daily basis. The model use reported burglaries to learn patterns in both space and time. The rationale for the existence of patterns is based on near repeat theories in criminology which states that after a burglary both the burgled victim and an area around that victim has an increased risk of additional burglaries. The work has been conducted in cooperation with the Swedish Police authority. The machine learning is implemented with convolutional long short-term memory (LSTM) neural networks with max pooling in three dimensions that learn from ten years of residential burglary data (2007-2016) in a study area in Stockholm, Sweden. The model's accuracy is measured by performing predictions of burglaries during 2017 on a daily basis. It classifies cells in a 36x36 grid with 600 meter square grid cells as areas with elevated risk or not. By classifying 4% of all grid cells during the year as risk areas, 43% of all burglaries are correctly predicted. The performance of the model could potentially be improved by further configuration of the parameters of the neural network, along with a use of more data with factors that are correlated to burglaries, for instance weather. Consequently, further work in these areas could increase the accuracy. The conclusion is that neural networks or machine learning in general could be a powerful and innovative tool for the Swedish Police authority to predict and moreover prevent certain crime. This thesis serves as a first prototype of how such a system could be implemented and used.

9

Fu, Xinyu. "Context-aware sentence categorisation : word mover's distance and character-level convolutional recurrent neural network." Thesis, University of Nottingham, 2018. http://eprints.nottingham.ac.uk/52054/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Supervised k nearest neighbour and unsupervised hierarchical agglomerative clustering algorithm can be enhanced through word mover’s distance-based sentence distance metric to offer superior context-aware sentence categorisation performance. Advanced neural network-oriented classifier is able to achieve competing result on the benchmark streams via an aggregated recurrent unit incorporated with sophis- ticated convolving layer. The continually increasing number of textual snippets produced each year ne- cessitates ever improving information processing methods for searching, retrieving, and organising text. Central to these information processing methods are sentence classification and clustering, which have become an important application for nat- ural language processing and information retrieval. This present work proposes three novel sentence categorisation frameworks, namely hierarchical agglomerative clustering-word mover’s distance, k nearest neighbour-word mover’s distance, and convolutional recurrent neural network. Hierarchical agglomerative clustering-word mover’s distance employs word mover’s distance distortion function to effectively cluster unlabelled sentences into nearby centroid. K nearest neighbour-word mover’s distance classifies testing textual snippets through word mover’s distance-based sen- tence similarity. Both models are from the spectrum of count-based framework since they apply term frequency statistics when building the vector space matrix. Experimental evaluation on the two unsupervised learning data-sets show better per- formance of hierarchical agglomerative clustering-word mover’s distance over other competitors on mean squared error, completeness score, homogeneity score, and v-measure value. For k nearest neighbour-word mover’s distance, two benchmark textual streams are experimented to verify its superior classification performance against comparison algorithms on precision rate, recall ratio, and F1 score. Per- formance comparison is statistically validated via Mann-Whitney-U test. Through extensive experiments and results analysis, each research hypothesis is successfully verified to be yes. Unlike traditional singleton neural network, convolutional recurrent neural net- work model incorporates character-level convolutional network with character-aware recurrent neural network to form a combined framework. The proposed model ben- efits from character-aware convolutional neural network in that only salient features are selected and fed into the integrated character-aware recurrent neural network. Character-aware recurrent neural network effectively learns long sequence semantics via sophisticated update mechanism. The experiment presented in current thesis compares convolutional recurrent neural network framework against the state-of- the-art text classification algorithms on four popular benchmarking corpus. The present work also analyses three different recurrent neural network hidden recurrent cells’ impact on performance and their runtime efficiency. It is observed that min- imal gated unit achieves the optimal runtime and comparable performance against gated recurrent unit and long short-term memory. For term frequency-inverse docu- ment frequency-based algorithms, the current experiment examines word2vec, global vectors for word representation, and sent2vec embeddings and reports their perfor- mance differences. Performance comparison is statistically validated through Mann- Whitney-U test and the corresponding hypotheses are tested to be yes through the reported statistical analysis.

10

Kvedaraite, Indre. "Sentiment Analysis of YouTube Public Videos based on their Comments." Thesis, Linnéuniversitetet, Institutionen för datavetenskap och medieteknik (DM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-105754.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

With the rise of social media and publicly available data, opinion mining is more accessible than ever. It is valuable for content creators, companies and advertisers to gain insights into what users think and feel. This work examines comments on YouTube videos, and builds a deep learning classifier to automatically determine their sentiment. Four Long Short-Term Memory-based models are trained and evaluated. Experiments are performed to determine which deep learning model performs with the best accuracy, recall, precision, F1 score and ROC curve on a labelled YouTube Comment dataset. The results indicate that a BiLSTM-based model has the overall best performance, with the accuracy of 89%. Furthermore, the four LSTM-based models are evaluated on an IMDB movie review dataset, achieving an average accuracy of 87%, showing that the models can predict the sentiment of different textual data. Finally, a statistical analysis is performed on the YouTube videos, revealing that videos with positive sentiment have a statistically higher number of upvotes and views. However, the number of downvotes is not significantly higher in videos with negative sentiment.

11

Guan, Xiao. "Deterministic and Flexible Parallel Latent Feature Models Learning Framework for Probabilistic Knowledge Graph." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-35788.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Knowledge Graph is a rising topic in the field of Artificial Intelligence. As the current trend of knowledge representation, Knowledge graph research is utilizing the large knowledge base freely available on the internet. Knowledge graph also allows inspection, analysis, the reasoning of all knowledge in reality. To enable the ambitious idea of modeling the knowledge of the world, different theory and implementation emerges. Nowadays, we have the opportunity to use freely available information from Wikipedia and Wikidata. The thesis investigates and formulates a theory about learning from Knowledge Graph. The thesis researches probabilistic knowledge graph. It only focuses on a branch called latent feature models in learning probabilistic knowledge graph. These models aim to predict possible relationships of connected entities and relations. There are many models for such a task. The metrics and training process is detailed described and improved in the thesis work. The efficiency and correctness enable us to build a more complex model with confidence. The thesis also covers possible problems in finding and proposes future work.

12

Wang, Xutao. "Chinese Text Classification Based On Deep Learning." Thesis, Mittuniversitetet, Avdelningen för informationssystem och -teknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-35322.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Text classification has always been a concern in area of natural language processing, especially nowadays the data are getting massive due to the development of internet. Recurrent neural network (RNN) is one of the most popular method for natural language processing due to its recurrent architecture which give it ability to process serialized information. In the meanwhile, Convolutional neural network (CNN) has shown its ability to extract features from visual imagery. This paper combine the advantages of RNN and CNN and proposed a model called BLSTM-C for Chinese text classification. BLSTM-C begins with a Bidirectional long short-term memory (BLSTM) layer which is an special kind of RNN to get a sequence output based on the past context and the future context. Then it feed this sequence to CNN layer which is utilized to extract features from the previous sequence. We evaluate BLSTM-C model on several tasks such as sentiment classification and category classification and the result shows our model’s remarkable performance on these text tasks.

13

Hijazi, Issa, and Pontus Pettersson. "Animal ID Tag Recognition with Convolutional and Recurrent Neural Network : Identifying digits from a number sequence with RCNN." Thesis, Högskolan i Skövde, Institutionen för informationsteknologi, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-17031.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Major advances in machine learning have made image recognition applications, with Artificial Neural Network, blossom over the recent years. The aim of this thesis was to find a solution to recognize digits from a number sequence on an ID tag, used to identify farm animals, with the help of image recognition. A Recurrent Convolutional Neural Network solution called PPNet was proposed and tested on a data set called Animal Identification Tags. A transfer learning method was also used to test if it could help PPNet generalize and better recognize digits. PPNet was then compared against Microsoft Azures own image recognition API, to determine how PPNet compares to a general solution. PPNet, while not performing as good, still managed to achieve competitive results to the Azure API.

14

Chancan, Leon Marvin Aldo. "The role of motion-and-visual perception in robot place learning and navigation." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/229769/8/Marvin%20Aldo_Chancan%20Leon_Thesis.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis was a step forward in developing new robot learning-based localisation and navigation systems using real world data and simulation environments. Three new methods were proposed to provide new insights on the role of joint motion-and-vision-based end-to-end robot learning in both place recognition and navigation tasks, within modern reinforcement learning and deep learning frameworks. Inspired by biological neural circuits underlying these complex tasks in insect and rat mammalian brains, these methods were shown to be orders of magnitude faster than classical techniques, while setting new state-of-the-art performance standards in terms of accuracy, throughput and latency.

15

Teuer, Lukáš. "Komprese obrazu pomocí neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385964.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This document describes image compression using different types of neural networks. Features of neural networks like convolutional and recurrent networks are also discussed here. The document contains detailed description of various neural network architectures and their inner workings. In addition, experiments are carried out on various neural network structures and parameters in order to find the most appropriate properties for image compression. Also, there are proposed new concepts for image compression using neural networks that are also immediately tested. Finally, a network of the best concepts and parts discovered during experimentation is designed.

16

JOB, MIRKO. "Development of a real-time classifier for the identification of the Sit-To-Stand motion pattern." Doctoral thesis, Università degli studi di Genova, 2021. http://hdl.handle.net/11567/1049508.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The Sit-to-Stand (STS) movement has significant importance in clinical practice, since it is an indicator of lower limb functionality. As an optimal trade-off between costs and accuracy, accelerometers have recently been used to synchronously recognise the STS transition in various Human Activity Recognition-based tasks. However, beyond the mere identification of the entire action, a major challenge remains the recognition of clinically relevant phases inside the STS motion pattern, due to the intrinsic variability of the movement. This work presents the development process of a deep-learning model aimed at recognising specific clinical valid phases in the STS, relying on a pool of 39 young and healthy participants performing the task under self-paced (SP) and controlled speed (CT). The movements were registered using a total of 6 inertial sensors, and the accelerometric data was labelised into four sequential STS phases according to the Ground Reaction Force profiles acquired through a force plate. The optimised architecture combined convolutional and recurrent neural networks into a hybrid approach and was able to correctly identify the four STS phases, both under SP and CT movements, relying on the single sensor placed on the chest. The overall accuracy estimate (median [95% confidence intervals]) for the hybrid architecture was 96.09 [95.37 - 96.56] in SP trials and 95.74 [95.39 – 96.21] in CT trials. Moreover, the prediction delays (≅33 ms) were compatible with the temporal characteristics of the dataset, sampled at 10 Hz (100 ms). These results support the implementation of the proposed model in the development of digital rehabilitation solutions able to synchronously recognise the STS movement pattern, with the aim of effectively evaluate and correct its execution.

17

Tirumaladasu, Sai Subhakar, and Shirdi Manjunath Adigarla. "Autonomous Driving: Traffic Sign Classification." Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17783.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Autonomous Driving and Advance Driver Assistance Systems (ADAS) are revolutionizing the way we drive and the future of mobility. Among ADAS, Traffic Sign Classification is an important technique which assists the driver to easily interpret traffic signs on the road. In this thesis, we used the powerful combination of Image Processing and Deep Learning to pre-process and classify the traffic signs. Recent studies in Deep Learning show us how good a Convolutional Neural Network (CNN) is for image classification and there are several state-of-the-art models with classification accuracies over 99 % existing out there. This shaped our thesis to focus more on tackling the current challenges and some open-research cases. We focussed more on performance tuning by modifying the existing architectures with a trade-off between computations and accuracies. Our research areas include enhancement in low light/noisy conditions by adding Recurrent Neural Network (RNN) connections, and contribution to a universal-regional dataset with Generative Adversarial Networks (GANs). The results obtained on the test data are comparable to the state-of-the-art models and we reached accuracies above 98% after performance evaluation in different frameworks

18

Kohút, Jan. "Aktivní učení pro rozpoznávání textu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403210.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The aim of this Master's thesis is to design methods of active learning and to experiment with datasets of historical documents. A large and diverse dataset IMPACT of more than one million lines is used for experiments. I am using neural networks to check the readability of lines and correctness of their annotations. Firstly, I compare architectures of convolutional and recurrent neural networks with bidirectional LSTM layer. Next, I study different ways of learning neural networks using methods of active learning. Mainly I use active learning to adapt neural networks to documents that the neural networks do not have in the original training dataset. Active learning is thus used for picking appropriate adaptation data. Convolutional neural networks achieve 98.6\% accuracy, recurrent neural networks achieve 99.5\% accuracy. Active learning decreases error by 26\% compared to random pick of adaptations data.

19

Wang, Wei. "Event Detection and Extraction from News Articles." Diss., Virginia Tech, 2018. http://hdl.handle.net/10919/82238.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Event extraction is a type of information extraction(IE) that works on extracting the specific knowledge of certain incidents from texts. Nowadays the amount of available information (such as news, blogs, and social media) grows in exponential order. Therefore, it becomes imperative to develop algorithms that automatically extract the machine-readable information from large volumes of text data. In this dissertation, we focus on three problems in obtaining event-related information from news articles. (1) The first effort is to comprehensively analyze the performance and challenges in current large-scale event encoding systems. (2) The second problem involves event detection and critical information extractions from news articles. (3) Third, the efforts concentrate on event-encoding which aims to extract event extent and arguments from texts. We start by investigating the two large-scale event extraction systems (ICEWS and GDELT) in the political science domain. We design a set of experiments to evaluate the quality of the extracted events from the two target systems, in terms of reliability and correctness. The results show that there exist significant discrepancies between the outputs of automated systems and hand-coded system and the accuracy of both systems are far away from satisfying. These findings provide preliminary background and set the foundation for using advanced machine learning algorithms for event related information extraction. Inspired by the successful application of deep learning in Natural Language Processing (NLP), we propose a Multi-Instance Convolutional Neural Network (MI-CNN) model for event detection and critical sentences extraction without sentence level labels. To evaluate the model, we run a set of experiments on a real-world protest event dataset. The result shows that our model could be able to outperform the strong baseline models and extract the meaningful key sentences without domain knowledge and manually designed features. We also extend the MI-CNN model and propose an MIMTRNN model for event extraction with distant supervision to overcome the problem of lacking fine level labels and small size training data. The proposed MIMTRNN model systematically integrates the RNN, Multi-Instance Learning, and Multi-Task Learning into a unified framework. The RNN module aims to encode into the representation of entity mentions the sequential information as well as the dependencies between event arguments, which are very useful in the event extraction task. The Multi-Instance Learning paradigm makes the system does not require the precise labels in entity mention level and make it perfect to work together with distant supervision for event extraction. And the Multi-Task Learning module in our approach is designed to alleviate the potential overfitting problem caused by the relatively small size of training data. The results of the experiments on two real-world datasets(Cyber-Attack and Civil Unrest) show that our model could be able to benefit from the advantage of each component and outperform other baseline methods significantly.
Ph. D.

20

Kvita, Jakub. "Popis fotografií pomocí rekurentních neuronových sítí." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2016. http://www.nusl.cz/ntk/nusl-255324.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Tato práce se zabývá automatickým generovaním popisů obrázků s využitím několika druhů neuronových sítí. Práce je založena na článcích z MS COCO Captioning Challenge 2015 a znakových jazykových modelech, popularizovaných A. Karpathym. Navržený model je kombinací konvoluční a rekurentní neuronové sítě s architekturou kodér--dekodér. Vektor reprezentující zakódovaný obrázek je předáván jazykovému modelu jako hodnoty paměti LSTM vrstev v síti. Práce zkoumá, na jaké úrovni je model s takto jednoduchou architekturou schopen popisovat obrázky a jak si stojí v porovnání s ostatními současnými modely. Jedním ze závěrů práce je, že navržená architektura není dostatečná pro jakýkoli popis obrázků.

21

Bahceci, Oktay. "Deep Neural Networks for Context Aware Personalized Music Recommendation : A Vector of Curation." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210252.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Information Filtering and Recommender Systems have been used and has been implemented in various ways from various entities since the dawn of the Internet, and state-of-the-art approaches rely on Machine Learning and Deep Learning in order to create accurate and personalized recommendations for users in a given context. These models require big amounts of data with a variety of features such as time, location and user data in order to find correlations and patterns that other classical models such as matrix factorization and collaborative filtering cannot. This thesis researches, implements and compares a variety of models with the primary focus of Machine Learning and Deep Learning for the task of music recommendation and do so successfully by representing the task of recommendation as a multi-class extreme classification task with 100 000 distinct labels. By comparing fourteen different experiments, all implemented models successfully learn features such as time, location, user features and previous listening history in order to create context-aware personalized music predictions, and solves the cold start problem by using user demographic information, where the best model being capable of capturing the intended label in its top 100 list of recommended items for more than 1/3 of the unseen data in an offine evaluation, when evaluating on randomly selected examples from the unseen following week.
Informationsfiltrering och rekommendationssystem har använts och implementeratspå flera olika sätt från olika enheter sedan gryningen avInternet, och moderna tillvägagångssätt beror påMaskininlärrning samtDjupinlärningför att kunna skapa precisa och personliga rekommendationerför användare i en given kontext. Dessa modeller kräver data i storamängder med en varians av kännetecken såsom tid, plats och användardataför att kunna hitta korrelationer samt mönster som klassiska modellersåsom matris faktorisering samt samverkande filtrering inte kan. Dettaexamensarbete forskar, implementerar och jämför en mängd av modellermed fokus påMaskininlärning samt Djupinlärning för musikrekommendationoch gör det med succé genom att representera rekommendationsproblemetsom ett extremt multi-klass klassifikationsproblem med 100000 unika klasser att välja utav. Genom att jämföra fjorton olika experiment,så lär alla modeller sig kännetäcken såsomtid, plats, användarkänneteckenoch lyssningshistorik för att kunna skapa kontextberoendepersonaliserade musikprediktioner, och löser kallstartsproblemet genomanvändning av användares demografiska kännetäcken, där den bästa modellenklarar av att fånga målklassen i sin rekommendationslista medlängd 100 för mer än 1/3 av det osedda datat under en offline evaluering,när slumpmässigt valda exempel från den osedda kommande veckanevalueras.

22

Mishra, Vishal Vijayshankar. "Sequence-to-Sequence Learning using Deep Learning for Optical Character Recognition (OCR)." University of Toledo / OhioLINK, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1513273051760905.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Parakkal, Sreenivasan Akshai. "Deep learning prediction of Quantmap clusters." Thesis, Uppsala universitet, Institutionen för biologisk grundutbildning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445909.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The hypothesis that similar chemicals exert similar biological activities has been widely adopted in the field of drug discovery and development. Quantitative Structure-Activity Relationship (QSAR) models have been used ubiquitously in drug discovery to understand the function of chemicals in biological systems. A common QSAR modeling method calculates similarity scores between chemicals to assess their biological function. However, due to the fact that some chemicals can be similar and yet have different biological activities, or conversely can be structurally different yet have similar biological functions, various methods have instead been developed to quantify chemical similarity at the functional level. Quantmap is one such method, which utilizes biological databases to quantify the biological similarity between chemicals. Quantmap uses quantitative molecular network topology analysis to cluster chemical substances based on their bioactivities. This method by itself, unfortunately, cannot assign new chemicals (those which may not yet have biological data) to the derived clusters. Owing to the fact that there is a lack of biological data for many chemicals, deep learning models were explored in this project with respect to their ability to correctly assign unknown chemicals to Quantmap clusters. The deep learning methods explored included both convolutional and recurrent neural networks. Transfer learning/pretraining based approaches and data augmentation methods were also investigated. The best performing model, among those considered, was the Seq2seq model (a recurrent neural network containing two joint networks, a perceiver and an interpreter network) without pretraining, but including data augmentation.

24

AbuRa'ed, Ahmed Ghassan Tawfiq. "Automatic generation of descriptive related work reports." Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/669975.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

A related work report is a section in a research paper which integrates key information from a list of related scientific papers providing context to the work being presented. Related work reports can either be descriptive or integrative. Integrative related work reports provide a high-level overview and critique of the scientific papers by comparing them with each other, providing fewer details of individual studies. Descriptive related work reports, instead, provide more in-depth information about each mentioned study providing information such as methods and results of the cited works. In order to write a related work report, scientist have to identify, condense/summarize, and combine relevant information from different scientific papers. However, such task is complicated due to the available volume of scientific papers. In this context, the automatic generation of related work reports appears to be an important problem to tackle. The automatic generation of related work reports can be considered as an instance of the multi-document summarization problem where, given a list of scientific papers, the main objective is to automatically summarize those scientific papers and generate related work reports. In order to study the problem of related work generation, we have developed a manually annotated, machine readable data-set of related work sections, cited papers (e.g. references) and sentences, together with an additional layer of papers citing the references. We have also investigated the relation between a citation context in a citing paper and the scientific paper it is citing so as to properly model cross-document relations and inform our summarization approach. Moreover, we have also investigated the identification of explicit and implicit citations to a given scientific paper which is an important task in several scientific text mining activities such as citation purpose identification, scientific opinion mining, and scientific summarization. We present both extractive and abstractive methods to summarize a list of scientific papers by utilizing their citation network. The extractive approach follows three stages: scoring the sentences of the scientific papers based on their citation network, selecting sentences from each scientific paper to be mentioned in the related work report, and generating an organized related work report by grouping the sentences of the scientific papers that belong to the same topic together. On the other hand, the abstractive approach attempts to generate citation sentences to be included in a related work report, taking advantage of current sequence-to-sequence neural architectures and resources that we have created specifically for this task. The thesis also presents and discusses automatic and manual evaluation of the generated related work reports showing the viability of the proposed approaches.
La sección de trabajos relacionados de un artículo científico resume e integra información clave de una lista de documentos científicos relacionados con el trabajo que se presenta. Para redactar esta sección del artículo científico el autor debe identificar, condensar/resumir y combinar información relevante de diferentes artículos. Esta tarea es complicada debido al gran volumen disponible de artículos científicos. En este contexto, la generación automática de tales secciones es un problema importante a abordar. La generación automática de secciones de trabajo relacionados puede ser considerada como una instancia del problema de resumen de documentos múltiples donde, dada una lista de documentos científicos, el objetivo es resumir automáticamente esos documentos científicos y generar la sección de trabajos relacionados. Para estudiar este problema, hemos creado un corpus de secciones de trabajos relacionados anotado manualmente y procesado automáticamente. Asimismo, hemos investigado la relación entre las citaciones y el artículo científico que se cita para modelar adecuadamente las relaciones entre documentos y, así, informar nuestro método de resumen automático. Además, hemos investigado la identificación de citaciones implícitas a un artículo científico dado que es una tarea importante en varias actividades de minería de textos científicos. Presentamos métodos extractivos y abstractivos para resumir una lista de artículos científicos utilizando su red de citaciones. El enfoque extractivo sigue tres etapas: cálculo de la relevancia las oraciones de cada artículo en función de la red de citaciones, selección de oraciones de cada artículo científico para integrarlas en el resumen y generación de la sección de trabajos relacionados agrupando las oraciones por tema. Por otro lado, el enfoque abstractivo intenta generar citaciones para incluirlas en un resumen utilizando redes neuronales y recursos que hemos creado específicamente para esta tarea. La tesis también presenta y discute la evaluación automática y manual de los resúmenes generados automáticamente, demostrando la viabilidad de los enfoques propuestos.
Una secció d’antecedents o estat de l’art d’un articulo científic resumeix la informació clau d'una llista de documents científics relacionats amb el treball que es presenta. Per a redactar aquesta secció de l’article científic l’autor ha d’identificar, condensar / resumir i combinar informació rellevant de diferents articles. Aquesta activitat és complicada per causa del gran volum disponible d’articles científics. En aquest context, la generació automàtica d’aquestes seccions és un problema important a abordar. La generació automàtica d’antecedents o d’estat de l’art pot considerar-se com una instància del problema de resum de documents. Per estudiar aquest problema, es va crear un corpus de seccions d’estat de l’art d’articles científics manualment anotat i processat automàticament. Així mateix, es va investigar la relació entre citacions i l’article científic que es cita per modelar adequadament les relacions entre documents i, així, informar el nostre mètode de resum automàtic. A més, es va investigar la identificació de citacions implícites a un article científic que és un problema important en diverses activitats de mineria de textos científics. Presentem mètodes extractius i abstractius per resumir una llista d'articles científics utilitzant el conjunt de citacions de cada article. L’enfoc extractiu segueix tres etapes: càlcul de la rellevància de les oracions de cada article en funció de les seves citacions, selecció d’oracions de cada article científic per a integrar-les en el resum i generació de la secció de treballs relacionats agrupant les oracions per tema. Per un altre costat, l’enfoc abstractiu implementa la generació de citacions per a incloure-les en un resum que utilitza xarxes neuronals i recursos que hem creat específicament per a aquest tasca. La tesi també presenta i discuteix l'avaluació automàtica i el manual dels resums generats automàticament, demostrant la viabilitat dels mètodes proposats.

25

Santos, Claudio Filipi Gonçalves dos. "Optical character recognition using deep learning." Universidade Estadual Paulista (UNESP), 2018. http://hdl.handle.net/11449/154100.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-24T11:51:59Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 8334356 bytes, checksum: 8dd05363a96c946ae1f6d665edc80d09 (MD5)
Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) Corrigir a ordem das páginas pré-textuais; a ordem correta (capa, folha de rosto, dedicatória, agradecimentos, epígrafe, resumo na língua vernácula, resumo em língua estrangeira, listas de ilustrações, de tabelas, de abreviaturas, de siglas e de símbolos e sumário). Problema 03) Faltam as palavras-chave no resumo e no abstracts. Na página da Seção de pós-graduação, em Instruções para Qualificação e Defesas de Dissertação e Tese, você pode acessar o modelo das páginas pré-textuais. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-24T20:59:53Z (GMT)
Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T00:43:19Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11084990 bytes, checksum: 6f8d7431cd17efd931a31c0eade10c65 (MD5)
Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Problema 02) A paginação deve ser sequencial, iniciando a contagem na folha de rosto e mostrando o número a partir da introdução, a ficha catalográfica ficará após a folha de rosto e não deverá ser contada. Problema 03) Na descrição do item: Título em outro idioma – Se você colocou no título em inglês deve por neste campo o título em outro idioma (ex: português, espanhol, francês...) Estamos encaminhando via e-mail o template/modelo para que você possa fazer as correções. Lembramos que o arquivo depositado no repositório deve ser igual ao impresso, o rigor com o padrão da Universidade se deve ao fato de que o seu trabalho passará a ser visível mundialmente. Agradecemos a compreensão. on 2018-05-25T15:22:45Z (GMT)
Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T15:52:53Z No. of bitstreams: 1 optical-character-recognition-16052018.pdf: 11089966 bytes, checksum: d6c863077a995bd2519035b8a3e97c80 (MD5)
Rejected by Elza Mitiko Sato null (elzasato@ibilce.unesp.br), reason: Solicitamos que realize correções na submissão seguindo as orientações abaixo: Problema 01) Falta a FOLHA DE APROVAÇÃO (Obrigatório pela ABNT NBR14724) Agradecemos a compreensão. on 2018-05-25T18:03:19Z (GMT)
Submitted by Claudio Filipi Gonçalves dos Santos (cfsantos85@gmail.com) on 2018-05-25T18:08:09Z No. of bitstreams: 1 Claudio Filipi Gonçalves dos Santos Corrigido Biblioteca.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5)
Approved for entry into archive by Elza Mitiko Sato null (elzasato@ibilce.unesp.br) on 2018-05-25T18:51:24Z (GMT) No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5)
Made available in DSpace on 2018-05-25T18:51:24Z (GMT). No. of bitstreams: 1 santos_cfg_me_sjrp.pdf: 8257484 bytes, checksum: 3a61ebfa8e1d16c9d0c694f46b979c1f (MD5) Previous issue date: 2018-04-26
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Detectores óticos de caracteres, ou Optical Character Recognition (OCR) é o nome dado à técnologia de traduzir dados de imagens em arquivo de texto. O objetivo desse projeto é usar aprendizagem profunda, também conhecido por aprendizado hierárquico ou Deep Learning para o desenvolvimento de uma aplicação com a habilidade de detectar áreas candidatas, segmentar esses espaços dan imagem e gerar o texto contido na figura. Desde 2006, Deep Learning emergiu como uma nova área em aprendizagem de máquina. Em tempos recentes, as técnicas desenvolvidas em pesquisas com Deep Learning têm influenciado e expandido escopo, incluindo aspectos chaves nas área de inteligência artificial e aprendizagem de máquina. Um profundo estudo foi conduzido com a intenção de desenvolver um sistema OCR usando apenas arquiteturas de Deep Learning.A evolução dessas técnicas, alguns trabalhos passados e como esses trabalhos influenciaram o desenvolvimento dessa estrutura são explicados nesse texto. Essa tese demonstra com resultados como um classificador de caracteres foi desenvolvido. Em seguida é explicado como uma rede neural pode ser desenvolvida para ser usada como um detector de objetos e como ele pode ser transformado em um detector de texto. Logo após é demonstrado como duas técnicas diferentes de Deep Learning podem ser combinadas e usadas na tarefa de transformar segmentos de imagens em uma sequência de caracteres. Finalmente é demonstrado como o detector de texto e o sistema transformador de imagem em texto podem ser combinados para se desenvolver um sistema OCR completo que detecta regiões de texto nas imagens e o que está escrito nessa região. Esse estudo demonstra que a idéia de usar apenas estruturas de Deep Learning podem ter performance melhores do técnicas baseadas em outras áreas da computação como por exemplo o processamento de imagens. Para detecção de texto foi alcançado mais de 70% de precisão quando uma arquitetura mais complexa foi usada, por volta de 69% de traduções de imagens para texto corretas e por volta de 50% na tarefa ponta-à-ponta de detectar as áreas de texto e traduzi-las em sequência de caracteres.
Optical Character Recognition (OCR) is the name given to the technology used to translate image data into a text file. The objective of this project is to use Deep Learning techniques to develop a software with the ability to segment images, detecting candidate characters and generating textthatisinthepicture. Since2006,DeepLearningorhierarchicallearning, emerged as a new machine learning area. Over recent years, the techniques developed from deep learning research have influenced and expanded scope, including key aspects of artificial intelligence and machine learning. A thorough study was carried out in order to develop an OCR system using only Deep Learning architectures. It is explained the evolution of these techniques, some past works and how they influenced thisframework’sdevelopment. Inthisthesisitisdemonstratedwithresults how a single character classifier was developed. Then it is explained how a neural network can be developed to be an object detector and how to transform this object detector into a text detector. After that it shows how a set of two Deep Learning techniques can be combined and used in the taskoftransformingacroppedregionofanimageinastringofcharacters. Finally, it demonstrates how the text detector and the Image-to-Text systemswerecombinedinordertodevelopafullend-to-endOCRsystemthat detects the regions of a given image containing text and what is written in this region. It shows the idea of using only Deep Learning structures can outperform other techniques based on other areas like image processing. In text detection it reached over 70% of precision when a more complex architecture was used, around 69% of correct translation of image-to-text areasandaround50%onend-to-endtaskofdetectingareasandtranslating them into text.
1623685

26

Suchánek, Tomáš. "Detektor tempa hudebních nahrávek na bázi neuronové sítě." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2021. http://www.nusl.cz/ntk/nusl-442576.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This Master’s thesis deals with beat tracking systems, whose functionality is based on neural networks. It describes the structure of these systems and how the signal is processed in their individual blocks. Emphasis is then placed on recurrent and temporal convolutional networks, which by they nature can effectively detect tempo and beats in audio recordings. The selected methods, network architectures and their modifications are then implemented within a comprehensive detection system, which is further tested and evaluated through a cross-validation process on a genre-diverse data-set. The results show that the system, with proposed temporal convolutional network architecture, produces comparable results with foreign publications. For example, within the SMC dataset, it proved to be the most successful, on the contrary, in the case of other datasets it was slightly below the accuracy of state-of-the-art systems. In addition,the proposed network retains low computational complexity despite increased number of internal parameters.

27

Al, Hajj Hassan. "Video analysis for augmented cataract surgery." Thesis, Brest, 2018. http://www.theses.fr/2018BRES0041/document.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

L’ère numérique change de plus en plus le monde en raison de la quantité de données récoltées chaque jour. Le domaine médical est fortement affecté par cette explosion, car l’exploitation de ces données est un véritable atout pour l’aide à la pratique médicale. Dans cette thèse, nous proposons d’utiliser les vidéos chirurgicales dans le but de créer un système de chirurgie assistée par ordinateur. Nous nous intéressons principalement à reconnaître les gestes chirurgicaux à chaque instant afin de fournir aux chirurgiens des recommandations et des informations pertinentes. Pour ce faire, l’objectif principal de cette thèse est de reconnaître les outils chirurgicaux dans les vidéos de chirurgie de la cataracte. Dans le flux vidéo du microscope, ces outils sont partiellement visibles et certains se ressemblent beaucoup. Pour relever ces défis, nous proposons d'ajouter une caméra supplémentaire filmant la table opératoire. Notre objectif est donc de détecter la présence des outils dans les deux types de flux vidéo : les vidéos du microscope et les vidéos de la table opératoire. Le premier enregistre l'oeil du patient et le second enregistre les activités de la table opératoire. Deux tâches sont proposées pour détecter les outils dans les vidéos de la table : la détection des changements et la détection de présence d'outil. Dans un premier temps, nous proposons un système similaire pour ces deux tâches. Il est basé sur l’extraction des caractéristiques visuelles avec des méthodes de classification classique. Il fournit des résultats satisfaisants pour la détection de changement, cependant, il fonctionne insuffisamment bien pour la tâche de détection de présence des outils sur la table. Dans un second temps, afin de résoudre le problème du choix des caractéristiques, nous utilisons des architectures d’apprentissage profond pour la détection d'outils chirurgicaux sur les deux types de vidéo. Pour surmonter les défis rencontrés dans les vidéos de la table, nous proposons de générer des vidéos artificielles imitant la scène de la table opératoire et d’utiliser un réseau de neurones à convolutions (CNN) à base de patch. Enfin, nous exploitons l'information temporelle en utilisant un réseau de neurones récurrent analysant les résultats de CNNs. Contrairement à notre hypothèse, les expérimentations montrent des résultats insuffisants pour la détection de présence des outils sur la table, mais de très bons résultats dans les vidéos du microscope. Nous obtenons des résultats encore meilleurs dans les vidéos du microscope après avoir fusionné l’information issue de la détection des changements sur la table et la présence des outils dans l’oeil
The digital era is increasingly changing the world due to the sheer volume of data produced every day. The medical domain is highly affected by this revolution, because analysing this data can be a source of education/support for the clinicians. In this thesis, we propose to reuse the surgery videos recorded in the operating rooms for computer-assisted surgery system. We are chiefly interested in recognizing the surgical gesture being performed at each instant in order to provide relevant information. To achieve this goal, this thesis addresses the surgical tool recognition problem, with applications in cataract surgery. The main objective of this thesis is to address the surgical tool recognition problem in cataract surgery videos.In the surgical field, those tools are partially visible in videos and highly similar to one another. To address the visual challenges in the cataract surgical field, we propose to add an additional camera filming the surgical tray. Our goal is to detect the tool presence in the two complementary types of videos: tool-tissue interaction and surgical tray videos. The former records the patient's eye and the latter records the surgical tray activities.Two tasks are proposed to perform the task on the surgical tray videos: tools change detection and tool presence detection.First, we establish a similar pipeline for both tasks. It is based on standard classification methods on top of visual learning features. It yields satisfactory results for the tools change task, howev-lateer, it badly performs the surgical tool presence task on the tray. Second, we design deep learning architectures for the surgical tool detection on both video types in order to address the difficulties in manually designing the visual features.To alleviate the inherent challenges on the surgical tray videos, we propose to generate simulated surgical tray scenes along with a patch-based convolutional neural network (CNN).Ultimately, we study the temporal information using RNN processing the CNN results. Contrary to our primary hypothesis, the experimental results show deficient results for surgical tool presence on the tray but very good results on the tool-tissue interaction videos. We achieve even better results in the surgical field after fusing the tool change information coming from the tray and tool presence signals on the tool-tissue interaction videos

28

Shahkarami, Abtin. "Complexity reduction over bi-RNN-based Kerr nonlinearity equalization in dual-polarization fiber-optic communications via a CRNN-based approach." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT034.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Les dégradations dues à la non-linéarité de Kerr dans les fibres optiques limitent les débits d’information des systèmes de communications. Les effets linéaires, tels que la dispersion chromatique et la dispersion modale de polarisation, peuvent être compensés par égalisation linéaire, de mise en oeuvre relativement simple, au niveau du récepteur. A l’inverse, la complexité de calcul des techniques classiques de réduction de la non-linéarité, telles que la rétro-propagation numérique, peut être considérable. Les réseaux neuronaux ont récemment attiré l’attention, dans ce contexte, pour la mise en oeuvre d’égaliseurs non-linéaires à faible complexité. Cette thèse porte sur l’étude des réseaux neuronaux récurrents pour compenser efficacement les dégradations des canaux dans les transmissions à longue distance multiplexés en polarisation. Nous présentons une architecture hybride de réseaux neuronaux récurrents convolutifs (CRNN), comprenant un encodeur basé sur un réseau neuronal convolutif (CNN) suivie d’une couche récurrente travaillant en tandem. L’encodeur basé sur CNN représente efficacement la mémoire de canal à court terme résultant de la dispersion chromatique, tout en faisant passer le signal vers un espace latent avec moins de caractéristiques pertinentes. La couche récurrente suivante est implémentée sous la forme d’un RNN unidirectionnel de type vanille, chargé de capturer les interactions à longue portée négligées par l’encodeur CNN. Nous démontrons que le CRNN proposé atteint la performance des égaliseurs actuels dans la communication par fibre optique, avec une complexité de calcul significativement plus faible selon le modèle du système. Enfin, le compromis performance-complexité est établi pour un certain nombre de modèles, y compris les réseaux neuronaux multicouches entièrement connectés, les CNN, les réseaux neuronaux récurrents bidirectionnels, les réseaux long short-term memory bidirectionnels (bi-LSTM), les réseaux gated recurrent units bidirectionnels, les modèles bi-LSTM convolutifs et le modèle hybride proposé
The impairments arising from the Kerr nonlinearity in optical fibers limit the achievable information rates in fiber-optic communication. Unlike linear effects, such as chromatic dispersion and polarization-mode dispersion, which can be compensated via relatively simple linear equalization at the receiver, the computational complexity of the conventional nonlinearity mitigation techniques, such as the digital backpropagation, can be substantial. Neural networks have recently attracted attention, in this context, for low-complexity nonlinearity mitigation in fiber-optic communications. This Ph.D. dissertation deals with investigating the recurrent neural networks to efficiently compensate for the nonlinear channel impairments in dual-polarization long-haul fiber-optic transmission. We present a hybrid convolutional recurrent neural network (CRNN) architecture, comprising a convolutional neural network (CNN) -based encoder followed by a recurrent layer working in tandem. The CNN-based encoder represents the shortterm channel memory arising from the chromatic dispersion efficiently, while transitioning the signal to a latent space with fewer relevant features. The subsequent recurrent layer is implemented in the form of a unidirectional vanilla RNN, responsible for capturing the long-range interactions neglected by the CNN encoder. We demonstrate that the proposed CRNN achieves the performance of the state-of-theart equalizers in optical fiber communication, with significantly lower computational complexity depending on the system model. Finally, the performance complexity trade-off is established for a number of models, including multi-layer fully-connected neural networks, CNNs, bidirectional recurrent neural networks, bidirectional long short-term memory (bi-LSTM), bidirectional gated recurrent units, convolutional bi-LSTM models, and the suggested hybrid model

29

Semela, René. "Automatické tagování hudebních děl pomocí metod strojového učení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-413253.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

One of the many challenges of machine learning are systems for automatic tagging of music, the complexity of this issue in particular. These systems can be practically used in the content analysis of music or the sorting of music libraries. This thesis deals with the design, training, testing, and evaluation of artificial neural network architectures for automatic tagging of music. In the beginning, attention is paid to the setting of the theoretical foundation of this field. In the practical part of this thesis, 8 architectures of neural networks are designed (4 fully convolutional and 4 convolutional recurrent). These architectures are then trained using the MagnaTagATune Dataset and mel spectrogram. After training, these architectures are tested and evaluated. The best results are achieved by the four-layer convolutional recurrent neural network (CRNN4) with the ROC-AUC = 0.9046 ± 0.0016. As the next step of the practical part of this thesis, a completely new Last.fm Dataset 2020 is created. This dataset uses Last.fm and Spotify API for data acquisition and contains 100 tags and 122877 tracks. The most successful architectures are then trained, tested, and evaluated on this new dataset. The best results on this dataset are achieved by the six-layer fully convolutional neural network (FCNN6) with the ROC-AUC = 0.8590 ± 0.0011. Finally, a simple application is introduced as a concluding point of this thesis. This application is designed for testing individual neural network architectures on a user-inserted audio file. Overall results of this thesis are similar to other papers on the same topic, but this thesis brings several new findings and innovations. In terms of innovations, a significant reduction in the complexity of individual neural network architectures is achieved while maintaining similar results.

30

Talevi, Luca, and Luca Talevi. "“Decodifica di intenzioni di movimento dalla corteccia parietale posteriore di macaco attraverso il paradigma Deep Learning”." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17846/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Le Brain Computer Interfaces (BCI) invasive permettono di restituire la mobilità a pazienti che hanno perso il controllo degli arti: ciò avviene attraverso la decodifica di segnali bioelettrici prelevati da aree corticali di interesse al fine di guidare un arto prostetico. La decodifica dei segnali neurali è quindi un punto critico nelle BCI, richiedendo lo sviluppo di algoritmi performanti, affidabili e robusti. Tali requisiti sono soddisfatti in numerosi campi dalle Deep Neural Networks, algoritmi adattivi le cui performance scalano con la quantità di dati forniti, allineandosi con il crescente numero di elettrodi degli impianti. Impiegando segnali pre-registrati dalla corteccia di due macachi durante movimenti di reach-to-grasp verso 5 oggetti differenti, ho testato tre basilari esempi notevoli di DNN – una rete densa multistrato, una Convolutional Neural Network (CNN) ed una Recurrent NN (RNN) – nel compito di discriminare in maniera continua e real-time l’intenzione di movimento verso ciascun oggetto. In particolare, è stata testata la capacità di ciascun modello di decodificare una generica intenzione (single-class), la performance della migliore rete risultante nel discriminarle (multi-class) con o senza metodi di ensemble learning e la sua risposta ad un degrado del segnale in ingresso. Per agevolarne il confronto, ciascuna rete è stata costruita e sottoposta a ricerca iperparametrica seguendo criteri comuni. L’architettura CNN ha ottenuto risultati particolarmente interessanti, ottenendo F-Score superiori a 0.6 ed AUC superiori a 0.9 nel caso single-class con metà dei parametri delle altre reti e tuttavia maggior robustezza. Ha inoltre mostrato una relazione quasi-lineare con il degrado del segnale, priva di crolli prestazionali imprevedibili. Le DNN impiegate si sono rivelate performanti e robuste malgrado la semplicità, rendendo eventuali architetture progettate ad-hoc promettenti nello stabilire un nuovo stato dell’arte nel controllo neuroprotesico.

31

Etienne, Caroline. "Apprentissage profond appliqué à la reconnaissance des émotions dans la voix." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS517.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Mes travaux de thèse s'intéressent à l'utilisation de nouvelles technologies d'intelligence artificielle appliquées à la problématique de la classification automatique des séquences audios selon l'état émotionnel du client au cours d'une conversation avec un téléconseiller. En 2016, l'idée est de se démarquer des prétraitements de données et modèles d'apprentissage automatique existant au sein du laboratoire, et de proposer un modèle qui soit le plus performant possible sur la base de données audios IEMOCAP. Nous nous appuyons sur des travaux existants sur les modèles de réseaux de neurones profonds pour la reconnaissance de la parole, et nous étudions leur extension au cas de la reconnaissance des émotions dans la voix. Nous nous intéressons ainsi à l'architecture neuronale bout-en-bout qui permet d'extraire de manière autonome les caractéristiques acoustiques du signal audio en vue de la tâche de classification à réaliser. Pendant longtemps, le signal audio est prétraité avec des indices paralinguistiques dans le cadre d'une approche experte. Nous choisissons une approche naïve pour le prétraitement des données qui ne fait pas appel à des connaissances paralinguistiques spécialisées afin de comparer avec l'approche experte. Ainsi le signal audio brut est transformé en spectrogramme temps-fréquence à l'aide d'une transformée de Fourier à court-terme. Exploiter un réseau neuronal pour une tâche de prédiction précise implique de devoir s'interroger sur plusieurs aspects. D'une part, il convient de choisir les meilleurs hyperparamètres possibles. D'autre part, il faut minimiser les biais présents dans la base de données (non discrimination) en ajoutant des données par exemple et prendre en compte les caractéristiques de la base de données choisie. Le but est d'optimiser le mieux possible l'algorithme de classification. Nous étudions ces aspects pour une architecture neuronale bout-en-bout qui associe des couches convolutives spécialisées dans le traitement de l'information visuelle, et des couches récurrentes spécialisées dans le traitement de l'information temporelle. Nous proposons un modèle d'apprentissage supervisé profond compétitif avec l'état de l'art sur la base de données IEMOCAP et cela justifie son utilisation pour le reste des expérimentations. Ce modèle de classification est constitué de quatre couches de réseaux de neurones à convolution et un réseau de neurones récurrent bidirectionnel à mémoire court-terme et long-terme (BLSTM). Notre modèle est évalué sur deux bases de données audios anglophones proposées par la communauté scientifique : IEMOCAP et MSP-IMPROV. Une première contribution est de montrer qu'avec un réseau neuronal profond, nous obtenons de hautes performances avec IEMOCAP et que les résultats sont prometteurs avec MSP-IMPROV. Une autre contribution de cette thèse est une étude comparative des valeurs de sortie des couches du module convolutif et du module récurrent selon le prétraitement de la voix opéré en amont : spectrogrammes (approche naïve) ou indices paralinguistiques (approche experte). À l'aide de la distance euclidienne, une mesure de proximité déterministe, nous analysons les données selon l'émotion qui leur est associée. Nous tentons de comprendre les caractéristiques de l'information émotionnelle extraite de manière autonome par le réseau. L'idée est de contribuer à une recherche centrée sur la compréhension des réseaux de neurones profonds utilisés en reconnaissance des émotions dans la voix et d'apporter plus de transparence et d'explicabilité à ces systèmes dont le mécanisme décisionnel est encore largement incompris
This thesis deals with the application of artificial intelligence to the automatic classification of audio sequences according to the emotional state of the customer during a commercial phone call. The goal is to improve on existing data preprocessing and machine learning models, and to suggest a model that is as efficient as possible on the reference IEMOCAP audio dataset. We draw from previous work on deep neural networks for automatic speech recognition, and extend it to the speech emotion recognition task. We are therefore interested in End-to-End neural architectures to perform the classification task including an autonomous extraction of acoustic features from the audio signal. Traditionally, the audio signal is preprocessed using paralinguistic features, as part of an expert approach. We choose a naive approach for data preprocessing that does not rely on specialized paralinguistic knowledge, and compare it with the expert approach. In this approach, the raw audio signal is transformed into a time-frequency spectrogram by using a short-term Fourier transform. In order to apply a neural network to a prediction task, a number of aspects need to be considered. On the one hand, the best possible hyperparameters must be identified. On the other hand, biases present in the database should be minimized (non-discrimination), for example by adding data and taking into account the characteristics of the chosen dataset. We study these aspects in order to develop an End-to-End neural architecture that combines convolutional layers specialized in the modeling of visual information with recurrent layers specialized in the modeling of temporal information. We propose a deep supervised learning model, competitive with the current state-of-the-art when trained on the IEMOCAP dataset, justifying its use for the rest of the experiments. This classification model consists of a four-layer convolutional neural networks and a bidirectional long short-term memory recurrent neural network (BLSTM). Our model is evaluated on two English audio databases proposed by the scientific community: IEMOCAP and MSP-IMPROV. A first contribution is to show that, with a deep neural network, we obtain high performances on IEMOCAP, and that the results are promising on MSP-IMPROV. Another contribution of this thesis is a comparative study of the output values of the layers of the convolutional module and the recurrent module according to the data preprocessing method used: spectrograms (naive approach) or paralinguistic indices (expert approach). We analyze the data according to their emotion class using the Euclidean distance, a deterministic proximity measure. We try to understand the characteristics of the emotional information extracted autonomously by the network. The idea is to contribute to research focused on the understanding of deep neural networks used in speech emotion recognition and to bring more transparency and explainability to these systems, whose decision-making mechanism is still largely misunderstood

32

Buratti, Luca. "Visualisation of Convolutional Neural Networks." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Le Reti Neurali, e in particolare le Reti Neurali Convoluzionali, hanno recentemente dimostrato risultati straordinari in vari campi. Purtroppo, comunque, non vi è ancora una chiara comprensione del perchè queste architetture funzionino così bene e soprattutto è difficile spiegare il comportamento nel caso di fallimenti. Questa mancanza di chiarezza è quello che separa questi modelli dall’essere applicati in scenari concreti e critici della vita reale, come la sanità o le auto a guida autonoma. Per questa ragione, durante gli ultimi anni sono stati portati avanti diversi studi in modo tale da creare metodi che siano capaci di spiegare al meglio cosa sta succedendo dentro una rete neurale oppure dove la rete sta guardando per predire in un certo modo. Proprio queste tecniche sono il centro di questa tesi e il ponte tra i due casi di studio che sono presentati sotto. Lo scopo di questo lavoro è quindi duplice: per prima cosa, usare questi metodi per analizzare e quindi capire come migliorare applicazioni basate su reti neurali convoluzionali e in secondo luogo, per investigare la capacità di generalizzazione di queste architetture, sempre grazie a questi metodi.

33

Mancevo, del Castillo Ayala Diego. "Compressing Deep Convolutional Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-217316.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Deep Convolutional Neural Networks and "deep learning" in general stand at the cutting edge on a range of applications, from image based recognition and classification to natural language processing, speech and speaker recognition and reinforcement learning. Very deep models however are often large, complex and computationally expensive to train and evaluate. Deep learning models are thus seldom deployed natively in environments where computational resources are scarce or expensive. To address this problem we turn our attention towards a range of techniques that we collectively refer to as "model compression" where a lighter student model is trained to approximate the output produced by the model we wish to compress. To this end, the output from the original model is used to craft the training labels of the smaller student model. This work contains some experiments on CIFAR-10 and demonstrates how to use the aforementioned techniques to compress a people counting model whose precision, recall and F1-score are improved by as much as 14% against our baseline.

34

Long, Cameron E. "Quaternion Temporal Convolutional Neural Networks." University of Dayton / OhioLINK, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1565303216180597.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Żbikowski, Rafal Waclaw. "Recurrent neural networks some control aspects /." Connect to electronic version, 1994. http://hdl.handle.net/1905/180.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Ahamed, Woakil Uddin. "Quantum recurrent neural networks for filtering." Thesis, University of Hull, 2009. http://hydra.hull.ac.uk/resources/hull:2411.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The essence of stochastic filtering is to compute the time-varying probability densityfunction (pdf) for the measurements of the observed system. In this thesis, a filter isdesigned based on the principles of quantum mechanics where the schrodinger waveequation (SWE) plays the key part. This equation is transformed to fit into the neuralnetwork architecture. Each neuron in the network mediates a spatio-temporal field witha unified quantum activation function that aggregates the pdf information of theobserved signals. The activation function is the result of the solution of the SWE. Theincorporation of SWE into the field of neural network provides a framework which is socalled the quantum recurrent neural network (QRNN). A filter based on this approachis categorized as intelligent filter, as the underlying formulation is based on the analogyto real neuron.In a QRNN filter, the interaction between the observed signal and the wave dynamicsare governed by the SWE. A key issue, therefore, is achieving a solution of the SWEthat ensures the stability of the numerical scheme. Another important aspect indesigning this filter is in the way the wave function transforms the observed signalthrough the network. This research has shown that there are two different ways (anormal wave and a calm wave, Chapter-5) this transformation can be achieved and thesewave packets play a critical role in the evolution of the pdf. In this context, this thesishave investigated the following issues: existing filtering approach in the evolution of thepdf, architecture of the QRNN, the method of solving SWE, numerical stability of thesolution, and propagation of the waves in the well. The methods developed in this thesishave been tested with relevant simulations. The filter has also been tested with somebenchmark chaotic series along with applications to real world situation. Suggestionsare made for the scope of further developments.

37

Zbikowski, Rafal Waclaw. "Recurrent neural networks : some control aspects." Thesis, University of Glasgow, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390233.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Jacobsson, Henrik. "Rule extraction from recurrent neural networks." Thesis, University of Sheffield, 2006. http://etheses.whiterose.ac.uk/6081/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Molin, David. "Pedestrian Detection Using Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-120019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Pedestrian detection is an important field with applications in active safety systems for cars as well as autonomous driving. Since autonomous driving and active safety are becoming technically feasible now the interest for these applications has dramatically increased.The aim of this thesis is to investigate convolutional neural networks (CNN) for pedestrian detection. The reason for this is that CNN have recently beensuccessfully applied to several different computer vision problems. The main applications of pedestrian detection are in real time systems. For this reason,this thesis investigates strategies for reducing the computational complexity offorward propagation for CNN.The approach used in this thesis for extracting pedestrians is to use a CNN tofind a probability map of where pedestrians are located. From this probabilitymap bounding boxes for pedestrians are generated. A method for handling scale invariance for the objects of interest has also been developed in this thesis. Experiments show that using this method givessignificantly better results for the problem of pedestrian detection.The accuracy which this thesis has managed to achieve is similar to the accuracy for some other works which use CNN.

40

Mattsson, Niklas. "Classification Performance of Convolutional Neural Networks." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-305342.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The purpose of this thesis is to determine the performance of convolutional neural networks in classifications per millisecond, not training or accuracy, for the GTX960 and the TegraX1. This is done through varying parameters of the convolutional neural networks and using the Python framework Theano's function profiler to measure the time taken for different networks. The results show that increasing any parameter of the convolutional neural network also increases the time required for the classification of an image. The parameters do not punish the network equally, however. Convolutional layers and their depth have a far bigger negative impact on the network's performance than fully-connected layers and the amount of neurons in them. Additionally, the time needed for training the networks does not appear to correlate with the time needed for classification.

41

Jönsson, Jonatan, and Felix Stenbäck. "Fence surveillance with convolutional neural networks." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-37116.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Broken fences is a big security risk for any facility or area with strict security standards. In this report we suggest a machine learning approach to automate the surveillance for chain-linked fences. The main challenge is to classify broken and non-broken fences with the help of a convolution neural network. Gathering data for this task is done by hand and the dataset is about 127 videos at 26 minutes length total on 23 diﬀerent locations. The model and dataset are tested on three performances traits, scaling, augmentation improvement and false rate. In these tests we concluded that nearest neighbor increased accuracy. Classifying with fences that has been included in the training data a false rate that was low, about 1%. Classifying with fences that are unknown to the model produced a false rate of about 90%. With these results we concludes that this method and dataset is useful under the right circumstances but not in an unknown environment.

42

Katrenko, Maksim, and Максим Олександрович Катренко. "Convolutional neural networks during object identification." Thesis, National Aviation University, 2021. https://er.nau.edu.ua/handle/NAU/50753.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

1. Geoffrey E. Hinton, A. Krizhevsky & S. D. Wang. URL: http://www.cs.toronto.edu/~fritz/absps/transauto6.pdf (Last accessed: 17.02.2021). 2. Sara Sabour, Nicholas Frosst, Geoffrey E Hinton. URL: https://arxiv.org/abs/1710.09829 (Last accessed: 14.02.2021). 3. Geoffrey E. Hinton. URL: https://u.to/Ov4rGw (Last accessed: 14.02.2021). 4. Anish Athalye, Logan Engstrom, Andrew Ilyas & Kevin Kwok. URL: https://www.labsix.org/physical-objects-that-fool-neural-nets/ (Last accessed: 14.02.2021).
Nowadays, convolutional neural networks perform very well in identifying objects, but unfortunately, they have very important problems that are very difficult to get rid of. Convolutional networks use multiple layers of feature detectors. Each feature detector is local, so feature detectors are repeated across the space. Pooling gives some translational invariance in much deeper layers, but only in a crude way. The psychology of shape perception suggests that the human brain achieves translational invariance in a much better way. We know that, roughly speaking, the brain has two separate pathways, a “from” and “where”. Neurons in the “from” pathway respond to a particular type of stimulus regardless of where it is in the visual field. Neurons “where” are responsible for encoding where things are. As a side note, it is hypothesized that the “from” has a lower resolution then “where”.
У наш час згорткові нейронні мережі дуже добре справляються з ідентифікацією об’єктів, але, на жаль, вони мають дуже важливі проблеми, від яких дуже важко позбутися. Світові мережі використовують кілька шарів детекторів функцій. Кожен детектор функцій є локальним, тому детектори функцій повторюються на просторі. Об’єднання дає певну поступальну незмінність у набагато глибших шарах, але лише грубо. Психологія сприйняття форми передбачає, що людський мозок набагато краще досягає поступальної незмінності. Ми знаємо що, грубо кажучи, мозок має два окремі шляхи, „з” і „де”. Нейрони на шляху “з” реагують на певний тип подразника незалежно від того, де він знаходиться в полі зору. Нейрони "де" відповідають за кодування, де є речі. Як побічне зауваження, висувається гіпотеза, що “з” має нижчу розподільна здатність, ніж “де”.

43

Nikzad, Dehaji Mohammad. "Structural Improvements of Convolutional Neural Networks." Thesis, Griffith University, 2021. http://hdl.handle.net/10072/410448.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Over the last decade, deep learning has demonstrated outstanding performance in almost every application domain. Among different types of deep frameworks, convolutional neural networks (CNNs), inspired by the biological process of the visual system, can learn to extract discriminative features from raw inputs without any prior manipulation. However, efficient information circulation and the ability to explore effective new features are still two key and challenging factors for a successful deep neural network. In this thesis, we aim at presenting novel structural improvements of the CNN frameworks to enhance their effectiveness and efficiency of feature exploring and exploiting capability. To this end, first, we propose a novel residual-dense lattice network (RDL-Net), a 2-dimensional triangular lattice of convolutional units connected using residual and dense connections. RDL-Net effectively harnesses the advantages of both residual and dense aggregations without over-allocating parameters for feature re-usage. This property improves the network’s capacity to effectively and yet efficiently extract and exploit features. Furthermore, our extensive experimental investigation in processing 1D sequential speech signals shows that RDL-Nets can achieve a higher speech enhancement performance than many state-of-the-art CNN-based speech enhancement approaches. Further, we modify RDL topology to be applicable for the spatial (2D) signals. Hence, inspired by RDL-Nets innovation, we present an attention-based pyramid dilated lattice network (APDL-Net) for blind image denoising. The proposed framework employs a novel pyramid dilated convolution strategy alongside a channel-wise attention mechanism to effectively capture contextual information corresponding to different noise levels through the training of a single model. The extensive empirical studies in image denoising and JPEG artifacts suppression tasks verify the effectiveness and efficiency of the APDL architecture. We also investigate the capability of the lattice topology for hyperspectral image classification. For this purpose, we introduce a new attention-based lattice network (ALN) empowered by a unique joint spectral-spatial attention mechanism to capture spectral and spatial information effectively. The proposed ALN achieves superior accuracy and computational efficiency against state-of-the-art deep learning benchmark approaches for hyperspectral image classification. In addition to the above architectural improvements of CNNs, inspired by geographical analysis, we propose a novel channel-wise spatially autocorrelated (CSA) attention mechanism. The proposed CSA exploits the spatial relationships between feature maps channels. It also employs a unique hybrid spatial contiguity measure based on directional metrics to measure the degree of spatial closeness between feature maps effectively. Furthermore, imposing negligible learning parameters and light computational overhead to the deep model, making CSA a powerful yet efficient attention module of choice. The experimental results on large scale image classification and object detection datasets demonstrate that CSA-Nets can consistently achieve superior performance than different state-of-the-art attention-based CNNs. Besides the above architectural and attention-based advances, this research presents a simple and novel feature pooling method as gradient-based pooling (GP). This method considers the spatial gradient of the pixels within a pooling region as a key to pick the possible discriminative information. In contrast, other common pooling methods mostly rely on pixel values. The superiority of the GP over other pooling methods is proved through experiments on different benchmark image classification tasks.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Eng & Built Env
Science, Environment, Engineering and Technology
Full Text

44

Gousseau, Clément. "Hyperparameter Optimization for Convolutional Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-272107.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Training algorithms for artificial neural networks depend on parameters called the hyperparameters. They can have a strong influence on the trained model but are often chosen manually with trial and error experiments. This thesis, conducted at Orange Labs Lannion, presents and evaluates three algorithms that aim at solving this task: a naive approach (random search), a Bayesian approach (Tree Parzen Estimator) and an evolutionary approach (Particle Swarm Optimization). A well-known dataset for handwritten digit recognition (MNIST) is used to compare these algorithms. These algorithms are also evaluated on audio classification, which is one of the main activities in the company team where the thesis was conducted. The evolutionary algorithm (PSO) showed better results than the two other methods.
Hyperparameteroptimering är en viktig men svår uppgift vid träning av ett artificiellt neuralt nätverk. Detta examensarbete, genomfört vid Orange Labs Lannion, presenterar och utvärderar tre algoritmer som syftar till att lösa denna uppgift: en naiv strategi (slumpmässig sökning), en Bayesiansk metod (TPE) och en evolutionär strategi (PSO). För att jämföra dessa algoritmer har MNIST-datasetet använts. Algoritmerna utvärderas även med hjälp av ljudklassificering, som är kärnverksamheten på företaget där examensarbetet genomfördes. Evolutionsalgoritmen (PSO) gav bättre resultat än de två andra metoderna.

45

Barhoumi, Amira. "Une approche neuronale pour l’analyse d’opinions en arabe." Thesis, Le Mans, 2020. http://www.theses.fr/2020LEMA1022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Cette thèse s’inscrit dans le cadre de l’analyse d’opinions en arabe. Son objectif consiste à déterminer la polarité globale d’un énoncé textuel donné écrit en Arabe standard moderne (ASM) ou dialectes arabes. Cette thématique est un domaine de recherche en plein essor et a fait l’objet de nombreuses études avec une majorité de travaux actuels traitant des langues indo-européennes, en particulier la langue anglaise. Une des difficultés à laquelle se confronte cette thèse est le traitement de la langue arabe qui est une langue morphologiquement riche avec une grande variabilité des formes de surface observables dans les données d’apprentissage. Nous souhaitons pallier ce problème en produisant, de manière totalement automatique et contrôlée, de nouvelles représentations vectorielles continues (en anglais embeddings) spécifiques à la langue arabe. Notre étude se concentre sur l’utilisation d’une approche neuronale pour améliorer la détection de polarité, en exploitant la puissance des embeddings. En effet, ceux-ci se sont révélés un atout fondamental dans différentes tâches de traitement automatique des langues naturelles (TALN). Notre contribution dans le cadre de cette thèse porte plusieurs axes. Nous commençons, d’abord, par une étude préliminaire des différentes ressources d’embeddings de mots pré-entraînés existants en langue arabe. Ces embeddings considèrent les mots comme étant des unités séparées par des espaces afin de capturer, dans l'espace de projection, des similarités sémantiques et syntaxiques. Ensuite, nous nous focalisons sur les spécificités de la langue arabe en proposant des embeddings spécifiques pour cette langue. Les phénomènes comme l’agglutination et la richesse morphologique de l’arabe sont alors pris en compte. Ces embeddings spécifiques ont été utilisés, seuls et combinés, comme entrée à deux réseaux neuronaux (l’un convolutif et l’autre récurrent) apportant une amélioration des performances dans la détection de polarité sur un corpus de revues. Nous proposons une analyse poussée des embeddings proposées. Dans une évaluation intrinsèque, nous proposons un nouveau protocole introduisant la notion de la stabilité de polarités (sentiment stability) dans l’espace d'embeddings. Puis, nous proposons une analyse qualitative extrinsèque de nos embeddings en utilisant des méthodes de projection et de visualisation
My thesis is part of Arabic sentiment analysis. Its aim is to determine the global polarity of a given textual statement written in MSA or dialectal arabic. This research area has been subject of numerous studies dealing with Indo-European languages, in particular English. One of difficulties confronting this thesis is the processing of Arabic. In fact, Arabic is a morphologically rich language which implies a greater sparsity : we want to overcome this problem by producing, in a completely automatic way, new arabic specific embeddings. Our study focuses on the use of a neural approach to improve polarity detection, using embeddings. These embeddings have revealed fundamental in various natural languages processing tasks (NLP). Our contribution in this thesis concerns several axis. First, we begin with a preliminary study of the various existing pre-trained word embeddings resources in arabic. These embeddings consider words as space separated units in order to capture semantic and syntactic similarities in the embedding space. Second, we focus on the specifity of Arabic language. We propose arabic specific embeddings that take into account agglutination and morphological richness of Arabic. These specific embeddings have been used, alone and in combined way, as input to neural networks providing an improvement in terms of classification performance. Finally, we evaluate embeddings with intrinsic and extrinsic methods specific to sentiment analysis task. For intrinsic embeddings evaluation, we propose a new protocol introducing the notion of sentiment stability in the embeddings space. We propose also a qualitaive extrinsic analysis of our embeddings by using visualisation methods

46

Bonato, Tommaso. "Time Series Predictions With Recurrent Neural Networks." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2018.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

L'obiettivo principale di questa tesi è studiare come gli algoritmi di apprendimento automatico (machine learning in inglese) e in particolare le reti neurali LSTM (Long Short Term Memory) possano essere utilizzati per prevedere i valori futuri di una serie storica regolare come, per esempio, le funzioni seno e coseno. Una serie storica è definita come una sequenza di osservazioni s_t ordinate nel tempo. Inoltre cercheremo di applicare gli stessi principi per prevedere i valori di una serie storica prodotta utilizzando i dati di vendita di un prodotto cosmetico durante un periodo di tre anni. Prima di arrivare alla parte pratica di questa tesi è necessario introdurre alcuni concetti fondamentali che saranno necessari per sviluppare l'architettura e il codice del nostro modello. Sia nell'introduzione teorica che nella parte pratica l'attenzione sarà focalizzata sull'uso di RNN (Recurrent Neural Network o Rete Neurale Ricorrente) poiché sono le reti neurali più adatte a questo tipo di problema. Un particolare tipo di RNN, chiamato Long Short Term Memory (LSTM), sarà soggetto dello studio principale di questa tesi e verrà presentata e utilizzata anche una delle sue varianti chiamata Gated Recurrent Unit (GRU). Questa tesi, in conclusione, conferma che LSTM e GRU sono il miglior tipo di rete neurale per le previsioni di serie temporali. Nell'ultima parte analizzeremo le differenze tra l'utilizzo di una CPU e una GPU durante la fase di training della rete neurale.

47

Brax, Christoffer. "Recurrent neural networks for time-series prediction." Thesis, University of Skövde, Department of Computer Science, 2000. http://urn.kb.se/resolve?urn=urn:nbn:se:his:diva-480.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Recurrent neural networks have been used for time-series prediction with good results. In this dissertation recurrent neural networks are compared with time-delayed feed forward networks, feed forward networks and linear regression models on a prediction task. The data used in all experiments is real-world sales data containing two kinds of segments: campaign segments and non-campaign segments. The task is to make predictions of sales under campaigns. It is evaluated if more accurate predictions can be made when only using the campaign segments of the data.

Throughout the entire project a knowledge discovery process, identified in the literature has been used to give a structured work-process. The results show that the recurrent network is not better than the other evaluated algorithms, in fact, the time-delayed feed forward neural network showed to give the best predictions. The results also show that more accurate predictions could be made when only using information from campaign segments.

48

Ljungehed, Jesper. "Predicting Customer Churn Using Recurrent Neural Networks." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210670.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Churn prediction is used to identify customers that are becoming less loyal and is an important tool for companies that want to stay competitive in a rapidly growing market. In retail, a dynamic definition of churn is needed to identify churners correctly. Customer Lifetime Value (CLV) is the monetary value of a customer relationship. No change in CLV for a given customer indicates a decrease in loyalty. This thesis proposes a novel approach to churn prediction. The proposed model uses a Recurrent Neural Network to identify churners based on Customer Lifetime Value time series regression. The results show that the model performs better than random. This thesis also investigated the use of the K-means algorithm as a replacement to a rule-extraction algorithm. The K-means algorithm contributed to a more comprehensive analytical context regarding the churn prediction of the proposed model.
Illojalitet prediktering används för att identifiera kunder som är påväg att bli mindre lojala och är ett hjälpsamt verktyg för att ett företag ska kunna driva en konkurrenskraftig verksamhet. I detaljhandel behöves en dynamisk definition av illojalitet för att korrekt kunna identifera illojala kunder. Kundens livstidsvärde är ett mått på monetärt värde av en kundrelation. En avstannad förändring av detta värde indikerar en minskning av kundens lojalitet. Denna rapport föreslår en ny metod för att utföra illojalitet prediktering. Den föreslagna metoden består av ett återkommande neuralt nätverk som används för att identifiera illojalitet hos kunder genom att prediktera kunders livstidsvärde. Resultaten visar att den föreslagna modellen presterar bättre jämfört med slumpmässig metod. Rapporten undersöker också användningen av en k-medelvärdesalgoritm som ett substitut för en regelextraktionsalgoritm. K-medelsalgoritm bidrog till en mer omfattande analys av illojalitet predikteringen.

49

Rabi, Gihad. "Visual speech recognition by recurrent neural networks." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0010/MQ36169.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Miller, Paul Ian. "Recurrent neural networks and adaptive motor control." Thesis, University of Stirling, 1997. http://hdl.handle.net/1893/21520.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis is concerned with the use of neural networks for motor control tasks. The main goal of the thesis is to investigate ways in which the biological notions of motor programs and Central Pattern Generators (CPGs) may be implemented in a neural network framework. Biological CPGs can be seen as components within a larger control scheme, which is basically modular in design. In this thesis, these ideas are investigated through the use of modular recurrent networks, which are used in a variety of control tasks. The first experimental chapter deals with learning in recurrent networks, and it is shown that CPGs may be easily implemented using the machinery of backpropagation. The use of these CPGs can aid the learning of pattern generation tasks; they can also mean that the other components in the system can be reduced in complexity, say, to a purely feedforward network. It is also shown that incremental learning, or 'shaping' is an effective method for building CPGs. Genetic algorithms are also used to build CPGs; although computational effort prevents this from being a practical method, it does show that GAs are capable of optimising systems that operate in the context of a larger scheme. One interesting result from the GA is that optimal CPGs tend to have unstable dynamics, which may have implications for building modular neural controllers. The next chapter applies these ideas to some simple control tasks involving a highly redundant simulated robot arm. It was shown that it is relatively straightforward to build CPGs that represent elements of pattern generation, constraint satisfaction. and local feedback. This is indirect control, in which errors are backpropagated through a plant model, as well as the ePG itself, to give errors for the controller. Finally, the third experimental chapter takes an alternative approach, and uses direct control methods, such as reinforcement learning. In reinforcement learning, controller outputs have unmodelled effects; this allows us to build complex control systems, where outputs modulate the couplings between sets of dynamic systems. This was shown for a simple case, involving a system of coupled oscillators. A second set of experiments investigates the use of simplified models of behaviour; this is a reduced form of supervised learning, and the use of such models in control is discussed.

Dissertations / Theses on the topic 'Convolutional recurrent neural networks'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles