Log in

Relevant bibliographies by topics / Reinforcement learning (Machine learning) / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Reinforcement learning (Machine learning).

Dissertations / Theses on the topic 'Reinforcement learning (Machine learning)'

Author: Grafiati

Published: 4 June 2021

Last updated: 7 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Reinforcement learning (Machine learning).'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Hengst, Bernhard Computer Science &amp Engineering Faculty of Engineering UNSW. "Discovering hierarchy in reinforcement learning." Awarded by:University of New South Wales. Computer Science and Engineering, 2003. http://handle.unsw.edu.au/1959.4/20497.

Full text

Abstract:

This thesis addresses the open problem of automatically discovering hierarchical structure in reinforcement learning. Current algorithms for reinforcement learning fail to scale as problems become more complex. Many complex environments empirically exhibit hierarchy and can be modeled as interrelated subsystems, each in turn with hierarchic structure. Subsystems are often repetitive in time and space, meaning that they reoccur as components of different tasks or occur multiple times in different circumstances in the environment. A learning agent may sometimes scale to larger problems if it successfully exploits this repetition. Evidence suggests that a bottom up approach that repetitively finds building-blocks at one level of abstraction and uses them as background knowledge at the next level of abstraction, makes learning in many complex environments tractable. An algorithm, called HEXQ, is described that automatically decomposes and solves a multi-dimensional Markov decision problem (MDP) by constructing a multi-level hierarchy of interlinked subtasks without being given the model beforehand. The effectiveness and efficiency of the HEXQ decomposition depends largely on the choice of representation in terms of the variables, their temporal relationship and whether the problem exhibits a type of constrained stochasticity. The algorithm is first developed for stochastic shortest path problems and then extended to infinite horizon problems. The operation of the algorithm is demonstrated using a number of examples including a taxi domain, various navigation tasks, the Towers of Hanoi and a larger sporting problem. The main contributions of the thesis are the automation of (1)decomposition, (2) sub-goal identification, and (3) discovery of hierarchical structure for MDPs with states described by a number of variables or features. It points the way to further scaling opportunities that encompass approximations, partial observability, selective perception, relational representations and planning. The longer term research aim is to train rather than program intelligent agents

APA, Harvard, Vancouver, ISO, and other styles

2

Tabell, Johnsson Marco, and Ala Jafar. "Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents." Thesis, Blekinge Tekniska Högskola, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20218.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Akrour, Riad. "Robust Preference Learning-based Reinforcement Learning." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112236/document.

Full text

Abstract:

Les contributions de la thèse sont centrées sur la prise de décisions séquentielles et plus spécialement sur l'Apprentissage par Renforcement (AR). Prenant sa source de l'apprentissage statistique au même titre que l'apprentissage supervisé et non-supervisé, l'AR a gagné en popularité ces deux dernières décennies en raisons de percées aussi bien applicatives que théoriques. L'AR suppose que l'agent (apprenant) ainsi que son environnement suivent un processus de décision stochastique Markovien sur un espace d'états et d'actions. Le processus est dit de décision parce que l'agent est appelé à choisir à chaque pas de temps du processus l'action à prendre. Il est dit stochastique parce que le choix d'une action donnée en un état donné n'implique pas le passage systématique à un état particulier mais définit plutôt une distribution sur l'espace d'états. Il est dit Markovien parce que cette distribution ne dépend que de l'état et de l'action courante. En conséquence d'un choix d'action, l'agent reçoit une récompense. Le but de l'AR est alors de résoudre le problème d'optimisation retournant le comportement qui assure à l'agent une récompense maximale tout au long de son interaction avec l'environnement. D'un point de vue pratique, un large éventail de problèmes peuvent être transformés en un problème d'AR, du Backgammon (cf. TD-Gammon, l'une des premières grandes réussites de l'AR et de l'apprentissage statistique en général, donnant lieu à un joueur expert de classe internationale) à des problèmes de décision dans le monde industriel ou médical. Seulement, le problème d'optimisation résolu par l'AR dépend de la définition préalable d'une fonction de récompense adéquate nécessitant une expertise certaine du domaine d'intérêt mais aussi du fonctionnement interne des algorithmes d'AR. En ce sens, la première contribution de la thèse a été de proposer un nouveau cadre d'apprentissage, allégeant les prérequis exigés à l'utilisateur. Ainsi, ce dernier n'a plus besoin de connaître la solution exacte du problème mais seulement de pouvoir désigner entre deux comportements, celui qui s'approche le plus de la solution. L'apprentissage se déroule en interaction entre l'utilisateur et l'agent. Cette interaction s'articule autour des trois points suivants : i) L'agent exhibe un nouveau comportement ii) l'expert le compare au meilleur comportement jusqu'à présent iii) l'agent utilise ce retour pour mettre à jour son modèle des préférences puis choisit le prochain comportement à démontrer. Afin de réduire le nombre d'interactions nécessaires entre l'utilisateur et l'agent pour que ce dernier trouve le comportement optimal, la seconde contribution de la thèse a été de définir un critère théoriquement justifié faisant le compromis entre les désirs parfois contradictoires de prendre en compte les préférences de l'utilisateur tout en exhibant des comportements suffisamment différents de ceux déjà proposés. La dernière contribution de la thèse est d'assurer la robustesse de l'algorithme face aux éventuelles erreurs d'appréciation de l'utilisateur. Ce qui arrive souvent en pratique, spécialement au début de l'interaction, quand tous les comportements proposés par l'agent sont loin de la solution attendue
The thesis contributions resolves around sequential decision taking and more precisely Reinforcement Learning (RL). Taking its root in Machine Learning in the same way as supervised and unsupervised learning, RL quickly grow in popularity within the last two decades due to a handful of achievements on both the theoretical and applicative front. RL supposes that the learning agent and its environment follow a stochastic Markovian decision process over a state and action space. The process is said of decision as the agent is asked to choose at each time step an action to take. It is said stochastic as the effect of selecting a given action in a given state does not systematically yield the same state but rather defines a distribution over the state space. It is said to be Markovian as this distribution only depends on the current state-action pair. Consequently to the choice of an action, the agent receives a reward. The RL goal is then to solve the underlying optimization problem of finding the behaviour that maximizes the sum of rewards all along the interaction of the agent with its environment. From an applicative point of view, a large spectrum of problems can be cast onto an RL one, from Backgammon (TD-Gammon, was one of Machine Learning first success giving rise to a world class player of advanced level) to decision problems in the industrial and medical world. However, the optimization problem solved by RL depends on the prevous definition of a reward function that requires a certain level of domain expertise and also knowledge of the internal quirks of RL algorithms. As such, the first contribution of the thesis was to propose a learning framework that lightens the requirements made to the user. The latter does not need anymore to know the exact solution of the problem but to only be able to choose between two behaviours exhibited by the agent, the one that matches more closely the solution. Learning is interactive between the agent and the user and resolves around the three main following points: i) The agent demonstrates a behaviour ii) The user compares it w.r.t. to the current best one iii) The agent uses this feedback to update its preference model of the user and uses it to find the next behaviour to demonstrate. To reduce the number of required interactions before finding the optimal behaviour, the second contribution of the thesis was to define a theoretically sound criterion making the trade-off between the sometimes contradicting desires of complying with the user's preferences and demonstrating sufficiently different behaviours. The last contribution was to ensure the robustness of the algorithm w.r.t. the feedback errors that the user might make. Which happens more often than not in practice, especially at the initial phase of the interaction, when all the behaviours are far from the expected solution

APA, Harvard, Vancouver, ISO, and other styles

4

Lee, Siu-keung, and 李少強. "Reinforcement learning for intelligent assembly automation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2002. http://hub.hku.hk/bib/B31244397.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Tebbifakhr, Amirhossein. "Machine Translation For Machines." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/320504.

Full text

Abstract:

Traditionally, Machine Translation (MT) systems are developed by targeting fluency (i.e. output grammaticality) and adequacy (i.e. semantic equivalence with the source text) criteria that reflect the needs of human end-users. However, recent advancements in Natural Language Processing (NLP) and the introduction of NLP tools in commercial services have opened new opportunities for MT. A particularly relevant one is related to the application of NLP technologies in low-resource language settings, for which the paucity of training data reduces the possibility to train reliable services. In this specific condition, MT can come into play by enabling the so-called “translation-based” workarounds. The idea is simple: first, input texts in the low-resource language are translated into a resource-rich target language; then, the machine-translated text is processed by well-trained NLP tools in the target language; finally, the output of these downstream components is projected back to the source language. This results in a new scenario, in which the end-user of MT technology is no longer a human but another machine. We hypothesize that current MT training approaches are not the optimal ones for this setting, in which the objective is to maximize the performance of a downstream tool fed with machine-translated text rather than human comprehension. Under this hypothesis, this thesis introduces a new research paradigm, which we named “MT for machines”, addressing a number of questions that raise from this novel view of the MT problem. Are there different quality criteria for humans and machines? What makes a good translation from the machine standpoint? What are the trade-offs between the two notions of quality? How to pursue machine-oriented objectives? How to serve different downstream components with a single MT system? How to exploit knowledge transfer to operate in different language settings with a single MT system? Elaborating on these questions, this thesis: i) introduces a novel and challenging MT paradigm, ii) proposes an effective method based on Reinforcement Learning analysing its possible variants, iii) extends the proposed method to multitask and multilingual settings so as to serve different downstream applications and languages with a single MT system, iv) studies the trade-off between machine-oriented and human-oriented criteria, and v) discusses the successful application of the approach in two real-world scenarios.

APA, Harvard, Vancouver, ISO, and other styles

6

Yang, Zhaoyuan Yang. "Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning Approach." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152411491981452.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Scholz, Jonathan. "Physics-based reinforcement learning for autonomous manipulation." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54366.

Full text

Abstract:

With recent research advances, the dream of bringing domestic robots into our everyday lives has become more plausible than ever. Domestic robotics has grown dramatically in the past decade, with applications ranging from house cleaning to food service to health care. To date, the majority of the planning and control machinery for these systems are carefully designed by human engineers. A large portion of this effort goes into selecting the appropriate models and control techniques for each application, and these skills take years to master. Relieving the burden on human experts is therefore a central challenge for bringing robot technology to the masses. This work addresses this challenge by introducing a physics engine as a model space for an autonomous robot, and defining procedures for enabling robots to decide when and how to learn these models. We also present an appropriate space of motor controllers for these models, and introduce ways to intelligently select when to use each controller based on the estimated model parameters. We integrate these components into a framework called Physics-Based Reinforcement Learning, which features a stochastic physics engine as the core model structure. Together these methods enable a robot to adapt to unfamiliar environments without human intervention. The central focus of this thesis is on fast online model learning for objects with under-specified dynamics. We develop our approach across a diverse range of domestic tasks, starting with a simple table-top manipulation task, followed by a mobile manipulation task involving a single utility cart, and finally an open-ended navigation task with multiple obstacles impeding robot progress. We also present simulation results illustrating the efficiency of our method compared to existing approaches in the learning literature.

APA, Harvard, Vancouver, ISO, and other styles

8

Cleland, Andrew Lewis. "Bounding Box Improvement with Reinforcement Learning." PDXScholar, 2018. https://pdxscholar.library.pdx.edu/open_access_etds/4438.

Full text

Abstract:

In this thesis, I explore a reinforcement learning technique for improving bounding box localizations of objects in images. The model takes as input a bounding box already known to overlap an object and aims to improve the fit of the box through a series of transformations that shift the location of the box by translation, or change its size or aspect ratio. Over the course of these actions, the model adapts to new information extracted from the image. This active localization approach contrasts with existing bounding-box regression methods, which extract information from the image only once. I implement, train, and test this reinforcement learning model using data taken from the Portland State Dog-Walking image set. The model balances exploration with exploitation in training using an ε-greedy policy. I find that the performance of the model is sensitive to the ε-greedy configuration used during training, performing best when the epsilon parameter is set to very low values over the course of training. With = 0.01, I find the algorithm can improve bounding boxes in about 78% of test cases for the "dog" object category, and 76% for the "human" category.

APA, Harvard, Vancouver, ISO, and other styles

9

Piano, Francesco. "Deep Reinforcement Learning con PyTorch." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25340/.

Full text

Abstract:

Il Reinforcement Learning è un campo di ricerca del Machine Learning in cui la risoluzione di problemi da parte di un agente avviene scegliendo l’azione più idonea da eseguire attraverso un processo di apprendimento iterativo, in un ambiente dinamico che lo incentiva tramite ricompense. Il Deep Learning, anch’esso approccio del Machine Learning, sfruttando una rete neurale artificiale è in grado di applicare metodi di apprendimento per rappresentazione allo scopo di ottenere una struttura dei dati più idonea ad essere elaborata. Solo recentemente il Deep Reinforcement Learning, creato dalla combinazione di questi due paradigmi di apprendimento, ha permesso di risolvere problemi considerati prima intrattabili riscuotendo un notevole successo e rinnovando l’interesse dei ricercatori riguardo l’applicazione degli algoritmi di Reinforcement Learning. Con questa tesi si è voluto approfondire lo studio del Reinforcement Learning applicato a problemi semplici, per poi esaminare come esso possa superare i propri limiti caratteristici attraverso l’utilizzo delle reti neurali artificiali, in modo da essere applicato in un contesto di Deep Learning attraverso l'utilizzo del framework PyTorch, una libreria attualmente molto usata per il calcolo scientifico e il Machine Learning.

APA, Harvard, Vancouver, ISO, and other styles

10

Suggs, Sterling. "Reinforcement Learning with Auxiliary Memory." BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/9028.

Full text

Abstract:

Deep reinforcement learning algorithms typically require vast amounts of data to train to a useful level of performance. Each time new data is encountered, the network must inefficiently update all of its parameters. Auxiliary memory units can help deep neural networks train more efficiently by separating computation from storage, and providing a means to rapidly store and retrieve precise information. We present four deep reinforcement learning models augmented with external memory, and benchmark their performance on ten tasks from the Arcade Learning Environment. Our discussion and insights will be helpful for future RL researchers developing their own memory agents.

APA, Harvard, Vancouver, ISO, and other styles

11

Jesu, Alberto. "Reinforcement learning over encrypted data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23257/.

Full text

Abstract:

Reinforcement learning is a particular paradigm of machine learning that, recently, has proved times and times again to be a very effective and powerful approach. On the other hand, cryptography usually takes the opposite direction. While machine learning aims at analyzing data, cryptography aims at maintaining its privacy by hiding such data. However, the two techniques can be jointly used to create privacy preserving models, able to make inferences on the data without leaking sensitive information. Despite the numerous amount of studies performed on machine learning and cryptography, reinforcement learning in particular has never been applied to such cases before. Being able to successfully make use of reinforcement learning in an encrypted scenario would allow us to create an agent that efficiently controls a system without providing it with full knowledge of the environment it is operating in, leading the way to many possible use cases. Therefore, we have decided to apply the reinforcement learning paradigm to encrypted data. In this project we have applied one of the most well-known reinforcement learning algorithms, called Deep Q-Learning, to simple simulated environments and studied how the encryption affects the training performance of the agent, in order to see if it is still able to learn how to behave even when the input data is no longer readable by humans. The results of this work highlight that the agent is still able to learn with no issues whatsoever in small state spaces with non-secure encryptions, like AES in ECB mode. For fixed environments, it is also able to reach a suboptimal solution even in the presence of secure modes, like AES in CBC mode, showing a significant improvement with respect to a random agent; however, its ability to generalize in stochastic environments or big state spaces suffers greatly.

APA, Harvard, Vancouver, ISO, and other styles

12

Gustafsson, Robin, and Lucas Fröjdendahl. "Machine Learning for Traffic Control of Unmanned Mining Machines : Using the Q-learning and SARSA algorithms." Thesis, KTH, Hälsoinformatik och logistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260285.

Full text

Abstract:

Manual configuration of rules for unmanned mining machine traffic control can be time-consuming and therefore expensive. This paper presents a Machine Learning approach for automatic configuration of rules for traffic control in mines with autonomous mining machines by using Q-learning and SARSA. The results show that automation might be able to cut the time taken to configure traffic rules from 1-2 weeks to a maximum of approximately 6 hours which would decrease the cost of deployment. Tests show that in the worst case the developed solution is able to run continuously for 24 hours 82% of the time compared to the 100% accuracy of the manual configuration. The conclusion is that machine learning can plausibly be used for the automatic configuration of traffic rules. Further work in increasing the accuracy to 100% is needed for it to replace manual configuration. It remains to be examined whether the conclusion retains pertinence in more complex environments with larger layouts and more machines.
Manuell konfigurering av trafikkontroll för obemannade gruvmaskiner kan vara en tidskrävande process. Om denna konfigurering skulle kunna automatiseras så skulle det gynnas tidsmässigt och ekonomiskt. Denna rapport presenterar en lösning med maskininlärning med Q-learning och SARSA som tillvägagångssätt. Resultaten visar på att konfigureringstiden möjligtvis kan tas ned från 1–2 veckor till i värsta fallet 6 timmar vilket skulle minska kostnaden för produktionssättning. Tester visade att den slutgiltiga lösningen kunde köra kontinuerligt i 24 timmar med minst 82% träffsäkerhet jämfört med 100% då den manuella konfigurationen används. Slutsatsen är att maskininlärning eventuellt kan användas för automatisk konfiguration av trafikkontroll. Vidare arbete krävs för att höja träffsäkerheten till 100% så att det kan användas istället för manuell konfiguration. Fler studier bör göras för att se om detta även är sant och applicerbart för mer komplexa scenarier med större gruvlayouts och fler maskiner.

APA, Harvard, Vancouver, ISO, and other styles

13

Mariani, Tommaso. "Deep reinforcement learning for industrial applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20548/.

Full text

Abstract:

In recent years there has been a growing attention from the world of research and companies in the field of Machine Learning. This interest, thanks mainly to the increasing availability of large amounts of data, and the respective strengthening of the hardware sector useful for their analysis, has led to the birth of Deep Learning. The growing computing capacity and the use of mathematical optimization techniques, already studied in depth but with few applications due to a low computational power, have then allowed the development of a new approach called Reinforcement Learning. This thesis work is part of an industrial process of selection of fruit for sale, thanks to the identification and classification of any defects present on it. The final objective is to measure the similarity between them, being able to identify and link them together, even if coming from optical acquisitions obtained at different time steps. We therefore studied a class of algorithms characteristic of Reinforcement Learning, the policy gradient methods, in order to train a feedforward neural network to compare possible malformations of the same fruit. Finally, an applicability study was made, based on real data, in which the model was compared on different fruit rolling dynamics and with different versions of the network.

APA, Harvard, Vancouver, ISO, and other styles

14

Cleland, Benjamin George. "Reinforcement Learning for Racecar Control." The University of Waikato, 2006. http://hdl.handle.net/10289/2507.

Full text

Abstract:

This thesis investigates the use of reinforcement learning to learn to drive a racecar in the simulated environment of the Robot Automobile Racing Simulator. Real-life race driving is known to be difficult for humans, and expert human drivers use complex sequences of actions. There are a large number of variables, some of which change stochastically and all of which may affect the outcome. This makes driving a promising domain for testing and developing Machine Learning techniques that have the potential to be robust enough to work in the real world. Therefore the principles of the algorithms from this work may be applicable to a range of problems. The investigation starts by finding a suitable data structure to represent the information learnt. This is tested using supervised learning. Reinforcement learning is added and roughly tuned, and the supervised learning is then removed. A simple tabular representation is found satisfactory, and this avoids difficulties with more complex methods and allows the investigation to concentrate on the essentials of learning. Various reward sources are tested and a combination of three are found to produce the best performance. Exploration of the problem space is investigated. Results show exploration is essential but controlling how much is done is also important. It turns out the learning episodes need to be very long and because of this the task needs to be treated as continuous by using discounting to limit the size of the variables stored. Eligibility traces are used with success to make the learning more efficient. The tabular representation is made more compact by hashing and more accurate by using smaller buckets. This slows the learning but produces better driving. The improvement given by a rough form of generalisation indicates the replacement of the tabular method by a function approximator is warranted. These results show reinforcement learning can work within the Robot Automobile Racing Simulator, and lay the foundations for building a more efficient and competitive agent.

APA, Harvard, Vancouver, ISO, and other styles

15

Suay, Halit Bener. "Reinforcement Learning from Demonstration." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/173.

Full text

Abstract:

Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent's own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na"ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently.

APA, Harvard, Vancouver, ISO, and other styles

16

Pipe, Anthony Graham. "Reinforcement learning and knowledge transformation in mobile robotics." Thesis, University of the West of England, Bristol, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.364077.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Chalup, Stephan Konrad. "Incremental learning with neural networks, evolutionary computation and reinforcement learning algorithms." Thesis, Queensland University of Technology, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

18

Le, Piane Fabio. "Training cognitivo adattativo mediante Reinforcement Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/17289/.

Full text

Abstract:

La sclerosi multipla (SM) è una malattia autoimmune che colpisce il sistema nervoso centrale causando varie alterazioni organiche e funzionali. In particolare, una rilevante percentuale di pazienti sviluppa deficit in differenti domini cognitivi. Per limitare la progressione di tali deficit, team specialistici hanno ideato dei protocolli per la riabilitazione cognitiva. Per effettuare le sedute di riabilitazione, i pazienti devono recarsi in cliniche specializzate, necessitando dell'assistenza di personale qualificato e svolgendo gli esercizi tramite scrittura su carta. In seguito, si è iniziato un percorso verso la digitalizzazione di questo genere di esperienze. Un team multidisciplinare composto da ricercatori del DISI - Università di Bologna e da specialisti di vari centri italiani ha progettato un software, MS-Rehab, il cui scopo è fornire alle strutture sanitarie un sistema completo e di facile utilizzo specifico per la riabilitazione della SM. Tale software permette lo svolgimento di numerosi esercizi nei tre domini cognitivi: attenzione, memoria e funzioni esecutive. Questo lavoro di tesi si è concentrato sull'integrazione di metodi di Reinforcement Learning (RL) all'interno di MS-Rehab, allo scopo di realizzare un meccanismo per l'automatizzazione adattiva della difficoltà degli esercizi. Tale soluzione è inedita nell'ambito della riabilitazione cognitiva. Allo scopo di verificare se tale soluzione permettesse un’esperienza riabilitativa pari o superiore a quella fornita attualmente, è stato realizzato un esperimento basato sulla somministrazione ad individui selezionati di un test preliminare, atto a valutare il loro livello nelle funzioni cognitive di attenzione e memoria, seguito poi da un periodo di allenamento su MS-rehab, e infine da una nuova istanza del test iniziale. I risultati ottenuti sono incoraggianti: le prestazioni del test neuro-psicologico hanno evidenziato punteggi sensibilmente più alti per il gruppo che ha utilizzato la versione con RL.

APA, Harvard, Vancouver, ISO, and other styles

19

Rouet-Leduc, Bertrand. "Machine learning for materials science." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/267987.

Full text

Abstract:

Machine learning is a branch of artificial intelligence that uses data to automatically build inferences and models designed to generalise and make predictions. In this thesis, the use of machine learning in materials science is explored, for two different problems: the optimisation of gallium nitride optoelectronic devices, and the prediction of material failure in the setting of laboratory earthquakes. Light emitting diodes based on III-nitrides quantum wells have become ubiquitous as a light source, owing to their direct band-gap that covers UV, visible and infra-red light, and their very high quantum efficiency. This efficiency originates from most electronic transitions across the band-gap leading to the emission of a photon. At high currents however this efficiency sharply drops. In chapters 3 and 4 simulations are shown to provide an explanation for experimental results, shedding a new light on this drop of efficiency. Chapter 3 provides a simple and yet accurate model that explains the experimentally observed beneficial effect that silicon doping has on light emitting diodes. Chapter 4 provides a model for the experimentally observed detrimental effect that certain V-shaped defects have on light emitting diodes. These results pave the way for the association of simulations to detailed multi-microscopy. In the following chapters 5 to 7, it is shown that machine learning can leverage the use of device simulations, by replacing in a targeted and efficient way the very labour intensive tasks of making sure the numerical parameters of the simulations lead to convergence, and that the physical parameters reproduce experimental results. It is then shown that machine learning coupled with simulations can find optimal light emitting diodes structures, that have a greatly enhanced theoretical efficiency. These results demonstrate the power of machine learning for leveraging and automatising the exploration of device structures in simulations. Material failure is a very broad problem encountered in a variety of fields, ranging from engineering to Earth sciences. The phenomenon stems from complex and multi-scale physics, and failure experiments can provide a wealth of data that can be exploited by machine learning. In chapter 8 it is shown that by recording the acoustic waves emitted during the failure of a laboratory fault, an accurate predictive model can be built. The machine learning algorithm that is used retains the link with the physics of the experiment, and a new signal is thus discovered in the sound emitted by the fault. This new signal announces an upcoming laboratory earthquake, and is a signature of the stress state of the material. These results show that machine learning can help discover new signals in experiments where the amount of data is very large, and demonstrate a new method for the prediction of material failure.

APA, Harvard, Vancouver, ISO, and other styles

20

Addis, Antonio. "Deep reinforcement learning optimization of video streaming." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.

Find full text

Abstract:

Questa tesi si occuperà dell'ottimizzazione delle performance di streaming video attraverso internet, divenute particolarmente problematiche con l'avvento delle nuove risoluzioni ultraHD e i video a 360 gradi per la realtà virtuale. Verranno confrontate le performance ottenute con gli algoritmi che attualmente fanno parte dello stato dell'arte, e sviluppato un modello di reinforcement learning che sia capace di effettuare scelte per migliorare la QoE(quality of experience) durante una sessione di streaming. Per i video a 360 gradi, verrà inoltre implementata la tecnica snapchange, con questo metodo è possibile ridurre la banda utilizzata durante lo streaming, forzando la riposizione dello sguardo dell'utente in un'area di maggior interesse del video.

APA, Harvard, Vancouver, ISO, and other styles

21

Janagam, Anirudh, and Saddam Hossen. "Analysis of Network Intrusion Detection System with Machine Learning Algorithms (Deep Reinforcement Learning Algorithm)." Thesis, Blekinge Tekniska Högskola, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17126.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Weideman, Ryan. "Robot Navigation in Cluttered Environments with Deep Reinforcement Learning." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2011.

Full text

Abstract:

The application of robotics in cluttered and dynamic environments provides a wealth of challenges. This thesis proposes a deep reinforcement learning based system that determines collision free navigation robot velocities directly from a sequence of depth images and a desired direction of travel. The system is designed such that a real robot could be placed in an unmapped, cluttered environment and be able to navigate in a desired direction with no prior knowledge. Deep Q-learning, coupled with the innovations of double Q-learning and dueling Q-networks, is applied. Two modifications of this architecture are presented to incorporate direction heading information that the reinforcement learning agent can utilize to learn how to navigate to target locations while avoiding obstacles. The performance of the these two extensions of the D3QN architecture are evaluated in simulation in simple and complex environments with a variety of common obstacles. Results show that both modifications enable the agent to successfully navigate to target locations, reaching 88% and 67% of goals in a cluttered environment, respectively.

APA, Harvard, Vancouver, ISO, and other styles

23

PASQUALINI, LUCA. "Real World Problems through Deep Reinforcement Learning." Doctoral thesis, Università di Siena, 2022. http://hdl.handle.net/11365/1192945.

Full text

Abstract:

Reinforcement Learning (RL) represents a very promising field in the umbrella of Machine Learning (ML). Using algorithms inspired by psychology, specifically by the Operant Conditioning of Behaviorism, RL makes it possible to solve problems from scratch, without any prior knowledge nor data about the task at hand. When used in conjuction with Neural Networks (NNs), RL has proven to be especially effective: we call this Deep Reinforcement Learning (DRL). In recent past, DRL proved super-human capabilities on many games, but its real world applications are varied and range from robotics to general optimization problems. One of the main focuses of current research and literature in the broader field of Machine Learning (ML) revolves around benchmarks, in a never ending challenge between researchers to the last decimal figure on certain metrics. However, having to pass some benchmark or to beat some other approach as the main objective is, more often than not, limiting from the point of view of actually contributing to the overall goal of ML: to automate as many real tasks as possible. Following this intuition, this thesis proposes to first analyze a collection of really varied real world tasks and then to develop a set of associated models. Finally, we apply DRL to solve these tasks by means of exploration and exploitation of these models. Specifically, we start from studying how using the score as target influences the performance of a well-known artificial player of Go, in order to develop an agent capable of teaching humans how to play to maximize their score. Then, we move onto machine creativity, using DRL in conjuction with state-of-the-art Natural Language Processing (NLP) techniques to generate and revise poems in a human-like fashion. We then dive deep into a queue optimization task, to dynamically schedule Ultra Reliable Low Latency Communication (URLLC) packets on top of a set of frequencies previously allocated for enhanced Mobile Broad Band (eMBB) users. Finally, we propose a novel DRL approach to the task of generating black-box Pseudo Random Number Generators (PRNGs) with variable periods, by exploiting the autonomous navigation of a state-of-the-art DRL algorithm both in a feedforward and a recurrent fashion.

APA, Harvard, Vancouver, ISO, and other styles

24

Song, Yupu. "A Forex Trading System Using Evolutionary Reinforcement Learning." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-theses/1240.

Full text

Abstract:

Building automated trading systems has long been one of the most cutting-edge and exciting fields in the financial industry. In this research project, we built a trading system based on machine learning methods. We used the Recurrent Reinforcement Learning (RRL) algorithm as our fundamental algorithm, and by introducing Genetic Algorithms (GA) in the optimization procedure, we tackled the problems of picking good initial values of parameters and dynamically updating the learning speed in the original RRL algorithm. We call this optimization algorithm the Evolutionary Recurrent Reinforcement Learning algorithm (ERRL), or the GA-RRL algorithm. ERRL allows us to find many local optimal solutions easier and faster than the original RRL algorithm. Finally, we implemented the GA-RRL system on EUR/USD at a 5-minute level, and the backtest performance showed that our GA-RRL system has potentially promising profitability. In future research we plan to introduce some risk control mechanism, implement the system on different markets and assets, and perform backtest at higher frequency level.

APA, Harvard, Vancouver, ISO, and other styles

25

Mitchell, Matthew Winston 1968. "An architecture for situated learning agents." Monash University, School of Computer Science and Software Engineering, 2003. http://arrow.monash.edu.au/hdl/1959.1/5553.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Tarbouriech, Jean. "Goal-oriented exploration for reinforcement learning." Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILB014.

Full text

Abstract:

Apprendre à atteindre des buts est une compétence à acquérir à grande pertinence pratique pour des agents intelligents. Par exemple, ceci englobe de nombreux problèmes de navigation (se diriger vers telle destination), de manipulation robotique (atteindre telle position du bras robotique) ou encore certains jeux (gagner en accomplissant tel objectif). En tant qu'être vivant interagissant avec le monde, je suis constamment motivé par l'atteinte de buts, qui varient en portée et difficulté.L'Apprentissage par Renforcement (AR) est un paradigme prometteur pour formaliser et apprendre des comportements d'atteinte de buts. Un but peut être modélisé comme une configuration spécifique d'états de l'environnement qui doit être atteinte par interaction séquentielle et exploration de l'environnement inconnu. Bien que divers algorithmes en AR dit "profond" aient été proposés pour ce modèle d'apprentissage conditionné par des états buts, les méthodes existantes manquent de compréhension rigoureuse, d'efficacité d'échantillonnage et de capacités polyvalentes. Il s'avère que l'analyse théorique de l'AR conditionné par des états buts demeurait très limitée, même dans le scénario basique d'un nombre fini d'états et d'actions.Premièrement, nous nous concentrons sur le scénario supervisé, où un état but qui doit être atteint en minimisant l'espérance des coûts cumulés est fourni dans la définition du problème. Après avoir formalisé le problème d'apprentissage incrémental (ou ``online'') de ce modèle souvent appelé Plus Court Chemin Stochastique, nous introduisons deux algorithmes au regret sous-linéaire (l'un est le premier disponible dans la littérature, l'autre est quasi-optimal).Au delà d'entraîner l'agent d'AR à résoudre une seule tâche, nous aspirons ensuite qu'il apprenne de manière autonome à résoudre une grande variété de tâches, dans l'absence de toute forme de supervision en matière de récompense. Dans ce scénario non-supervisé, nous préconisons que l'agent sélectionne lui-même et cherche à atteindre ses propres états buts. Nous dérivons des garanties non-asymptotiques de cette heuristique populaire dans plusieurs cadres, chacun avec son propre objectif d'exploration et ses propres difficultés techniques. En guise d'illustration, nous proposons une analyse rigoureuse du principe algorithmique de viser des états buts "incertains", que nous ancrons également dans le cadre de l'AR profond.L'objectif et les contributions de cette thèse sont d'améliorer notre compréhension formelle de l'exploration d'états buts pour l'AR, dans les scénarios supervisés et non-supervisés. Nous espérons qu'elle peut aider à suggérer de nouvelles directions de recherche pour améliorer l'efficacité d'échantillonnage et l'interprétabilité d'algorithmes d'AR basés sur la sélection et/ou l'atteinte d'états buts dans des applications pratiques
Learning to reach goals is a competence of high practical relevance to acquire for intelligent agents. For instance, this encompasses many navigation tasks ("go to target X"), robotic manipulation ("attain position Y of the robotic arm"), or game-playing scenarios ("win the game by fulfilling objective Z"). As a living being interacting with the world, I am constantly driven by goals to reach, varying in scope and difficulty.Reinforcement Learning (RL) holds the promise to frame and learn goal-oriented behavior. Goals can be modeled as specific configurations of the environment that must be attained via sequential interaction and exploration of the unknown environment. Although various deep RL algorithms have been proposed for goal-oriented RL, existing methods often lack principled understanding, sample efficiency and general-purpose effectiveness. In fact, very limited theoretical analysis of goal-oriented RL was available, even in the basic scenario of finitely many states and actions.We first focus on a supervised scenario of goal-oriented RL, where a goal state to be reached in minimum total expected cost is provided as part of the problem definition. After formalizing the online learning problem in this setting often known as Stochastic Shortest Path (SSP), we introduce two no-regret algorithms (one is the first available in the literature, the other attains nearly optimal guarantees).Beyond training our RL agent to solve only one task, we then aspire that it learns to autonomously solve a wide variety of tasks, in the absence of any reward supervision. In this challenging unsupervised RL scenario, we advocate to "Set Your Own Goals" (SYOG), which suggests the agent to learn the ability to intrinsically select and reach its own goal states. We derive finite-time guarantees of this popular heuristic in various settings, each with its specific learning objective and technical challenges. As an illustration, we propose a rigorous analysis of the algorithmic principle of targeting "uncertain" goals which we also anchor in deep RL.The main focus and contribution of this thesis are to instigate a principled analysis of goal-oriented exploration in RL, both in the supervised and unsupervised scenarios. We hope that it helps suggest promising research directions to improve the interpretability and sample efficiency of goal-oriented RL algorithms in practical applications

APA, Harvard, Vancouver, ISO, and other styles

27

Irani, Arya John. "Utilizing negative policy information to accelerate reinforcement learning." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53481.

Full text

Abstract:

A pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior. We propose two representations to fill this gap: the state constraint (a set or predicate identifying states that the agent should avoid) and the state-action constraint (identifying state-action pairs that should not be taken). State-action constraints can be directly utilized by an agent, which must choose an action in each state, while state constraints require an approximation of the MDP’s state transition function to be used; however, it is important to support both representations, as certain constraints may be more easily expressed in terms of one as compared to the other, and users may conceive of rules in either form. Using domains inspired by classic video games, this dissertation demonstrates the thesis that explicitly modeling this negative policy information improves reinforcement learning performance by decreasing the amount of training needed to achieve a given level of performance. In particular, we will show that even the use of negative policy information captured from individuals with no background in artificial intelligence yields improved performance. We also demonstrate that the use of options and constraints together form a powerful combination: an option and constraint can be taken together to construct a constrained option, which terminates in any situation where the original option would violate a constraint. In this way, a naive option defined to perform well in a best-case scenario may still accelerate learning in domains where the best-case scenario is not guaranteed.

APA, Harvard, Vancouver, ISO, and other styles

28

Tham, Chen Khong. "Modular on-line function approximation for scaling up reinforcement learning." Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.309702.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Dönmez, Halit Anil. "Collision Avoidance for Virtual Crowds Using Reinforcement Learning." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210560.

Full text

Abstract:

Virtual crowd simulation is being used in a wide variety of applications such as video games, architectural designs and movies. It is important for creators to have a realistic crowd simulator that will be able to generate crowds that displays the behaviours needed. It is important to provide an easy to use tool for crowd generation which is fast and realistic. Reinforcement Learning was proposed for training an agent to display a certain behaviour. In this thesis, a Reinforcement Learning approach was implemented and the generated virtual crowds were evaluated. Q Learning method was selected as the Reinforcement Learning method. Two different versions of the Q Learning method was implemented. These different versions were evaluated with respect to state-of-the-art algorithms: Reciprocal Velocity Obstacles(RVO) and a copy-synthesis approach based on real-data. Evaluation of the crowds was done with a user study. Results from the user study showed that while Reinforcement Learning method is not perceived as real as the real crowds, it was perceived almost as realistic as the crowds generated with RVO. Another result was that, the perception of RVO changes with the changing environment. When only the paths were shown, RVO was perceived as being more natural than when the paths were shown in a setting in real world with pedestrians. It was concluded that using Q Learning for generating virtual crowds is a promising method and can be improved as a substitute for existing methods and in certain scenarios, Q Learning algorithm results with better collision avoidance and more realistic crowd simulation.
Virtuell folkmassimulering används i ett brett utbud av applikationersom videospel, arkitektoniska mönster och filmer. Det är viktigt förskaparna att ha en realistisk publik simulator som kommer att kunnagenerera publiken som behövs för att visa de beteenden som behövs. Det är viktigt att tillhandahålla ett lättanvänt verktyg för publikgenereringsom är snabb och realistisk. Förstärkt lärande föreslogs föratt utbilda en agent för att visa ett visst beteende. I denna avhandlingimplementerades en förstärkningslärande metod för att utvärderavirtuella folkmassor. Q Lärandemetod valdes som förstärkningslärningsmetod.Två olika versioner av Q-inlärningsmetoden genomfördes. Dessa olika versioner utvärderades med avseende på toppmodernaalgoritmer: Gensamma hastighetshinder och ett kopieringssyntestillvägagångssättbaserat på realtid. Utvärderingen av publiken gjordesmed en användarstudie. Resultaten från användarstudien visadeatt medan Reinforcement Learning-metoden inte uppfattas som verkligsom den verkliga publiken, uppfattades det nästan lika realistisktsom massorna genererade med Reciprocal Velocity Objects. Ett annatresultat var att uppfattningen av RVO förändras med den föränderligamiljön. När bara stigarna visades upplevdes det mer naturligt än närdet visades i en miljö i riktiga värld med fotgängare. Det drogs slutsatsenatt att använda Q Learning för att generera folkmassor är enlovande metod och kan förbättras som ett ersättare för befintliga metoderoch i vissa scenarier resulterar Q Learning algoritm med bättrekollisionsundvikande och mer realistisk publik simulering.

APA, Harvard, Vancouver, ISO, and other styles

30

Svensson, Frida. "Scalable Distributed Reinforcement Learning for Radio Resource Management." Thesis, Linköpings universitet, Tillämpad matematik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177822.

Full text

Abstract:

There is a large potential for automation and optimization in radio access networks (RANs) using a data-driven approach to efficiently handle the increase in complexity due to the steep growth in traffic and new technologies introduced with the development of 5G. Reinforcement learning (RL) has natural applications in RAN control loops such as link adaptation, interference management and power control at different timescales commonly occurring in the RAN context. Elevating the status of data-driven solutions in RAN and building a new, scalable, distributed and data-friendly RAN architecture will be needed to competitively tackle the challenges of coming 5G networks. In this work, we propose a systematic, efficient and robust methodology for applying RL on different control problems. Firstly, the proposed methodology is evaluated using a well-known control problem. Then, it is adapted to a real-world RAN scenario. Extensive simulation results are provided to show the effectiveness and potential of the proposed approach. The methodology was successfully created but results on a RAN-simulator were not mature
Det finns en stor potential automatisering och optimering inom radionätverk (RAN, radio access network) genom att använda datadrivna lösningar för att på ett effektivt sätt hantera den ökade komplexiteten på grund av trafikökningar and nya teknologier som introducerats i samband med 5G. Förstärkningsinlärning (RL, reinforcement learning) har naturliga kopplingar till reglerproblem i olika tidsskalor, såsom länkanpassning, interferenshantering och kraftkontroll, vilket är vanligt förekommande i radionätverk. Att förhöja statusen på datadrivna lösningar i radionätverk kommer att vara nödvändigt för att hantera utmaningarna som uppkommer med framtida 5G nätverk. I detta arbete föreslås vi en syetematisk metodologi för att applicera RL på ett reglerproblem. I första hand används den föreslagna metodologin på ett välkänt reglerporblem. Senare anpassas metodologin till ett äkta RAN-scenario. Arbetet inkluderar utförliga resultat från simuleringar för att visa effektiviteten och potentialen hos den föreslagna metoden. En lyckad metodologi skapades men resultaten på RAN-simulatorn saknade mognad.

APA, Harvard, Vancouver, ISO, and other styles

31

Larsson, Hannes. "Deep Reinforcement Learning for Cavity Filter Tuning." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-354815.

Full text

Abstract:

In this Master's thesis the option of using deep reinforcement learning for cavity filter tuning has been explored. Several reinforcement learning algorithms have been explained and discussed, and then the deep deterministic policy gradient algorithm has been used to solve a simulated filter tuning problem. Both the filter environment and the reinforcement learning agent were implemented, with the filter environment making use of existing circuit models. The reinforcement learning agent learned how to tune filters with four poles and one transmission zero, or eight tune-able screws in total. A comparison was also made between constant exploration noise and exploration noise decaying over time, together with different maximum lengths of the episodes. For the particular noise used here, decaying exploration noise was shown to be better than constant, and a maximum length of 100 steps was shown to be better than 200 for the 8 screw filter.

APA, Harvard, Vancouver, ISO, and other styles

32

Renner, Michael Robert. "Machine Learning Simulation: Torso Dynamics of Robotic Biped." Thesis, Virginia Tech, 2007. http://hdl.handle.net/10919/34602.

Full text

Abstract:

Military, Medical, Exploratory, and Commercial robots have much to gain from exchanging wheels for legs. However, the equations of motion of dynamic bipedal walker models are highly coupled and non-linear, making the selection of an appropriate control scheme difficult. A temporal difference reinforcement learning method known as Q-learning develops complex control policies through environmental exploration and exploitation. As a proof of concept, Q-learning was applied through simulation to a benchmark single pendulum swing-up/balance task; the value function was first approximated with a look-up table, and then an artificial neural network. We then applied Evolutionary Function Approximation for Reinforcement Learning to effectively control the swing-leg and torso of a 3 degree of freedom active dynamic bipedal walker in simulation. The model began each episode in a stationary vertical configuration. At each time-step the learning agent was rewarded for horizontal hip displacement scaled by torso altitude--which promoted faster walking while maintaining an upright posture--and one of six coupled torque activations were applied through two first-order filters. Over the course of 23 generations, an approximation of the value function was evolved which enabled walking at an average speed of 0.36 m/s. The agent oscillated the torso forward then backward at each step, driving the walker forward for forty-two steps in thirty seconds without falling over. This work represents the foundation for improvements in anthropomorphic bipedal robots, exoskeleton mechanisms to assist in walking, and smart prosthetics.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

33

Nikolic, Marko. "Single asset trading: a recurrent reinforcement learning approach." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-47505.

Full text

Abstract:

Asset trading using machine learning has become popular within the financial industry in the recent years. This can for instance be seen in the large number of daily trading volume which are defined by an automatic algorithm. This thesis presents a recurrent reinforcement learning model to trade an asset. The benefits, drawdowns and the derivations of the model are presented. Different parameters of the model are calibrated and tuned considering a traditional division between training and testing data set and also with the help of nested cross validation. The results of the single asset trading model are compared to the benchmark strategy, which consists of buying the underlying asset and hold it for a long period of time regardless of the asset volatility. The proposed model outperforms the buy and hold strategy on three out of four stocks selected for the experiment. Additionally, returns of the model are sensitive to changes in epoch, m, learning rate and training/test ratio.

APA, Harvard, Vancouver, ISO, and other styles

34

Emenonye, Don-Roberts Ugochukwu. "Application of Machine Learning to Multi Antenna Transmission and Machine Type Resource Allocation." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/99956.

Full text

Abstract:

Wireless communication systems is a well-researched area in electrical engineering that has continually evolved over the past decades. This constant evolution and development have led to well-formulated theoretical baselines in terms of reliability and efficiency. However, most communication baselines are derived by splitting the baseband communications into a series of modular blocks like modulation, coding, channel estimation, and orthogonal frequency modulation. Subsequently, these blocks are independently optimized. Although this has led to a very efficient and reliable process, a theoretical verification of the optimality of this design process is not feasible due to the complexities of each individual block. In this work, we propose two modifications to these conventional wireless systems. First, with the goal of designing better space-time block codes for improved reliability, we propose to redesign the transmit and receive blocks of the physical layer. We replace a portion of the transmit chain - from modulation to antenna mapping with a neural network. Similarly, the receiver/decoder is also replaced with a neural network. In other words, the first part of this work focuses on jointly optimizing the transmit and receive blocks to produce a set of space-time codes that are resilient to Rayleigh fading channels. We compare our results to the conventional orthogonal space-time block codes for multiple antenna configurations. The second part of this work investigates the possibility of designing a distributed multiagent reinforcement learning-based multi-access algorithm for machine type communication. This work recognizes that cellular networks are being proposed as a solution for the connectivity of machine type devices (MTDs) and one of the most crucial aspects of scheduling in cellular connectivity is the random access procedure. The random access process is used by conventional cellular users to receive an allocation for the uplink transmissions. This process usually requires six resource blocks. It is efficient for cellular users to perform this process because transmission of cellular data usually requires more than six resource blocks. Hence, it is relatively efficient to perform the random access process in order to establish a connection. Moreover, as long as cellular users maintain synchronization, they do not have to undertake the random access process every time they have data to transmit. They can maintain a connection with the base station through discontinuous reception. On the other hand, the random access process is unsuitable for MTDs because MTDs usually have small-sized packets. Hence, performing the random access process to transmit such small-sized packets is highly inefficient. Also, most MTDs are power constrained, thus they turn off when they have no data to transmit. This means that they lose their connection and can't maintain any form of discontinuous reception. Hence, they perform the random process each time they have data to transmit. Due to these observations, explicit scheduling is undesirable for MTC. To overcome these challenges, we propose bypassing the entire scheduling process by using a grant free resource allocation scheme. In this scheme, MTDs pseudo-randomly transmit their data in random access slots. Note that this results in the possibility of a large number of collisions during the random access slots. To alleviate the resulting congestion, we exploit a heterogeneous network and investigate the optimal MTD-BS association which minimizes the long term congestion experienced in the overall cellular network. Our results show that we can derive the optimal MTD-BS association when the number of MTDs is less than the total number of random access slots.
Master of Science
Wireless communication systems is a well researched area of engineering that has continually evolved over the past decades. This constant evolution and development has led to well formulated theoretical baselines in terms of reliability and efficiency. This two part thesis investigates the possibility of improving these wireless systems with machine learning. First, with the goal of designing more resilient codes for transmission, we propose to redesign the transmit and receive blocks of the physical layer. We focus on jointly optimizing the transmit and receive blocks to produce a set of transmit codes that are resilient to channel impairments. We compare our results to the current conventional codes for various transmit and receive antenna configuration. The second part of this work investigates the possibility of designing a distributed multi-access scheme for machine type devices. In this scheme, MTDs pseudo-randomly transmit their data by randomly selecting time slots. This results in the possibility of a large number of collisions occurring in the duration of these slots. To alleviate the resulting congestion, we employ a heterogeneous network and investigate the optimal MTD-BS association which minimizes the long term congestion experienced in the overall network. Our results show that we can derive the optimal MTD-BS algorithm when the number of MTDs is less than the total number of slots.

APA, Harvard, Vancouver, ISO, and other styles

35

Barkino, Iliam. "Summary Statistic Selection with Reinforcement Learning." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-390838.

Full text

Abstract:

Multi-armed bandit (MAB) algorithms could be used to select a subset of the k most informative summary statistics, from a pool of m possible summary statistics, by reformulating the subset selection problem as a MAB problem. This is suggested by experiments that tested five MAB algorithms (Direct, Halving, SAR, OCBA-m, and Racing) on the reformulated problem and comparing the results to two established subset selection algorithms (Minimizing Entropy and Approximate Sufficiency). The MAB algorithms yielded errors at par with the established methods, but in only a fraction of the time. Establishing MAB algorithms as a new standard for summary statistics subset selection could therefore save numerous scientists substantial amounts of time when selecting summary statistics for approximate bayesian computation.

APA, Harvard, Vancouver, ISO, and other styles

36

Cunha, João Alexandre da Silva Costa e. "Techniques for batch reinforcement learning in robotics." Doctoral thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/15735.

Full text

Abstract:

Doutoramento em Engenharia Informática
This thesis addresses the Batch Reinforcement Learning methods in Robotics. This sub-class of Reinforcement Learning has shown promising results and has been the focus of recent research. Three contributions are proposed that aim to extend the state-of-art methods allowing for a faster and more stable learning process, such as required for learning in Robotics. The Q-learning update-rule is widely applied, since it allows to learn without the presence of a model of the environment. However, this update-rule is transition-based and does not take advantage of the underlying episodic structure of collected batch of interactions. The Q-Batch update-rule is proposed in this thesis, to process experiencies along the trajectories collected in the interaction phase. This allows a faster propagation of obtained rewards and penalties, resulting in faster and more robust learning. Non-parametric function approximations are explored, such as Gaussian Processes. This type of approximators allows to encode prior knowledge about the latent function, in the form of kernels, providing a higher level of exibility and accuracy. The application of Gaussian Processes in Batch Reinforcement Learning presented a higher performance in learning tasks than other function approximations used in the literature. Lastly, in order to extract more information from the experiences collected by the agent, model-learning techniques are incorporated to learn the system dynamics. In this way, it is possible to augment the set of collected experiences with experiences generated through planning using the learned models. Experiments were carried out mainly in simulation, with some tests carried out in a physical robotic platform. The obtained results show that the proposed approaches are able to outperform the classical Fitted Q Iteration.
Esta tese aborda a aplicação de métodos de Aprendizagem por Reforço em Lote na Robótica. Como o nome indica, os métodos de Aprendizagem por Reforço em Lote aprendem a completar uma tarefa processando um lote de interacções com o ambiente. São propostas três contribuições que procuram possibilitar a aprendizagem de uma forma mais rápida e estável. A regra Q-learning e amplamente usada dado que permite aprender sem a existência de um modelo do ambiente. No entanto, esta tem por base uma única transição, não tirando partido da estrutura baseada em episódios do lote de experiências. E proposta, neste trabalho, a regra Q-Batch que processa as experiências através es das trajectórias descritas aquando da interacção. Desta forma, e possível propagar mais rapidamente o valor das recompensas e penalizações obtidas, permitindo assim aprender de uma forma mais robusta e rápida. E também explorada a aplicação de aproximações não paramétricas como Processos Gaussianos. Este tipo de aproximadores permite codificar conhecimento prévio sobre as características da função a aproximar sob a forma de núcleos, fornecendo maior exibilidade e precisão. A aplicação de Processos Gaussianos na Aprendizagem por Reforço em Lote apresentou um maior desempenho na aprendizagem de comportamentos do que outras aproximações existentes na literatura. Por ultimo, de forma a extrair mais informação das experiências adquiridas pelo agente, são incorporadas técnicas de aprendizagem de modelos de transição. Desta forma, e possível ampliar o conjunto de experiências adquiridas através da interacção com o ambiente, com experiências geradas através de planeamento com recurso aos modelos de transição. Foram realizadas experiências principalmente em simulação, com alguns tests realizados numa plataforma robótica f sica. Os resultados obtidos mostram que as abordagens propostas são capaz de superar o método Fitted Q Iteration clássico.

APA, Harvard, Vancouver, ISO, and other styles

37

Crandall, Jacob W. "Learning Successful Strategies in Repeated General-sum Games." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd1156.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Wingate, David. "Solving Large MDPs Quickly with Partitioned Value Iteration." Diss., CLICK HERE for online access, 2004. http://contentdm.lib.byu.edu/ETD/image/etd437.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Beretta, Davide. "Experience Replay in Sparse Rewards Problems using Deep Reinforcement Techniques." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17531/.

Full text

Abstract:

In questo lavoro si introduce il lettore al Reinforcement Learning, un'area del Machine Learning su cui negli ultimi anni è stata fatta molta ricerca. In seguito vengono presentate alcune modifiche ad ACER, un algoritmo noto e molto interessante che fa uso di Experience Replay. Lo scopo è quello di cercare di aumentarne le performance su problemi generali ma in particolar modo sugli sparse reward problem. Per verificare la bontà delle idee proposte è utilizzato Montezuma's Revenge, un gioco sviluppato per Atari 2600 e considerato tra i più difficili da trattare.

APA, Harvard, Vancouver, ISO, and other styles

40

Vafaie, Parsa. "Learning in the Presence of Skew and Missing Labels Through Online Ensembles and Meta-reinforcement Learning." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42636.

Full text

Abstract:

Data streams are large sequences of data, possibly endless and temporarily ordered, that are common-place in Internet of Things (IoT) applications such as intrusion detection in computer networking, fraud detection in financial institutions, real-time tumor tracking in radiotherapy and social media analysis. Algorithms learning from such streams need to be able to construct near real-time models that continuously adapt to potential changes in patterns, in order to retain high performance throughout the stream. It follows that there are numerous challenges involved in supervised learning (or so-called classification) in such environments. One of the challenges in learning from streams is multi-class imbalance, in which the rates of instances in the different class labels differ substantially. Notably, classification algorithms may become biased towards the classes with more frequent instances, sacrificing the performance of the less frequent or so-called minority classes. Further, minority instances often arrive infrequently and in bursts, making accurate model construction problematic. For example, network intrusion detection systems must be able to distinguish between normal traffic and multiple minority classes corresponding to a variety of different types of attacks. Further, having labels for all instances are often infeasible, since we might have missing or late-arriving labels. For instance, when learning from a stream regarding the task of detecting network intrusions, the true label for all instances might not be available, or it might take time until the label is made available, especially for new types of attacks. In this thesis, we contribute to the advancements of online learning from evolving streams by focusing on the above-mentioned areas of multi-class imbalance and missing labels. First, we introduce a multi-class online ensemble algorithm designed to maintain a balanced performance over all classes. Specifically, our approach samples instances with replacement while dynamically increasing the weights of under-represented classes, in order to produce models that benefit all classes. Our experimental results show that our online ensemble method performs well against multi-class imbalanced data in various datasets. We further continue our study by introducing an approach to dealing with missing labels that utilize both labelled and unlabelled data to increase a model’s performance. That is, our method utilizes labelled data for pseudo-labelling unlabelled instances, allowing the model to perform better in environments where labels are scarce. More specifically, our approach features a meta-reinforcement learning agent, trained on multiple-source streams, that can effectively select the prediction of a K nearest neighbours (K-NN) classifier as the label for unlabelled instances. Extensive experiments on benchmark datasets demonstrate the value and effectiveness of our approach and confirm that our method outperforms state-of-the-art.

APA, Harvard, Vancouver, ISO, and other styles

41

Staffolani, Alessandro. "A Reinforcement Learning Agent for Distributed Task Allocation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20051/.

Full text

Abstract:

Al giorno d'oggi il reinforcement learning ha dimostrato di essere davvero molto efficace nel machine learning in svariati campi, come ad esempio i giochi, il riconoscimento vocale e molti altri. Perciò, abbiamo deciso di applicare il reinforcement learning ai problemi di allocazione, in quanto sono un campo di ricerca non ancora studiato con questa tecnica e perchè questi problemi racchiudono nella loro formulazione un vasto insieme di sotto-problemi con simili caratteristiche, per cui una soluzione per uno di essi si estende ad ognuno di questi sotto-problemi. In questo progetto abbiamo realizzato un applicativo chiamato Service Broker, il quale, attraverso il reinforcement learning, apprende come distribuire l'esecuzione di tasks su dei lavoratori asincroni e distribuiti. L'analogia è quella di un cloud data center, il quale possiede delle risorse interne - possibilmente distribuite nella server farm -, riceve dei tasks dai suoi clienti e li esegue su queste risorse. L'obiettivo dell'applicativo, e quindi del data center, è quello di allocare questi tasks in maniera da minimizzare il costo di esecuzione. Inoltre, al fine di testare gli agenti del reinforcement learning sviluppati è stato creato un environment, un simulatore, che permettesse di concentrarsi nello sviluppo dei componenti necessari agli agenti, invece che doversi anche occupare di eventuali aspetti implementativi necessari in un vero data center, come ad esempio la comunicazione con i vari nodi e i tempi di latenza di quest'ultima. I risultati ottenuti hanno dunque confermato la teoria studiata, riuscendo a ottenere prestazioni migliori di alcuni dei metodi classici per il task allocation.

APA, Harvard, Vancouver, ISO, and other styles

42

Ceylan, Hakan. "Using Reinforcement Learning in Partial Order Plan Space." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5232/.

Full text

Abstract:

Partial order planning is an important approach that solves planning problems without completely specifying the orderings between the actions in the plan. This property provides greater flexibility in executing plans; hence making the partial order planners a preferred choice over other planning methodologies. However, in order to find partially ordered plans, partial order planners perform a search in plan space rather than in space of world states and an uninformed search in plan space leads to poor efficiency. In this thesis, I discuss applying a reinforcement learning method, called First-visit Monte Carlo method, to partial order planning in order to design agents which do not need any training data or heuristics but are still able to make informed decisions in plan space based on experience. Communicating effectively with the agent is crucial in reinforcement learning. I address how this task was accomplished in plan space and the results from an evaluation of a blocks world test bed.

APA, Harvard, Vancouver, ISO, and other styles

43

Dazeley, R. "Investigations into Playing Chess Endgames using Reinforcement Learning." Thesis, Honours thesis, University of Tasmania, 2001. https://eprints.utas.edu.au/62/1/Final_Thesis.pdf.

Full text

Abstract:

Research in computer game playing has relied primarily on brute force searching approaches rather than any formal AI method. However, these methods may not be able to exceed human ability, as they need human expert knowledge to perform as well as they do. One recently popularized field of research known as reinforcement learning has shown good prospects in overcoming these limitations when applied to non-deterministic games. This thesis investigated whether the TD(_) algorithm, one method of reinforcement learning, using standard back-propagation neural networks for function generalization, could successfully learn a deterministic game such as chess. The aim is to determine if an agent using no external knowledge can learn to defeat a random player consistently. The results of this thesis suggests that, even though the agents faced a highly information sparse environment, an agent using a well selected view of the state information was still able to learn to not only to differentiate between various terminating board positions but also to improve its play against a random player. This shows that the reinforcement learning techniques are quite capable of learning behaviour in large deterministic environments without needing any external knowledge.

APA, Harvard, Vancouver, ISO, and other styles

44

Miller, Eric D. "Biased Exploration in Offline Hierarchical Reinforcement Learning." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case160768140424212.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Qi, Dehu. "Multi-agent systems : integrating reinforcement learning, bidding and genetic algorithms /." free to MU campus, to others for purchase, 2002. http://wwwlib.umi.com/cr/mo/fullcit?p3060133.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Sharma, Aakanksha. "Machine learning-based optimal load balancing in software-defined networks." Thesis, Federation University Australia, 2022. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/188228.

Full text

Abstract:

The global advancement of the Internet of Things (IoT) has poised the existing network traffic for explosive growth. The prediction in the literature shows that in the future, trillions of smart devices will connect to transfer useful information. Accommodating such proliferation of devices in the existing network infrastructure, referred to as the traditional network, is a significant challenge due to the absence of centralized control, making it tedious to implement the device management and network protocol updates. In addition, due to their inherently distributed features, applying machine learning mechanisms in traditional networks is demanding. Consequently, it leads to an imbalanced load in the network that affects the overall network Quality of Service (QoS). Expanding the existing infrastructure and manual traffic control methods are inadequate to cope with the exponential growth of IoT devices. Therefore, an intelligent system is necessary for future networks that can efficiently organize, manage, maintain, and optimize the growing networks. Software-defined network (SDN) has a holistic view of the network and is highly suitable for handling dynamic loads in the traditional network with a minimal update in the network infrastructure. However, the standard SDN architecture control plane has been designed for a single controller or multiple distributed controllers that faces severe bottleneck issues. Our initial research created a reference model for the traditional network, using the standard SDN (SDN) in a network simulator called NetSim. Based on the network traffic, the reference models consisted of light, modest and heavy networks depending on the number of connected IoT devices. Furthermore, the research was enhanced with a priority scheduling and congestion control algorithm in the standard SDN, named extended SDN (eSDN), which minimized the network congestion and performed better than the existing SDN. However, enhancement was suitable only for the small-scale network because, in a large-scale network, the eSDN does not support dynamic controller mapping in the network. Often, the same controller gets overloaded, leading to a single point of failure. Our exhaustive literature review shows that the majority of proposed solutions are based on static controller deployment without considering flow fluctuations and traffic bursts that lead to a lack of load balancing among controllers in real-time, eventually increasing the network latency. Often, the switch experiences a traffic burst, and consequently, the corresponding controller might overload. Therefore, to maintain the Quality of Service (QoS) in the network, it becomes imperative for the static controller to neutralize the on-the-fly traffic burst. Addressing the above-mentioned issues demands research critical to improving the QoS in load balancing, latency minimisation, and network reliability for next- generation networks. Our novel dynamic controller mapping algorithm with multiple- controller placement in the SDN is critical in solving the identified issues. In the dynamic controller approach (dSDN), the controllers are mapped dynamically as the load fluctuates. If any controller reaches its maximum threshold, the rest of the traffic will be diverted to another controller, significantly reducing delay and enhancing the overall performance. Our technique considers the latency and load fluctuation in the network and manages the situations where static mapping is ineffective in dealing with the dynamic flow variation. In addition, our novel approach adds more intelligence to the network with a Temporal Deep Q Learning (tDQN) approach for dynamic controller mapping when the flow fluctuates. In this technique, a multi-objective optimization problem for flow fluctuation is formulated to dynamically divert the traffic to the best-suited controller. The formulated technique is placed as an agent in the network controller to take care of all the routing decisions, which can solve the dynamic flow mapping and latency optimization without increasing the number of optimally placed controllers. Extensive simulation results show that the novel approach proposed in this thesis solves dynamic flow mapping by maintaining a balanced load among controllers and outperforms the existing traditional networks and SDN with priority scheduling and congestion control. Compared to traditional networks, tDQN provides a 47.48% increase in throughput, a 99.10% reduction in delay and a 97.98% reduction in jitter for heavy network traffic. The thesis also presents a few future research directions as possible extensions of the current work for further enhancement.
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

47

Ngai, Chi-kit, and 魏智傑. "Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environment." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B39707386.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Buzzoni, Michele. "Reinforcement Learning in problemi di controllo del bilanciamento." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15539/.

Full text

Abstract:

Si pone come obiettivo della tesi lo studio di algoritmi di reinforcement learning capaci di istruire un agente ad interagire correttamente con gli ambienti proposti con lo scopo di risolvere i problemi presentati. Nello specifico i problemi verteranno su un argomento comune: il balancing, ovvero problemi legati all'equilibrio. In particolare vengono presentati tre ambienti per il learning: due sono legati al conosciuto “cart-pole problem” in cui l’ambiente è composto da un carrello su cui è posto un palo. L’agente, muovendo il carrello, dovrà mantenere bilanciato il palo impedendo la sua caduta. Questo problema è realizzato in due varianti: una variante semplice in cui il carrello è legato ad un binario e quindi i suoi movimenti sono solo due (avanti, indietro), mentre la seconda variante prevede un ambiente più complesso in cui il carrello è slegato dai vincoli del binario e può quindi muoversi in 4 direzioni diverse. L’ultimo ambiente consiste di un piano quadrato su cui è posta una pallina. Il compito dell’agente è quello di mantenere la pallina sul piano, imparando a muovere opportunamente il piano stesso. Anche questo problema viene trattato in due varianti, una semplice ed una complessa, ma l’ambiente realizzato è il medesimo. Questa tesi presenta quindi due algoritmi per risolvere i problemi appena elencati: un algoritmo di Q-learning con uso di una Q-table per la memorizzazione delle componenti stato-azione e uno di Q-network in cui la Q-table viene sostituita da una rete neurale. Gli ambienti legati ai problemi che verranno affrontati sono realizzati attraverso pyBullet, libreria per la simulazione 3D di corpi solidi che viene integrata con Gym openAI, toolkit per la programmazione in ambito machine learning che offre semplici interfacce per la costruzione di nuovi ambienti.

APA, Harvard, Vancouver, ISO, and other styles

49

Hayashi, Kazuki. "Reinforcement Learning for Optimal Design of Skeletal Structures." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263614.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Kuurne, Uussilta Dennis, and Viktor Olsson. "Deep Reinforcement Learning in Cart Pole and Pong." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293856.

Full text

Abstract:

In this project, we aim to reproduce previous resultsachieved with Deep Reinforcement Learning. We present theMarkov Decision Process model as well as the algorithms Q-learning and Deep Q-learning Network (DQN). We implement aDQN agent, first in an environment called CartPole, and later inthe game Pong.Our agent was able to solve the CartPole environment in lessthan 300 episodes. We assess the impact some of the parametershad on the agents performance. The performance of the agentis particularly sensitive to the learning rate and seeminglyproportional to the dimension of the neural network. The DQNagent implemented in Pong was unable to learn, performing atthe same level as an agent picking actions at random, despiteintroducing various modifications to the algorithm. We discusspossible sources of error, including the RAM used as input,possibly not containing sufficient information. Furthermore, wediscuss the possibility of needing additional modifications to thealgorithm in order to achieve convergence, as it is not guaranteedfor DQN.
Målet med detta projekt är att reproducera tidigare resultat som uppnåtts med Deep Reinforcement Learning. Vi presenterar Markov Decision Process-modellen samt algoritmerna Q-learning och Deep Q-learning Network (DQN). Vi implementerar en DQN agent, först i miljön CartPole, sedan i spelet Pong. Vår agent lyckades lösa CartPole på mindre än 300 episoder. Vi gör en bedömning av vissa parametrars påverkan på agentens prestanda. Agentens prestanda är särskilt känslig för värdet på ”learning rate” och verkar vara proportionell mot dimensionen av det neurala nätverket. DQN-agenten som implementerades i Pong var oförmögen att lära sig och spelade på samma nivå som en agent som agerar slumpmässigt, trots att vi introducerade diverse modifikationer. Vi diskuterar möjliga felkällor, bland annat att RAM, som används som indata till agenten, eventuellt saknar tillräcklig information. Dessutom diskuterar vi att ytterligare modifikationer kan vara nödvändiga för uppnå konvergens eftersom detta inte är garanterat för DQN.
Kandidatexjobb i elektroteknik 2020, KTH, Stockholm

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!