Tesi sul tema "Reinforcement Learning"
Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili
Vedi i top-50 saggi (tesi di laurea o di dottorato) per l'attività di ricerca sul tema "Reinforcement Learning".
Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.
Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.
Vedi le tesi di molte aree scientifiche e compila una bibliografia corretta.
Izquierdo, Ayala Pablo. "Learning comparison: Reinforcement Learning vs Inverse Reinforcement Learning : How well does inverse reinforcement learning perform in simple markov decision processes in comparison to reinforcement learning?" Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259371.
Testo completoDenna studie är en kvalitativ jämförelse mellan två olika inlärningsangreppssätt, “Reinforcement Learning” (RL) och “Inverse Reinforcement Learning” (IRL), om använder "Gridworld", en "Markov Decision-Process". Fokus ligger på den senare algoritmen, IRL, eftersom den anses relativt ny och få studier har i nuläget gjorts kring den. I studien är RL mer fördelaktig än IRL, som skapar en korrekt lösning i alla olika scenarier som presenteras i studien. Beteendet hos IRL-algoritmen kan dock förbättras vilket också visas och analyseras i denna studie.
Seymour, B. J. "Aversive reinforcement learning". Thesis, University College London (University of London), 2010. http://discovery.ucl.ac.uk/800107/.
Testo completoAkrour, Riad. "Robust Preference Learning-based Reinforcement Learning". Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112236/document.
Testo completoThe thesis contributions resolves around sequential decision taking and more precisely Reinforcement Learning (RL). Taking its root in Machine Learning in the same way as supervised and unsupervised learning, RL quickly grow in popularity within the last two decades due to a handful of achievements on both the theoretical and applicative front. RL supposes that the learning agent and its environment follow a stochastic Markovian decision process over a state and action space. The process is said of decision as the agent is asked to choose at each time step an action to take. It is said stochastic as the effect of selecting a given action in a given state does not systematically yield the same state but rather defines a distribution over the state space. It is said to be Markovian as this distribution only depends on the current state-action pair. Consequently to the choice of an action, the agent receives a reward. The RL goal is then to solve the underlying optimization problem of finding the behaviour that maximizes the sum of rewards all along the interaction of the agent with its environment. From an applicative point of view, a large spectrum of problems can be cast onto an RL one, from Backgammon (TD-Gammon, was one of Machine Learning first success giving rise to a world class player of advanced level) to decision problems in the industrial and medical world. However, the optimization problem solved by RL depends on the prevous definition of a reward function that requires a certain level of domain expertise and also knowledge of the internal quirks of RL algorithms. As such, the first contribution of the thesis was to propose a learning framework that lightens the requirements made to the user. The latter does not need anymore to know the exact solution of the problem but to only be able to choose between two behaviours exhibited by the agent, the one that matches more closely the solution. Learning is interactive between the agent and the user and resolves around the three main following points: i) The agent demonstrates a behaviour ii) The user compares it w.r.t. to the current best one iii) The agent uses this feedback to update its preference model of the user and uses it to find the next behaviour to demonstrate. To reduce the number of required interactions before finding the optimal behaviour, the second contribution of the thesis was to define a theoretically sound criterion making the trade-off between the sometimes contradicting desires of complying with the user's preferences and demonstrating sufficiently different behaviours. The last contribution was to ensure the robustness of the algorithm w.r.t. the feedback errors that the user might make. Which happens more often than not in practice, especially at the initial phase of the interaction, when all the behaviours are far from the expected solution
Tabell, Johnsson Marco, e Ala Jafar. "Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents". Thesis, Blekinge Tekniska Högskola, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20218.
Testo completoYang, Zhaoyuan Yang. "Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning Approach". The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152411491981452.
Testo completoCortesi, Daniele. "Reinforcement Learning in Rogue". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16138/.
Testo completoGirgin, Sertan. "Abstraction In Reinforcement Learning". Phd thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608257/index.pdf.
Testo completoSuay, Halit Bener. "Reinforcement Learning from Demonstration". Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/173.
Testo completoGao, Yang. "Argumentation accelerated reinforcement learning". Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/26603.
Testo completoAlexander, John W. "Transfer in reinforcement learning". Thesis, University of Aberdeen, 2015. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=227908.
Testo completoLeslie, David S. "Reinforcement learning in games". Thesis, University of Bristol, 2004. http://hdl.handle.net/1983/420b3f4b-a8b3-4a65-be23-6d21f6785364.
Testo completoSchneider, Markus. "Reinforcement Learning für Laufroboter". [S.l. : s.n.], 2007. http://nbn-resolving.de/urn:nbn:de:bsz:747-opus-344.
Testo completoWülfing, Jan [Verfasser], e Martin [Akademischer Betreuer] Riedmiller. "Stable deep reinforcement learning". Freiburg : Universität, 2019. http://d-nb.info/1204826188/34.
Testo completoZhang, Jingwei [Verfasser], e Wolfram [Akademischer Betreuer] Burgard. "Learning navigation policies with deep reinforcement learning". Freiburg : Universität, 2021. http://d-nb.info/1235325571/34.
Testo completoRottmann, Axel [Verfasser], e Wolfram [Akademischer Betreuer] Burgard. "Approaches to online reinforcement learning for miniature airships = Online Reinforcement Learning Verfahren für Miniaturluftschiffe". Freiburg : Universität, 2012. http://d-nb.info/1123473560/34.
Testo completoHengst, Bernhard Computer Science & Engineering Faculty of Engineering UNSW. "Discovering hierarchy in reinforcement learning". Awarded by:University of New South Wales. Computer Science and Engineering, 2003. http://handle.unsw.edu.au/1959.4/20497.
Testo completoBlixt, Rikard, e Anders Ye. "Reinforcement learning AI to Hive". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-134908.
Testo completoDenna rapport handlar om det unika brädspelet Hive. Rapporten kommer först berätta om vad Hive är och sedan gå in på detalj hur vi implementerar spelet, vad för problem vi stötte på och hur dessa problem löstes. Även så försökte vi göra en AI som lärde sig med hjälp av förstärkningslärning för att bli bra på spelet. Mer exakt så använde vi två AI som inte kunde något alls om Hive förutom spelreglerna. Detta visades vara omöjligt att genomföra inom rimlig tid, vår uppskattning är att det skulle ha tagit en bra stationär hemdator minst 140 år att lära en AI spel Hive på en godtagbar nivå.
Borgstrand, Richard, e Patrik Servin. "Reinforcement Learning AI till Fightingspel". Thesis, Blekinge Tekniska Högskola, Sektionen för datavetenskap och kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-3113.
Testo completoArnekvist, Isac. "Reinforcement learning for robotic manipulation". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-216386.
Testo completoReinforcement learning har nyligen använts framgångsrikt för att lära icke-simulerade robotar uppgifter med hjälp av en normalized advantage function-algoritm (NAF), detta utan att använda mänskliga demonstrationer. Restriktioner på funktionsytorna som använts kan dock visa sig vara problematiska för generalisering till andra uppgifter. För poseestimering har i liknande sammanhang convolutional neural networks använts med bilder från kamera med konstant position. I vissa applikationer kan dock inte kameran garanteras hålla en konstant position och studier har visat att kvaliteten på policys kraftigt förvärras när kameran förflyttas. Denna uppsats undersöker användandet av NAF för att lära in en ”pushing”-uppgift med tydliga multimodala egenskaper. Resultaten jämförs med användandet av en deterministisk policy med minimala restriktioner på Q-funktionsytan. Vidare undersöks användandet av convolutional neural networks för pose-estimering, särskilt med hänsyn till slumpmässigt placerade kameror med okänd placering. Genom att definiera koordinatramen för objekt i förhållande till ett synligt referensobjekt så tros relativ pose-estimering kunna utföras även när kameran är rörlig och förflyttningen är okänd. NAF appliceras i denna uppsats framgångsrikt på enklare problem där datainsamling är distribuerad över flera robotar och inlärning sker på en central server. Vid applicering på ”pushing”- uppgiften misslyckas dock NAF, både vid träning på riktiga robotar och i simulering. Deep deterministic policy gradient (DDPG) appliceras istället på problemet och lär sig framgångsrikt att lösa problemet i simulering. Den inlärda policyn appliceras sedan framgångsrikt på riktiga robotar. Pose-estimering genom att använda en fast kamera implementeras också framgångsrikt. Genom att definiera ett koordinatsystem från ett föremål i bilden med känd position, i detta fall robotarmen, kan andra föremåls positioner beskrivas i denna koordinatram med hjälp av neurala nätverk. Dock så visar sig precisionen vara för låg för att appliceras på robotar. Resultaten visar ändå att denna metod, med ytterligare utökningar och modifikationer, skulle kunna lösa problemet.
Cleland, Benjamin George. "Reinforcement Learning for Racecar Control". The University of Waikato, 2006. http://hdl.handle.net/10289/2507.
Testo completoKim, Min Sub Computer Science & Engineering Faculty of Engineering UNSW. "Reinforcement learning by incremental patching". Awarded by:University of New South Wales, 2007. http://handle.unsw.edu.au/1959.4/39716.
Testo completoPatrascu, Relu-Eugen. "Adaptive exploration in reinforcement learning". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp01/MQ35921.pdf.
Testo completoLi, Jingxian. "Reinforcement learning using sensorimotor traces". Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/45590.
Testo completoRummery, Gavin Adrian. "Problem solving with reinforcement learning". Thesis, University of Cambridge, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.363828.
Testo completoMcCabe, Jonathan Aiden. "Reinforcement learning in virtual reality". Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.608852.
Testo completoBudhraja, Karan Kumar. "Neuroevolution Based Inverse Reinforcement Learning". Thesis, University of Maryland, Baltimore County, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10140581.
Testo completoMotivated by such learning in nature, the problem of Learning from Demonstration is targeted at learning to perform tasks based on observed examples. One of the approaches to Learning from Demonstration is Inverse Reinforcement Learning, in which actions are observed to infer rewards. This work combines a feature based state evaluation approach to Inverse Reinforcement Learning with neuroevolution, a paradigm for modifying neural networks based on their performance on a given task. Neural networks are used to learn from a demonstrated expert policy and are evolved to generate a policy similar to the demonstration. The algorithm is discussed and evaluated against competitive feature-based Inverse Reinforcement Learning approaches. At the cost of execution time, neural networks allow for non-linear combinations of features in state evaluations. These valuations may correspond to state value or state reward. This results in better correspondence to observed examples as opposed to using linear combinations. This work also extends existing work on Bayesian Non-Parametric Feature construction for Inverse Reinforcement Learning by using non-linear combinations of intermediate data to improve performance. The algorithm is observed to be specifically suitable for a linearly solvable non-deterministic Markov Decision Processes in which multiple rewards are sparsely scattered in state space. Performance of the algorithm is shown to be limited by parameters used, implying adjustable capability. A conclusive performance hierarchy between evaluated algorithms is constructed.
Piano, Francesco. "Deep Reinforcement Learning con PyTorch". Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25340/.
Testo completoKozlova, Olga. "Hierarchical and factored reinforcement learning". Paris 6, 2010. http://www.theses.fr/2010PA066196.
Testo completoBlows, Curtly. "Reinforcement learning for telescope optimisation". Master's thesis, Faculty of Science, 2019. http://hdl.handle.net/11427/31352.
Testo completoStigenberg, Jakob. "Scheduling using Deep Reinforcement Learning". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-284506.
Testo completoI takt med radionätverks fortsatta utveckling under de senaste decenniernahar även komplexiteten och svårigheten i att effektivt utnyttja de tillgängligaresurserna ökat. I varje trådlöst nätverk finns en schemaläggare som styrtrafikflödet genom nätverket. Schemaläggaren är därmed en nyckelkomponentnär det kommer till att effektivt utnyttja de tillgängliga nätverksresurserna. Ien given nätverkspecifikation, t.ex. Long-Term Evoluation eller New Radio,är det givet vilka möjligheter till allokering som schemaläggaren kan använda.Hur schemaläggaren utnyttjar dessa möjligheter, det vill säga implementationenav schemaläggaren, är helt upp till varje enskild tillverkare. I tidigarearbete har fokus främst legat på att manuellt definera sorteringsvikter baseratpå, bland annat, Quality of Service (QoS) -klass, kanalkvalitet och fördröjning.Nätverkspaket skickas sedan givet viktordningen. I detta examensarbetepresenteras en ny metod för schemaläggning baserat på förstärkande inlärning.Metoden hanterar resursallokeraren som en svart låda och lär sig denbästa sorteringen direkt från indata (end-to-end) och hanterar även kontrollpaket.Ramverket utvärderades med ett Deep Q-Network i ett scenario medflera fördröjningskänsliga röstanvändare tillsammans med en (oändligt) storfilnedladdning. Algoritmen lärde sig att minska mängden försenade röstpaket,alltså öka QoS, med 29.6% samtidigt som den ökade total överföringshastighetmed 20.5, 23.5 och 16.2% i den 10:e, 50:e samt 90:e kvantilen.
Jesu, Alberto. "Reinforcement learning over encrypted data". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23257/.
Testo completoSuggs, Sterling. "Reinforcement Learning with Auxiliary Memory". BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/9028.
Testo completoLiu, Chong. "Reinforcement learning with time perception". Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/reinforcement-learning-with-time-perception(a03580bd-2dd6-4172-a061-90e8ac3022b8).html.
Testo completoTluk, von Toschanowitz Katharina. "Relevance determination in reinforcement learning". Tönning Lübeck Marburg Der Andere Verl, 2009. http://d-nb.info/993341128/04.
Testo completoBonneau, Maxime. "Reinforcement Learning for 5G Handover". Thesis, Linköpings universitet, Statistik och maskininlärning, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-140816.
Testo completoOvidiu, Chelcea Vlad, e Björn Ståhl. "Deep Reinforcement Learning for Snake". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239362.
Testo completoEdlund, Joar, e Jack Jönsson. "Reinforcement Learning for Video Games". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239363.
Testo completoMagnusson, Björn, e Måns Forslund. "SAFE AND EFFICIENT REINFORCEMENT LEARNING". Thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-76588.
Testo completoFörprogrammering av en robot kan vara effektiv i viss utsträckning, men eftersom en människa har programmerat roboten kommer den bara att vara lika effektiv som programmet är skrivet. Problemet kan lösas genom att använda maskininlärning. Detta gör att roboten kan lära sig det effektivaste sättet på sitt sätt. Denna avhandling är fortsättning på ett tidigare arbete som täckte utvecklingen av ramverket Safe-To-Explore-State-Spaces (STESS) för säker robot manipulation. Denna avhandling utvärderar effektiviteten hos Q-Learning with normalized advantage function (NAF), en deep reinforcement learning algoritm, när den integreras med ramverket STESS. Det gör detta genom att utföra en 2D-uppgift där roboten flyttar sitt verktyg på ett plan från punkt A till punkt B i en förbestämd arbetsyta. För att testa effektiviteten presenterades olika scenarier för roboten. Inga hinder, hinder med sfärisk form och hinder med cylindrisk form. Deep reinforcement learning algoritmen visste bara startpositionen och STESS-fördefinierade arbetsytan och begränsade de områden som roboten inte fick beträda. Genom att uppfylla dessa hinder kunde roboten utforska och lära sig det mest effektiva sättet att utföra sin uppgift. Resultaten visar att NAF-algoritmen i simulering lär sig snabbt och effektivt, samtidigt som man undviker hindren utan kollision.
Liu, Bai S. M. Massachusetts Institute of Technology. "Reinforcement learning in network control". Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122414.
Testo completoCataloged from PDF version of thesis.
Includes bibliographical references (pages 59-91).
With the rapid growth of information technology, network systems have become increasingly complex. In particular, designing network control policies requires knowledge of underlying network dynamics, which are often unknown, and need to be learned. Existing reinforcement learning methods such as Q-Learning, Actor-Critic, etc. are heuristic and do not offer performance guarantees. In contrast, model-based learning methods offer performance guarantees, but can only be applied with bounded state spaces. In the thesis, we propose to use model-based reinforcement learning. By applying Lyapunov analysis, our algorithm can be applied to queueing networks with unbounded state spaces. We prove that under our algorithm, the average queue backlog can get arbitrarily close to the optimal result. We also implement simulations to illustrate the effectiveness of our algorithm.
by Bai Liu.
S.M.
S.M. Massachusetts Institute of Technology, Department of Aeronautics and Astronautics
Garcelon, Evrard. "Constrained Exploration in Reinforcement Learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAG007.
Testo completoA major application of machine learning is to provide personnalized content to different users. In general, the algorithms powering those recommandation are supervised learning algorithm. That is to say the data used to train those algorithms are assumed to be sampled from the same distribution. However, the data are generated through interactions between the users and the recommendation algorithms. Thus, recommendations for a user a time t can have an impact on the set of pertinent recommandation at a later time. Therefore, it is necessary to take those interactions into account. This setting is reminiscent of the online learning setting. Among online learning algorithms, Reinforcement Learning algorithms (RL) looks the most promising to replace supervised learning algorithms for applications requiring a certain degree of personnalization. The deployement in production of RL algorithms presents some challenges such as being able to guarantee a certain level of performance during exploration phases or how to guarantee privacy of the data collected by RL algorithms. In this thesis, we consider different constraints limiting the use of RL algorithms and provides both empirical and theoretical results on the impact of those constraints on the learning process
Wei, Ermo. "Learning to Play Cooperative Games via Reinforcement Learning". Thesis, George Mason University, 2019. http://pqdtopen.proquest.com/#viewpdf?dispub=13420351.
Testo completoBeing able to accomplish tasks with multiple learners through learning has long been a goal of the multiagent systems and machine learning communities. One of the main approaches people have taken is reinforcement learning, but due to certain conditions and restrictions, applying reinforcement learning in a multiagent setting has not achieved the same level of success when compared to its single agent counterparts.
This thesis aims to make coordination better for agents in cooperative games by improving on reinforcement learning algorithms in several ways. I begin by examining certain pathologies that can lead to the failure of reinforcement learning in cooperative games, and in particular the pathology of relative overgeneralization. In relative overgeneralization, agents do not learn to optimally collaborate because during the learning process each agent instead converges to behaviors which are robust in conjunction with the other agent's exploratory (and thus random), rather than optimal, choices. One solution to this is so-called lenient learning, where agents are forgiving of the poor choices of their teammates early in the learning cycle. In the first part of the thesis, I develop a lenient learning method to deal with relative overgeneralization in independent learner settings with small stochastic games and discrete actions.
I then examine certain issues in a more complex multiagent domain involving parameterized action Markov decision processes, motivated by the RoboCup 2D simulation league. I propose two methods, one batch method and one actor-critic method, based on state of the art reinforcement learning algorithms, and show experimentally that the proposed algorithms can train the agents in a significantly more sample-efficient way than more common methods.
I then broaden the parameterized-action scenario to consider both repeated and stochastic games with continuous actions. I show how relative overgeneralization prevents the multiagent actor-critic model from learning optimal behaviors and demonstrate how to use Soft Q-Learning to solve this problem in repeated games.
Finally, I extend imitation learning to the multiagent setting to solve related issues in stochastic games, and prove that given the demonstration from an expert, multiagent Imitation Learning is exactly the multiagent actor-critic model in Maximum Entropy Reinforcement Learning framework. I further show that when demonstration samples meet certain conditions the relative overgeneralization problem can be avoided during the learning process.
Stachenfeld, Kimberly. "Learning Neural Representations that Support Efficient Reinforcement Learning". Thesis, Princeton University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10824319.
Testo completoRL has been transformative for neuroscience by providing a normative anchor for interpreting neural and behavioral data. End-to-end RL methods have scored impressive victories with minimal compromises in autonomy, hand-engineering, and generality. The cost of this minimalism in practice is that model-free RL methods are slow to learn and generalize poorly. Humans and animals exhibit substantially improved flexibility and generalize learned information rapidly to new environment by learning invariants of the environment and features of the environment that support fast learning rapid transfer in new environments. An important question for both neuroscience and machine learning is what kind of ``representational objectives'' encourage humans and other animals to encode structure about the world. This can be formalized as ``representation feature learning,'' in which the animal or agent learns to form representations with information potentially relevant to the downstream RL process. We will overview different representational objectives that have received attention in neuroscience and in machine learning. The focus of this overview will be to first highlight conditions under which these seemingly unrelated objectives are actually mathematically equivalent. We will use this to motivate a breakdown of properties of different learned representations that are meaningfully different and can be used to inform contrasting hypotheses for neuroscience. We then use this perspective to motivate our model of the hippocampus. A cognitive map has long been the dominant metaphor for hippocampal function, embracing the idea that place cells encode a geometric representation of space. However, evidence for predictive coding, reward sensitivity, and policy dependence in place cells suggests that the representation is not purely spatial. We approach the problem of understanding hippocampal representations from a reinforcement learning perspective, focusing on what kind of spatial representation is most useful for maximizing future reward. We show that the answer takes the form of a predictive representation. This representation captures many aspects of place cell responses that fall outside the traditional view of a cognitive map. We go on to argue that entorhinal grid cells encode a low-dimensional basis set for the predictive representation, useful for suppressing noise in predictions and extracting multiscale structure for hierarchical planning.
Effraimidis, Dimitros. "Computation approaches for continuous reinforcement learning problems". Thesis, University of Westminster, 2016. https://westminsterresearch.westminster.ac.uk/item/q0y82/computation-approaches-for-continuous-reinforcement-learning-problems.
Testo completoLe, Piane Fabio. "Training cognitivo adattativo mediante Reinforcement Learning". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/17289/.
Testo completoMariani, Tommaso. "Deep reinforcement learning for industrial applications". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20548/.
Testo completoRossi, Martina. "Opponent Modelling using Inverse Reinforcement Learning". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22263/.
Testo completoBorga, Magnus. "Reinforcement Learning Using Local Adaptive Models". Licentiate thesis, Linköping University, Linköping University, Computer Vision, 1995. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-53352.
Testo completoIn this thesis, the theory of reinforcement learning is described and its relation to learning in biological systems is discussed. Some basic issues in reinforcement learning, the credit assignment problem and perceptual aliasing, are considered. The methods of temporal difference are described. Three important design issues are discussed: information representation and system architecture, rules for improving the behaviour and rules for the reward mechanisms. The use of local adaptive models in reinforcement learning is suggested and exemplified by some experiments. This idea is behind all the work presented in this thesis. A method for learning to predict the reward called the prediction matrix memory is presented. This structure is similar to the correlation matrix memory but differs in that it is not only able to generate responses to given stimuli but also to predict the rewards in reinforcement learning. The prediction matrix memory uses the channel representation, which is also described. A dynamic binary tree structure that uses the prediction matrix memories as local adaptive models is presented. The theory of canonical correlation is described and its relation to the generalized eigenproblem is discussed. It is argued that the directions of canonical correlations can be used as linear models in the input and output spaces respectively in order to represent input and output signals that are maximally correlated. It is also argued that this is a better representation in a response generating system than, for example, principal component analysis since the energy of the signals has nothing to do with their importance for the response generation. An iterative method for finding the canonical correlations is presented. Finally, the possibility of using the canonical correlation for response generation in a reinforcement learning system is indicated.
Mastour, Eshgh Somayeh Sadat. "Distributed Reinforcement Learning for Overlay Networks". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2011. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-92131.
Testo completoHumphrys, Mark. "Action selection methods using reinforcement learning". Thesis, University of Cambridge, 1996. https://www.repository.cam.ac.uk/handle/1810/252269.
Testo completoNamvar, Gharehshiran Omid. "Reinforcement learning in non-stationary games". Thesis, University of British Columbia, 2015. http://hdl.handle.net/2429/51993.
Testo completoApplied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate