Dissertations / Theses on the topic 'Reinforcement learning (Machine learning)'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Reinforcement learning (Machine learning).'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Hengst, Bernhard Computer Science & Engineering Faculty of Engineering UNSW. "Discovering hierarchy in reinforcement learning." Awarded by:University of New South Wales. Computer Science and Engineering, 2003. http://handle.unsw.edu.au/1959.4/20497.
Full textTabell, Johnsson Marco, and Ala Jafar. "Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents." Thesis, Blekinge Tekniska Högskola, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20218.
Full textAkrour, Riad. "Robust Preference Learning-based Reinforcement Learning." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112236/document.
Full textThe thesis contributions resolves around sequential decision taking and more precisely Reinforcement Learning (RL). Taking its root in Machine Learning in the same way as supervised and unsupervised learning, RL quickly grow in popularity within the last two decades due to a handful of achievements on both the theoretical and applicative front. RL supposes that the learning agent and its environment follow a stochastic Markovian decision process over a state and action space. The process is said of decision as the agent is asked to choose at each time step an action to take. It is said stochastic as the effect of selecting a given action in a given state does not systematically yield the same state but rather defines a distribution over the state space. It is said to be Markovian as this distribution only depends on the current state-action pair. Consequently to the choice of an action, the agent receives a reward. The RL goal is then to solve the underlying optimization problem of finding the behaviour that maximizes the sum of rewards all along the interaction of the agent with its environment. From an applicative point of view, a large spectrum of problems can be cast onto an RL one, from Backgammon (TD-Gammon, was one of Machine Learning first success giving rise to a world class player of advanced level) to decision problems in the industrial and medical world. However, the optimization problem solved by RL depends on the prevous definition of a reward function that requires a certain level of domain expertise and also knowledge of the internal quirks of RL algorithms. As such, the first contribution of the thesis was to propose a learning framework that lightens the requirements made to the user. The latter does not need anymore to know the exact solution of the problem but to only be able to choose between two behaviours exhibited by the agent, the one that matches more closely the solution. Learning is interactive between the agent and the user and resolves around the three main following points: i) The agent demonstrates a behaviour ii) The user compares it w.r.t. to the current best one iii) The agent uses this feedback to update its preference model of the user and uses it to find the next behaviour to demonstrate. To reduce the number of required interactions before finding the optimal behaviour, the second contribution of the thesis was to define a theoretically sound criterion making the trade-off between the sometimes contradicting desires of complying with the user's preferences and demonstrating sufficiently different behaviours. The last contribution was to ensure the robustness of the algorithm w.r.t. the feedback errors that the user might make. Which happens more often than not in practice, especially at the initial phase of the interaction, when all the behaviours are far from the expected solution
Lee, Siu-keung, and 李少強. "Reinforcement learning for intelligent assembly automation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2002. http://hub.hku.hk/bib/B31244397.
Full textTebbifakhr, Amirhossein. "Machine Translation For Machines." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/320504.
Full textYang, Zhaoyuan Yang. "Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning Approach." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152411491981452.
Full textScholz, Jonathan. "Physics-based reinforcement learning for autonomous manipulation." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54366.
Full textCleland, Andrew Lewis. "Bounding Box Improvement with Reinforcement Learning." PDXScholar, 2018. https://pdxscholar.library.pdx.edu/open_access_etds/4438.
Full textPiano, Francesco. "Deep Reinforcement Learning con PyTorch." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25340/.
Full textSuggs, Sterling. "Reinforcement Learning with Auxiliary Memory." BYU ScholarsArchive, 2021. https://scholarsarchive.byu.edu/etd/9028.
Full textJesu, Alberto. "Reinforcement learning over encrypted data." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/23257/.
Full textGustafsson, Robin, and Lucas Fröjdendahl. "Machine Learning for Traffic Control of Unmanned Mining Machines : Using the Q-learning and SARSA algorithms." Thesis, KTH, Hälsoinformatik och logistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-260285.
Full textManuell konfigurering av trafikkontroll för obemannade gruvmaskiner kan vara en tidskrävande process. Om denna konfigurering skulle kunna automatiseras så skulle det gynnas tidsmässigt och ekonomiskt. Denna rapport presenterar en lösning med maskininlärning med Q-learning och SARSA som tillvägagångssätt. Resultaten visar på att konfigureringstiden möjligtvis kan tas ned från 1–2 veckor till i värsta fallet 6 timmar vilket skulle minska kostnaden för produktionssättning. Tester visade att den slutgiltiga lösningen kunde köra kontinuerligt i 24 timmar med minst 82% träffsäkerhet jämfört med 100% då den manuella konfigurationen används. Slutsatsen är att maskininlärning eventuellt kan användas för automatisk konfiguration av trafikkontroll. Vidare arbete krävs för att höja träffsäkerheten till 100% så att det kan användas istället för manuell konfiguration. Fler studier bör göras för att se om detta även är sant och applicerbart för mer komplexa scenarier med större gruvlayouts och fler maskiner.
Mariani, Tommaso. "Deep reinforcement learning for industrial applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20548/.
Full textCleland, Benjamin George. "Reinforcement Learning for Racecar Control." The University of Waikato, 2006. http://hdl.handle.net/10289/2507.
Full textSuay, Halit Bener. "Reinforcement Learning from Demonstration." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/173.
Full textPipe, Anthony Graham. "Reinforcement learning and knowledge transformation in mobile robotics." Thesis, University of the West of England, Bristol, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.364077.
Full textChalup, Stephan Konrad. "Incremental learning with neural networks, evolutionary computation and reinforcement learning algorithms." Thesis, Queensland University of Technology, 2001.
Find full textLe, Piane Fabio. "Training cognitivo adattativo mediante Reinforcement Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/17289/.
Full textRouet-Leduc, Bertrand. "Machine learning for materials science." Thesis, University of Cambridge, 2017. https://www.repository.cam.ac.uk/handle/1810/267987.
Full textAddis, Antonio. "Deep reinforcement learning optimization of video streaming." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019.
Find full textJanagam, Anirudh, and Saddam Hossen. "Analysis of Network Intrusion Detection System with Machine Learning Algorithms (Deep Reinforcement Learning Algorithm)." Thesis, Blekinge Tekniska Högskola, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17126.
Full textWeideman, Ryan. "Robot Navigation in Cluttered Environments with Deep Reinforcement Learning." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2011.
Full textPASQUALINI, LUCA. "Real World Problems through Deep Reinforcement Learning." Doctoral thesis, Università di Siena, 2022. http://hdl.handle.net/11365/1192945.
Full textSong, Yupu. "A Forex Trading System Using Evolutionary Reinforcement Learning." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-theses/1240.
Full textMitchell, Matthew Winston 1968. "An architecture for situated learning agents." Monash University, School of Computer Science and Software Engineering, 2003. http://arrow.monash.edu.au/hdl/1959.1/5553.
Full textTarbouriech, Jean. "Goal-oriented exploration for reinforcement learning." Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILB014.
Full textLearning to reach goals is a competence of high practical relevance to acquire for intelligent agents. For instance, this encompasses many navigation tasks ("go to target X"), robotic manipulation ("attain position Y of the robotic arm"), or game-playing scenarios ("win the game by fulfilling objective Z"). As a living being interacting with the world, I am constantly driven by goals to reach, varying in scope and difficulty.Reinforcement Learning (RL) holds the promise to frame and learn goal-oriented behavior. Goals can be modeled as specific configurations of the environment that must be attained via sequential interaction and exploration of the unknown environment. Although various deep RL algorithms have been proposed for goal-oriented RL, existing methods often lack principled understanding, sample efficiency and general-purpose effectiveness. In fact, very limited theoretical analysis of goal-oriented RL was available, even in the basic scenario of finitely many states and actions.We first focus on a supervised scenario of goal-oriented RL, where a goal state to be reached in minimum total expected cost is provided as part of the problem definition. After formalizing the online learning problem in this setting often known as Stochastic Shortest Path (SSP), we introduce two no-regret algorithms (one is the first available in the literature, the other attains nearly optimal guarantees).Beyond training our RL agent to solve only one task, we then aspire that it learns to autonomously solve a wide variety of tasks, in the absence of any reward supervision. In this challenging unsupervised RL scenario, we advocate to "Set Your Own Goals" (SYOG), which suggests the agent to learn the ability to intrinsically select and reach its own goal states. We derive finite-time guarantees of this popular heuristic in various settings, each with its specific learning objective and technical challenges. As an illustration, we propose a rigorous analysis of the algorithmic principle of targeting "uncertain" goals which we also anchor in deep RL.The main focus and contribution of this thesis are to instigate a principled analysis of goal-oriented exploration in RL, both in the supervised and unsupervised scenarios. We hope that it helps suggest promising research directions to improve the interpretability and sample efficiency of goal-oriented RL algorithms in practical applications
Irani, Arya John. "Utilizing negative policy information to accelerate reinforcement learning." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53481.
Full textTham, Chen Khong. "Modular on-line function approximation for scaling up reinforcement learning." Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.309702.
Full textDönmez, Halit Anil. "Collision Avoidance for Virtual Crowds Using Reinforcement Learning." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210560.
Full textVirtuell folkmassimulering används i ett brett utbud av applikationersom videospel, arkitektoniska mönster och filmer. Det är viktigt förskaparna att ha en realistisk publik simulator som kommer att kunnagenerera publiken som behövs för att visa de beteenden som behövs. Det är viktigt att tillhandahålla ett lättanvänt verktyg för publikgenereringsom är snabb och realistisk. Förstärkt lärande föreslogs föratt utbilda en agent för att visa ett visst beteende. I denna avhandlingimplementerades en förstärkningslärande metod för att utvärderavirtuella folkmassor. Q Lärandemetod valdes som förstärkningslärningsmetod.Två olika versioner av Q-inlärningsmetoden genomfördes. Dessa olika versioner utvärderades med avseende på toppmodernaalgoritmer: Gensamma hastighetshinder och ett kopieringssyntestillvägagångssättbaserat på realtid. Utvärderingen av publiken gjordesmed en användarstudie. Resultaten från användarstudien visadeatt medan Reinforcement Learning-metoden inte uppfattas som verkligsom den verkliga publiken, uppfattades det nästan lika realistisktsom massorna genererade med Reciprocal Velocity Objects. Ett annatresultat var att uppfattningen av RVO förändras med den föränderligamiljön. När bara stigarna visades upplevdes det mer naturligt än närdet visades i en miljö i riktiga värld med fotgängare. Det drogs slutsatsenatt att använda Q Learning för att generera folkmassor är enlovande metod och kan förbättras som ett ersättare för befintliga metoderoch i vissa scenarier resulterar Q Learning algoritm med bättrekollisionsundvikande och mer realistisk publik simulering.
Svensson, Frida. "Scalable Distributed Reinforcement Learning for Radio Resource Management." Thesis, Linköpings universitet, Tillämpad matematik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177822.
Full textDet finns en stor potential automatisering och optimering inom radionätverk (RAN, radio access network) genom att använda datadrivna lösningar för att på ett effektivt sätt hantera den ökade komplexiteten på grund av trafikökningar and nya teknologier som introducerats i samband med 5G. Förstärkningsinlärning (RL, reinforcement learning) har naturliga kopplingar till reglerproblem i olika tidsskalor, såsom länkanpassning, interferenshantering och kraftkontroll, vilket är vanligt förekommande i radionätverk. Att förhöja statusen på datadrivna lösningar i radionätverk kommer att vara nödvändigt för att hantera utmaningarna som uppkommer med framtida 5G nätverk. I detta arbete föreslås vi en syetematisk metodologi för att applicera RL på ett reglerproblem. I första hand används den föreslagna metodologin på ett välkänt reglerporblem. Senare anpassas metodologin till ett äkta RAN-scenario. Arbetet inkluderar utförliga resultat från simuleringar för att visa effektiviteten och potentialen hos den föreslagna metoden. En lyckad metodologi skapades men resultaten på RAN-simulatorn saknade mognad.
Larsson, Hannes. "Deep Reinforcement Learning for Cavity Filter Tuning." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-354815.
Full textRenner, Michael Robert. "Machine Learning Simulation: Torso Dynamics of Robotic Biped." Thesis, Virginia Tech, 2007. http://hdl.handle.net/10919/34602.
Full textMaster of Science
Nikolic, Marko. "Single asset trading: a recurrent reinforcement learning approach." Thesis, Mälardalens högskola, Akademin för utbildning, kultur och kommunikation, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-47505.
Full textEmenonye, Don-Roberts Ugochukwu. "Application of Machine Learning to Multi Antenna Transmission and Machine Type Resource Allocation." Thesis, Virginia Tech, 2020. http://hdl.handle.net/10919/99956.
Full textMaster of Science
Wireless communication systems is a well researched area of engineering that has continually evolved over the past decades. This constant evolution and development has led to well formulated theoretical baselines in terms of reliability and efficiency. This two part thesis investigates the possibility of improving these wireless systems with machine learning. First, with the goal of designing more resilient codes for transmission, we propose to redesign the transmit and receive blocks of the physical layer. We focus on jointly optimizing the transmit and receive blocks to produce a set of transmit codes that are resilient to channel impairments. We compare our results to the current conventional codes for various transmit and receive antenna configuration. The second part of this work investigates the possibility of designing a distributed multi-access scheme for machine type devices. In this scheme, MTDs pseudo-randomly transmit their data by randomly selecting time slots. This results in the possibility of a large number of collisions occurring in the duration of these slots. To alleviate the resulting congestion, we employ a heterogeneous network and investigate the optimal MTD-BS association which minimizes the long term congestion experienced in the overall network. Our results show that we can derive the optimal MTD-BS algorithm when the number of MTDs is less than the total number of slots.
Barkino, Iliam. "Summary Statistic Selection with Reinforcement Learning." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-390838.
Full textCunha, João Alexandre da Silva Costa e. "Techniques for batch reinforcement learning in robotics." Doctoral thesis, Universidade de Aveiro, 2015. http://hdl.handle.net/10773/15735.
Full textThis thesis addresses the Batch Reinforcement Learning methods in Robotics. This sub-class of Reinforcement Learning has shown promising results and has been the focus of recent research. Three contributions are proposed that aim to extend the state-of-art methods allowing for a faster and more stable learning process, such as required for learning in Robotics. The Q-learning update-rule is widely applied, since it allows to learn without the presence of a model of the environment. However, this update-rule is transition-based and does not take advantage of the underlying episodic structure of collected batch of interactions. The Q-Batch update-rule is proposed in this thesis, to process experiencies along the trajectories collected in the interaction phase. This allows a faster propagation of obtained rewards and penalties, resulting in faster and more robust learning. Non-parametric function approximations are explored, such as Gaussian Processes. This type of approximators allows to encode prior knowledge about the latent function, in the form of kernels, providing a higher level of exibility and accuracy. The application of Gaussian Processes in Batch Reinforcement Learning presented a higher performance in learning tasks than other function approximations used in the literature. Lastly, in order to extract more information from the experiences collected by the agent, model-learning techniques are incorporated to learn the system dynamics. In this way, it is possible to augment the set of collected experiences with experiences generated through planning using the learned models. Experiments were carried out mainly in simulation, with some tests carried out in a physical robotic platform. The obtained results show that the proposed approaches are able to outperform the classical Fitted Q Iteration.
Esta tese aborda a aplicação de métodos de Aprendizagem por Reforço em Lote na Robótica. Como o nome indica, os métodos de Aprendizagem por Reforço em Lote aprendem a completar uma tarefa processando um lote de interacções com o ambiente. São propostas três contribuições que procuram possibilitar a aprendizagem de uma forma mais rápida e estável. A regra Q-learning e amplamente usada dado que permite aprender sem a existência de um modelo do ambiente. No entanto, esta tem por base uma única transição, não tirando partido da estrutura baseada em episódios do lote de experiências. E proposta, neste trabalho, a regra Q-Batch que processa as experiências através es das trajectórias descritas aquando da interacção. Desta forma, e possível propagar mais rapidamente o valor das recompensas e penalizações obtidas, permitindo assim aprender de uma forma mais robusta e rápida. E também explorada a aplicação de aproximações não paramétricas como Processos Gaussianos. Este tipo de aproximadores permite codificar conhecimento prévio sobre as características da função a aproximar sob a forma de núcleos, fornecendo maior exibilidade e precisão. A aplicação de Processos Gaussianos na Aprendizagem por Reforço em Lote apresentou um maior desempenho na aprendizagem de comportamentos do que outras aproximações existentes na literatura. Por ultimo, de forma a extrair mais informação das experiências adquiridas pelo agente, são incorporadas técnicas de aprendizagem de modelos de transição. Desta forma, e possível ampliar o conjunto de experiências adquiridas através da interacção com o ambiente, com experiências geradas através de planeamento com recurso aos modelos de transição. Foram realizadas experiências principalmente em simulação, com alguns tests realizados numa plataforma robótica f sica. Os resultados obtidos mostram que as abordagens propostas são capaz de superar o método Fitted Q Iteration clássico.
Crandall, Jacob W. "Learning Successful Strategies in Repeated General-sum Games." Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd1156.pdf.
Full textWingate, David. "Solving Large MDPs Quickly with Partitioned Value Iteration." Diss., CLICK HERE for online access, 2004. http://contentdm.lib.byu.edu/ETD/image/etd437.pdf.
Full textBeretta, Davide. "Experience Replay in Sparse Rewards Problems using Deep Reinforcement Techniques." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17531/.
Full textVafaie, Parsa. "Learning in the Presence of Skew and Missing Labels Through Online Ensembles and Meta-reinforcement Learning." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42636.
Full textStaffolani, Alessandro. "A Reinforcement Learning Agent for Distributed Task Allocation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20051/.
Full textCeylan, Hakan. "Using Reinforcement Learning in Partial Order Plan Space." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5232/.
Full textDazeley, R. "Investigations into Playing Chess Endgames using Reinforcement Learning." Thesis, Honours thesis, University of Tasmania, 2001. https://eprints.utas.edu.au/62/1/Final_Thesis.pdf.
Full textMiller, Eric D. "Biased Exploration in Offline Hierarchical Reinforcement Learning." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case160768140424212.
Full textQi, Dehu. "Multi-agent systems : integrating reinforcement learning, bidding and genetic algorithms /." free to MU campus, to others for purchase, 2002. http://wwwlib.umi.com/cr/mo/fullcit?p3060133.
Full textSharma, Aakanksha. "Machine learning-based optimal load balancing in software-defined networks." Thesis, Federation University Australia, 2022. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/188228.
Full textDoctor of Philosophy
Ngai, Chi-kit, and 魏智傑. "Reinforcement-learning-based autonomous vehicle navigation in a dynamically changing environment." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B39707386.
Full textBuzzoni, Michele. "Reinforcement Learning in problemi di controllo del bilanciamento." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/15539/.
Full textHayashi, Kazuki. "Reinforcement Learning for Optimal Design of Skeletal Structures." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/263614.
Full textKuurne, Uussilta Dennis, and Viktor Olsson. "Deep Reinforcement Learning in Cart Pole and Pong." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-293856.
Full textMålet med detta projekt är att reproducera tidigare resultat som uppnåtts med Deep Reinforcement Learning. Vi presenterar Markov Decision Process-modellen samt algoritmerna Q-learning och Deep Q-learning Network (DQN). Vi implementerar en DQN agent, först i miljön CartPole, sedan i spelet Pong. Vår agent lyckades lösa CartPole på mindre än 300 episoder. Vi gör en bedömning av vissa parametrars påverkan på agentens prestanda. Agentens prestanda är särskilt känslig för värdet på ”learning rate” och verkar vara proportionell mot dimensionen av det neurala nätverket. DQN-agenten som implementerades i Pong var oförmögen att lära sig och spelade på samma nivå som en agent som agerar slumpmässigt, trots att vi introducerade diverse modifikationer. Vi diskuterar möjliga felkällor, bland annat att RAM, som används som indata till agenten, eventuellt saknar tillräcklig information. Dessutom diskuterar vi att ytterligare modifikationer kan vara nödvändiga för uppnå konvergens eftersom detta inte är garanterat för DQN.
Kandidatexjobb i elektroteknik 2020, KTH, Stockholm