Dissertations / Theses on the topic 'Policy gradients'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 30 dissertations / theses for your research on the topic 'Policy gradients.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Crowley, Mark. "Equilibrium policy gradients for spatiotemporal planning." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/38971.
Full textSehnke, Frank [Verfasser], Patrick van der [Akademischer Betreuer] Smagt, and Jürgen [Akademischer Betreuer] Schmidhuber. "Parameter Exploring Policy Gradients and their Implications / Frank Sehnke. Gutachter: Jürgen Schmidhuber. Betreuer: Patrick van der Smagt." München : Universitätsbibliothek der TU München, 2012. http://d-nb.info/1030099820/34.
Full textTolman, Deborah A. "Environmental Gradients, Community Boundaries, and Disturbance the Darlingtonia Fens of Southwestern Oregon." PDXScholar, 2004. https://pdxscholar.library.pdx.edu/open_access_etds/3013.
Full textMasoudi, Mohammad Amin. "Robust Deep Reinforcement Learning for Portfolio Management." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42743.
Full textJacobzon, Gustaf, and Martin Larsson. "Generalizing Deep Deterministic Policy Gradient." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239365.
Full textКовальов, Костянтин Миколайович. "Комп'ютерна система управління промисловим роботом." Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2019. https://ela.kpi.ua/handle/123456789/28610.
Full textQualifying work includes an explanatory note (56 p., 2 appendix). The object of the study are reinforcement learning algorithms for the task of an industrial robotic arm control. Continuous control of an industrial robotic arm for non-trivial tasks is too complicated or even unsolvable for classical methods of robotics. Reinforcement learning methods can be used in this case. They are quite simple to implement, allow for generalization to unseen cases, and learn from high-dimensional data. We implement deep deterministic policy gradient algorithm that is suitable for complex continuous contol tasks. During the study: • An analysis of existing classical methods for the problem of industrial robot control was conducted • An analysis of existing algorithms of training with reinforcement learning and their use in the field of robotics has been conducted • Deep deterministic policy gradient algorithm is implemented • Implemented algorithm is tested on a simplified environment • The architecture of the neural network is proposed for solving the problem • Algorithm was tested on the training set of objects • Algorithm was tested for its generalization ability on the test set It was shown that deep deterministic policy gradient algorithm with neural network as policy approximator is able to solve the problem with the image as an input and to generalize to objects not seen before.
Greensmith, Evan, and evan greensmith@gmail com. "Policy Gradient Methods: Variance Reduction and Stochastic Convergence." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20060106.193712.
Full textGreensmith, Evan. "Policy gradient methods : variance reduction and stochastic convergence /." View thesis entry in Australian Digital Theses Program, 2005. http://thesis.anu.edu.au/public/adt-ANU20060106.193712/index.html.
Full textAberdeen, Douglas Alexander, and doug aberdeen@anu edu au. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes." The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20030410.111006.
Full textAberdeen, Douglas Alexander. "Policy-gradient algorithms for partially observable Markov decision processes /." View thesis entry in Australian Digital Theses Program, 2003. http://thesis.anu.edu.au/public/adt-ANU20030410.111006/index.html.
Full textLidström, Christian, and Hannes Leskelä. "Learning for RoboCup Soccer : Policy Gradient Reinforcement Learning inmulti-agent systems." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-157469.
Full textRobo Cup Soccer är en årlig världsomspännande robotiktävling, i vilken lag av autonoma robotagenter spelar fotboll mot varandra. Denna rapport fokuserar på 2D-simulatorn, vilken är en variant där inga riktiga robotar behövs, utan där spelarklienterna istället kommunicerar med en server vilken håller reda på speltillståndet. RoboCup Soccer 2D simulation har blivit ett stort ämne för forskning inom articiell intelligens, samarbete och beteende i multi-agent-system, och lärandet därav. Någon form av maskininlärning är ett krav om man villkunna tävla på den högsta nivån, då problemet är för komplext för att beslutsfattandet ska kunna programmeras manuellt.Denna rapport finner att PGRL är en vanlig metod för maskininlärning i Robo Cup-lag, den används inom några av de bästa lagen i Robo Cup. Rapporten nner också att PGRL är en effektiv form av maskininlärningn är det gäller inlärningshastighet, men att det finns många faktorer som kan påverka detta. Oftast måste en avvägning ske mellan inlärningshastighet och precision.
GAVELLI, VIKTOR, and ALEXANDER GOMEZ. "Multi-agent system with Policy Gradient Reinforcement Learning for RoboCup Soccer Simulator." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-157418.
Full textRoboCup Soccer Simulator är en multiagent fotbollssimulator som används i tävlingar för att simulera robotar som spelar fotboll. Dessa tävlingar hålls huvudsakligen för att marknadsföra forskning inom robotik och articiell intelligens genom att tillhandahålla ett billigt och lättillgängligt sätt att programmera robotlika agenter. I denna rapportbeskrivs och testas en implementation av ett multiagentfotbollslag. PolicyGradiend Reinforcement Learning (PGRL) används för att träna ochförändra lagets beteende. Resultaten visar att PGRL förbättrar lagets prestanda, men närlagets prestanda skiljer sig avsevärt från motståndarens blir resultatetofullständigt.3
Poulin, Nolan. "Proactive Planning through Active Policy Inference in Stochastic Environments." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/1267.
Full textPianazzi, Enrico. "A deep reinforcement learning approach based on policy gradient for mobile robot navigation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.
Find full textFleming, Brian James. "The social gradient in health : trends in C20th ideas, Australian Health Policy 1970-1998, and a health equity policy evaluation of Australian aged care planning /." Title page, abstract and table of contents only, 2003. http://web4.library.adelaide.edu.au/theses/09PH/09phf5971.pdf.
Full textBjörnberg, Adam, and Haris Poljo. "Impact of observation noise and reward sparseness on Deep Deterministic Policy Gradient when applied to inverted pendulum stabilization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259758.
Full textDjupa Reinforcement Learning (RL) algoritmer har visat sig kunna lösa komplexa problem. Deep Deterministic Policy Gradient (DDPG) är en modern djup RL algoritm som kan hantera miljöer med kontinuerliga åtgärdsutrymmen. Denna studie utvärderar hur DDPG-algoritmen presterar med avseende på lösningsgrad och resultat beroende på observationsbrus och belöningsgles-het i en enkel miljö. Ett tröskelvärde för hur mycket gaussiskt brus som kan läggas på observationer innan algoritmens prestanda börjar minska hittades mellan en standardavvikelse på 0,025 och 0,05. Det drogs även slutsatsen att belöningsgleshet leder till inkonsekventa resultat och oreproducerbarhet, vilket visar vikten av en väl utformad belöningsfunktion. Ytterligare tester krävs för att grundligt utvärdera effekten av att kombinera brusiga observationer och glesa belöningssignaler.
Tagesson, Dennis. "A Comparison Between Deep Q-learning and Deep Deterministic Policy Gradient for an Autonomous Drone in a Simulated Environment." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55134.
Full textKaisaravalli, Bhojraj Gokul, and Yeswanth Surya Achyut Markonda. "Policy-based Reinforcement learning control for window opening and closing in an office building." Thesis, Högskolan Dalarna, Mikrodataanalys, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:du-34420.
Full textOlafsson, Björgvin. "Partially Observable Markov Decision Processes for Faster Object Recognition." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.
Full textCox, Carissa. "Spatial Patterns in Development Regulation: Tree Preservation Ordinances of the DFW Metropolitan Area." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84194/.
Full textMcDowell, Journey. "Comparison of Modern Controls and Reinforcement Learning for Robust Control of Autonomously Backing Up Tractor-Trailers to Loading Docks." DigitalCommons@CalPoly, 2019. https://digitalcommons.calpoly.edu/theses/2100.
Full textMichaud, Brianna. "A Habitat Analysis of Estuarine Fishes and Invertebrates, with Observations on the Effects of Habitat-Factor Resolution." Scholar Commons, 2016. http://scholarcommons.usf.edu/etd/6543.
Full textOlsson, Anton, and Felix Rosberg. "Domain Transfer for End-to-end Reinforcement Learning." Thesis, Högskolan i Halmstad, Akademin för informationsteknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-43042.
Full textAklil, Nassim. "Apprentissage actif sous contrainte de budget en robotique et en neurosciences computationnelles. Localisation robotique et modélisation comportementale en environnement non stationnaire." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066225/document.
Full textDecision-making is a highly researched field in science, be it in neuroscience to understand the processes underlying animal decision-making, or in robotics to model efficient and rapid decision-making processes in real environments. In neuroscience, this problem is resolved online with sequential decision-making models based on reinforcement learning. In robotics, the primary objective is efficiency, in order to be deployed in real environments. However, in robotics what can be called the budget and which concerns the limitations inherent to the hardware, such as computation times, limited actions available to the robot or the lifetime of the robot battery, are often not taken into account at the present time. We propose in this thesis to introduce the notion of budget as an explicit constraint in the robotic learning processes applied to a localization task by implementing a model based on work developed in statistical learning that processes data under explicit constraints, limiting the input of data or imposing a more explicit time constraint. In order to discuss an online functioning of this type of budgeted learning algorithms, we also discuss some possible inspirations that could be taken on the side of computational neuroscience. In this context, the alternation between information retrieval for location and the decision to move for a robot may be indirectly linked to the notion of exploration-exploitation compromise. We present our contribution to the modeling of this compromise in animals in a non-stationary task involving different levels of uncertainty, and we make the link with the methods of multi-armed bandits
Su, Xiaoshan. "Three Essays on the Design, Pricing, and Hedging of Insurance Contracts." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE2065.
Full textThis thesis makes use of some theoretical tools in finance, decision theory, machine learning, to improve the design, pricing and hedging of insurance contracts. Chapter 3 develops closed-form pricing formulas for participating life insurance contracts, based on matrix Wiener-Hopf factorization, where multiple risk sources, such as credit, market, and economic risks, are considered. The pricing method proves to be accurate and efficient. The dynamic and semi-static hedging strategies are introduced to assist insurance company to reduce risk exposure arising from the issue of participating contracts. Chapter 4 discusses the optimal contract design when the insured is third degree risk averse. The results showthat dual limited stop-loss, change-loss, dual change-loss, and stop-loss can be optimal contracts favord by both of risk averters and risk lovers in different settings. Chapter 5 develops a stochastic gradient boosting frequency-severity model, which improves the important and popular GLM and GAM frequency-severity models. This model fully inherits advantages ofgradient boosting algorithm, overcoming the restrictive linear or additive forms of the GLM and GAM frequency-severity models, through learning the model structure from data. Further, our model can also capture the flexible nonlinear dependence between claim frequency and severity
Cai, Bo-Yin, and 蔡博胤. "A Behavior Fusion Approach Based on Policy Gradient." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/u6ctx3.
Full text國立中山大學
電機工程學系研究所
107
In this study, we propose a behavioral fusion algorithm based on policy gradient. We use Actor-Critic algorithm to train sub-tasks. After the training is completed, the behavior fusion algorithm proposed in this paper is used for the learning of complex tasks. We can know the state value function of each sub-task in each state by reading the trained sub-task neural network, then calculate the return of each sub-task, and then pass the normalized return to the behavior fusion algorithm as a policy gradient. When reinforced learning is learning a complex task, there is often a problem that the reward function is difficult to be designed. If we use the sparse reward, although the best solution can be achieved theoretically, it will take a long training time. If we use the dense reward, although the speed of training is accelerated, it is also easy to get the agent into the local minimum. If the complex task is disassembled into several sub-tasks for training, the reward functions of the sub-tasks are easier to design. After the training is completed, these sub-tasks can be merged to achieve the complex tasks. In this study, we use the wafer probe simulator designed by our laboratory and pong in Atari game as the test environment. The wafer inspection simulator is used to simulate how the probe moves when the fab detects the chip. The goal is to have each wafer on the wafer checked once and not repeatedly check the same chip. The pong environment is about letting agents learn to defeat the computer on their own.
Chen, Yi-Ching, and 陳怡靜. "Solving Rubik's Cube by Policy Gradient Based Reinforcement Learning." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/t842yt.
Full text國立清華大學
資訊工程學系所
107
Reinforcement Learning provides a mechanism for training an agent to interact with its environment. Policy gradient makes the right actions more probable. We propose using a linear policy gradient method in a deep neural network-based reinforcement learning. The proposed method employs an intensifying reward function to increase the probabilities of right actions to solve the Rubik's Cube problems. Experiments show that our proposed neural network learned to solve some Rubik's Cube states. For more difficult initial states, the network still cannot always give the correct suggestion.
Kiah-YangChong and 張家揚. "Design and Implementation of Fuzzy Policy Gradient Gait Learning Method for Humanoid Robot." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/90100127378597192142.
Full text國立成功大學
電機工程學系碩博士班
98
The design and implementation of Fuzzy Policy Gradient Learning (FPGL) method for small-sized humanoid robot is proposed in this thesis. This thesis not only introduces the mechanism structure of the humanoid robot and the hardware system adapted on the robot, which is named as aiRobots-V, but also improves and parameterizes the gait pattern of the robot. The movement of arms is added to the gait pattern to reduce the tilt of trunk while walking. FPGL method is an integrated machine learning method based on Policy Gradient Reinforcement Learning (PGRL) method and fuzzy logic concept in order to improve the efficiency and speed of gait learning computation. The humanoid robot is trained with FPGL method which is using the walking distance in constant walking cycles as the reward to learn faster and stable gait automatically. The tilt degree of trunk is chosen as the reward to learn the movement of arms in the walking cycle. The result of the experiment shows that FPGL method could train the gait pattern from 9.26 mm/s walking speed to 162.27 mm/s in about an hour. The training data of experiments also shows that this method could improve the efficiency of basic PGRL method up to 13%. The effect of arm movement to reduce the tilt degree of trunk is also proved by the experimental results. This robot is also applied to participate in the throw-in technical challenge of RoboCup 2010.
"Adaptive Curvature for Stochastic Optimization." Master's thesis, 2019. http://hdl.handle.net/2286/R.I.53675.
Full textDissertation/Thesis
Masters Thesis Computer Science 2019
Pereira, Bruno Alexandre Barbosa. "Deep reinforcement learning for robotic manipulation tasks." Master's thesis, 2021. http://hdl.handle.net/10773/33654.
Full textOs avanços recentes na Inteligência Artificial (IA) demonstram um conjunto de novas oportunidades para a robótica. A Aprendizagem Profunda por Reforço (DRL) é uma subárea da IA que resulta da combinação de Aprendizagem Profunda (DL) com Aprendizagem por Reforço (RL). Esta subárea define algoritmos de aprendizagem automática que aprendem diretamente por experiência e oferece uma abordagem compreensiva para o estudo da interação entre aprendizagem, representação e a decisão. Estes algoritmos já têm sido utilizados com sucesso em diferentes domínios. Nomeadamente, destaca-se a aplicação de agentes de DRL que aprenderam a jogar vídeo jogos da consola Atari 2600 diretamente a partir de pixels e atingiram um desempenho comparável a humanos em 49 desses jogos. Mais recentemente, a DRL em conjunto com outras técnicas originou agentes capazes de jogar o jogo de tabuleiro Go a um nível profissional, algo que até ao momento era visto como um problema demasiado complexo para ser resolvido devido ao seu enorme espaço de procura. No âmbito da robótica, a DRL tem vindo a ser utilizada em problemas de planeamento, navegação, controlo ótimo e outros. Nestas aplicações, as excelentes capacidades de aproximação de funções e aprendizagem de representação das Redes Neuronais Profundas permitem à RL escalar a problemas com espaços de estado e ação multidimensionais. Adicionalmente, propriedades inerentes à DRL fazem a transferência de aprendizagem útil ao passar da simulação para o mundo real. Esta dissertação visa investigar a aplicabilidade e eficácia de técnicas de DRL para aprender políticas de sucesso no domínio das tarefas de manipulação robótica. Inicialmente, um conjunto de três problemas clássicos de RL foram resolvidos utilizando algoritmos de RL e DRL de forma a explorar a sua implementação prática e chegar a uma classe de algoritmos apropriados para estas tarefas de robótica. Posteriormente, foi definida uma tarefa em simulação onde um agente tem como objetivo controlar um manipulador com 6 graus de liberdade de forma a atingir um alvo com o seu terminal. Esta é utilizada para avaliar o efeito no desempenho de diferentes representações do estado, hiperparâmetros e algoritmos do estado da arte de DRL, o que resultou em agentes com taxas de sucesso elevadas. O foco é depois colocado na velocidade e restrições de tempo do posicionamento do terminal. Para este fim, diferentes sistemas de recompensa foram testados para que um agente possa aprender uma versão modificada da tarefa anterior para velocidades de juntas superiores. Neste cenário, foram verificadas várias melhorias em relação ao sistema de recompensa original. Finalmente, uma aplicação do melhor agente obtido nas experiências anteriores é demonstrada num cenário implicado de captura de bola.
Mestrado em Engenharia de Computadores e Telemática