Academic literature on the topic 'Reinforcement Learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Reinforcement Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Reinforcement Learning"

1

Deora, Merin, and Sumit Mathur. "Reinforcement Learning." IJARCCE 6, no. 4 (April 30, 2017): 178–81. http://dx.doi.org/10.17148/ijarcce.2017.6433.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Barto, Andrew G. "Reinforcement Learning." IFAC Proceedings Volumes 31, no. 29 (October 1998): 5. http://dx.doi.org/10.1016/s1474-6670(17)38315-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Woergoetter, Florentin, and Bernd Porr. "Reinforcement learning." Scholarpedia 3, no. 3 (2008): 1448. http://dx.doi.org/10.4249/scholarpedia.1448.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Moore, Brett L., Anthony G. Doufas, and Larry D. Pyeatt. "Reinforcement Learning." Anesthesia & Analgesia 112, no. 2 (February 2011): 360–67. http://dx.doi.org/10.1213/ane.0b013e31820334a7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Liaq, Mudassar, and Yungcheol Byun. "Autonomous UAV Navigation Using Reinforcement Learning." International Journal of Machine Learning and Computing 9, no. 6 (December 2019): 756–61. http://dx.doi.org/10.18178/ijmlc.2019.9.6.869.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Alrammal, Muath, and Munir Naveed. "Monte-Carlo Based Reinforcement Learning (MCRL)." International Journal of Machine Learning and Computing 10, no. 2 (February 2020): 227–32. http://dx.doi.org/10.18178/ijmlc.2020.10.2.924.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Nurmuhammet, Abdullayev. "DEEP REINFORCEMENT LEARNING ON STOCK DATA." Alatoo Academic Studies 23, no. 2 (June 30, 2023): 505–18. http://dx.doi.org/10.17015/aas.2023.232.49.

Full text
Abstract:
This study proposes using Deep Reinforcement Learning (DRL) for stock trading decisions and prediction. DRL is a machine learning technique that enables agents to learn optimal strategies by interacting with their environment. The proposed model surpasses traditional models and can make informed trading decisions in real-time. The study highlights the feasibility of applying DRL in financial markets and its advantages in strategic decision- making. The model's ability to learn from market dynamics makes it a promising approach for stock market forecasting. Overall, this paper provides valuable insights into the use of DRL for stock trading decisions and prediction, establishing a strong case for its adoption in financial markets. Keywords: reinforcement learning, stock market, deep reinforcement learning.
APA, Harvard, Vancouver, ISO, and other styles
8

Likas, Aristidis. "A Reinforcement Learning Approach to Online Clustering." Neural Computation 11, no. 8 (November 1, 1999): 1915–32. http://dx.doi.org/10.1162/089976699300016025.

Full text
Abstract:
A general technique is proposed for embedding online clustering algorithms based on competitive learning in a reinforcement learning framework. The basic idea is that the clustering system can be viewed as a reinforcement learning system that learns through reinforcements to follow the clustering strategy we wish to implement. In this sense, the reinforcement guided competitive learning (RGCL) algorithm is proposed that constitutes a reinforcement-based adaptation of learning vector quantization (LVQ) with enhanced clustering capabilities. In addition, we suggest extensions of RGCL and LVQ that are characterized by the property of sustained exploration and significantly improve the performance of those algorithms, as indicated by experimental tests on well-known data sets.
APA, Harvard, Vancouver, ISO, and other styles
9

Mardhatillah, Elsy. "Teacher’s Reinforcement in English Classroom in MTSS Darul Makmur Sungai Cubadak." Indonesian Research Journal On Education 3, no. 1 (January 2, 2022): 825–32. http://dx.doi.org/10.31004/irje.v3i1.202.

Full text
Abstract:
This research was due to some problems found in MTsS Darul Makmur. First, some students were not motivated in learning. Second, sometime the teacher still uses Indonesian in giving reinforcements. Third, some Students did not care about the teacher's reinforcement. This study aimed to find out the types of reinforcement used by the teacher. Then, to find out the types of reinforcement often and rarely to be usedby the teacher. Then, to find out the reasons the teacher used certain reinforcements. Last, to find out how the teacher understands the reinforcement. This research used a qualitative approach. The design of this research was descriptive because the researcher made a description of the use of reinforcement by theteacher in the English classroom. In this research, the interview and observation sheets were used by the researcher. The researcher found that the type of reinforcement used by the teacher is positive reinforcement and negative reinforcement. First, there were two types of positive reinforcement used by teachers, namely verbal reinforcement and non-verbal reinforcement. The verbal often used by theteacher was a reinforcement in the form of words and reinforcement in the form of phrases. Then, verbal reinforcement in the form of sentences was never done by the teacher in the learning process. While the non-verbal reinforcement often used by the teacher was gestural, activity reinforcement, and proximity reinforcement. Second, the negative reinforcement often used by the teacher was a warning, gesture, and eye contact. Meanwhile, the negative reinforcement rarely used by the teacher was speech volume andpunishment. Third, the reasons teachers reinforce learning are to motivate students and make students feel appreciated and happy while learning.
APA, Harvard, Vancouver, ISO, and other styles
10

Fan, ZiSheng. "An exploration of reinforcement learning and deep reinforcement learning." Applied and Computational Engineering 73, no. 1 (July 5, 2024): 154–59. http://dx.doi.org/10.54254/2755-2721/73/20240386.

Full text
Abstract:
Today, machine learning is evolving so quickly that new algorithms are always appearing. Deep neural networks in particular have shown positive outcomes in a variety of areas, including computer vision, natural language processing, and time series prediction. Its development moves at a very sluggish pace due to the high threshold. Therefore, a thorough examination of the reinforcement learning field should be required. This essay examines both the deep learning algorithm and the reinforcement learning operational procedure. The study identifies information retrieval, data mining, intelligent speech, natural language processing, and reinforcement learning as key technologies. The scientific study of reinforcement learning has advanced remarkably quickly, and it is now being used to tackle important decision optimization issues at academic conferences and journal research work in computer networks, computer graphics, etc. Brief introductions and reviews of both types of models are provided in this paper, along with an understanding of some of the most cutting-edge reinforcement learning applications and approaches.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Reinforcement Learning"

1

Izquierdo, Ayala Pablo. "Learning comparison: Reinforcement Learning vs Inverse Reinforcement Learning : How well does inverse reinforcement learning perform in simple markov decision processes in comparison to reinforcement learning?" Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259371.

Full text
Abstract:
This research project elaborates a qualitative comparison between two different learning approaches, Reinforcement Learning (RL) and Inverse Reinforcement Learning (IRL) over the Gridworld Markov Decision Process. The interest focus will be set on the second learning paradigm, IRL, as it is considered to be relatively new and little work has been developed in this field of study. As observed, RL outperforms IRL, obtaining a correct solution in all the different scenarios studied. However, the behaviour of the IRL algorithms can be improved and this will be shown and analyzed as part of the scope.
Denna studie är en kvalitativ jämförelse mellan två olika inlärningsangreppssätt, “Reinforcement Learning” (RL) och “Inverse Reinforcement Learning” (IRL), om använder "Gridworld", en "Markov Decision-Process". Fokus ligger på den senare algoritmen, IRL, eftersom den anses relativt ny och få studier har i nuläget gjorts kring den. I studien är RL mer fördelaktig än IRL, som skapar en korrekt lösning i alla olika scenarier som presenteras i studien. Beteendet hos IRL-algoritmen kan dock förbättras vilket också visas och analyseras i denna studie.
APA, Harvard, Vancouver, ISO, and other styles
2

Seymour, B. J. "Aversive reinforcement learning." Thesis, University College London (University of London), 2010. http://discovery.ucl.ac.uk/800107/.

Full text
Abstract:
We hypothesise that human aversive learning can be described algorithmically by Reinforcement Learning models. Our first experiment uses a second-order conditioning design to study sequential outcome prediction. We show that aversive prediction errors are expressed robustly in the ventral striatum, supporting the validity of temporal difference algorithms (as in reward learning), and suggesting a putative critical area for appetitive-aversive interactions. With this in mind, the second experiment explores the nature of pain relief, which as expounded in theories of motivational opponency, is rewarding. In a Pavlovian conditioning task with phasic relief of tonic noxious thermal stimulation, we show that both appetitive and aversive prediction errors are co-expressed in anatomically dissociable regions (in a mirror opponent pattern) and that striatal activity appears to reflect integrated appetitive-aversive processing. Next we designed a Pavlovian task in which cues predicted either financial gains, losses, or both, thereby forcing integration of both motivational streams. This showed anatomical dissociation of aversive and appetitive predictions along a posterior-anterior gradient within the striatum, respectively. Lastly, we studied aversive instrumental control (avoidance). We designed a simultaneous pain avoidance and financial reward learning task, in which subjects had to learn independently learn about each, and trade off aversive and appetitive predictions. We show that predictions for both converge on the medial head of caudate nucleus, suggesting that this is a critical site for appetitive-aversive integration in instrumental decision making. We also study also tested whether serotonin (5HT) modulates either phasic or tonic opponency using acute tryptophan depletion. Both behavioural and imaging data confirm the latter, in which it appears to mediate an average reward term, providing an aspiration level against which the benefits of exploration are judged. In summary, our data provide a basic computational and neuroanatomical framework for human aversive learning. We demonstrate the algorithmic and implementational validity of reinforcement learning models for both aversive prediction and control, illustrate the nature and neuroanatomy of appetitive-aversive integration, and discover the critical (and somewhat unexpected) central role for the striatum.
APA, Harvard, Vancouver, ISO, and other styles
3

Akrour, Riad. "Robust Preference Learning-based Reinforcement Learning." Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112236/document.

Full text
Abstract:
Les contributions de la thèse sont centrées sur la prise de décisions séquentielles et plus spécialement sur l'Apprentissage par Renforcement (AR). Prenant sa source de l'apprentissage statistique au même titre que l'apprentissage supervisé et non-supervisé, l'AR a gagné en popularité ces deux dernières décennies en raisons de percées aussi bien applicatives que théoriques. L'AR suppose que l'agent (apprenant) ainsi que son environnement suivent un processus de décision stochastique Markovien sur un espace d'états et d'actions. Le processus est dit de décision parce que l'agent est appelé à choisir à chaque pas de temps du processus l'action à prendre. Il est dit stochastique parce que le choix d'une action donnée en un état donné n'implique pas le passage systématique à un état particulier mais définit plutôt une distribution sur l'espace d'états. Il est dit Markovien parce que cette distribution ne dépend que de l'état et de l'action courante. En conséquence d'un choix d'action, l'agent reçoit une récompense. Le but de l'AR est alors de résoudre le problème d'optimisation retournant le comportement qui assure à l'agent une récompense maximale tout au long de son interaction avec l'environnement. D'un point de vue pratique, un large éventail de problèmes peuvent être transformés en un problème d'AR, du Backgammon (cf. TD-Gammon, l'une des premières grandes réussites de l'AR et de l'apprentissage statistique en général, donnant lieu à un joueur expert de classe internationale) à des problèmes de décision dans le monde industriel ou médical. Seulement, le problème d'optimisation résolu par l'AR dépend de la définition préalable d'une fonction de récompense adéquate nécessitant une expertise certaine du domaine d'intérêt mais aussi du fonctionnement interne des algorithmes d'AR. En ce sens, la première contribution de la thèse a été de proposer un nouveau cadre d'apprentissage, allégeant les prérequis exigés à l'utilisateur. Ainsi, ce dernier n'a plus besoin de connaître la solution exacte du problème mais seulement de pouvoir désigner entre deux comportements, celui qui s'approche le plus de la solution. L'apprentissage se déroule en interaction entre l'utilisateur et l'agent. Cette interaction s'articule autour des trois points suivants : i) L'agent exhibe un nouveau comportement ii) l'expert le compare au meilleur comportement jusqu'à présent iii) l'agent utilise ce retour pour mettre à jour son modèle des préférences puis choisit le prochain comportement à démontrer. Afin de réduire le nombre d'interactions nécessaires entre l'utilisateur et l'agent pour que ce dernier trouve le comportement optimal, la seconde contribution de la thèse a été de définir un critère théoriquement justifié faisant le compromis entre les désirs parfois contradictoires de prendre en compte les préférences de l'utilisateur tout en exhibant des comportements suffisamment différents de ceux déjà proposés. La dernière contribution de la thèse est d'assurer la robustesse de l'algorithme face aux éventuelles erreurs d'appréciation de l'utilisateur. Ce qui arrive souvent en pratique, spécialement au début de l'interaction, quand tous les comportements proposés par l'agent sont loin de la solution attendue
The thesis contributions resolves around sequential decision taking and more precisely Reinforcement Learning (RL). Taking its root in Machine Learning in the same way as supervised and unsupervised learning, RL quickly grow in popularity within the last two decades due to a handful of achievements on both the theoretical and applicative front. RL supposes that the learning agent and its environment follow a stochastic Markovian decision process over a state and action space. The process is said of decision as the agent is asked to choose at each time step an action to take. It is said stochastic as the effect of selecting a given action in a given state does not systematically yield the same state but rather defines a distribution over the state space. It is said to be Markovian as this distribution only depends on the current state-action pair. Consequently to the choice of an action, the agent receives a reward. The RL goal is then to solve the underlying optimization problem of finding the behaviour that maximizes the sum of rewards all along the interaction of the agent with its environment. From an applicative point of view, a large spectrum of problems can be cast onto an RL one, from Backgammon (TD-Gammon, was one of Machine Learning first success giving rise to a world class player of advanced level) to decision problems in the industrial and medical world. However, the optimization problem solved by RL depends on the prevous definition of a reward function that requires a certain level of domain expertise and also knowledge of the internal quirks of RL algorithms. As such, the first contribution of the thesis was to propose a learning framework that lightens the requirements made to the user. The latter does not need anymore to know the exact solution of the problem but to only be able to choose between two behaviours exhibited by the agent, the one that matches more closely the solution. Learning is interactive between the agent and the user and resolves around the three main following points: i) The agent demonstrates a behaviour ii) The user compares it w.r.t. to the current best one iii) The agent uses this feedback to update its preference model of the user and uses it to find the next behaviour to demonstrate. To reduce the number of required interactions before finding the optimal behaviour, the second contribution of the thesis was to define a theoretically sound criterion making the trade-off between the sometimes contradicting desires of complying with the user's preferences and demonstrating sufficiently different behaviours. The last contribution was to ensure the robustness of the algorithm w.r.t. the feedback errors that the user might make. Which happens more often than not in practice, especially at the initial phase of the interaction, when all the behaviours are far from the expected solution
APA, Harvard, Vancouver, ISO, and other styles
4

Tabell, Johnsson Marco, and Ala Jafar. "Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents." Thesis, Blekinge Tekniska Högskola, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20218.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yang, Zhaoyuan Yang. "Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning Approach." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152411491981452.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cortesi, Daniele. "Reinforcement Learning in Rogue." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16138/.

Full text
Abstract:
In this work we use Reinforcement Learning to play the famous Rogue, a dungeon-crawler videogame father of the rogue-like genre. By employing different algorithms we substantially improve on the results obtained in previous work, addressing and solving the problems that were arisen. We then devise and perform new experiments to test the limits of our own solution and encounter additional and unexpected issues in the process. In one of the investigated scenario we clearly see that our approach is not yet enough to even perform better than a random agent and propose ideas for future works.
APA, Harvard, Vancouver, ISO, and other styles
7

Girgin, Sertan. "Abstraction In Reinforcement Learning." Phd thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608257/index.pdf.

Full text
Abstract:
Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. Generally, the problem to be solved contains subtasks that repeat at different regions of the state space. Without any guidance an agent has to learn the solutions of all subtask instances independently, which degrades the learning performance. In this thesis, we propose two approaches to build connections between different regions of the search space leading to better utilization of gained experience and accelerate learning is proposed. In the first approach, we first extend existing work of McGovern and propose the formalization of stochastic conditionally terminating sequences with higher representational power. Then, we describe how to efficiently discover and employ useful abstractions during learning based on such sequences. The method constructs a tree structure to keep track of frequently used action sequences together with visited states. This tree is then used to select actions to be executed at each step. In the second approach, we propose a novel method to identify states with similar sub-policies, and show how they can be integrated into reinforcement learning framework to improve the learning performance. The method uses an efficient data structure to find common action sequences started from observed states and defines a similarity function between states based on the number of such sequences. Using this similarity function, updates on the action-value function of a state are reflected to all similar states. This, consequently, allows experience acquired during learning be applied to a broader context. Effectiveness of both approaches is demonstrated empirically by conducting extensive experiments on various domains.
APA, Harvard, Vancouver, ISO, and other styles
8

Suay, Halit Bener. "Reinforcement Learning from Demonstration." Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/173.

Full text
Abstract:
Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent's own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na"ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently.
APA, Harvard, Vancouver, ISO, and other styles
9

Gao, Yang. "Argumentation accelerated reinforcement learning." Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/26603.

Full text
Abstract:
Reinforcement Learning (RL) is a popular statistical Artificial Intelligence (AI) technique for building autonomous agents, but it suffers from the curse of dimensionality: the computational requirement for obtaining the optimal policies grows exponentially with the size of the state space. Integrating heuristics into RL has proven to be an effective approach to combat this curse, but deriving high-quality heuristics from people's (typically conflicting) domain knowledge is challenging, yet it received little research attention. Argumentation theory is a logic-based AI technique well-known for its conflict resolution capability and intuitive appeal. In this thesis, we investigate the integration of argumentation frameworks into RL algorithms, so as to improve the convergence speed of RL algorithms. In particular, we propose a variant of Value-based Argumentation Framework (VAF) to represent domain knowledge and to derive heuristics from this knowledge. We prove that the heuristics derived from this framework can effectively instruct individual learning agents as well as multiple cooperative learning agents. In addition,we propose the Argumentation Accelerated RL (AARL) framework to integrate these heuristics into different RL algorithms via Potential Based Reward Shaping (PBRS) techniques: we use classical PBRS techniques for flat RL (e.g. SARSA(λ)) based AARL, and propose a novel PBRS technique for MAXQ-0, a hierarchical RL (HRL) algorithm, so as to implement HRL based AARL. We empirically test two AARL implementations - SARSA(λ)-based AARL and MAXQ-based AARL - in multiple application domains, including single-agent and multi-agent learning problems. Empirical results indicate that AARL can improve the convergence speed of RL, and can also be easily used by people that have little background in Argumentation and RL.
APA, Harvard, Vancouver, ISO, and other styles
10

Alexander, John W. "Transfer in reinforcement learning." Thesis, University of Aberdeen, 2015. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=227908.

Full text
Abstract:
The problem of developing skill repertoires autonomously in robotics and artificial intelligence is becoming ever more pressing. Currently, the issues of how to apply prior knowledge to new situations and which knowledge to apply have not been sufficiently studied. We present a transfer setting where a reinforcement learning agent faces multiple problem solving tasks drawn from an unknown generative process, where each task has similar dynamics. The task dynamics are changed by varying in the transition function between states. The tasks are presented sequentially with the latest task presented considered as the target for transfer. We describe two approaches to solving this problem. Firstly we present an algorithm for transfer of the function encoding the stateaction value, defined as value function transfer. This algorithm uses the value function of a source policy to initialise the policy of a target task. We varied the type of basis the algorithm used to approximate the value function. Empirical results in several well known domains showed that the learners benefited from the transfer in the majority of cases. Results also showed that the Radial basis performed better in general than the Fourier. However contrary to expectation the Fourier basis benefited most from the transfer. Secondly, we present an algorithm for learning an informative prior which encodes beliefs about the underlying dynamics shared across all tasks. We call this agent the Informative Prior agent (IP). The prior is learnt though experience and captures the commonalities in the transition dynamics of the domain and allows for a quantification of the agent's uncertainty about these. By using a sparse distribution of the uncertainty in the dynamics as a prior, the IP agent can successfully learn a model of 1) the set of feasible transitions rather than the set of possible transitions, and 2) the likelihood of each of the feasible transitions. Analysis focusing on the accuracy of the learned model showed that IP had a very good accuracy bound, which is expressible in terms of only the permissible error and the diffusion, a factor that describes the concentration of the prior mass around the truth, and which decreases as the number of tasks experienced grows. The empirical evaluation of IP showed that an agent which uses the informative prior outperforms several existing Bayesian reinforcement learning algorithms on tasks with shared structure in a domain where multiple related tasks were presented only once to the learners. IP is a step towards the autonomous acquisition of behaviours in artificial intelligence. IP also provides a contribution towards the analysis of exploration and exploitation in the transfer paradigm.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Reinforcement Learning"

1

S, Sutton Richard, ed. Reinforcement learning. Boston: Kluwer Academic Publishers, 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sutton, Richard S. Reinforcement Learning. Boston, MA: Springer US, 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wiering, Marco, and Martijn van Otterlo, eds. Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-27645-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Sutton, Richard S., ed. Reinforcement Learning. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lorenz, Uwe. Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2020. http://dx.doi.org/10.1007/978-3-662-61651-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Nandy, Abhishek, and Manisha Biswas. Reinforcement Learning. Berkeley, CA: Apress, 2018. http://dx.doi.org/10.1007/978-1-4842-3285-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Jinna, Frank L. Lewis, and Jialu Fan. Reinforcement Learning. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-28394-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Lorenz, Uwe. Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2024. http://dx.doi.org/10.1007/978-3-662-68311-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Merrick, Kathryn, and Mary Lou Maher. Motivated Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-540-89187-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Dong, Hao, Zihan Ding, and Shanghang Zhang, eds. Deep Reinforcement Learning. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-4095-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Reinforcement Learning"

1

Sutton, Richard S. "Introduction: The Challenge of Reinforcement Learning." In Reinforcement Learning, 1–3. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Williams, Ronald J. "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning." In Reinforcement Learning, 5–32. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tesauro, Gerald. "Practical Issues in Temporal Difference Learning." In Reinforcement Learning, 33–53. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Watkins, Christopher J. C. H., and Peter Dayan. "Technical Note." In Reinforcement Learning, 55–68. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lin, Long-Ji. "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching." In Reinforcement Learning, 69–97. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Singh, Satinder Pal. "Transfer of Learning by Composing Solutions of Elemental Sequential Tasks." In Reinforcement Learning, 99–115. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Dayan, Peter. "The Convergence of TD(λ) for General λ." In Reinforcement Learning, 117–38. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Millán, José R., and Carme Torras. "A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments." In Reinforcement Learning, 139–71. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Lorenz, Uwe. "Bestärkendes Lernen als Teilgebiet des Maschinellen Lernens." In Reinforcement Learning, 1–11. Berlin, Heidelberg: Springer Berlin Heidelberg, 2020. http://dx.doi.org/10.1007/978-3-662-61651-2_1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Lorenz, Uwe. "Grundbegriffe des Bestärkenden Lernens." In Reinforcement Learning, 13–20. Berlin, Heidelberg: Springer Berlin Heidelberg, 2020. http://dx.doi.org/10.1007/978-3-662-61651-2_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Reinforcement Learning"

1

Yang, Kun, Chengshuai Shi, and Cong Shen. "Teaching Reinforcement Learning Agents via Reinforcement Learning." In 2023 57th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2023. http://dx.doi.org/10.1109/ciss56502.2023.10089695.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Doshi, Finale, Joelle Pineau, and Nicholas Roy. "Reinforcement learning with limited reinforcement." In the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390189.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Li, Zhiyi. "Reinforcement Learning." In SIGCSE '19: The 50th ACM Technical Symposium on Computer Science Education. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3287324.3293703.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shen, Shitian, and Min Chi. "Reinforcement Learning." In UMAP '16: User Modeling, Adaptation and Personalization Conference. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2930238.2930247.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kuroe, Yasuaki, and Kenya Takeuchi. "Sophisticated Swarm Reinforcement Learning by Incorporating Inverse Reinforcement Learning." In 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2023. http://dx.doi.org/10.1109/smc53992.2023.10394525.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Lyu, Le, Yang Shen, and Sicheng Zhang. "The Advance of Reinforcement Learning and Deep Reinforcement Learning." In 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA). IEEE, 2022. http://dx.doi.org/10.1109/eebda53927.2022.9744760.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Epshteyn, Arkady, Adam Vogel, and Gerald DeJong. "Active reinforcement learning." In the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Epshteyn, Arkady, and Gerald DeJong. "Qualitative reinforcement learning." In the 23rd international conference. New York, New York, USA: ACM Press, 2006. http://dx.doi.org/10.1145/1143844.1143883.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Vargas, Danilo Vasconcellos. "Evolutionary reinforcement learning." In GECCO '18: Genetic and Evolutionary Computation Conference. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3205651.3207865.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Langford, John. "Contextual reinforcement learning." In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8257902.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Reinforcement Learning"

1

Singh, Satinder, Andrew G. Barto, and Nuttapong Chentanez. Intrinsically Motivated Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, January 2005. http://dx.doi.org/10.21236/ada440280.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ghavamzadeh, Mohammad, and Sridhar Mahadevan. Hierarchical Multiagent Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, January 2004. http://dx.doi.org/10.21236/ada440418.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Harmon, Mance E., and Stephanie S. Harmon. Reinforcement Learning: A Tutorial. Fort Belvoir, VA: Defense Technical Information Center, January 1997. http://dx.doi.org/10.21236/ada323194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Tadepalli, Prasad, and Alan Fern. Partial Planning Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, August 2012. http://dx.doi.org/10.21236/ada574717.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ghavamzadeh, Mohammad, and Sridhar Mahadevan. Hierarchical Average Reward Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, June 2003. http://dx.doi.org/10.21236/ada445728.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Johnson, Daniel W. Drive-Reinforcement Learning System Applications. Fort Belvoir, VA: Defense Technical Information Center, July 1992. http://dx.doi.org/10.21236/ada264514.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Cleland, Andrew. Bounding Box Improvement With Reinforcement Learning. Portland State University Library, January 2000. http://dx.doi.org/10.15760/etd.6322.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Li, Jiajie. Learning Financial Investment Strategies using Reinforcement Learning and 'Chan theory'. Ames (Iowa): Iowa State University, August 2022. http://dx.doi.org/10.31274/cc-20240624-946.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Baird, III, Klopf Leemon C., and A. H. Reinforcement Learning With High-Dimensional, Continuous Actions. Fort Belvoir, VA: Defense Technical Information Center, November 1993. http://dx.doi.org/10.21236/ada280844.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Obert, James, and Angie Shia. Optimizing Dynamic Timing Analysis with Reinforcement Learning. Office of Scientific and Technical Information (OSTI), November 2019. http://dx.doi.org/10.2172/1573933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography