Academic literature on the topic 'RL ALGORITHMS'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'RL ALGORITHMS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "RL ALGORITHMS"

1

Lahande, Prathamesh, Parag Kaveri, and Jatinderkumar Saini. "Reinforcement Learning for Reducing the Interruptions and Increasing Fault Tolerance in the Cloud Environment." Informatics 10, no. 3 (August 2, 2023): 64. http://dx.doi.org/10.3390/informatics10030064.

Full text
Abstract:
Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason for this is that these existing algorithms need an intelligence mechanism to enhance their abilities. To provide an intelligence mechanism to improve the resource-scheduling process and provision the fault-tolerance mechanism, an algorithm named reinforcement learning-shortest job first (RL-SJF) has been implemented by integrating the RL technique with the existing SJF algorithm. An experiment was conducted in a simulation platform to compare the working of RL-SJF with SJF, and challenging tasks were computed in multiple scenarios. The experimental results convey that the RL-SJF algorithm enhances the resource-scheduling process by improving the aggregate cost by 14.88% compared to the SJF algorithm. Additionally, the RL-SJF algorithm provided a fault-tolerance mechanism by computing 55.52% of the total tasks compared to 11.11% of the SJF algorithm. Thus, the RL-SJF algorithm improves the overall cloud performance and provides the ideal quality of service (QoS).
APA, Harvard, Vancouver, ISO, and other styles
2

Trella, Anna L., Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez, and Susan A. Murphy. "Designing Reinforcement Learning Algorithms for Digital Interventions: Pre-Implementation Guidelines." Algorithms 15, no. 8 (July 22, 2022): 255. http://dx.doi.org/10.3390/a15080255.

Full text
Abstract:
Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (predictability, computability, stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning to the design of RL algorithms for the digital interventions setting. Furthermore, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We show how we used the PCS framework to design an RL algorithm for Oralytics, a mobile health study aiming to improve users’ tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.
APA, Harvard, Vancouver, ISO, and other styles
3

Rodríguez Sánchez, Francisco, Ildeberto Santos-Ruiz, Joaquín Domínguez-Zenteno, and Francisco Ronay López-Estrada. "Control Applications Using Reinforcement Learning: An Overview." Memorias del Congreso Nacional de Control Automático 5, no. 1 (October 17, 2022): 67–72. http://dx.doi.org/10.58571/cnca.amca.2022.019.

Full text
Abstract:
This article presents the general formulation and terminology of reinforcement learning (RL) from the perspective of Bellman’s equations based on a reward function, its learning methods and algorithms. The important key in RL is the calculation of value-state and value state-action functions, useful to find, compare and improve policies for learning agent through different methods based on values and policies such as Q-learning. The deep deterministic policy gradient (DDPG) learning algorithm based on an actor-critic structure is also described as one of the ways of training the RL agent. RL algorithms can be used to design closed loop controllers. Through simulation, using the DDPG algorithm, an example of the application of the inverted pendulum is proposed in simulation, demonstrating that the training is carried out in a reasonable time, showing the role and importance of RL algorithms, like tools that combined with control can address this type of problems.
APA, Harvard, Vancouver, ISO, and other styles
4

Abbass, Mahmoud Abdelkader Bashery, and Hyun-Soo Kang. "Drone Elevation Control Based on Python-Unity Integrated Framework for Reinforcement Learning Applications." Drones 7, no. 4 (March 24, 2023): 225. http://dx.doi.org/10.3390/drones7040225.

Full text
Abstract:
Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in the online training process. In addition, the RL platform tools (e.g., Python OpenAI’s Gym, Unity ML-Agents, PyBullet, DART, MoJoCo, RaiSim, Isaac, and AirSim), that are required to reduce the real-world challenges, suffer from drawbacks (e.g., the limited number of examples and applications, and difficulties in implementation of the RL algorithms, due to difficulties with the programing language). This paper presents an integrated RL framework, based on Python–Unity interaction, to demonstrate the ability to create a new RL platform tool, based on making a stable user datagram protocol (UDP) communication between the RL agent algorithm (developed using the Python programing language as a server), and the simulation environment (created using the Unity simulation software as a client). This Python–Unity integration process, increases the advantage of the overall RL platform (i.e., flexibility, scalability, and robustness), with the ability to create different environment specifications. The challenge of RL algorithms’ implementation and development is also achieved. The proposed framework is validated by applying two popular deep RL algorithms (i.e., Vanilla Policy Gradient (VPG) and Actor-Critic (A2C)), on an elevation control challenge for a quadcopter drone. The validation results for these experimental tests, prove the innovation of the proposed framework, to be used in RL applications, because both implemented algorithms achieve high stability, by achieving convergence to the required performance through the semi-online training process.
APA, Harvard, Vancouver, ISO, and other styles
5

Mann, Timothy, and Yoonsuck Choe. "Scaling Up Reinforcement Learning through Targeted Exploration." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 435–40. http://dx.doi.org/10.1609/aaai.v25i1.7929.

Full text
Abstract:
Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelope ξ. When ξ equals the total state space, STAR-MAX behaves identically to R-MAX. When ξ is a subset of the state space, to keep exploration within ξ, a recovery rule β is needed. We compared existing algorithms with our algorithm employing various exploration envelopes. With an appropriate choice of ξ, STAR-MAX scales far better than existing RL algorithms as the number of states increases. A possible drawback of our algorithm is its dependence on a good choice of ξ and β. However, we show that an effective recovery rule β can be learned on-line and ξ can be learned from demonstrations. We also find that even randomly sampled exploration envelopes can improve cumulative rewards compared to R-MAX. We expect these results to lead to more efficient methods for RL in large-scale problems.
APA, Harvard, Vancouver, ISO, and other styles
6

Cheng, Richard, Gábor Orosz, Richard M. Murray, and Joel W. Burdick. "End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3387–95. http://dx.doi.org/10.1609/aaai.v33i01.33013387.

Full text
Abstract:
Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.
APA, Harvard, Vancouver, ISO, and other styles
7

Kirsch, Louis, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh, and Yutian Chen. "Introducing Symmetries to Black Box Meta Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7202–10. http://dx.doi.org/10.1609/aaai.v36i7.20681.

Full text
Abstract:
Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform compared to human-engineered RL algorithms in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems. We hypothesise that these symmetries can play an important role in meta-generalisation. Building off recent work in black-box supervised meta learning, we develop a black-box meta RL system that exhibits these same symmetries. We show through careful experimentation that incorporating these symmetries can lead to algorithms with a greater ability to generalise to unseen action & observation spaces, tasks, and environments.
APA, Harvard, Vancouver, ISO, and other styles
8

Kim, Hyun-Su, and Uksun Kim. "Development of a Control Algorithm for a Semi-Active Mid-Story Isolation System Using Reinforcement Learning." Applied Sciences 13, no. 4 (February 4, 2023): 2053. http://dx.doi.org/10.3390/app13042053.

Full text
Abstract:
The semi-active control system is widely used to reduce the seismic response of building structures. Its control performance mainly depends on the applied control algorithms. Various semi-active control algorithms have been developed to date. Recently, machine learning has been applied to various engineering fields and provided successful results. Because reinforcement learning (RL) has shown good performance for real-time decision-making problems, structural control engineers have become interested in RL. In this study, RL was applied to the development of a semi-active control algorithm. Among various RL methods, a Deep Q-network (DQN) was selected because of its successful application to many control problems. A sample building structure was constructed by using a semi-active mid-story isolation system (SMIS) with a magnetorheological damper. Artificial ground motions were generated for numerical simulation. In this study, the sample building structure and seismic excitation were used to make the RL environment. The reward of RL was designed to reduce the peak story drift and the isolation story drift. Skyhook and groundhook control algorithms were applied for comparative study. Based on numerical results, this paper shows that the proposed control algorithm can effectively reduce the seismic responses of building structures with a SMIS.
APA, Harvard, Vancouver, ISO, and other styles
9

Prakash, Kritika, Fiza Husain, Praveen Paruchuri, and Sujit Gujar. "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 8009–16. http://dx.doi.org/10.1609/aaai.v36i7.20772.

Full text
Abstract:
Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems, and more, optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep-Q Networks, and Vanilla Proximal Policy Optimization. We propose a new Privacy-Aware Inverse RL analysis framework (PRIL) that involves performing reward reconstruction as an adversarial attack on private policies that the agents may deploy. For this, we introduce the reward reconstruction attack, wherein we seek to reconstruct the original reward from a privacy-preserving policy using the Inverse RL algorithm. An adversary must do poorly at reconstructing the original reward function if the agent uses a tightly private policy. Using this framework, we empirically test the effectiveness of the privacy guarantee offered by the private algorithms on instances of the FrozenLake domain of varying complexities. Based on the analysis performed, we infer a gap between the current standard of privacy offered and the standard of privacy needed to protect reward functions in RL. We do so by quantifying the extent to which each private policy protects the reward function by measuring distances between the original and reconstructed rewards.
APA, Harvard, Vancouver, ISO, and other styles
10

Niazi, Abdolkarim, Norizah Redzuan, Raja Ishak Raja Hamzah, and Sara Esfandiari. "Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making." Advanced Materials Research 566 (September 2012): 572–79. http://dx.doi.org/10.4028/www.scientific.net/amr.566.572.

Full text
Abstract:
In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "RL ALGORITHMS"

1

Marcus, Elwin. "Simulating market maker behaviour using Deep Reinforcement Learning to understand market microstructure." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240682.

Full text
Abstract:
Market microstructure studies the process of exchanging assets underexplicit trading rules. With algorithmic trading and high-frequencytrading, modern financial markets have seen profound changes in marketmicrostructure in the last 5 to 10 years. As a result, previously establishedmethods in the field of market microstructure becomes oftenfaulty or insufficient. Machine learning and, in particular, reinforcementlearning has become more ubiquitous in both finance and otherfields today with applications in trading and optimal execution. This thesisuses reinforcement learning to understand market microstructureby simulating a stock market based on NASDAQ Nordics and trainingmarket maker agents on this stock market. Simulations are run on both a dealer market and a limit orderbook marketdifferentiating it from previous studies. Using DQN and PPO algorithmson these simulated environments, where stochastic optimal controltheory has been mainly used before. The market maker agents successfullyreproduce stylized facts in historical trade data from each simulation,such as mean reverting prices and absence of linear autocorrelationsin price changes as well as beating random policies employed on thesemarkets with a positive profit & loss of maximum 200%. Other tradingdynamics in real-world markets have also been exhibited via theagents interactions, mainly: bid-ask spread clustering, optimal inventorymanagement, declining spreads and independence of inventory and spreads, indicating that using reinforcement learning with PPO and DQN arerelevant choices when modelling market microstructure.
Marknadens mikrostruktur studerar hur utbytet av finansiella tillgångar sker enligt explicita regler. Algoritmisk och högfrekvenshandel har förändrat moderna finansmarknaders strukturer under de senaste 5 till 10 åren. Detta har även påverkat pålitligheten hos tidigare använda metoder från exempelvis ekonometri för att studera marknadens mikrostruktur. Maskininlärning och Reinforcement Learning har blivit mer populära, med många olika användningsområden både inom finans och andra fält. Inom finansfältet har dessa typer av metoder använts främst inom handel och optimal exekvering av ordrar. I denna uppsats kombineras både Reinforcement Learning och marknadens mikrostruktur, för att simulera en aktiemarknad baserad på NASDAQ i Norden. Där tränas market maker - agenter via Reinforcement Learning med målet att förstå marknadens mikrostruktur som uppstår via agenternas interaktioner. I denna uppsats utvärderas och testas agenterna på en dealer – marknad tillsammans med en limit - orderbok. Vilket särskiljer denna studie tillsammans med de två algoritmerna DQN och PPO från tidigare studier. Främst har stokastisk optimering använts för liknande problem i tidigare studier. Agenterna lyckas framgångsrikt med att återskapa egenskaper hos finansiella tidsserier som återgång till medelvärdet och avsaknad av linjär autokorrelation. Agenterna lyckas också med att vinna över slumpmässiga strategier, med maximal vinst på 200%. Slutgiltigen lyckas även agenterna med att visa annan handelsdynamik som förväntas ske på en verklig marknad. Huvudsakligen: kluster av spreads, optimal hantering av aktielager och en minskning av spreads under simuleringarna. Detta visar att Reinforcement Learning med PPO eller DQN är relevanta val vid modellering av marknadens mikrostruktur.
APA, Harvard, Vancouver, ISO, and other styles
2

ALI, FAIZ MOHAMMAD. "CART POLE SYSTEM ANALYSIS AND CONTROL USING MACHINE LEARNING ALGORITHMS." Thesis, 2022. http://dspace.dtu.ac.in:8080/jspui/handle/repository/19298.

Full text
Abstract:
The cart and pole system balancing is a classical benchmark problem in control theory which is also referred as inverted pendulum. It is a prototype laboratory model of an unstable mechanical system. It is mainly used to model the control problems of rockets and missiles in the initial stages of their launch. This system represents an unstable system because an external force is required to keep the pendulum in vertically upright position when cart moves on horizontal track. Designing optimal controllers for the Cart and pole system is a challenging and complex problem as it is an inherently nonlinear system. The principal advantage of reinforcement learning (RL) is its ability to learn from the interaction with the environment and provide an optimal control strategy. In this project, RL is explored in the context of control of the benchmark cart-pole dynamical system. RL algorithms such as Q-Learning, SARSA, and value-function approximation applied to Q-Learning are implemented in this context. By using a fixed Force value of +10N or -10N, decided by a policy that maximizes the approximate value function, the agent achieves optimal control of the system.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "RL ALGORITHMS"

1

Ahlawat, Samit. "Recent RL Algorithms." In Reinforcement Learning for Finance, 349–402. Berkeley, CA: Apress, 2022. http://dx.doi.org/10.1007/978-1-4842-8835-1_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nandy, Abhishek, and Manisha Biswas. "RL Theory and Algorithms." In Reinforcement Learning, 19–69. Berkeley, CA: Apress, 2017. http://dx.doi.org/10.1007/978-1-4842-3285-9_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hahn, Ernst Moritz, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. "Mungojerrie: Linear-Time Objectives in Model-Free Reinforcement Learning." In Tools and Algorithms for the Construction and Analysis of Systems, 527–45. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-30823-9_27.

Full text
Abstract:
AbstractMungojerrie is an extensible tool that provides a framework to translate linear-time objectives into reward for reinforcement learning (RL). The tool provides convergent RL algorithms for stochastic games, reference implementations of existing reward translations for $$\omega $$ ω -regular objectives, and an internal probabilistic model checker for $$\omega $$ ω -regular objectives. This functionality is modular and operates on shared data structures, which enables fast development of new translation techniques. Mungojerrie supports finite models specified in PRISM and $$\omega $$ ω -automata specified in the HOA format, with an integrated command line interface to external linear temporal logic translators. Mungojerrie is distributed with a set of benchmarks for $$\omega $$ ω -regular objectives in RL.
APA, Harvard, Vancouver, ISO, and other styles
4

Ramponi, Giorgia. "Learning in the Presence of Multiple Agents." In Special Topics in Information Technology, 93–103. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-15374-7_8.

Full text
Abstract:
AbstractReinforcement Learning (RL) has emerged as a powerful tool to solve sequential decision-making problems, where a learning agent interacts with an unknown environment in order to maximize its rewards. Although most RL real-world applications involve multiple agents, the Multi-Agent Reinforcement Learning (MARL) framework is still poorly understood from a theoretical point of view. In this manuscript, we take a step toward solving this problem, providing theoretically sound algorithms for three RL sub-problems with multiple agents: Inverse Reinforcement Learning (IRL), online learning in MARL, and policy optimization in MARL. We start by considering the IRL problem, providing novel algorithms in two different settings: the first considers how to recover and cluster the intentions of a set of agents given demonstrations of near-optimal behavior; the second aims at inferring the reward function optimized by an agent while observing its actual learning process. Then, we consider online learning in MARL. We showed how the presence of other agents can increase the hardness of the problem while proposing statistically efficient algorithms in two settings: Non-cooperative Configurable Markov Decision Processes and Turn-based Markov Games. As the third sub-problem, we study MARL from an optimization viewpoint, showing the difficulties that arise from multiple function optimization problems and providing a novel algorithm for this scenario.
APA, Harvard, Vancouver, ISO, and other styles
5

Metelli, Alberto Maria. "Configurable Environments in Reinforcement Learning: An Overview." In Special Topics in Information Technology, 101–13. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-85918-3_9.

Full text
Abstract:
AbstractReinforcement Learning (RL) has emerged as an effective approach to address a variety of complex control tasks. In a typical RL problem, an agent interacts with the environment by perceiving observations and performing actions, with the ultimate goal of maximizing the cumulative reward. In the traditional formulation, the environment is assumed to be a fixed entity that cannot be externally controlled. However, there exist several real-world scenarios in which the environment offers the opportunity to configure some of its parameters, with diverse effects on the agent’s learning process. In this contribution, we provide an overview of the main aspects of environment configurability. We start by introducing the formalism of the Configurable Markov Decision Processes (Conf-MDPs) and we illustrate the solutions concepts. Then, we revise the algorithms for solving the learning problem in Conf-MDPs. Finally, we present two applications of Conf-MDPs: policy space identification and control frequency adaptation.
APA, Harvard, Vancouver, ISO, and other styles
6

Gros, Timo P., Holger Hermanns, Jörg Hoffmann, Michaela Klauck, Maximilian A. Köhl, and Verena Wolf. "MoGym: Using Formal Models for Training and Verifying Decision-making Agents." In Computer Aided Verification, 430–43. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13188-2_21.

Full text
Abstract:
AbstractMoGym, is an integrated toolbox enabling the training and verification of machine-learned decision-making agents based on formal models, for the purpose of sound use in the real world. Given a formal representation of a decision-making problem in the JANI format and a reach-avoid objective, MoGym (a) enables training a decision-making agent with respect to that objective directly on the model using reinforcement learning (RL) techniques, and (b) it supports rigorous assessment of the quality of the induced decision-making agent by means of deep statistical model checking (DSMC). MoGym implements the standard interface for training environments established by OpenAI Gym, thereby connecting to the vast body of existing work in the RL community. In return, it makes accessible the large set of existing JANI model checking benchmarks to machine learning research. It thereby contributes an efficient feedback mechanism for improving in particular reinforcement learning algorithms. The connective part is implemented on top of Momba. For the DSMC quality assurance of the learned decision-making agents, a variant of the statistical model checker modes of the Modest Toolset is leveraged, which has been extended by two new resolution strategies for non-determinism when encountered during statistical evaluation.
APA, Harvard, Vancouver, ISO, and other styles
7

Du, Huaiyu, and Rafał Jóźwiak. "Representation of Observations in Reinforcement Learning for Playing Arcade Fighting Game." In Digital Interaction and Machine Intelligence, 45–55. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37649-8_5.

Full text
Abstract:
AbstractReinforcement learning (RL) is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. Reinforcement learning algorithms have become very popular in simple computer games and games like chess and GO. However, playing classical arcade fighting games would be challenging because of the complexity of the command system (the character makes moves according to the sequence of input) and combo system. In this paper, a creation of a game environment of The King of Fighters ’97 (KOF ’97), which implements the open gym env interface, is described. Based on the characteristics of the game, an innovative approach to represent the observations from the last few steps has been proposed, which guarantees the preservation of Markov’s property. The observations are coded using the “one-hot encoding” technique to form a binary vector, while the sequence of stacked vectors from successive steps creates a binary image. This image encodes the character’s input and behavioural pattern, which are then retrieved and recognized by the CNN network. A network structure based on the Advantage Actor-Critic network was proposed. In the experimental verification, the RL agent performing basic combos and complex moves (including the so-called “desperation moves”) was able to defeat characters using the highest level of AI built into the game.
APA, Harvard, Vancouver, ISO, and other styles
8

Bugaenko, Andrey A. "Replacing the Reinforcement Learning (RL) to the Auto Reinforcement Learning (AutoRL) Algorithms to Find the Optimal Structure of Business Processes in the Bank." In Software Engineering Application in Informatics, 15–22. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-90318-3_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Dasong, and Roland Snooks. "Artificial Intuitions of Generative Design: An Approach Based on Reinforcement Learning." In Proceedings of the 2020 DigitalFUTURES, 189–98. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-33-4400-6_18.

Full text
Abstract:
AbstractThis paper proposes a Reinforcement Learning (RL) based design approach that augments existing algorithmic generative processes through the emergence of a form of artificial design intuition. The research presented in the paper is embedded within a highly speculative research project, Artificial Agency, exploring the operation of Machine Learning (ML) in generative design and digital fabrication. After describing the inherent limitations of contemporary generative design processes, the paper compares the three fundamental types of machine learning frameworks in terms of their characteristics and potential impact on generative design. A theoretical framework is defined to demonstrate the methodology of integrating RL with existing generative design procedures, which is further explained with a Random Walk based experimental design example. The paper includes detailed RL definitions as well as critical reflections on its impact and the effects of its implementation. The proposed artificial intuition within this generative approach is currently being further developed through a series of ongoing and proposed research trajectories noted in the conclusion. The ambition of this research is to deepen the integration of intention with machine learning in generative design.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Sizhe, Haitao Wang, Jian Wen, and Hejun Wu. "A Deep RL Algorithm for Location Optimization of Regional Express Distribution Center Using IoT Data." In Lecture Notes in Electrical Engineering, 377–84. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-0416-7_38.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "RL ALGORITHMS"

1

Simão, Thiago D. "Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/919.

Full text
Abstract:
Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi'. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi' is better than the performance of pi given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.
APA, Harvard, Vancouver, ISO, and other styles
2

Chrabąszcz, Patryk, Ilya Loshchilov, and Frank Hutter. "Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/197.

Full text
Abstract:
Evolution Strategies (ES) have recently been demonstrated to be a viable alternative to reinforcement learning (RL) algorithms on a set of challenging deep learning problems, including Atari games and MuJoCo humanoid locomotion benchmarks. While the ES algorithms in that work belonged to the specialized class of natural evolution strategies (which resemble approximate gradient RL algorithms, such as REINFORCE), we demonstrate that even a very basic canonical ES algorithm can achieve the same or even better performance. This success of a basic ES algorithm suggests that the state-of-the-art can be advanced further by integrating the many advances made in the field of ES in the last decades.We also demonstrate that ES algorithms have very different performance characteristics than traditional RL algorithms: on some games, they learn to exploit the environment and perform much better while on others they can get stuck in suboptimal local minima. Combining their strengths and weaknesses with those of traditional RL algorithms is therefore likely to lead to new advances in the state-of-the-art for solving RL problems.
APA, Harvard, Vancouver, ISO, and other styles
3

Arusoaie, Andrei, David Nowak, Vlad Rusu, and Dorel Lucanu. "A Certified Procedure for RL Verification." In 2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE, 2017. http://dx.doi.org/10.1109/synasc.2017.00031.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Gajane, Pratik, Peter Auer, and Ronald Ortner. "Autonomous Exploration for Navigating in MDPs Using Blackbox RL Algorithms." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/413.

Full text
Abstract:
We consider the problem of navigating in a Markov decision process where extrinsic rewards are either absent or ignored. In this setting, the objective is to learn policies to reach all the states that are reachable within a given number of steps (in expectation) from a starting state. We introduce a novel meta-algorithm which can use any online reinforcement learning algorithm (with appropriate regret guarantees) as a black-box. Our algorithm demonstrates a method for transforming the output of online algorithms to a batch setting. We prove an upper bound on the sample complexity of our algorithm in terms of the regret bound of the used black-box RL algorithm. Furthermore, we provide experimental results to validate the effectiveness of our algorithm and correctness of our theoretical results.
APA, Harvard, Vancouver, ISO, and other styles
5

Lin, Zichuan, Tianqi Zhao, Guangwen Yang, and Lintao Zhang. "Episodic Memory Deep Q-Networks." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/337.

Full text
Abstract:
Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interactions with the environments to obtain satisfactory performances. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method leads to better sample efficiency and is more likely to find good policy. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms.
APA, Harvard, Vancouver, ISO, and other styles
6

Martin, Jarryd, Suraj Narayanan S., Tom Everitt, and Marcus Hutter. "Count-Based Exploration in Feature Space for Reinforcement Learning." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/344.

Full text
Abstract:
We introduce a new count-based optimistic exploration algorithm for Reinforcement Learning (RL) that is feasible in environments with high-dimensional state-action spaces. The success of RL algorithms in these domains depends crucially on generalisation from limited training experience. Function approximation techniques enable RL agents to generalise in order to estimate the value of unvisited states, but at present few methods enable generalisation regarding uncertainty. This has prevented the combination of scalable RL algorithms with efficient exploration strategies that drive the agent to reduce its uncertainty. We present a new method for computing a generalised state visit-count, which allows the agent to estimate the uncertainty associated with any state. Our \phi-pseudocount achieves generalisation by exploiting same feature representation of the state space that is used for value function approximation. States that have less frequently observed features are deemed more uncertain. The \phi-Exploration-Bonus algorithm rewards the agent for exploring in feature space rather than in the untransformed state space. The method is simpler and less computationally expensive than some previous proposals, and achieves near state-of-the-art results on high-dimensional RL benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
7

Da Silva, Felipe Leno, and Anna Helena Reali Costa. "Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning." In Concurso de Teses e Dissertações da SBC. Sociedade Brasileira de Computação - SBC, 2020. http://dx.doi.org/10.5753/ctd.2020.11360.

Full text
Abstract:
Reinforcement Learning (RL) is a powerful tool that has been used to solve increasingly complex tasks. RL operates through repeated interactions of the learning agent with the environment, via trial and error. However, this learning process is extremely slow, requiring many interactions. In this thesis, we leverage previous knowledge so as to accelerate learning in multiagent RL problems. We propose knowledge reuse both from previous tasks and from other agents. Several flexible methods are introduced so that each of these two types of knowledge reuse is possible. This thesis adds important steps towards more flexible and broadly applicable multiagent transfer learning methods.
APA, Harvard, Vancouver, ISO, and other styles
8

Gao, Yang, Christian M. Meyer, Mohsen Mesgar, and Iryna Gurevych. "Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/326.

Full text
Abstract:
Document summarisation can be formulated as a sequential decision-making problem, which can be solved by Reinforcement Learning (RL) algorithms. The predominant RL paradigm for summarisation learns a cross-input policy, which requires considerable time, data and parameter tuning due to the huge search spaces and the delayed rewards. Learning input-specific RL policies is a more efficient alternative, but so far depends on handcrafted rewards, which are difficult to design and yield poor performance. We propose RELIS, a novel RL paradigm that learns a reward function with Learning-to-Rank (L2R) algorithms at training time and uses this reward function to train an input-specific RL policy at test time. We prove that RELIS guarantees to generate near-optimal summaries with appropriate L2R and RL algorithms. Empirically, we evaluate our approach on extractive multi-document summarisation. We show that RELIS reduces the training time by two orders of magnitude compared to the state-of-the-art models while performing on par with them.
APA, Harvard, Vancouver, ISO, and other styles
9

Zhao, Enmin, Shihong Deng, Yifan Zang, Yongxin Kang, Kai Li, and Junliang Xing. "Potential Driven Reinforcement Learning for Hard Exploration Tasks." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/290.

Full text
Abstract:
Experience replay plays a crucial role in Reinforcement Learning (RL), enabling the agent to remember and reuse experience from the past. Most previous methods sample experience transitions using simple heuristics like uniformly sampling or prioritizing those good ones. Since humans can learn from both good and bad experiences, more sophisticated experience replay algorithms need to be developed. Inspired by the potential energy in physics, this work introduces the artificial potential field into experience replay and develops Potentialized Experience Replay (PotER) as a new and effective sampling algorithm for RL in hard exploration tasks with sparse rewards. PotER defines a potential energy function for each state in experience replay and helps the agent to learn from both good and bad experiences using intrinsic state supervision. PotER can be combined with different RL algorithms as well as the self-imitation learning algorithm. Experimental analyses and comparisons on multiple challenging hard exploration environments have verified its effectiveness and efficiency.
APA, Harvard, Vancouver, ISO, and other styles
10

Sarafian, Elad, Aviv Tamar, and Sarit Kraus. "Constrained Policy Improvement for Efficient Reinforcement Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/396.

Full text
Abstract:
We propose a policy improvement algorithm for Reinforcement Learning (RL) termed Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the Q-value from finite experience data. Greedy policies or even constrained policy optimization algorithms that ignore these errors may suffer from an improvement penalty (i.e., a policy impairment). To reduce the penalty, the idea of RBI is to attenuate rapid policy changes to actions that were rarely sampled. This approach is shown to avoid catastrophic performance degradation and reduce regret when learning from a batch of transition samples. Through a two-armed bandit example, we show that it also increases data efficiency when the optimal action has a high variance. We evaluate RBI in two tasks in the Atari Learning Environment: (1) learning from observations of multiple behavior policies and (2) iterative RL. Our results demonstrate the advantage of RBI over greedy policies and other constrained policy optimization algorithms both in learning from observations and in RL tasks.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "RL ALGORITHMS"

1

A Decision-Making Method for Connected Autonomous Driving Based on Reinforcement Learning. SAE International, December 2020. http://dx.doi.org/10.4271/2020-01-5154.

Full text
Abstract:
At present, with the development of Intelligent Vehicle Infrastructure Cooperative Systems (IVICS), the decision-making for automated vehicle based on connected environment conditions has attracted more attentions. Reliability, efficiency and generalization performance are the basic requirements for the vehicle decision-making system. Therefore, this paper proposed a decision-making method for connected autonomous driving based on Wasserstein Generative Adversarial Nets-Deep Deterministic Policy Gradient (WGAIL-DDPG) algorithm. In which, the key components for reinforcement learning (RL) model, reward function, is designed from the aspect of vehicle serviceability, such as safety, ride comfort and handling stability. To reduce the complexity of the proposed model, an imitation learning strategy is introduced to improve the RL training process. Meanwhile, the model training strategy based on cloud computing effectively solves the problem of insufficient computing resources of the vehicle-mounted system. Test results show that the proposed method can improve the efficiency for RL training process with reliable decision making performance and reveals excellent generalization capability.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography