Se connecter

Bibliographies thématiques / RL ALGORITHMS / Articles de revues

Articles de revues sur le sujet « RL ALGORITHMS »

Pour voir les autres types de publications sur ce sujet consultez le lien suivant : RL ALGORITHMS.

Auteur : Grafiati

Publié le 11 septembre 2023

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres

Choisissez une source :

Consultez les 50 meilleurs articles de revues pour votre recherche sur le sujet « RL ALGORITHMS ».

À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.

Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.

Parcourez les articles de revues sur diverses disciplines et organisez correctement votre bibliographie.

1

Lahande, Prathamesh, Parag Kaveri et Jatinderkumar Saini. « Reinforcement Learning for Reducing the Interruptions and Increasing Fault Tolerance in the Cloud Environment ». Informatics 10, n^o 3 (2 août 2023) : 64. http://dx.doi.org/10.3390/informatics10030064.

Texte intégral

Résumé :

Cloud computing delivers robust computational services by processing tasks on its virtual machines (VMs) using resource-scheduling algorithms. The cloud’s existing algorithms provide limited results due to inappropriate resource scheduling. Additionally, these algorithms cannot process tasks generating faults while being computed. The primary reason for this is that these existing algorithms need an intelligence mechanism to enhance their abilities. To provide an intelligence mechanism to improve the resource-scheduling process and provision the fault-tolerance mechanism, an algorithm named reinforcement learning-shortest job first (RL-SJF) has been implemented by integrating the RL technique with the existing SJF algorithm. An experiment was conducted in a simulation platform to compare the working of RL-SJF with SJF, and challenging tasks were computed in multiple scenarios. The experimental results convey that the RL-SJF algorithm enhances the resource-scheduling process by improving the aggregate cost by 14.88% compared to the SJF algorithm. Additionally, the RL-SJF algorithm provided a fault-tolerance mechanism by computing 55.52% of the total tasks compared to 11.11% of the SJF algorithm. Thus, the RL-SJF algorithm improves the overall cloud performance and provides the ideal quality of service (QoS).

Styles APA, Harvard, Vancouver, ISO, etc.

2

Trella, Anna L., Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Finale Doshi-Velez et Susan A. Murphy. « Designing Reinforcement Learning Algorithms for Digital Interventions : Pre-Implementation Guidelines ». Algorithms 15, n^o 8 (22 juillet 2022) : 255. http://dx.doi.org/10.3390/a15080255.

Texte intégral

Résumé :

Online reinforcement learning (RL) algorithms are increasingly used to personalize digital interventions in the fields of mobile health and online education. Common challenges in designing and testing an RL algorithm in these settings include ensuring the RL algorithm can learn and run stably under real-time constraints, and accounting for the complexity of the environment, e.g., a lack of accurate mechanistic models for the user dynamics. To guide how one can tackle these challenges, we extend the PCS (predictability, computability, stability) framework, a data science framework that incorporates best practices from machine learning and statistics in supervised learning to the design of RL algorithms for the digital interventions setting. Furthermore, we provide guidelines on how to design simulation environments, a crucial tool for evaluating RL candidate algorithms using the PCS framework. We show how we used the PCS framework to design an RL algorithm for Oralytics, a mobile health study aiming to improve users’ tooth-brushing behaviors through the personalized delivery of intervention messages. Oralytics will go into the field in late 2022.

Styles APA, Harvard, Vancouver, ISO, etc.

3

Rodríguez Sánchez, Francisco, Ildeberto Santos-Ruiz, Joaquín Domínguez-Zenteno et Francisco Ronay López-Estrada. « Control Applications Using Reinforcement Learning : An Overview ». Memorias del Congreso Nacional de Control Automático 5, n^o 1 (17 octobre 2022) : 67–72. http://dx.doi.org/10.58571/cnca.amca.2022.019.

Texte intégral

Résumé :

This article presents the general formulation and terminology of reinforcement learning (RL) from the perspective of Bellman’s equations based on a reward function, its learning methods and algorithms. The important key in RL is the calculation of value-state and value state-action functions, useful to find, compare and improve policies for learning agent through different methods based on values and policies such as Q-learning. The deep deterministic policy gradient (DDPG) learning algorithm based on an actor-critic structure is also described as one of the ways of training the RL agent. RL algorithms can be used to design closed loop controllers. Through simulation, using the DDPG algorithm, an example of the application of the inverted pendulum is proposed in simulation, demonstrating that the training is carried out in a reasonable time, showing the role and importance of RL algorithms, like tools that combined with control can address this type of problems.

Styles APA, Harvard, Vancouver, ISO, etc.

4

Abbass, Mahmoud Abdelkader Bashery, et Hyun-Soo Kang. « Drone Elevation Control Based on Python-Unity Integrated Framework for Reinforcement Learning Applications ». Drones 7, n^o 4 (24 mars 2023) : 225. http://dx.doi.org/10.3390/drones7040225.

Texte intégral

Résumé :

Reinforcement learning (RL) applications require a huge effort to become established in real-world environments, due to the injury and break down risks during interactions between the RL agent and the environment, in the online training process. In addition, the RL platform tools (e.g., Python OpenAI’s Gym, Unity ML-Agents, PyBullet, DART, MoJoCo, RaiSim, Isaac, and AirSim), that are required to reduce the real-world challenges, suffer from drawbacks (e.g., the limited number of examples and applications, and difficulties in implementation of the RL algorithms, due to difficulties with the programing language). This paper presents an integrated RL framework, based on Python–Unity interaction, to demonstrate the ability to create a new RL platform tool, based on making a stable user datagram protocol (UDP) communication between the RL agent algorithm (developed using the Python programing language as a server), and the simulation environment (created using the Unity simulation software as a client). This Python–Unity integration process, increases the advantage of the overall RL platform (i.e., flexibility, scalability, and robustness), with the ability to create different environment specifications. The challenge of RL algorithms’ implementation and development is also achieved. The proposed framework is validated by applying two popular deep RL algorithms (i.e., Vanilla Policy Gradient (VPG) and Actor-Critic (A2C)), on an elevation control challenge for a quadcopter drone. The validation results for these experimental tests, prove the innovation of the proposed framework, to be used in RL applications, because both implemented algorithms achieve high stability, by achieving convergence to the required performance through the semi-online training process.

Styles APA, Harvard, Vancouver, ISO, etc.

5

Mann, Timothy, et Yoonsuck Choe. « Scaling Up Reinforcement Learning through Targeted Exploration ». Proceedings of the AAAI Conference on Artificial Intelligence 25, n^o 1 (4 août 2011) : 435–40. http://dx.doi.org/10.1609/aaai.v25i1.7929.

Texte intégral

Résumé :

Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelope ξ. When ξ equals the total state space, STAR-MAX behaves identically to R-MAX. When ξ is a subset of the state space, to keep exploration within ξ, a recovery rule β is needed. We compared existing algorithms with our algorithm employing various exploration envelopes. With an appropriate choice of ξ, STAR-MAX scales far better than existing RL algorithms as the number of states increases. A possible drawback of our algorithm is its dependence on a good choice of ξ and β. However, we show that an effective recovery rule β can be learned on-line and ξ can be learned from demonstrations. We also find that even randomly sampled exploration envelopes can improve cumulative rewards compared to R-MAX. We expect these results to lead to more efficient methods for RL in large-scale problems.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Cheng, Richard, Gábor Orosz, Richard M. Murray et Joel W. Burdick. « End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks ». Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 juillet 2019) : 3387–95. http://dx.doi.org/10.1609/aaai.v33i01.33013387.

Texte intégral

Résumé :

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

Styles APA, Harvard, Vancouver, ISO, etc.

7

Kirsch, Louis, Sebastian Flennerhag, Hado van Hasselt, Abram Friesen, Junhyuk Oh et Yutian Chen. « Introducing Symmetries to Black Box Meta Reinforcement Learning ». Proceedings of the AAAI Conference on Artificial Intelligence 36, n^o 7 (28 juin 2022) : 7202–10. http://dx.doi.org/10.1609/aaai.v36i7.20681.

Texte intégral

Résumé :

Meta reinforcement learning (RL) attempts to discover new RL algorithms automatically from environment interaction. In so-called black-box approaches, the policy and the learning algorithm are jointly represented by a single neural network. These methods are very flexible, but they tend to underperform compared to human-engineered RL algorithms in terms of generalisation to new, unseen environments. In this paper, we explore the role of symmetries in meta-generalisation. We show that a recent successful meta RL approach that meta-learns an objective for backpropagation-based learning exhibits certain symmetries (specifically the reuse of the learning rule, and invariance to input and output permutations) that are not present in typical black-box meta RL systems. We hypothesise that these symmetries can play an important role in meta-generalisation. Building off recent work in black-box supervised meta learning, we develop a black-box meta RL system that exhibits these same symmetries. We show through careful experimentation that incorporating these symmetries can lead to algorithms with a greater ability to generalise to unseen action & observation spaces, tasks, and environments.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Kim, Hyun-Su, et Uksun Kim. « Development of a Control Algorithm for a Semi-Active Mid-Story Isolation System Using Reinforcement Learning ». Applied Sciences 13, n^o 4 (4 février 2023) : 2053. http://dx.doi.org/10.3390/app13042053.

Texte intégral

Résumé :

The semi-active control system is widely used to reduce the seismic response of building structures. Its control performance mainly depends on the applied control algorithms. Various semi-active control algorithms have been developed to date. Recently, machine learning has been applied to various engineering fields and provided successful results. Because reinforcement learning (RL) has shown good performance for real-time decision-making problems, structural control engineers have become interested in RL. In this study, RL was applied to the development of a semi-active control algorithm. Among various RL methods, a Deep Q-network (DQN) was selected because of its successful application to many control problems. A sample building structure was constructed by using a semi-active mid-story isolation system (SMIS) with a magnetorheological damper. Artificial ground motions were generated for numerical simulation. In this study, the sample building structure and seismic excitation were used to make the RL environment. The reward of RL was designed to reduce the peak story drift and the isolation story drift. Skyhook and groundhook control algorithms were applied for comparative study. Based on numerical results, this paper shows that the proposed control algorithm can effectively reduce the seismic responses of building structures with a SMIS.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Prakash, Kritika, Fiza Husain, Praveen Paruchuri et Sujit Gujar. « How Private Is Your RL Policy ? An Inverse RL Based Analysis Framework ». Proceedings of the AAAI Conference on Artificial Intelligence 36, n^o 7 (28 juin 2022) : 8009–16. http://dx.doi.org/10.1609/aaai.v36i7.20772.

Texte intégral

Résumé :

Reinforcement Learning (RL) enables agents to learn how to perform various tasks from scratch. In domains like autonomous driving, recommendation systems, and more, optimal RL policies learned could cause a privacy breach if the policies memorize any part of the private reward. We study the set of existing differentially-private RL policies derived from various RL algorithms such as Value Iteration, Deep-Q Networks, and Vanilla Proximal Policy Optimization. We propose a new Privacy-Aware Inverse RL analysis framework (PRIL) that involves performing reward reconstruction as an adversarial attack on private policies that the agents may deploy. For this, we introduce the reward reconstruction attack, wherein we seek to reconstruct the original reward from a privacy-preserving policy using the Inverse RL algorithm. An adversary must do poorly at reconstructing the original reward function if the agent uses a tightly private policy. Using this framework, we empirically test the effectiveness of the privacy guarantee offered by the private algorithms on instances of the FrozenLake domain of varying complexities. Based on the analysis performed, we infer a gap between the current standard of privacy offered and the standard of privacy needed to protect reward functions in RL. We do so by quantifying the extent to which each private policy protects the reward function by measuring distances between the original and reconstructed rewards.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Niazi, Abdolkarim, Norizah Redzuan, Raja Ishak Raja Hamzah et Sara Esfandiari. « Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making ». Advanced Materials Research 566 (septembre 2012) : 572–79. http://dx.doi.org/10.4028/www.scientific.net/amr.566.572.

Texte intégral

Résumé :

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Styles APA, Harvard, Vancouver, ISO, etc.

11

Mu, Tong, Georgios Theocharous, David Arbour et Emma Brunskill. « Constraint Sampling Reinforcement Learning : Incorporating Expertise for Faster Learning ». Proceedings of the AAAI Conference on Artificial Intelligence 36, n^o 7 (28 juin 2022) : 7841–49. http://dx.doi.org/10.1609/aaai.v36i7.20753.

Texte intégral

Résumé :

Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy. It takes in multiple potential policy constraints to maintain robustness to misspecification of individual constraints while leveraging helpful ones to learn quickly. Given a base RL learning algorithm (ex. UCRL, DQN, Rainbow) we propose an upper confidence with elimination scheme that leverages the relationship between the constraints, and their observed performance, to adaptively switch among them. We instantiate our algorithm with DQN-type algorithms and UCRL as base algorithms, and evaluate our algorithm in four environments, including three simulators based on real data: recommendations, educational activity sequencing, and HIV treatment sequencing. In all cases, CSRL learns a good policy faster than baselines.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Kołota, Jakub, et Turhan Can Kargin. « Comparison of Various Reinforcement Learning Environments in the Context of Continuum Robot Control ». Applied Sciences 13, n^o 16 (11 août 2023) : 9153. http://dx.doi.org/10.3390/app13169153.

Texte intégral

Résumé :

Controlling flexible and continuously structured continuum robots is a challenging task in the field of robotics and control systems. This study explores the use of reinforcement learning (RL) algorithms in controlling a three-section planar continuum robot. The study aims to investigate the impact of various reward functions on the performance of the RL algorithm. The RL algorithm utilized in this study is the Deep Deterministic Policy Gradient (DDPG), which can be applied to both continuous-state and continuous-action problems. The study’s findings reveal that the design of the RL environment, including the selection of reward functions, significantly influences the performance of the RL algorithm. The study provides significant information on the design of RL environments for the control of continuum robots, which may be valuable to researchers and practitioners in the field of robotics and control systems.

Styles APA, Harvard, Vancouver, ISO, etc.

13

Jang, Sun-Ho, Woo-Jin Ahn, Yu-Jin Kim, Hyung-Gil Hong, Dong-Sung Pae et Myo-Taeg Lim. « Stable and Efficient Reinforcement Learning Method for Avoidance Driving of Unmanned Vehicles ». Electronics 12, n^o 18 (6 septembre 2023) : 3773. http://dx.doi.org/10.3390/electronics12183773.

Texte intégral

Résumé :

Reinforcement learning (RL) has demonstrated considerable potential in solving challenges across various domains, notably in autonomous driving. Nevertheless, implementing RL in autonomous driving comes with its own set of difficulties, such as the overestimation phenomenon, extensive learning time, and sparse reward problems. Although solutions like hindsight experience replay (HER) have been proposed to alleviate these issues, the direct utilization of RL in autonomous vehicles remains constrained due to the intricate fusion of information and the possibility of system failures during the learning process. In this paper, we present a novel RL-based autonomous driving system technology that combines obstacle-dependent Gaussian (ODG) RL, soft actor-critic (SAC), and meta-learning algorithms. Our approach addresses key issues in RL, including the overestimation phenomenon and sparse reward problems, by incorporating prior knowledge derived from the ODG algorithm. With these solutions in place, the ultimate aim of this work is to improve the performance of reinforcement learning and develop a swift, stable, and robust learning method for implementing autonomous driving systems that can effectively adapt to various environments and overcome the constraints of direct RL utilization in autonomous vehicles. We evaluated our proposed algorithm on official F1 circuits, using high-fidelity racing simulations with complex dynamics. The results demonstrate exceptional performance, with our method achieving up to 89% faster learning speed compared to existing algorithms in these environments.

Styles APA, Harvard, Vancouver, ISO, etc.

14

Peng, Zhiyong, Changlin Han, Yadong Liu et Zongtan Zhou. « Weighted Policy Constraints for Offline Reinforcement Learning ». Proceedings of the AAAI Conference on Artificial Intelligence 37, n^o 8 (26 juin 2023) : 9435–43. http://dx.doi.org/10.1609/aaai.v37i8.26130.

Texte intégral

Résumé :

Offline reinforcement learning (RL) aims to learn policy from the passively collected offline dataset. Applying existing RL methods on the static dataset straightforwardly will raise distribution shift, causing these unconstrained RL methods to fail. To cope with the distribution shift problem, a common practice in offline RL is to constrain the policy explicitly or implicitly close to behavioral policy. However, the available dataset usually contains sub-optimal or inferior actions, constraining the policy near all these actions will make the policy inevitably learn inferior behaviors, limiting the performance of the algorithm. Based on this observation, we propose a weighted policy constraints (wPC) method that only constrains the learned policy to desirable behaviors, making room for policy improvement on other parts. Our algorithm outperforms existing state-of-the-art offline RL algorithms on the D4RL offline gym datasets. Moreover, the proposed algorithm is simple to implement with few hyper-parameters, making the proposed wPC algorithm a robust offline RL method with low computational complexity.

Styles APA, Harvard, Vancouver, ISO, etc.

15

Tessler, Chen, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik et Shie Mannor. « Reinforcement Learning for Datacenter Congestion Control ». Proceedings of the AAAI Conference on Artificial Intelligence 36, n^o 11 (28 juin 2022) : 12615–21. http://dx.doi.org/10.1609/aaai.v36i11.21535.

Texte intégral

Résumé :

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, non-stationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that these challenges prevent standard RL algorithms from operating within this domain. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, show that our method exhibits improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

Styles APA, Harvard, Vancouver, ISO, etc.

16

JIANG, JU, MOHAMED S. KAMEL et LEI CHEN. « AGGREGATION OF MULTIPLE REINFORCEMENT LEARNING ALGORITHMS ». International Journal on Artificial Intelligence Tools 15, n^o 05 (octobre 2006) : 855–61. http://dx.doi.org/10.1142/s0218213006002990.

Texte intégral

Résumé :

Reinforcement learning (RL) has been successfully used in many fields. With the increasing complexity of environments and tasks, it is difficult for a single learning algorithm to cope with complicated problems with high performance. This paper proposes a new multiple learning architecture, "Aggregated Multiple Reinforcement Learning System (AMRLS)", which aggregates different RL algorithms in each learning step to make more appropriate sequential decisions than those made by individual learning algorithms. This architecture was tested on a Cart-Pole system. The presented simulation results confirm our prediction and reveal that aggregation not only provides robustness and fault tolerance ability, but also produces more smooth learning curves and needs fewer learning steps than individual learning algorithms.

Styles APA, Harvard, Vancouver, ISO, etc.

17

Chen, Feng, Chenghe Wang, Fuxiang Zhang, Hao Ding, Qiaoyong Zhong, Shiliang Pu et Zongzhang Zhang. « Towards Deployment-Efficient and Collision-Free Multi-Agent Path Finding (Student Abstract) ». Proceedings of the AAAI Conference on Artificial Intelligence 37, n^o 13 (26 juin 2023) : 16182–83. http://dx.doi.org/10.1609/aaai.v37i13.26951.

Texte intégral

Résumé :

Multi-agent pathfinding (MAPF) is essential to large-scale robotic coordination tasks. Planning-based algorithms show their advantages in collision avoidance while avoiding exponential growth in the number of agents. Reinforcement-learning (RL)-based algorithms can be deployed efficiently but cannot prevent collisions entirely due to the lack of hard constraints. This paper combines the merits of planning-based and RL-based MAPF methods to propose a deployment-efficient and collision-free MAPF algorithm. The experiments show the effectiveness of our approach.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Guo, Kun, et Qishan Zhang. « A Discrete Artificial Bee Colony Algorithm for the Reverse Logistics Location and Routing Problem ». International Journal of Information Technology & ; Decision Making 16, n^o 05 (septembre 2017) : 1339–57. http://dx.doi.org/10.1142/s0219622014500126.

Texte intégral

Résumé :

Reverse logistics (RL) emerges as a hot topic in both research and business with the increasing attention on the collection and recycling of the waste products. Since Location and Routing Problem (LRP) in RL is NP-complete, heuristic algorithms, especially those built upon swarm intelligence, are very popular in this research. In this paper, both Vehicle Routing Problem (RP) and Location Allocation Problem (LAP) of RL are considered as a whole. First, the features of LRP in RL are analyzed. Second, a mathematical model of the problem is developed. Then, a novel discrete artificial bee colony (ABC) algorithm with greedy adjustment is proposed. The experimental results show that the new algorithm can approach the optimal solutions efficiently and effectively.

Styles APA, Harvard, Vancouver, ISO, etc.

19

Padakandla, Sindhu. « A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments ». ACM Computing Surveys 54, n^o 6 (juillet 2021) : 1–25. http://dx.doi.org/10.1145/3459991.

Texte intégral

Résumé :

Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing, and robotics. The real-world complications arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlying assumption of stationary environment model is relaxed. This article provides a survey of RL methods developed for handling dynamically varying environment models. The goal of methods not limited by the stationarity assumption is to help autonomous agents adapt to varying operating conditions. This is possible either by minimizing the rewards lost during learning by RL agent or by finding a suitable policy for the RL agent that leads to efficient operation of the underlying system. A representative collection of these algorithms is discussed in detail in this work along with their categorization and their relative merits and demerits. Additionally, we also review works that are tailored to application domains. Finally, we discuss future enhancements for this field.

Styles APA, Harvard, Vancouver, ISO, etc.

20

Gaon, Maor, et Ronen Brafman. « Reinforcement Learning with Non-Markovian Rewards ». Proceedings of the AAAI Conference on Artificial Intelligence 34, n^o 04 (3 avril 2020) : 3980–87. http://dx.doi.org/10.1609/aaai.v34i04.5814.

Texte intégral

Résumé :

The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.

Styles APA, Harvard, Vancouver, ISO, etc.

21

Sun, Peiquan, Wengang Zhou et Houqiang Li. « Attentive Experience Replay ». Proceedings of the AAAI Conference on Artificial Intelligence 34, n^o 04 (3 avril 2020) : 5900–5907. http://dx.doi.org/10.1609/aaai.v34i04.6049.

Texte intégral

Résumé :

Experience replay (ER) has become an important component of deep reinforcement learning (RL) algorithms. ER enables RL algorithms to reuse past experiences for the update of current policy. By reusing a previous state for training, the RL agent would learn more accurate value estimation and better decision on that state. However, as the policy is continually updated, some states in past experiences become rarely visited, and optimization over these states might not improve the overall performance of current policy. To tackle this issue, we propose a new replay strategy to prioritize the transitions that contain states frequently visited by current policy. We introduce Attentive Experience Replay (AER), a novel experience replay algorithm that samples transitions according to the similarities between their states and the agent's state. We couple AER with different off-policy algorithms and demonstrate that AER makes consistent improvements on the suite of OpenAI gym tasks.

Styles APA, Harvard, Vancouver, ISO, etc.

22

Chen, Zaiwei. « A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms ». ACM SIGMETRICS Performance Evaluation Review 50, n^o 3 (30 décembre 2022) : 12–15. http://dx.doi.org/10.1145/3579342.3579346.

Texte intégral

Résumé :

Reinforcement learning (RL) is a paradigm where an agent learns to accomplish tasks by interacting with the environment, similar to how humans learn. RL is therefore viewed as a promising approach to achieve artificial intelligence, as evidenced by the remarkable empirical successes. However, many RL algorithms are theoretically not well-understood, especially in the setting where function approximation and off-policy sampling are employed. My thesis [1] aims at developing thorough theoretical understanding to the performance of various RL algorithms through finite-sample analysis. Since most of the RL algorithms are essentially stochastic approximation (SA) algorithms for solving variants of the Bellman equation, the first part of thesis is dedicated to the analysis of general SA involving a contraction operator, and under Markovian noise. We develop a Lyapunov approach where we construct a novel Lyapunov function called the generaled Moreau envelope. The results on SA enable us to establish finite-sample bounds of various RL algorithms in the tabular setting (cf. Part II of the thesis) and when using function approximation (cf. Part III of the thesis), which in turn provide theoretical insights to several important problems in the RL community, such as the efficiency of bootstrapping, the bias-variance trade-off in off-policy learning, and the stability of off-policy control. The main body of this document provides an overview of the contributions of my thesis.

Styles APA, Harvard, Vancouver, ISO, etc.

23

Yau, Kok-Lim Alvin, Geong-Sen Poh, Su Fong Chien et Hasan A. A. Al-Rawi. « Application of Reinforcement Learning in Cognitive Radio Networks : Models and Algorithms ». Scientific World Journal 2014 (2014) : 1–23. http://dx.doi.org/10.1155/2014/209810.

Texte intégral

Résumé :

Cognitive radio (CR) enables unlicensed users to exploit the underutilized spectrum in licensed spectrum whilst minimizing interference to licensed users. Reinforcement learning (RL), which is an artificial intelligence approach, has been applied to enable each unlicensed user to observe and carry out optimal actions for performance enhancement in a wide range of schemes in CR, such as dynamic channel selection and channel sensing. This paper presents new discussions of RL in the context of CR networks. It provides an extensive review on how most schemes have been approached using the traditional and enhanced RL algorithms through state, action, and reward representations. Examples of the enhancements on RL, which do not appear in the traditional RL approach, are rules and cooperative learning. This paper also reviews performance enhancements brought about by the RL algorithms and open issues. This paper aims to establish a foundation in order to spark new research interests in this area. Our discussion has been presented in a tutorial manner so that it is comprehensive to readers outside the specialty of RL and CR.

Styles APA, Harvard, Vancouver, ISO, etc.

24

Tessler, Chen, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik et Shie Mannor. « Reinforcement Learning for Datacenter Congestion Control ». ACM SIGMETRICS Performance Evaluation Review 49, n^o 2 (17 janvier 2022) : 43–46. http://dx.doi.org/10.1145/3512798.3512815.

Texte intégral

Résumé :

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, nonstationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, exhibit improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Jin, Zengwang, Menglu Ma, Shuting Zhang, Yanyan Hu, Yanning Zhang et Changyin Sun. « Secure State Estimation of Cyber-Physical System under Cyber Attacks : Q-Learning vs. SARSA ». Electronics 11, n^o 19 (1 octobre 2022) : 3161. http://dx.doi.org/10.3390/electronics11193161.

Texte intégral

Résumé :

This paper proposes a reinforcement learning (RL) algorithm for the security problem of state estimation of cyber-physical system (CPS) under denial-of-service (DoS) attacks. The security of CPS will inevitably decline when faced with malicious cyber attacks. In order to analyze the impact of cyber attacks on CPS performance, a Kalman filter, as an adaptive state estimation technology, is combined with an RL method to evaluate the issue of system security, where estimation performance is adopted as an evaluation criterion. Then, the transition of estimation error covariance under a DoS attack is described as a Markov decision process, and the RL algorithm could be applied to resolve the optimal countermeasures. Meanwhile, the interactive combat between defender and attacker could be regarded as a two-player zero-sum game, where the Nash equilibrium policy exists but needs to be solved. Considering the energy constraints, the action selection of both sides will be restricted by setting certain cost functions. The proposed RL approach is designed from three different perspectives, including the defender, the attacker and the interactive game of two opposite sides. In addition, the framework of Q-learning and state–action–reward–state–action (SARSA) methods are investigated separately in this paper to analyze the influence of different RL algorithms. The results show that both algorithms obtain the corresponding optimal policy and the Nash equilibrium policy of the zero-sum interactive game. Through comparative analysis of two algorithms, it is verified that the differences between Q-Learning and SARSA could be applied effectively into the secure state estimation in CPS.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Li, Shaodong, Xiaogang Yuan et Jie Niu. « Robotic Peg-in-Hole Assembly Strategy Research Based on Reinforcement Learning Algorithm ». Applied Sciences 12, n^o 21 (3 novembre 2022) : 11149. http://dx.doi.org/10.3390/app122111149.

Texte intégral

Résumé :

To improve the robotic assembly effects in unstructured environments, a reinforcement learning (RL) algorithm is introduced to realize a variable admittance control. In this article, the mechanisms of a peg-in-hole assembly task and admittance model are first analyzed to guide the control strategy and experimental parameters design. Then, the admittance parameter identification process is defined as the Markov decision process (MDP) problem and solved with the RL algorithm. Furthermore, a fuzzy reward system is established to evaluate the action–state value to solve the complex reward establishment problem, where the fuzzy reward includes a process reward and a failure punishment. Finally, four sets of experiments are carried out, including assembly experiments based on the position control, fuzzy control, and RL algorithm. The necessity of compliance control is demonstrated in the first experiment. The advantages of the proposed algorithms are validated by comparing them with different experimental results. Moreover, the generalization ability of the RL algorithm is tested in the last two experiments. The results indicate that the proposed RL algorithm effectively improves the robotic compliance assembly ability.

Styles APA, Harvard, Vancouver, ISO, etc.

27

Pan, Yaozong, Jian Zhang, Chunhui Yuan et Haitao Yang. « Supervised Reinforcement Learning via Value Function ». Symmetry 11, n^o 4 (24 avril 2019) : 590. http://dx.doi.org/10.3390/sym11040590.

Texte intégral

Résumé :

Using expert samples to improve the performance of reinforcement learning (RL) algorithms has become one of the focuses of research nowadays. However, in different application scenarios, it is hard to guarantee both the quantity and quality of expert samples, which prohibits the practical application and performance of such algorithms. In this paper, a novel RL decision optimization method is proposed. The proposed method is capable of reducing the dependence on expert samples via incorporating the decision-making evaluation mechanism. By introducing supervised learning (SL), our method optimizes the decision making of the RL algorithm by using demonstrations or expert samples. Experiments are conducted in Pendulum and Puckworld scenarios to test the proposed method, and we use representative algorithms such as deep Q-network (DQN) and Double DQN (DDQN) as benchmarks. The results demonstrate that the method adopted in this paper can effectively improve the decision-making performance of agents even when the expert samples are not available.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Kabanda, Professor Gabriel, Colletor Tendeukai Chipfumbu et Tinashe Chingoriwo. « A Reinforcement Learning Paradigm for Cybersecurity Education and Training ». Oriental journal of computer science and technology 16, n^o 01 (30 mai 2023) : 12–45. http://dx.doi.org/10.13005/ojcst16.01.02.

Texte intégral

Résumé :

Reinforcement learning (RL) is a type of ML, which involves learning from interactions with the environment to accomplish certain long-term objectives connected to the environmental condition. RL takes place when action sequences, observations, and rewards are used as inputs, and is hypothesis-based and goal-oriented. The key asynchronous RL algorithms are Asynchronous one-step Q learning, Asynchronous one-step SARSA, Asynchronous n-step Q-learning and Asynchronous Advantage Actor-Critic (A3C). The paper ascertains the Reinforcement Learning (RL) paradigm for cybersecurity education and training. The research was conducted using a largely positivism research philosophy, which focuses on quantitative approaches of determining the RL paradigm for cybersecurity education and training. The research design was an experiment that focused on implementing the RL Q-Learning and A3C algorithms using Python. The Asynchronous Advantage Actor-Critic (A3C) Algorithm is much faster, simpler, and scores higher on Deep Reinforcement Learning task. The research was descriptive, exploratory and explanatory in nature. A survey was conducted on the cybersecurity education and training as exemplified by Zimbabwean commercial banks. The study population encompassed employees and customers from five commercial banks in Zimbabwe, where the sample size was 370. Deep reinforcement learning (DRL) has been used to address a variety of issues in the Internet of Things. DRL heavily utilizes A3C algorithm with some Q-Learning, and this can be used to fight against intrusions into host computers or networks and fake data in IoT devices.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Yousif, Ayman Basheer, Hassan Jaleel Hassan et Gaida Muttasher. « Applying reinforcement learning for random early detaction algorithm in adaptive queue management systems ». Indonesian Journal of Electrical Engineering and Computer Science 26, n^o 3 (1 juin 2022) : 1684. http://dx.doi.org/10.11591/ijeecs.v26.i3.pp1684-1691.

Texte intégral

Résumé :

Recently, <span>the use of internet has been increased all around the hose, the companies, government departments and the video games and so on. Thus, this increased the traffic used in the networks, which generated congestion issues and sent packet drop in the nodes. To solve this problem, certain algorithms are used. The Active queue management is one of the most important algorithms that helps with this issue. For an effective network management, the RL was used, and it will adapt with the parameters of algorithms. Where the suggested algorithm deep Q-networks (DQN) depends on the reinforcement learning (RL) to reduce the drop and delay. Also, the random early detection (RED) (an active queue management (AQM) algorithm) was adopted based on the NS3 situation.</span>

Styles APA, Harvard, Vancouver, ISO, etc.

30

Szita, István, et András Lörincz. « Learning Tetris Using the Noisy Cross-Entropy Method ». Neural Computation 18, n^o 12 (décembre 2006) : 2936–41. http://dx.doi.org/10.1162/neco.2006.18.12.2936.

Texte intégral

Résumé :

The cross-entropy method is an efficient and general optimization algorithm. However, its applicability in reinforcement learning (RL) seems to be limited because it often converges to suboptimal policies. We apply noise for preventing early convergence of the cross-entropy method, using Tetris, a computer game, for demonstration. The resulting policy outperforms previous RL algorithms by almost two orders of magnitude.

Styles APA, Harvard, Vancouver, ISO, etc.

31

Ye, Weicheng, et Dangxing Chen. « Analysis of Performance Measure in Q Learning with UCB Exploration ». Mathematics 10, n^o 4 (12 février 2022) : 575. http://dx.doi.org/10.3390/math10040575.

Texte intégral

Résumé :

Compared to model-based Reinforcement Learning (RL) approaches, model-free RL algorithms, such as Q-learning, require less space and are more expressive, since specifying value functions or policies is more flexible than specifying the model for the environment. This makes model-free algorithms more prevalent in modern deep RL. However, model-based methods can more efficiently extract the information from available data. The Upper Confidence Bound (UCB) bandit can improve the exploration bonuses, and hence increase the data efficiency in the Q-learning framework. The cumulative regret of the Q-learning algorithm with an UCB exploration policy in the episodic Markov Decision Process has recently been explored in the underlying environment of finite state-action space. In this paper, we study the regret bound of the Q-learning algorithm with UCB exploration in the scenario of compact state-action metric space. We present an algorithm that adaptively discretizes the continuous state-action space and iteratively updates Q-values. The algorithm is able to efficiently optimize rewards and minimize cumulative regret.

Styles APA, Harvard, Vancouver, ISO, etc.

32

Lin, Xingbin, Deyu Yuan et Xifei Li. « Reinforcement Learning with Dual Safety Policies for Energy Savings in Building Energy Systems ». Buildings 13, n^o 3 (21 février 2023) : 580. http://dx.doi.org/10.3390/buildings13030580.

Texte intégral

Résumé :

Reinforcement learning (RL) is being gradually applied in the control of heating, ventilation and air-conditioning (HVAC) systems to learn the optimal control sequences for energy savings. However, due to the “trial and error” issue, the output sequences of RL may cause potential operational safety issues when RL is applied in real systems. To solve those problems, an RL algorithm with dual safety policies for energy savings in HVAC systems is proposed. In the proposed dual safety policies, the implicit safety policy is a part of the RL model, which integrates safety into the optimization target of RL, by adding penalties in reward for actions that exceed the safety constraints. In explicit safety policy, an online safety classifier is built to filter the actions outputted by RL; thus, only those actions that are classified as safe and have the highest benefits will be finally selected. In this way, the safety of controlled HVAC systems running with proposed RL algorithms can be effectively satisfied while reducing the energy consumptions. To verify the proposed algorithm, we implemented the control algorithm in a real existing commercial building. After a certain period of self-studying, the energy consumption of HVAC had been reduced by more than 15.02% compared to the proportional–integral–derivative (PID) control. Meanwhile, compared to the independent application of the RL algorithm without safety policy, the proportion of indoor temperature not meeting the demand is reduced by 25.06%.

Styles APA, Harvard, Vancouver, ISO, etc.

33

Li, Luchen, et A. Aldo Faisal. « Bayesian Distributional Policy Gradients ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 10 (18 mai 2021) : 8429–37. http://dx.doi.org/10.1609/aaai.v35i10.17024.

Texte intégral

Résumé :

Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in distributional RL focused mainly on computing the state-action-return distributions, here we model the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bellman operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model return distributions. The proposed algorithm, BDPG (Bayesian Distributional Policy Gradients), uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns. Moreover, we can now interpret the return prediction uncertainty as an information gain, which allows to obtain a new curiosity measure that helps BDPG steer exploration actively and efficiently. We demonstrate in a suite of Atari 2600 games and MuJoCo tasks, including well known hard-exploration challenges, how BDPG learns generally faster and with higher asymptotic performance than reference distributional RL algorithms.

Styles APA, Harvard, Vancouver, ISO, etc.

34

Grewal, Yashvir S., Frits De Nijs et Sarah Goodwin. « Evaluating Meta-Reinforcement Learning through a HVAC Control Benchmark (Student Abstract) ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 18 (18 mai 2021) : 15785–86. http://dx.doi.org/10.1609/aaai.v35i18.17889.

Texte intégral

Résumé :

Meta-Reinforcement Learning (RL) algorithms promise to leverage prior task experience to quickly learn new unseen tasks. Unfortunately, evaluating meta-RL algorithms is complicated by a lack of suitable benchmarks. In this paper we propose adapting a challenging real-world heating, ventilation and air-conditioning (HVAC) control benchmark for meta-RL. Unlike existing benchmark problems, HVAC control has a broader task distribution, and sources of exogenous stochasticity from price and weather predictions which can be shared across task definitions. This can enable greater differentiation between the performance of current meta-RL approaches, and open the way for future research into algorithms that can adapt to entirely new tasks not sampled from the current task distribution.

Styles APA, Harvard, Vancouver, ISO, etc.

35

Villalpando-Hernandez, Rafaela, Cesar Vargas-Rosales et David Munoz-Rodriguez. « Localization Algorithm for 3D Sensor Networks : A Recursive Data Fusion Approach ». Sensors 21, n^o 22 (17 novembre 2021) : 7626. http://dx.doi.org/10.3390/s21227626.

Texte intégral

Résumé :

Location-based applications for security and assisted living, such as human location tracking, pet tracking and others, have increased considerably in the last few years, enabled by the fast growth of sensor networks. Sensor location information is essential for several network protocols and applications such as routing and energy harvesting, among others. Therefore, there is a need for developing new alternative localization algorithms suitable for rough, changing environments. In this paper, we formulate the Recursive Localization (RL) algorithm, based on the recursive coordinate data fusion using at least three anchor nodes (ANs), combined with a multiplane location estimation, suitable for 3D ad hoc environments. The novelty of the proposed algorithm is the recursive fusion technique to obtain a reliable location estimation of a node by combining noisy information from several nodes. The feasibility of the RL algorithm under several network environments was examined through analytic formulation and simulation processes. The proposed algorithm improved the location accuracy for all the scenarios analyzed. Comparing with other 3D range-based positioning algorithms, we observe that the proposed RL algorithm presents several advantages, such as a smaller number of required ANs and a better position accuracy for the worst cases analyzed. On the other hand, compared to other 3D range-free positioning algorithms, we can see an improvement by around 15.6% in terms of positioning accuracy.

Styles APA, Harvard, Vancouver, ISO, etc.

36

Zhao, Richard, et Duane Szafron. « Learning Character Behaviors Using Agent Modeling in Games ». Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 5, n^o 1 (16 octobre 2009) : 179–85. http://dx.doi.org/10.1609/aiide.v5i1.12369.

Texte intégral

Résumé :

Our goal is to provide learning mechanisms to game agents so they are capable of adapting to new behaviors based on the actions of other agents. We introduce a new on-line reinforcement learning (RL) algorithm, ALeRT-AM, that includes an agent-modeling mechanism. We implemented this algorithm in BioWare Corp.’s role-playing game, Neverwinter Nights to evaluate its effectiveness in a real game. Our experiments compare agents who use ALeRT-AM with agents that use the non-agent modeling ALeRT RL algorithm and two other non-RL algorithms. We show that an ALeRT-AM agent is able to rapidly learn a winning strategy against other agents in a combat scenario and to adapt to changes in the environment.

Styles APA, Harvard, Vancouver, ISO, etc.

37

Hu et Xu. « Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation ». Information 10, n^o 11 (2 novembre 2019) : 341. http://dx.doi.org/10.3390/info10110341.

Texte intégral

Résumé :

Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.

Styles APA, Harvard, Vancouver, ISO, etc.

38

Shen, Haocheng, Jason Yosinski, Petar Kormushev, Darwin G. Caldwell et Hod Lipson. « Learning Fast Quadruped Robot Gaits with the RL PoWER Spline Parameterization ». Cybernetics and Information Technologies 12, n^o 3 (1 septembre 2012) : 66–75. http://dx.doi.org/10.2478/cait-2012-0022.

Texte intégral

Résumé :

Abstract Legged robots are uniquely privileged over their wheeled counterparts in their potential to access rugged terrain. However, designing walking gaits by hand for legged robots is a difficult and time-consuming process, so we seek algorithms for learning such gaits to automatically using real world experimentation. Numerous previous studies have examined a variety of algorithms for learning gaits, using an assortment of different robots. It is often difficult to compare the algorithmic results from one study to the next, because the conditions and robots used vary. With this in mind, we have used an open-source, 3D printed quadruped robot called QuadraTot, so the results may be verified, and hopefully improved upon, by any group so desiring. Because many robots do not have accurate simulators, we test gait-learning algorithms entirely on the physical robot. Previous studies using the QuadraTot have compared parameterized splines, the HyperNEAT generative encoding and genetic algorithm. Among these, the research on the genetic algorithm was conducted by (G l e t t e et al., 2012) in a simulator and tested on a real robot. Here we compare these results to an algorithm called Policy learning by Weighting Exploration with the Returns, or RL PoWER. We report that this algorithm has learned the fastest gait through only physical experiments yet reported in the literature, 16.3% faster than reported for HyperNEAT. In addition, the learned gaits are less taxing on the robot and more repeatable than previous record-breaking gaits.

Styles APA, Harvard, Vancouver, ISO, etc.

39

Shaposhnikova, Sofiia, et Dmytro Omelian. « TOWARDS EFFECTIVE STRATEGIES FOR MOBILE ROBOT USING REINFORCEMENT LEARNING AND GRAPH ALGORITHMS ». Automation of technological and business processes 15, n^o 2 (19 juin 2023) : 24–34. http://dx.doi.org/10.15673/atbp.v15i2.2522.

Texte intégral

Résumé :

Abstract. This research paper explores the use of Reinforcement Learning (RL) and traditional graph algorithms like A* for mobile robots in the field of path planning and strategy development. The paper conducts a comprehensive analysis of these algorithms by evaluating their performance in terms of efficiency, scalability, and applicability in real-world scenarios. The results of the study show that while both RL and A* algorithms have their benefits and limitations, RL algorithms have the potential to provide more effective and scalable solutions for mobile robots in real-world applications. The paper also provides ongoing research directions aimed at improving the performance of these algorithms and concludes by offering valuable insights for researchers and practitioners working in the field of mobile robots. The purpose of this project is to evaluate the performance of these algorithms, identify their benefits and limitations, and contribute to the development of more effective and practical solutions for mobile robots in real-world applications. The results of this study will be valuable for researchers and practitioners working in the field of mobile robots, as it will provide a comprehensive analysis of the use of RL and A* algorithms and offer ongoing research directions for improving their performance.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Liao, Hanlin. « Urban Intersection Simulation and Verification via Deep Reinforcement Learning Algorithms ». Journal of Physics : Conference Series 2435, n^o 1 (1 février 2023) : 012019. http://dx.doi.org/10.1088/1742-6596/2435/1/012019.

Texte intégral

Résumé :

Abstract Reinforcement Learning (RL) uses rewards to have iteration and update the next state for training in an unknown and complex environment. This paper aims to find a possible solution for the traffic congestion problem and train four Deep Reinforcement Learning (DRL) algorithms to verify the urban intersection simulation environment in the different discussed dimensions, including practicability, efficiency, safety, complexity, and limitation. The experiment result shows that the four DRL algorithms are efficient in the RL intersection simulation. This paper has succeeded in verifying this RL environment in the comparison and expands the experiment with three conclusions: The agent can train by Deep Q-network(DQN), DoubleDQN, DuelingNet DQN, and Categorical DQN algorithms to be practical and efficient. As the experiment results show, DuelingNet takes less time to finish the training, and Categorical DQN has reduced the collision rate after a while. However, the RL simulation environment lacks complexity, causing limitations in solving more complex problems, including the lack of simulation of pedestrian behaviors and the prediction of emergency events. This paper recommends creating a more complex urban intersection simulation that includes exceptional cases for the RL agent environment and more traffic pressure for the intersection to improve the faster and safer response in future automatic driving.

Styles APA, Harvard, Vancouver, ISO, etc.

41

Ding, Yuhao, Ming Jin et Javad Lavaei. « Non-stationary Risk-Sensitive Reinforcement Learning : Near-Optimal Dynamic Regret, Adaptive Detection, and Separation Design ». Proceedings of the AAAI Conference on Artificial Intelligence 37, n^o 6 (26 juin 2023) : 7405–13. http://dx.doi.org/10.1609/aaai.v37i6.25901.

Texte intégral

Résumé :

We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs). Both the reward functions and the state transition kernels are unknown and allowed to vary arbitrarily over time with a budget on their cumulative variations. When this variation budget is known a prior, we propose two restart-based algorithms, namely Restart-RSMB and Restart-RSQ, and establish their dynamic regrets. Based on these results, we further present a meta-algorithm that does not require any prior knowledge of the variation budget and can adaptively detect the non-stationarity on the exponential value functions. A dynamic regret lower bound is then established for non-stationary risk-sensitive RL to certify the near-optimality of the proposed algorithms. Our results also show that the risk control and the handling of the non-stationarity can be separately designed in the algorithm if the variation budget is known a prior, while the non-stationary detection mechanism in the adaptive algorithm depends on the risk parameter. This work offers the first non-asymptotic theoretical analyses for the non-stationary risk-sensitive RL in the literature.

Styles APA, Harvard, Vancouver, ISO, etc.

42

Sarkar, Soumyadip. « Quantitative Trading using Deep Q Learning ». International Journal for Research in Applied Science and Engineering Technology 11, n^o 4 (30 avril 2023) : 731–38. http://dx.doi.org/10.22214/ijraset.2023.50170.

Texte intégral

Résumé :

Abstract: Reinforcement learning (RL) is a branch of machine learning that has been used in a variety of applications such as robotics, game playing, and autonomous systems. In recent years, there has been growing interest in applying RL to quantitative trading, where the goal is to make profitable trades in financial markets. This paper explores the use of RL in quantitative trading and presents a case study of a RLbased trading algorithm. The results show that RL can be a powerful tool for quantitative trading, and that it has the potential to outperform traditional trading algorithms. The use of reinforcement learning in quantitative trading represents a promising area of research that can potentially lead to the development of more sophisticated and effective trading systems. Future work could explore the use of alternative reinforcement learning algorithms, incorporate additional data sources, and test the system on different asset classes. Overall, our research demonstrates the potential of using reinforcement learning in quantitative trading and highlights the importance of continued research and development in this area. By developing more sophisticated and effective trading systems, we can potentially improve the efficiency of financial markets and generate greater returns for investors

Styles APA, Harvard, Vancouver, ISO, etc.

43

Zhang, Ningyan. « Analysis of reinforce learning in medical treatment ». Applied and Computational Engineering 5, n^o 1 (14 juin 2023) : 48–53. http://dx.doi.org/10.54254/2755-2721/5/20230527.

Texte intégral

Résumé :

As human approaches the big data period, artificial intelligence becomes dominating in almost every domain. As part of machine learning, reinforcement learning (RL) is intended to utilize mutual communication experiences around the world and assess feedback to strengthen human ability in decision-making. Unlike traditional supervised learning, RL is able to sample, assess and order the delayed feedback decision-making at the same time. This characteristic of RL makes it powerful when it comes to exploring a solution in the medical field. This paper investigates the wide application of RL in the medical field. Including two major parts of the medical field: artificial diagnosis and precision medicine, this paper first introduces several algorithms of RL in each part, then states the inefficiency and unsolved difficulty in this area, together with the future investigation direction of RL. This paper provides researchers with multiple feasible algorithms, supported methods and theoretical analysis, which pave the way for future development of reinforcement learning in medical field.

Styles APA, Harvard, Vancouver, ISO, etc.

44

Puspitasari, Annisa Anggun, et Byung Moo Lee. « A Survey on Reinforcement Learning for Reconfigurable Intelligent Surfaces in Wireless Communications ». Sensors 23, n^o 5 (24 février 2023) : 2554. http://dx.doi.org/10.3390/s23052554.

Texte intégral

Résumé :

A reconfigurable intelligent surface (RIS) is a development of conventional relay technology that can send a signal by reflecting the signal received from a transmitter to a receiver without additional power. RISs are a promising technology for future wireless communication due to their improvement of the quality of the received signal, energy efficiency, and power allocation. In addition, machine learning (ML) is widely used in many technologies because it can create machines that mimic human mindsets with mathematical algorithms without requiring direct human assistance. Meanwhile, it is necessary to implement a subfield of ML, reinforcement learning (RL), to automatically allow a machine to make decisions based on real-time conditions. However, few studies have provided comprehensive information related to RL algorithms—especially deep RL (DRL)—for RIS technology. Therefore, in this study, we provide an overview of RISs and an explanation of the operations and implementations of RL algorithms for optimizing the parameters of RIS technology. Optimizing the parameters of RISs can offer several benefits for communication systems, such as the maximization of the sum rate, user power allocation, and energy efficiency or the minimization of the information age. Finally, we highlight several issues to consider in implementing RL algorithms for RIS technology in wireless communications in the future and provide possible solutions.

Styles APA, Harvard, Vancouver, ISO, etc.

45

Delipetrev, Blagoj, Andreja Jonoski et Dimitri P. Solomatine. « A novel nested stochastic dynamic programming (nSDP) and nested reinforcement learning (nRL) algorithm for multipurpose reservoir optimization ». Journal of Hydroinformatics 19, n^o 1 (17 septembre 2016) : 47–61. http://dx.doi.org/10.2166/hydro.2016.243.

Texte intégral

Résumé :

In this article we present two novel multipurpose reservoir optimization algorithms named nested stochastic dynamic programming (nSDP) and nested reinforcement learning (nRL). Both algorithms are built as a combination of two algorithms; in the nSDP case it is (1) stochastic dynamic programming (SDP) and (2) nested optimal allocation algorithm (nOAA) and in the nRL case it is (1) reinforcement learning (RL) and (2) nOAA. The nOAA is implemented with linear and non-linear optimization. The main novel idea is to include a nOAA at each SDP and RL state transition, that decreases starting problem dimension and alleviates curse of dimensionality. Both nSDP and nRL can solve multi-objective optimization problems without significant computational expenses and algorithm complexity and can handle dense and irregular variable discretization. The two algorithms were coded in Java as a prototype application and on the Knezevo reservoir, located in the Republic of Macedonia. The nSDP and nRL optimal reservoir policies were compared with nested dynamic programming policies, and overall conclusion is that nRL is more powerful, but significantly more complex than nSDP.

Styles APA, Harvard, Vancouver, ISO, etc.

46

Wang, Mengmei. « Optimizing Multitask Assignment of Internet of Things Devices by Reinforcement Learning in Mobile Crowdsensing Scenes ». Security and Communication Networks 2022 (17 août 2022) : 1–10. http://dx.doi.org/10.1155/2022/6202237.

Texte intégral

Résumé :

The objective is to optimize the multitask assignment (MTA) in mobile crowdsensing (MCS) scenarios. From the perspective of reinforcement learning (RL), an Internet of Things (IoT) devices-oriented MTA model is established using MCS, IoT technology, and other related theories. Then, the data collected by the University of Cambridge and the University of St. Andrews are chosen to verify the three MTA algorithms on IoT devices. They are multistage online task assignment (MOTA), average makespan-sensitive online task assignment (AOTA), and water filling (WF). Experiments are designed by considering different algorithms’ MTA time consumption and accuracy in simple and complex task scenarios. The research results manifest that with a constant load or task quantity, the MOTA algorithm takes the shortest time to assign tasks. In simple task scenarios, MOTA is compared with the WF. The MOTA algorithm’s total moving distance is relatively short, and the task completion degree is the highest. AOTA algorithm lends best to complex tasks, with the highest MTA accuracy and the shortest time consumption. Therefore, the research on IoT devices’ MTA optimization based on RL in the MCS scenario provides a certain theoretical basis for subsequent MTA studies.

Styles APA, Harvard, Vancouver, ISO, etc.

47

Гайнетдинов, А. Ф. « NeRF IN REINFORCEMENT LEARNING FOR IMAGE RECOGNITION ». Южно-Сибирский научный вестник, n^o 2(48) (30 avril 2023) : 63–72. http://dx.doi.org/10.25699/sssb.2023.48.2.011.

Texte intégral

Résumé :

Актуальность. В статье рассматриваются методы распознавания изображений с использованием нейронных сетей разной архитектуры, в том числе обучения с подкреплением Q-Learning. Метод. Для обучения алгоритмов и их тестирования использовались наборы изображений 6 классов лесных животных. Изучалось 6 вариаций наборов данных, отличие в которых обусловлено разной долей обучающей выборки: от 40 до 80%. Проанализировано семь методик распознавания: CNN-AE и два подхода визуального управления (NeRF-RL, DRQ-V2) обучались на основе двух- и трехмерной сверточной нейросети и Q-Learning. Результаты работы. Все испытуемые модели показывают высокий процент точности независимо от соотношения обучающей и тренирующей выборки. Минимальные результаты были зафиксированы при использовании CNN-AE, тогда как NeRF-RL и DRQ-V2 на основе двухмерных и трехмерных CNN были более точными. Обучение методов NeRF-RL и DRQ-V2 используя метод Q-Learning привел к получению наиболее точных результатов. Использование Q-Learning для обучения алгоритма NeRF-RL позволяет достичь максимальных результатов. Эта архитектура была применена для распознавания животных и распределения изображений по классам. Выводы. Таким образом, объединение алгоритмов NeRF и обучения с подкреплением является эффективным и перспективным методом распознавания изображений для использования в обработке информации со скрытых камер с целью обнаружении лесных животных. This study discusses image recognition methods that exploit different neural networks, including Q-Learning. The algorithms were trained and tested on images depicting 6 different classes of forest animals. A total of 6 image datasets with different amount of training data (40 to 80%) were taken. Here, seven image recognition techniques were analyzed: CNN-AE and two algorithms for visual continuous control (NeRF-RL and DRQ-V2), all trained on a two- and three-dimensional convolution neural network (CNN), as well as Q-Learning. All models had high accuracy; CNN-AE exhibited the lowest recognition accuracy, whilst NeRF-RL and DRQ-V2 based on 2D and 3D CNNs were more accurate. NeRF-RL and DRQ-V2 trained on data using the Q-Learning method yielded the highest accuracy. The use of Q-Learning to train the NeRF-RL algorithm provided the best result. This architecture has been applied for animal recognition and image classification into classes. Based on the research, the combination of NeRF algorithms and reinforcement learning is an effective and promising image recognition method for detecting forest animals in camera-trap images.

Styles APA, Harvard, Vancouver, ISO, etc.

48

Nicola, Marcel, et Claudiu-Ionel Nicola. « Improvement of Linear and Nonlinear Control for PMSM Using Computational Intelligence and Reinforcement Learning ». Mathematics 10, n^o 24 (9 décembre 2022) : 4667. http://dx.doi.org/10.3390/math10244667.

Texte intégral

Résumé :

Starting from the nonlinear operating equations of the permanent magnet synchronous motor (PMSM) and from the global strategy of the field-oriented control (FOC), this article compares the linear and nonlinear control of a PMSM. It presents the linear quadratic regulator (LQR) algorithm as a linear control algorithm, in addition to that obtained through feedback linearization (FL). Naturally, the nonlinear approach through the Lyapunov and Hamiltonian functions leads to results that are superior to those of the linear algorithms. With the particle swarm optimization (PSO), simulated annealing (SA), genetic algorithm (GA), and gray wolf Optimization (GWO) computational intelligence (CI) algorithms, the performance of the PMSM–control system (CS) was optimized by obtaining parameter vectors from the control algorithms by optimizing specific performance indices. Superior performance of the PMSM–CS was also obtained by using reinforcement learning (RL) algorithms, which provided correction command signals (CCSs) after the training stages. Starting from the PMSM–CS performance that was obtained for a benchmark, there were four types of linear and nonlinear control algorithms for the control of a PMSM, together with the means of improving the PMSM–CS performance by using CI algorithms and RL–twin delayed deep deterministic policy gradient (TD3) agent algorithms. The article also presents experimental results that confirm the superiority of PMSM–CS–CI over classical PI-type controllers.

Styles APA, Harvard, Vancouver, ISO, etc.

49

You, Haoyi, Beichen Yu, Haiming Jin, Zhaoxing Yang et Jiahui Sun. « User-Oriented Robust Reinforcement Learning ». Proceedings of the AAAI Conference on Artificial Intelligence 37, n^o 12 (26 juin 2023) : 15269–77. http://dx.doi.org/10.1609/aaai.v37i12.26781.

Texte intégral

Résumé :

Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy’s performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 6 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average-case and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.

Styles APA, Harvard, Vancouver, ISO, etc.

50

Yang, Bin, Muhammad Haseeb Arshad et Qing Zhao. « Packet-Level and Flow-Level Network Intrusion Detection Based on Reinforcement Learning and Adversarial Training ». Algorithms 15, n^o 12 (30 novembre 2022) : 453. http://dx.doi.org/10.3390/a15120453.

Texte intégral

Résumé :

Powered by advances in information and internet technologies, network-based applications have developed rapidly, and cybersecurity has grown more critical. Inspired by Reinforcement Learning (RL) success in many domains, this paper proposes an Intrusion Detection System (IDS) to improve cybersecurity. The IDS based on two RL algorithms, i.e., Deep Q-Learning and Policy Gradient, is carefully formulated, strategically designed, and thoroughly evaluated at the packet-level and flow-level using the CICDDoS2019 dataset. Compared to other research work in a similar line of research, this paper is focused on providing a systematic and complete design paradigm of IDS based on RL algorithms, at both the packet and flow levels. For the packet-level RL-based IDS, first, the session data are transformed into images via an image embedding method proposed in this work. A comparison between 1D-Convolutional Neural Networks (1D-CNN) and CNN for extracting features from these images (for further RL agent training) is drawn from the quantitative results. In addition, an anomaly detection module is designed to detect unknown network traffic. For flow-level IDS, a Conditional Generative Adversarial Network (CGAN) and the ε-greedy strategy are adopted in designing the exploration module for RL agent training. To improve the robustness of the intrusion detection, a sample agent with a complement reward policy of the RL agent is introduced for the purpose of adversarial training. The experimental results of the proposed RL-based IDS show improved results over the state-of-the-art algorithms presented in the literature for packet-level and flow-level IDS.

Styles APA, Harvard, Vancouver, ISO, etc.

Nous offrons des réductions sur tous les plans premium pour les auteurs dont les œuvres sont incluses dans des sélections littéraires thématiques. Contactez-nous pour obtenir un code promo unique!