Log in

Relevant bibliographies by topics / Constrained RL

Academic literature on the topic 'Constrained RL'

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Contents

Journal articles
Books
Book chapters
Conference papers

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Constrained RL.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Constrained RL"

1

HasanzadeZonuzy, Aria, Archana Bura, Dileep Kalathil, and Srinivas Shakkottai. "Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 7667–74. http://dx.doi.org/10.1609/aaai.v35i9.16937.

Full text

Abstract:

Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy---both objective maximization and constraint satisfaction---in a PAC sense. We explore two classes of RL algorithms, namely, (i) a generative model based approach, wherein samples are taken initially to estimate a model, and (ii) an online approach, wherein the model is updated as samples are obtained. Our main finding is that compared to the best known bounds of the unconstrained regime, the sample complexity of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints, which suggests that the approach may be easily utilized in real systems.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhang, Renchi, Runsheng Yu, and Wei Xia. "Constraint-aware Policy Optimization to Solve the Vehicle Routing Problem with Time Windows." Information Technology and Control 51, no. 1 (March 26, 2022): 126–38. http://dx.doi.org/10.5755/j01.itc.51.1.29924.

Full text

Abstract:

The vehicle routing problem with time windows (VRPTW) as one of the most known combinatorial operations (CO) problem is considered to be a tough issue in practice and the main challenge of that is to find the approximate solutions within a reasonable time. In recent years, reinforcement learning (RL) based methods have gained increasing attention in many CO problems, such as the vehicle routing problems (VRP), due to their enormous potential to efficiently generate high-quality solutions. However, neglecting the information between the constraints and the solutions makes previous approaches performance unideal in some strongly constrained problems, like VRPTW. We present the constraint-aware policy optimization (CPO) for VRPTWthat can let the agent learn the constraints as a representation of the whole environment to improve the generalization of RL methods. Extensive experiments on both the Solomon benchmark and the generated datasets demonstrate that our approach significantly outperforms other competition methods.

APA, Harvard, Vancouver, ISO, and other styles

3

Hu, Jingwei, Zhu Liu, Chichuan Jin, and Weimin Yuan. "Relativistic Fe Kα line in the composite X-ray spectra of radio-loud active galactic nuclei." Monthly Notices of the Royal Astronomical Society 488, no. 3 (July 25, 2019): 4378–88. http://dx.doi.org/10.1093/mnras/stz2030.

Full text

Abstract:

ABSTRACT While a broad Fe Kα emission line is generally found in the X-ray spectra of radio quiet (RQ) active galactic nuclei (AGNs), this feature, commonly thought to be broadened by the relativistic effects near the central black hole, appears to be rare in their radio loud (RL) counterparts. In this paper, we carry out a detailed study of the ensemble property of the X-ray spectra, focusing on the Fe line, of 97 RL AGNs by applying the spectral stacking method to the spectra obtained with XMM–Newton. For comparison, the same analysis is also performed for 193 RQ AGNs. Both a narrow and a broad component of the Fe Kα line are detected at high significance in the stacked spectra of both samples. The broad lines can be well fitted with relativistically broadened line profiles. Our results suggest that, as in their RQ counterparts, a relativistic Fe line component is commonly present in RL AGNs, though it may not be detected unambiguously in individual objects with spectra of relatively low signal to noise. We try to constrain the average spin of the black holes for both the RL and RQ AGN samples by modelling their composite Fe line spectral profiles with relativistic disc line models. For the RL sample, the average spin is loosely constrained and a wide range is allowed except for very fast spins (<0.78, 90 per cent confidence), while for the RQ sample, it is constrained to be low or moderate (<0.24). We conclude that the more precise measurement of the black hole spins in RL AGNs has to await for the advent of future high-throughput X-ray telescopes.

APA, Harvard, Vancouver, ISO, and other styles

4

Bhatia, Abhinav, Pradeep Varakantham, and Akshat Kumar. "Resource Constrained Deep Reinforcement Learning." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 610–20. http://dx.doi.org/10.1609/icaps.v29i1.3528.

Full text

Abstract:

In urban environments, resources have to be constantly matched to the “right” locations where customer demand is present. For instance, ambulances have to be matched to base stations regularly so as to reduce response time for emergency incidents in ERS (Emergency Response Systems); vehicles (cars, bikes among others) have to be matched to docking stations to reduce lost demand in shared mobility systems. Such problems are challenging owing to the demand uncertainty, combinatorial action spaces and constraints on allocation of resources (e.g., total resources, minimum and maximum number of resources at locations and regions).Existing systems typically employ myopic and greedy optimization approaches to optimize resource allocation. Such approaches typically are unable to handle surges or variances in demand patterns well. Recent work has demonstrated the ability of Deep RL methods in adapting well to highly uncertain environments. However, existing Deep RL methods are unable to handle combinatorial action spaces and constraints on allocation of resources. To that end, we have developed three approaches on top of the well known actor-critic approach, DDPG (Deep Deterministic Policy Gradient) that are able to handle constraints on resource allocation. We also demonstrate that they are able to outperform leading approaches on simulators validated on semi-real and real data sets.

APA, Harvard, Vancouver, ISO, and other styles

5

Gu, Shangding, Guang Chen, Lijun Zhang, Jing Hou, Yingbai Hu, and Alois Knoll. "Constrained Reinforcement Learning for Vehicle Motion Planning with Topological Reachability Analysis." Robotics 11, no. 4 (August 16, 2022): 81. http://dx.doi.org/10.3390/robotics11040081.

Full text

Abstract:

Rule-based traditional motion planning methods usually perform well with prior knowledge of the macro-scale environments but encounter challenges in unknown and uncertain environments. Deep reinforcement learning (DRL) is a solution that can effectively deal with micro-scale unknown and uncertain environments. Nevertheless, DRL is unstable and lacks interpretability. Therefore, it raises a new challenge: how to combine the effectiveness and overcome the drawbacks of the two methods while guaranteeing stability in uncertain environments. In this study, a multi-constraint and multi-scale motion planning method is proposed for automated driving with the use of constrained reinforcement learning (RL), named RLTT, and comprising RL, a topological reachability analysis used for vehicle path space (TPS), and a trajectory lane model (TLM). First, a dynamic model of vehicles is formulated; then, TLM is developed on the basis of the dynamic model, thus constraining RL action and state space. Second, macro-scale path planning is achieved through TPS, and in the micro-scale range, discrete routing points are achieved via RLTT. Third, the proposed motion planning method is designed by combining sophisticated rules, and a theoretical analysis is provided to guarantee the efficiency of our method. Finally, related experiments are conducted to evaluate the effectiveness of the proposed method; our method can reduce 19.9% of the distance cost in the experiments as compared to the traditional method. Experimental results indicate that the proposed method can help mitigate the gap between data-driven and traditional methods, provide better performance for automated driving, and facilitate the use of RL methods in more fields.

APA, Harvard, Vancouver, ISO, and other styles

6

Wang, Ru-Min, Jin-Huan Sheng, Jie Zhu, Ying-Ying Fan, and Yuan-Guo Xu. "Decays $D^+_{(s)}\to \pi(K)^{+}\ell^+\ell^-$ and D0→ℓ+ℓ- in the MSSM with and without R-parity." International Journal of Modern Physics A 30, no. 12 (April 28, 2015): 1550063. http://dx.doi.org/10.1142/s0217751x15500633.

Full text

Abstract:

We study the rare decays D+→π+ℓ+ℓ-, [Formula: see text] and D0→ℓ+ℓ-(ℓ = e, μ) in the minimal supersymmetic standard model with and without R-parity. Using the strong constraints on relevant supersymmetric parameters from [Formula: see text] mixing and [Formula: see text] decay, we examine constrained supersymmetry contributions to relevant branching ratios, direct CP violations and ratios of [Formula: see text] and [Formula: see text] decay rates. We find that both R-parity conserving LR as well as RL mass insertions and R-parity violating squark exchange couplings have huge effects on the direct CP violations of [Formula: see text], moreover, the constrained LR and RL mass insertions still have obvious effects on the ratios of [Formula: see text] and [Formula: see text] decay rates. The direct CP asymmetries and the ratios of [Formula: see text] and [Formula: see text] decay rates are very sensitive to both moduli and phases of relevant supersymmetric parameters. In addition, the differential direct CP asymmetries of [Formula: see text] are studied in detail.

APA, Harvard, Vancouver, ISO, and other styles

7

Wei, Honghao, Xin Liu, and Lei Ying. "A Provably-Efficient Model-Free Algorithm for Infinite-Horizon Average-Reward Constrained Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 4 (June 28, 2022): 3868–76. http://dx.doi.org/10.1609/aaai.v36i4.20302.

Full text

Abstract:

This paper presents a model-free reinforcement learning (RL) algorithm for infinite-horizon average-reward Constrained Markov Decision Processes (CMDPs). Considering a learning horizon K, which is sufficiently large, the proposed algorithm achieves sublinear regret and zero constraint violation. The bounds depend on the number of states S, the number of actions A, and two constants which are independent of the learning horizon K.

APA, Harvard, Vancouver, ISO, and other styles

8

Lee, Xian Yeow, Sambit Ghadai, Kai Liang Tan, Chinmay Hegde, and Soumik Sarkar. "Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4577–84. http://dx.doi.org/10.1609/aaai.v34i04.5887.

Full text

Abstract:

Robustness of Deep Reinforcement Learning (DRL) algorithms towards adversarial attacks in real world applications such as those deployed in cyber-physical systems (CPS) are of increasing concern. Numerous studies have investigated the mechanisms of attacks on the RL agent's state space. Nonetheless, attacks on the RL agent's action space (corresponding to actuators in engineering systems) are equally perverse, but such attacks are relatively less studied in the ML literature. In this work, we first frame the problem as an optimization problem of minimizing the cumulative reward of an RL agent with decoupled constraints as the budget of attack. We propose the white-box Myopic Action Space (MAS) attack algorithm that distributes the attacks across the action space dimensions. Next, we reformulate the optimization problem above with the same objective function, but with a temporally coupled constraint on the attack budget to take into account the approximated dynamics of the agent. This leads to the white-box Look-ahead Action Space (LAS) attack algorithm that distributes the attacks across the action and temporal dimensions. Our results showed that using the same amount of resources, the LAS attack deteriorates the agent's performance significantly more than the MAS attack. This reveals the possibility that with limited resource, an adversary can utilize the agent's dynamics to malevolently craft attacks that causes the agent to fail. Additionally, we leverage these attack strategies as a possible tool to gain insights on the potential vulnerabilities of DRL agents.

APA, Harvard, Vancouver, ISO, and other styles

9

Delgrange, Florent, Ann Nowé, and Guillermo A. Pérez. "Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6497–505. http://dx.doi.org/10.1609/aaai.v36i6.20602.

Full text

Abstract:

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.

APA, Harvard, Vancouver, ISO, and other styles

10

Ding, Zhenhuan, Xiaoge Huang, and Zhao Liu. "Active Exploration by Chance-Constrained Optimization for Voltage Regulation with Reinforcement Learning." Energies 15, no. 2 (January 16, 2022): 614. http://dx.doi.org/10.3390/en15020614.

Full text

Abstract:

Voltage regulation in distribution networks encounters a challenge of handling uncertainties caused by the high penetration of photovoltaics (PV). This research proposes an active exploration (AE) method based on reinforcement learning (RL) to respond to the uncertainties by regulating the voltage of a distribution network with battery energy storage systems (BESS). The proposed method integrates engineering knowledge to accelerate the training process of RL. The engineering knowledge is the chance-constrained optimization. We formulate the problem in a chance-constrained optimization with a linear load flow approximation. The optimization results are used to guide the action selection of the exploration for improving training efficiency and reducing the conserveness characteristic. The comparison of methods focuses on how BESSs are used, training efficiency, and robustness under varying uncertainties and BESS sizes. We implement the proposed algorithm, a chance-constrained optimization, and a traditional Q-learning in the IEEE 13 Node Test Feeder. Our evaluation shows that the proposed AE method has a better response to the training efficiency compared to traditional Q-learning. Meanwhile, the proposed method has advantages in BESS usage in conserveness compared to the chance-constrained optimization.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Constrained RL"

1

Denneheuvel, Sieger van. Constraint solving on data base systems: Design and implementation of the rule language RL/1. [Netherlands: s.n., 1991.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Constrained RL"

1

Du, Zhiyong, Bin Jiang, Qihui Wu, Yuhua Xu, and Kun Xu. "Learning the Optimal Network with Handoff Constraint: MAB RL Based Network Selection." In Towards User-Centric Intelligent Network Selection in 5G Heterogeneous Wireless Networks, 13–31. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-15-1120-2_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Sadeghi, Saeid, Maghsoud Amiri, and Farzaneh Mansoori Mooseloo. "Artificial Intelligence and Its Application in Optimization under Uncertainty." In Artificial Intelligence. IntechOpen, 2021. http://dx.doi.org/10.5772/intechopen.98628.

Full text

Abstract:

Nowadays, the increase in data acquisition and availability and complexity around optimization make it imperative to jointly use artificial intelligence (AI) and optimization for devising data-driven and intelligent decision support systems (DSS). A DSS can be successful if large amounts of interactive data proceed fast and robustly and extract useful information and knowledge to help decision-making. In this context, the data-driven approach has gained prominence due to its provision of insights for decision-making and easy implementation. The data-driven approach can discover various database patterns without relying on prior knowledge while also handling flexible objectives and multiple scenarios. This chapter reviews recent advances in data-driven optimization, highlighting the promise of data-driven optimization that integrates mathematical programming and machine learning (ML) for decision-making under uncertainty and identifies potential research opportunities. This chapter provides guidelines and implications for researchers, managers, and practitioners in operations research who want to advance their decision-making capabilities under uncertainty concerning data-driven optimization. Then, a comprehensive review and classification of the relevant publications on the data-driven stochastic program, data-driven robust optimization, and data-driven chance-constrained are presented. This chapter also identifies fertile avenues for future research that focus on deep-data-driven optimization, deep data-driven models, as well as online learning-based data-driven optimization. Perspectives on reinforcement learning (RL)-based data-driven optimization and deep RL for solving NP-hard problems are discussed. We investigate the application of data-driven optimization in different case studies to demonstrate improvements in operational performance over conventional optimization methodology. Finally, some managerial implications and some future directions are provided.

APA, Harvard, Vancouver, ISO, and other styles

3

Daneshfar, Fatemeh, and Vafa Maihami. "Distributed Learning Algorithm Applications to the Scheduling of Wireless Sensor Networks." In Mobile Computing and Wireless Networks, 1049–81. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-4666-8751-6.ch045.

Full text

Abstract:

Wireless Sensor Network (WSN) is a network of devices denoted as nodes that can sense the environment and communicate gathered data, through wireless medium to a sink node. It is a wireless network with low power consumption, small size, and reasonable price which has a variety of applications in monitoring and tracking. However, WSN is characterized by constrained energy because its nodes are battery-powered and energy recharging is difficult in most of applications. Also the reduction of energy consumption often introduces additional latency of data delivery. To address this, many scheduling approaches have been proposed. In this paper, the authors discuss the applicability of Reinforcement Learning (RL) towards multiple access design in order to reduce energy consumption and to achieve low latency in WSNs. In this learning strategy, an agent would become knowledgeable in making actions through interacting with the environment. As a result of rewards in response to the actions, the agent asymptotically reaches the optimal policy. This policy maximizes the long-term expected return value of the agent.

APA, Harvard, Vancouver, ISO, and other styles

4

Daneshfar, Fatemeh, and Vafa Maihami. "Distributed Learning Algorithm Applications to the Scheduling of Wireless Sensor Networks." In Handbook of Research on Novel Soft Computing Intelligent Algorithms, 860–91. IGI Global, 2014. http://dx.doi.org/10.4018/978-1-4666-4450-2.ch028.

Full text

Abstract:

Wireless Sensor Network (WSN) is a network of devices denoted as nodes that can sense the environment and communicate gathered data, through wireless medium to a sink node. It is a wireless network with low power consumption, small size, and reasonable price which has a variety of applications in monitoring and tracking. However, WSN is characterized by constrained energy because its nodes are battery-powered and energy recharging is difficult in most of applications. Also the reduction of energy consumption often introduces additional latency of data delivery. To address this, many scheduling approaches have been proposed. In this paper, the authors discuss the applicability of Reinforcement Learning (RL) towards multiple access design in order to reduce energy consumption and to achieve low latency in WSNs. In this learning strategy, an agent would become knowledgeable in making actions through interacting with the environment. As a result of rewards in response to the actions, the agent asymptotically reaches the optimal policy. This policy maximizes the long-term expected return value of the agent.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Constrained RL"

1

Sarafian, Elad, Aviv Tamar, and Sarit Kraus. "Constrained Policy Improvement for Efficient Reinforcement Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/396.

Full text

Abstract:

We propose a policy improvement algorithm for Reinforcement Learning (RL) termed Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the Q-value from finite experience data. Greedy policies or even constrained policy optimization algorithms that ignore these errors may suffer from an improvement penalty (i.e., a policy impairment). To reduce the penalty, the idea of RBI is to attenuate rapid policy changes to actions that were rarely sampled. This approach is shown to avoid catastrophic performance degradation and reduce regret when learning from a batch of transition samples. Through a two-armed bandit example, we show that it also increases data efficiency when the optimal action has a high variance. We evaluate RBI in two tasks in the Atari Learning Environment: (1) learning from observations of multiple behavior policies and (2) iterative RL. Our results demonstrate the advantage of RBI over greedy policies and other constrained policy optimization algorithms both in learning from observations and in RL tasks.

APA, Harvard, Vancouver, ISO, and other styles

2

Wang, Zhaorong, Meng Wang, Jingqi Zhang, Yingfeng Chen, and Chongjie Zhang. "Reward-Constrained Behavior Cloning." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/436.

Full text

Abstract:

Deep reinforcement learning (RL) has demonstrated success in challenging decision-making/control tasks. However, RL methods, which solve tasks through maximizing the expected reward, may generate undesirable behaviors due to inferior local convergence or incompetent reward design. These undesirable behaviors of agents may not reduce the total reward but destroy the user experience of the application. For example, in the autonomous driving task, the policy actuated by speed reward behaves much more sudden brakes while human drivers generally don’t do that. To overcome this problem, we present a novel method named Reward-Constrained Behavior Cloning (RCBC) which synthesizes imitation learning and constrained reinforcement learning. RCBC leverages human demonstrations to induce desirable or human-like behaviors and employs lower-bound reward constraints for policy optimization to maximize the expected reward. Empirical results on popular benchmark environments show that RCBC learns signiﬁcantly more human-desired policies with performance guarantees which meet the lower-bound reward constraints while performing better than or as well as baseline methods in terms of reward maximization.

APA, Harvard, Vancouver, ISO, and other styles

3

Peruvemba, Yasasvi V., Shubham Rai, Kapil Ahuja, and Akash Kumar. "RL-Guided Runtime-Constrained Heuristic Exploration for Logic Synthesis." In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 2021. http://dx.doi.org/10.1109/iccad51958.2021.9643530.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

HasanzadeZonuzy, Aria, Dileep Kalathil, and Srinivas Shakkottai. "Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/347.

Full text

Abstract:

In many real-world reinforcement learning (RL) problems, in addition to maximizing the objective, the learning agent has to maintain some necessary safety constraints. We formulate the problem of learning a safe policy as an infinite-horizon discounted Constrained Markov Decision Process (CMDP) with an unknown transition probability matrix, where the safety requirements are modeled as constraints on expected cumulative costs. We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm, where the algorithm has access to a generative model, and (ii) UC-CRL algorithm, where the algorithm learns the model using an upper confidence style online exploration method. We characterize the sample complexity of these algorithms, i.e., the the number of samples needed to ensure a desired level of accuracy with high probability, both with respect to objective maximization and constraint satisfaction.

APA, Harvard, Vancouver, ISO, and other styles

5

Liu, Yongshuai, Avishai Halev, and Xin Liu. "Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/614.

Full text

Abstract:

Reinforcement Learning (RL) algorithms have had tremendous success in simulated domains. These algorithms, however, often cannot be directly applied to physical systems, especially in cases where there are constraints to satisfy (e.g. to ensure safety or limit resource consumption). In standard RL, the agent is incentivized to explore any policy with the sole goal of maximizing reward; in the real world, however, ensuring satisfaction of certain constraints in the process is also necessary and essential. In this article, we overview existing approaches addressing constraints in model-free reinforcement learning. We model the problem of learning with constraints as a Constrained Markov Decision Process and consider two main types of constraints: cumulative and instantaneous. We summarize existing approaches and discuss their pros and cons. To evaluate policy performance under constraints, we introduce a set of standard benchmarks and metrics. We also summarize limitations of current methods and present open questions for future research.

APA, Harvard, Vancouver, ISO, and other styles

6

Alonso, Eloi, Maxim Peter, David Goumard, and Joshua Romoff. "Deep Reinforcement Learning for Navigation in AAA Video Games." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/294.

Full text

Abstract:

In video games, \non-player characters (NPCs) are used to enhance the players' experience in a variety of ways, e.g., as enemies, allies, or innocent bystanders. A crucial component of NPCs is navigation, which allows them to move from one point to another on the map. The most popular approach for NPC navigation in the video game industry is to use a navigation mesh (NavMesh), which is a graph representation of the map, with nodes and edges indicating traversable areas. Unfortunately, complex navigation abilities that extend the character's capacity for movement, e.g., grappling hooks, jetpacks, teleportation, or double-jumps, increase the complexity of the NavMesh, making it intractable in many practical scenarios. Game designers are thus constrained to only add abilities that can be handled by a NavMesh. As an alternative to the NavMesh, we propose to use Deep Reinforcement Learning (Deep RL) to learn how to navigate 3D maps in video games using any navigation ability. We test our approach on complex 3D environments that are notably an order of magnitude larger than maps typically used in the Deep RL literature. One of these environments is from a recently released AAA video game called Hyper Scape. We find that our approach performs surprisingly well, achieving at least 90% success rate in a variety of scenarios using complex navigation abilities.

APA, Harvard, Vancouver, ISO, and other styles

7

Jiang, Di, Yuan Cao, and Qiang Yang. "On the Channel Pruning using Graph Convolution Network for Convolutional Neural Network Acceleration." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/431.

Full text

Abstract:

Network pruning is considered efficient for sparsification and acceleration of Convolutional Neural Network (CNN) based models that can be adopted in re-source-constrained environments. Inspired by two popular pruning criteria, i.e. magnitude and similarity, this paper proposes a novel structural pruning method based on Graph Convolution Network (GCN) to further promote compression performance. The channel features are firstly extracted by Global Average Pooling (GAP) from a batch of samples, and a graph model for each layer is generated based on the similarity of features. A set of agents for individual CNN layers are implemented by GCN and utilized to synthesize comprehensive channel information and determine the pruning scheme for the overall CNN model. The training process of each agent is carried out using Reinforcement Learning (RL) to ensure their convergence and adaptability to various network architectures. The proposed solution is assessed based on a range of image classification datasets i.e., CIFAR and Tiny-ImageNet. The numerical results indicate that the proposed pruning method outperforms the pure magnitude-based or similarity-based pruning solutions and other SOTA methods (e.g., HRank and SCP). For example, the proposed method can prune VGG16 by removing 93% of the model parameters without any accuracy reduction in the CIFAR10 dataset.

APA, Harvard, Vancouver, ISO, and other styles

8

Back, Philipp. "Real-World Reinforcement Learning: Observations from Two Successful Cases." In Digital Support from Crisis to Progressive Change. University of Maribor Press, 2021. http://dx.doi.org/10.18690/978-961-286-485-9.20.

Full text

Abstract:

Reinforcement Learning (RL) is a machine learning technique that enables artificial agents to learn optimal strategies for sequential decision-making problems. RL has achieved superhuman performance in artificial domains, yet real-world applications remain rare. We explore the drivers of successful RL adoption for solving practical business problems. We rely on publicly available secondary data on two cases: data center cooling at Google and trade order execution at JPMorgan. We perform thematic analysis using a pre-defined coding framework based on the known challenges to real-world RL by DulacArnold, Mankowitz, & Hester (2019). First, we find that RL works best when the problem dynamics can be simulated. Second, the ability to encode the desired agent behavior as a reward function is critical. Third, safety constraints are often necessary in the context of trial-and-error learning. Our work is amongst the first in Information Systems to discuss the practical business value of the emerging AI subfield of RL.

APA, Harvard, Vancouver, ISO, and other styles

9

Sohn, Sungryull, Yinlam Chow, Jayden Ooi, Ofir Nachum, Honglak Lee, Ed Chi, and Craig Boutilier. "BRPO: Batch Residual Policy Optimization." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/391.

Full text

Abstract:

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state. This can cause batch RL to be overly conservative, unable to exploit large policy changes at frequently-visited, high-confidence states without risking poor performance at sparsely-visited states. To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent. We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance. We show that BRPO achieves the state-of-the-art performance in a number of tasks.

APA, Harvard, Vancouver, ISO, and other styles

10

Fedullo, Tommaso, Alberto Morato, Federico Tramarin, Paolo Ferrari, and Emiliano Sisinni. "Smart Measurement Systems Exploiting Adaptive LoRaWAN Under Power Consumption Constraints: a RL Approach." In 2022 IEEE International Workshop on Metrology for Industry 4.0 & IoT (MetroInd4.0&IoT). IEEE, 2022. http://dx.doi.org/10.1109/metroind4.0iot54413.2022.9831487.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!