Academic literature on the topic 'Constrained Reinforcement Learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Constrained Reinforcement Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Constrained Reinforcement Learning"

1

Pankayaraj, Pathmanathan, and Pradeep Varakantham. "Constrained Reinforcement Learning in Hard Exploration Problems." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (2023): 15055–63. http://dx.doi.org/10.1609/aaai.v37i12.26757.

Full text
Abstract:
One approach to guaranteeing safety in Reinforcement Learning is through cost constraints that are dependent on the policy. Recent works in constrained RL have developed methods that ensure constraints are enforced even at learning time while maximizing the overall value of the policy. Unfortunately, as demonstrated in our experimental results, such approaches do not perform well on complex multi-level tasks, with longer episode lengths or sparse rewards. To that end, we propose a scalable hierarchical approach for constrained RL problems that employs backward cost value functions in the conte
APA, Harvard, Vancouver, ISO, and other styles
2

HasanzadeZonuzy, Aria, Archana Bura, Dileep Kalathil, and Srinivas Shakkottai. "Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (2021): 7667–74. http://dx.doi.org/10.1609/aaai.v35i9.16937.

Full text
Abstract:
Many physical systems have underlying safety considerations that require that the policy employed ensures the satisfaction of a set of constraints. The analytical formulation usually takes the form of a Constrained Markov Decision Process (CMDP). We focus on the case where the CMDP is unknown, and RL algorithms obtain samples to discover the model and compute an optimal constrained policy. Our goal is to characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy---both objective maximization and constraint satisfaction---in a
APA, Harvard, Vancouver, ISO, and other styles
3

Dai, Juntao, Jiaming Ji, Long Yang, Qian Zheng, and Gang Pan. "Augmented Proximal Policy Optimization for Safe Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (2023): 7288–95. http://dx.doi.org/10.1609/aaai.v37i6.25888.

Full text
Abstract:
Safe reinforcement learning considers practical scenarios that maximize the return while satisfying safety constraints. Current algorithms, which suffer from training oscillations or approximation errors, still struggle to update the policy efficiently with precise constraint satisfaction. In this article, we propose Augmented Proximal Policy Optimization (APPO), which augments the Lagrangian function of the primal constrained problem via attaching a quadratic deviation term. The constructed multiplier-penalty function dampens cost oscillation for stable convergence while being equivalent to t
APA, Harvard, Vancouver, ISO, and other styles
4

Bhatia, Abhinav, Pradeep Varakantham, and Akshat Kumar. "Resource Constrained Deep Reinforcement Learning." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 610–20. http://dx.doi.org/10.1609/icaps.v29i1.3528.

Full text
Abstract:
In urban environments, resources have to be constantly matched to the “right” locations where customer demand is present. For instance, ambulances have to be matched to base stations regularly so as to reduce response time for emergency incidents in ERS (Emergency Response Systems); vehicles (cars, bikes among others) have to be matched to docking stations to reduce lost demand in shared mobility systems. Such problems are challenging owing to the demand uncertainty, combinatorial action spaces and constraints on allocation of resources (e.g., total resources, minimum and maximum number of res
APA, Harvard, Vancouver, ISO, and other styles
5

Yang, Qisong, Thiago D. Simão, Simon H. Tindemans, and Matthijs T. J. Spaan. "WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (2021): 10639–46. http://dx.doi.org/10.1609/aaai.v35i12.17272.

Full text
Abstract:
Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without considering the tail of the distribution. For instance, in safety-critical domains, worst-case analysis is required to avoid disastrous results. We present a novel reinforcement learning algorithm called Worst-Case Soft Actor Critic, which extends the Soft Act
APA, Harvard, Vancouver, ISO, and other styles
6

Zhou, Zixian, Mengda Huang, Feiyang Pan, et al. "Gradient-Adaptive Pareto Optimization for Constrained Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (2023): 11443–51. http://dx.doi.org/10.1609/aaai.v37i9.26353.

Full text
Abstract:
Constrained Reinforcement Learning (CRL) burgeons broad interest in recent years, which pursues maximizing long-term returns while constraining costs. Although CRL can be cast as a multi-objective optimization problem, it is still facing the key challenge that gradient-based Pareto optimization methods tend to stick to known Pareto-optimal solutions even when they yield poor returns (e.g., the safest self-driving car that never moves) or violate the constraints (e.g., the record-breaking racer that crashes the car). In this paper, we propose Gradient-adaptive Constrained Policy Optimization (G
APA, Harvard, Vancouver, ISO, and other styles
7

He, Tairan, Weiye Zhao, and Changliu Liu. "AutoCost: Evolving Intrinsic Cost for Zero-Violation Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (2023): 14847–55. http://dx.doi.org/10.1609/aaai.v37i12.26734.

Full text
Abstract:
Safety is a critical hurdle that limits the application of deep reinforcement learning to real-world control tasks. To this end, constrained reinforcement learning leverages cost functions to improve safety in constrained Markov decision process. However, constrained methods fail to achieve zero violation even when the cost limit is zero. This paper analyzes the reason for such failure, which suggests that a proper cost function plays an important role in constrained RL. Inspired by the analysis, we propose AutoCost, a simple yet effective framework that automatically searches for cost functio
APA, Harvard, Vancouver, ISO, and other styles
8

Yang, Zhaoxing, Haiming Jin, Rong Ding, et al. "DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (2023): 10861–70. http://dx.doi.org/10.1609/aaai.v37i9.26288.

Full text
Abstract:
In recent years, multi-agent reinforcement learning (MARL) has presented impressive performance in various applications. However, physical limitations, budget restrictions, and many other factors usually impose constraints on a multi-agent system (MAS), which cannot be handled by traditional MARL frameworks. Specifically, this paper focuses on constrained MASes where agents work cooperatively to maximize the expected team-average return under various constraints on expected team-average costs, and develops a constrained cooperative MARL framework, named DeCOM, for such MASes. In particular, De
APA, Harvard, Vancouver, ISO, and other styles
9

Martins, Miguel S. E., Joaquim L. Viegas, Tiago Coito, et al. "Reinforcement Learning for Dual-Resource Constrained Scheduling." IFAC-PapersOnLine 53, no. 2 (2020): 10810–15. http://dx.doi.org/10.1016/j.ifacol.2020.12.2866.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Guenter, Florent, Micha Hersch, Sylvain Calinon, and Aude Billard. "Reinforcement learning for imitating constrained reaching movements." Advanced Robotics 21, no. 13 (2007): 1521–44. http://dx.doi.org/10.1163/156855307782148550.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Constrained Reinforcement Learning"

1

Chung, Jen Jen. "Learning to soar: exploration strategies in reinforcement learning for resource-constrained missions." Thesis, The University of Sydney, 2014. http://hdl.handle.net/2123/11733.

Full text
Abstract:
An unpowered aerial glider learning to soar in a wind field presents a new manifestation of the exploration-exploitation trade-off. This thesis proposes a directed, adaptive and nonmyopic exploration strategy in a temporal difference reinforcement learning framework for tackling the resource-constrained exploration-exploitation task of this autonomous soaring problem. The complete learning algorithm is developed in a SARSA() framework, which uses a Gaussian process with a squared exponential covariance function to approximate the value function. The three key contributions of this thesis for
APA, Harvard, Vancouver, ISO, and other styles
2

Araújo, Anderson Viçoso de. "ERG-ARCH : a reinforcement learning architecture for propositionally constrained multi-agent state spaces." Instituto Tecnológico de Aeronáutica, 2014. http://www.bd.bibl.ita.br/tde_busca/arquivo.php?codArquivo=3096.

Full text
Abstract:
The main goal of this work is to present an approach that ?nds an appropriate set of sequential actions for a group of cooperative agents interacting over a constrained environment. This search is considered a complex task for autonomous agents and is not possible to use default reinforcement learning algorithms to learn the adequate policy. In this thesis, a technique that deals with propositionally constrained state spaces and makes use of a Reinforcement Learning algorithm based on Markov Decision Process is proposed. A new model is also presented which formally de?nes this restricted searc
APA, Harvard, Vancouver, ISO, and other styles
3

Pavesi, Alessandro. "Design and implementation of a Reinforcement Learning framework for iOS devices." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022. http://amslaurea.unibo.it/25811/.

Full text
Abstract:
Reinforcement Learning is an increasingly popular area of Artificial Intelligence. The applications of this learning paradigm are many, but its application in mobile computing is in its infancy. This study aims to provide an overview of current Reinforcement Learning applications on mobile devices, as well as to introduce a new framework for iOS devices: Swift-RL Lib. This new Swift package allows developers to easily support and integrate two of the most common RL algorithms, Q-Learning and Deep Q-Network, in a fully customizable environment. All processes are performed on the device, without
APA, Harvard, Vancouver, ISO, and other styles
4

Watanabe, Takashi. "Regret analysis of constrained irreducible MDPs with reset action." Kyoto University, 2020. http://hdl.handle.net/2433/253371.

Full text
Abstract:
Kyoto University (京都大学)<br>0048<br>新制・課程博士<br>博士(人間・環境学)<br>甲第22535号<br>人博第938号<br>新制||人||223(附属図書館)<br>2019||人博||938(吉田南総合図書館)<br>京都大学大学院人間・環境学研究科共生人間学専攻<br>(主査)准教授 櫻川 貴司, 教授 立木 秀樹, 教授 日置 尋久<br>学位規則第4条第1項該当
APA, Harvard, Vancouver, ISO, and other styles
5

Allmendinger, Richard. "Tuning evolutionary search for closed-loop optimization." Thesis, University of Manchester, 2012. https://www.research.manchester.ac.uk/portal/en/theses/tuning-evolutionary-search-for-closedloop-optimization(d54e63e2-7927-42aa-b974-c41e717298cb).html.

Full text
Abstract:
Closed-loop optimization deals with problems in which candidate solutions are evaluated by conducting experiments, e.g. physical or biochemical experiments. Although this form of optimization is becoming more popular across the sciences, it may be subject to rather unexplored resourcing issues, as any experiment may require resources in order to be conducted. In this thesis we are concerned with understanding how evolutionary search is affected by three particular resourcing issues -- ephemeral resource constraints (ERCs), changes of variables, and lethal environments -- and the development of
APA, Harvard, Vancouver, ISO, and other styles
6

Irani, Arya John. "Utilizing negative policy information to accelerate reinforcement learning." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53481.

Full text
Abstract:
A pilot study by Subramanian et al. on Markov decision problem task decomposition by humans revealed that participants break down tasks into both short-term subgoals with a defined end-condition (such as "go to food") and long-term considerations and invariants with no end-condition (such as "avoid predators"). In the context of Markov decision problems, behaviors having clear start and end conditions are well-modeled by an abstraction known as options, but no abstraction exists in the literature for continuous constraints imposed on the agent's behavior. We propose two representations to
APA, Harvard, Vancouver, ISO, and other styles
7

Acevedo, Valle Juan Manuel. "Sensorimotor exploration: constraint awareness and social reinforcement in early vocal development." Doctoral thesis, Universitat Politècnica de Catalunya, 2018. http://hdl.handle.net/10803/667500.

Full text
Abstract:
This research is motivated by the benefits that knowledge regarding early development in infants may provide to different fields of science. In particular, early sensorimotor exploration behaviors are studied in the framework of developmental robotics. The main objective is about understanding the role of motor constraint awareness and imitative behaviors during sensorimotor exploration. Particular emphasis is placed on prelinguistic vocal development because during this stage infants start to master the motor systems that will later allow them to pronounce their first words. Previous works
APA, Harvard, Vancouver, ISO, and other styles
8

Racey, Deborah Elaine. "EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASK." OpenSIUC, 2009. https://opensiuc.lib.siu.edu/dissertations/86.

Full text
Abstract:
An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-known options) and exploitive (choosing options with the greatest probability of reinforcement) human choice in a trial-and-error learning problem. In Experiment 1 a different probability of reinforcement was assigned to each of 8 response options using random-ratios (RRs), and participants chose by clicking buttons in a circular display on a computer screen using a computer mouse. Relative frequency thresholds (ranging from .10 to 1.0) were randomly assigned to each participant and acted as task
APA, Harvard, Vancouver, ISO, and other styles
9

Cline, Tammy Lynn. "Effects of Training Accurate Component Strokes Using Response Constraint and Self-evaluation on Whole Letter Writing." Thesis, University of North Texas, 2006. https://digital.library.unt.edu/ark:/67531/metadc5472/.

Full text
Abstract:
This study analyzed the effects of a training package containing response constraint, self-evaluation, reinforcement, and a fading procedure on written letter components and whole letter writing in four elementary school participants. The effect on accuracy of written components was evaluated using a multiple-baseline-across components and a continuous probe design of components, as well as pre-test, baseline, and post-test measures. The results of this study show that response constraint and self-evaluation quickly improved students' performance in writing components. Fading of the interventi
APA, Harvard, Vancouver, ISO, and other styles
10

Hester, Todd. "Texplore : temporal difference reinforcement learning for robots and time-constrained domains." Thesis, 2012. http://hdl.handle.net/2152/ETD-UT-2012-12-6763.

Full text
Abstract:
Robots have the potential to solve many problems in society, because of their ability to work in dangerous places doing necessary jobs that no one wants or is able to do. One barrier to their widespread deployment is that they are mainly limited to tasks where it is possible to hand-program behaviors for every situation that may be encountered. For robots to meet their potential, they need methods that enable them to learn and adapt to novel situations that they were not programmed for. Reinforcement learning (RL) is a paradigm for learning sequential decision making processes and could solve
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Constrained Reinforcement Learning"

1

Hester, Todd. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Springer International Publishing, 2013. http://dx.doi.org/10.1007/978-3-319-01168-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Hester, Todd. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Springer International Publishing, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Hester, Todd. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Springer, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Hester, Todd. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Springer, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hester, Todd. TEXPLORE: Temporal Difference Reinforcement Learning for Robots and Time-Constrained Domains. Springer, 2016.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Constrained Reinforcement Learning"

1

Junges, Sebastian, Nils Jansen, Christian Dehnert, Ufuk Topcu, and Joost-Pieter Katoen. "Safety-Constrained Reinforcement Learning for MDPs." In Tools and Algorithms for the Construction and Analysis of Systems. Springer Berlin Heidelberg, 2016. http://dx.doi.org/10.1007/978-3-662-49674-9_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Huiwei, Huaqing Li, and Bo Zhou. "Reinforcement Learning for Constrained Games with Incomplete Information." In Distributed Optimization, Game and Learning Algorithms. Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-33-4528-7_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ling, Jiajing, Arambam James Singh, Nguyen Duc Thien, and Akshat Kumar. "Constrained Multiagent Reinforcement Learning for Large Agent Population." In Machine Learning and Knowledge Discovery in Databases. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26412-2_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Winkel, David, Niklas Strauß, Matthias Schubert, Yunpu Ma, and Thomas Seidl. "Constrained Portfolio Management Using Action Space Decomposition for Reinforcement Learning." In Advances in Knowledge Discovery and Data Mining. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-33377-4_29.

Full text
Abstract:
AbstractFinancial portfolio managers typically face multi-period optimization tasks such as short-selling or investing at least a particular portion of the portfolio in a specific industry sector. A common approach to tackle these problems is to use constrained Markov decision process (CMDP) methods, which may suffer from sample inefficiency, hyperparameter tuning, and lack of guarantees for constraint violations. In this paper, we propose Action Space Decomposition Based Optimization (ADBO) for optimizing a more straightforward surrogate task that allows actions to be mapped back to the original task. We examine our method on two real-world data portfolio construction tasks. The results show that our new approach consistently outperforms state-of-the-art benchmark approaches for general CMDPs.
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Wei, and Waleed Meleis. "Adaptive Adjacency Kanerva Coding for Memory-Constrained Reinforcement Learning." In Machine Learning and Data Mining in Pattern Recognition. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-96136-1_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hasanbeig, Mohammadhosein, Daniel Kroening, and Alessandro Abate. "LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning." In Quantitative Evaluation of Systems. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-16336-4_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ferrari, Silvia, Keith Rudd, and Gianluca Di Muro. "A Constrained Backpropagation Approach to Function Approximation and Approximate Dynamic Programming." In Reinforcement Learning and Approximate Dynamic Programming for Feedback Control. John Wiley & Sons, Inc., 2013. http://dx.doi.org/10.1002/9781118453988.ch8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Sharif, Muddsair, Charitha Buddhika Heendeniya, and Gero Lückemeyer. "ARaaS: Context-Aware Optimal Charging Distribution Using Deep Reinforcement Learning." In iCity. Transformative Research for the Livable, Intelligent, and Sustainable City. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-92096-8_12.

Full text
Abstract:
AbstractElectromobility has profound economic and ecological impacts on human society. Much of the mobility sector’s transformation is catalyzed by digitalization, enabling many stakeholders, such as vehicle users and infrastructure owners, to interact with each other in real time. This article presents a new concept based on deep reinforcement learning to optimize agent interactions and decision-making in a smart mobility ecosystem. The algorithm performs context-aware, constrained optimization that fulfills on-demand requests from each agent. The algorithm can learn from the surrounding environment until the agent interactions reach an optimal equilibrium point in a given context. The methodology implements an automatic template-based approach via a continuous integration and delivery (CI/CD) framework using a GitLab runner and transfers highly computationally intensive tasks over a high-performance computing cluster automatically without manual intervention.
APA, Harvard, Vancouver, ISO, and other styles
9

Jędrzejowicz, Piotr, and Ewa Ratajczak-Ropel. "Reinforcement Learning Strategy for A-Team Solving the Resource-Constrained Project Scheduling Problem." In Computational Collective Intelligence. Technologies and Applications. Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40495-5_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Dutta, Hrishikesh, Amit Kumar Bhuyan, and Subir Biswas. "Reinforcement Learning for Protocol Synthesis in Resource-Constrained Wireless Sensor and IoT Networks." In Ubiquitous Networking. Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-29419-8_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Constrained Reinforcement Learning"

1

Hu, Chengpeng, Jiyuan Pei, Jialin Liu, and Xin Yao. "Evolving Constrained Reinforcement Learning Policy." In 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023. http://dx.doi.org/10.1109/ijcnn54540.2023.10191982.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Linrui, Li Shen, Long Yang, et al. "Penalized Proximal Policy Optimization for Safe Reinforcement Learning." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/520.

Full text
Abstract:
Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple yet effective penalty approach to eliminate cost constraints and removes the trust-region constraint by the clipped s
APA, Harvard, Vancouver, ISO, and other styles
3

HasanzadeZonuzy, Aria, Dileep Kalathil, and Srinivas Shakkottai. "Model-Based Reinforcement Learning for Infinite-Horizon Discounted Constrained Markov Decision Processes." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/347.

Full text
Abstract:
In many real-world reinforcement learning (RL) problems, in addition to maximizing the objective, the learning agent has to maintain some necessary safety constraints. We formulate the problem of learning a safe policy as an infinite-horizon discounted Constrained Markov Decision Process (CMDP) with an unknown transition probability matrix, where the safety requirements are modeled as constraints on expected cumulative costs. We propose two model-based constrained reinforcement learning (CRL) algorithms for learning a safe policy, namely, (i) GM-CRL algorithm, where the algorithm has access to
APA, Harvard, Vancouver, ISO, and other styles
4

Skalse, Joar, Lewis Hammond, Charlie Griffin, and Alessandro Abate. "Lexicographic Multi-Objective Reinforcement Learning." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/476.

Full text
Abstract:
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, a
APA, Harvard, Vancouver, ISO, and other styles
5

Zhao, Weiye, Tairan He, Rui Chen, Tianhao Wei, and Changliu Liu. "State-wise Safe Reinforcement Learning: A Survey." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/763.

Full text
Abstract:
Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address stat
APA, Harvard, Vancouver, ISO, and other styles
6

Sarafian, Elad, Aviv Tamar, and Sarit Kraus. "Constrained Policy Improvement for Efficient Reinforcement Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/396.

Full text
Abstract:
We propose a policy improvement algorithm for Reinforcement Learning (RL) termed Rerouted Behavior Improvement (RBI). RBI is designed to take into account the evaluation errors of the Q-function. Such errors are common in RL when learning the Q-value from finite experience data. Greedy policies or even constrained policy optimization algorithms that ignore these errors may suffer from an improvement penalty (i.e., a policy impairment). To reduce the penalty, the idea of RBI is to attenuate rapid policy changes to actions that were rarely sampled. This approach is shown to avoid catastrophic pe
APA, Harvard, Vancouver, ISO, and other styles
7

Lee, Jongmin, Youngsoo Jang, Pascal Poupart, and Kee-Eung Kim. "Constrained Bayesian Reinforcement Learning via Approximate Linear Programming." In Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/290.

Full text
Abstract:
In this paper, we consider the safe learning scenario where we need to restrict the exploratory behavior of a reinforcement learning agent. Specifically, we treat the problem as a form of Bayesian reinforcement learning in an environment that is modeled as a constrained MDP (CMDP) where the cost function penalizes undesirable situations. We propose a model-based Bayesian reinforcement learning (BRL) algorithm for such an environment, eliciting risk-sensitive exploration in a principled way. Our algorithm efficiently solves the constrained BRL problem by approximate linear programming, and gene
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Weiqin, Dharmashankar Subramanian, and Santiago Paternain. "Policy Gradients for Probabilistic Constrained Reinforcement Learning." In 2023 57th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2023. http://dx.doi.org/10.1109/ciss56502.2023.10089763.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Abe, Naoki, Melissa Kowalczyk, Mark Domick, et al. "Optimizing debt collections using constrained reinforcement learning." In the 16th ACM SIGKDD international conference. ACM Press, 2010. http://dx.doi.org/10.1145/1835804.1835817.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Malik, Shehryar, Muhammad Umair Haider, Omer Iqbal, and Murtaza Taj. "Neural Network Pruning Through Constrained Reinforcement Learning." In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 2022. http://dx.doi.org/10.1109/icpr56361.2022.9956050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!