To see the other types of publications on this topic, follow the link: Safe Reinforcement Learning.

Journal articles on the topic 'Safe Reinforcement Learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Safe Reinforcement Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Horie, Naoto, Tohgoroh Matsui, Koichi Moriyama, Atsuko Mutoh, and Nobuhiro Inuzuka. "Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning." Artificial Life and Robotics 24, no. 3 (February 8, 2019): 352–59. http://dx.doi.org/10.1007/s10015-019-00523-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Yang, Yongliang, Kyriakos G. Vamvoudakis, and Hamidreza Modares. "Safe reinforcement learning for dynamical games." International Journal of Robust and Nonlinear Control 30, no. 9 (March 25, 2020): 3706–26. http://dx.doi.org/10.1002/rnc.4962.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Xu, Haoran, Xianyuan Zhan, and Xiangyu Zhu. "Constraints Penalized Q-learning for Safe Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Full text
Abstract:
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.
APA, Harvard, Vancouver, ISO, and other styles
4

García, Javier, and Fernando Fernández. "Probabilistic Policy Reuse for Safe Reinforcement Learning." ACM Transactions on Autonomous and Adaptive Systems 13, no. 3 (March 28, 2019): 1–24. http://dx.doi.org/10.1145/3310090.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mannucci, Tommaso, Erik-Jan van Kampen, Cornelis de Visser, and Qiping Chu. "Safe Exploration Algorithms for Reinforcement Learning Controllers." IEEE Transactions on Neural Networks and Learning Systems 29, no. 4 (April 2018): 1069–81. http://dx.doi.org/10.1109/tnnls.2017.2654539.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Karthikeyan, P., Wei-Lun Chen, and Pao-Ann Hsiung. "Autonomous Intersection Management by Using Reinforcement Learning." Algorithms 15, no. 9 (September 13, 2022): 326. http://dx.doi.org/10.3390/a15090326.

Full text
Abstract:
Developing a safer and more effective intersection-control system is essential given the trends of rising populations and vehicle numbers. Additionally, as vehicle communication and self-driving technologies evolve, we may create a more intelligent control system to reduce traffic accidents. We recommend deep reinforcement learning-inspired autonomous intersection management (DRLAIM) to improve traffic environment efficiency and safety. The three primary models used in this methodology are the priority assignment model, the intersection-control model learning, and safe brake control. The brake-safe control module is utilized to make sure that each vehicle travels safely, and we train the system to acquire an effective model by using reinforcement learning. We have simulated our proposed method by using a simulation of urban mobility tools. Experimental results show that our approach outperforms the traditional method.
APA, Harvard, Vancouver, ISO, and other styles
7

Mazouchi, Majid, Subramanya Nageshrao, and Hamidreza Modares. "Conflict-Aware Safe Reinforcement Learning: A Meta-Cognitive Learning Framework." IEEE/CAA Journal of Automatica Sinica 9, no. 3 (March 2022): 466–81. http://dx.doi.org/10.1109/jas.2021.1004353.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cowen-Rivers, Alexander I., Daniel Palenicek, Vincent Moens, Mohammed Amin Abdullah, Aivar Sootla, Jun Wang, and Haitham Bou-Ammar. "SAMBA: safe model-based & active reinforcement learning." Machine Learning 111, no. 1 (January 2022): 173–203. http://dx.doi.org/10.1007/s10994-021-06103-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Serrano-Cuevas, Jonathan, Eduardo F. Morales, and Pablo Hernández-Leal. "Safe reinforcement learning using risk mapping by similarity." Adaptive Behavior 28, no. 4 (July 18, 2019): 213–24. http://dx.doi.org/10.1177/1059712319859650.

Full text
Abstract:
Reinforcement learning (RL) has been used to successfully solve sequential decision problem. However, considering risk at the same time as the learning process is an open research problem. In this work, we are interested in the type of risk that can lead to a catastrophic state. Related works that aim to deal with risk propose complex models. In contrast, we follow a simple, yet effective, idea: similar states might lead to similar risk. Using this idea, we propose risk mapping by similarity (RMS), an algorithm for discrete scenarios which infers the risk of newly discovered states by analyzing how similar they are to previously known risky states. In general terms, the RMS algorithm transfers the knowledge gathered by the agent regarding the risk to newly discovered states. We contribute with a new approach to consider risk based on similarity and with RMS, which is simple and generalizable as long as the premise similar states yield similar risk holds. RMS is not an RL algorithm, but a method to generate a risk-aware reward shaping signal that can be used with a RL algorithm to generate risk-aware policies.
APA, Harvard, Vancouver, ISO, and other styles
10

Andersen, Per-Arne, Morten Goodwin, and Ole-Christoffer Granmo. "Towards safe reinforcement-learning in industrial grid-warehousing." Information Sciences 537 (October 2020): 467–84. http://dx.doi.org/10.1016/j.ins.2020.06.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Carr, Steven, Nils Jansen, Sebastian Junges, and Ufuk Topcu. "Safe Reinforcement Learning via Shielding under Partial Observability." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 14748–56. http://dx.doi.org/10.1609/aaai.v37i12.26723.

Full text
Abstract:
Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches to this problem assume domain knowledge in the form of a (partial) model of this environment to decide upon the safety of an action. A so-called shield forces the RL agent to select only safe actions. However, for adoption in various applications, one must look beyond enforcing safety and also ensure the applicability of RL with good performance. We extend the applicability of shields via tight integration with state-of-the-art deep RL, and provide an extensive, empirical study in challenging, sparse-reward environments under partial observability. We show that a carefully integrated shield ensures safety and can improve the convergence rate and final performance of RL agents. We furthermore show that a shield can be used to bootstrap state-of-the-art RL agents: they remain safe after initial learning in a shielded setting, allowing us to disable a potentially too conservative shield eventually.
APA, Harvard, Vancouver, ISO, and other styles
12

Dai, Juntao, Jiaming Ji, Long Yang, Qian Zheng, and Gang Pan. "Augmented Proximal Policy Optimization for Safe Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 7288–95. http://dx.doi.org/10.1609/aaai.v37i6.25888.

Full text
Abstract:
Safe reinforcement learning considers practical scenarios that maximize the return while satisfying safety constraints. Current algorithms, which suffer from training oscillations or approximation errors, still struggle to update the policy efficiently with precise constraint satisfaction. In this article, we propose Augmented Proximal Policy Optimization (APPO), which augments the Lagrangian function of the primal constrained problem via attaching a quadratic deviation term. The constructed multiplier-penalty function dampens cost oscillation for stable convergence while being equivalent to the primal constrained problem to precisely control safety costs. APPO alternately updates the policy and the Lagrangian multiplier via solving the constructed augmented primal-dual problem, which can be easily implemented by any first-order optimizer. We apply our APPO methods in diverse safety-constrained tasks, setting a new state of the art compared with a comprehensive list of safe RL baselines. Extensive experiments verify the merits of our method in easy implementation, stable convergence, and precise cost control.
APA, Harvard, Vancouver, ISO, and other styles
13

Marchesini, Enrico, Davide Corsi, and Alessandro Farinelli. "Exploring Safer Behaviors for Deep Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7701–9. http://dx.doi.org/10.1609/aaai.v36i7.20737.

Full text
Abstract:
We consider Reinforcement Learning (RL) problems where an agent attempts to maximize a reward signal while minimizing a cost function that models unsafe behaviors. Such formalization is addressed in the literature using constrained optimization on the cost, limiting the exploration and leading to a significant trade-off between cost and reward. In contrast, we propose a Safety-Oriented Search that complements Deep RL algorithms to bias the policy toward safety within an evolutionary cost optimization. We leverage evolutionary exploration benefits to design a novel concept of safe mutations that use visited unsafe states to explore safer actions. We further characterize the behaviors of the policies over desired specifics with a sample-based bound estimation, which makes prior verification analysis tractable in the training loop. Hence, driving the learning process towards safer regions of the policy space. Empirical evidence on the Safety Gym benchmark shows that we successfully avoid drawbacks on the return while improving the safety of the policy.
APA, Harvard, Vancouver, ISO, and other styles
14

Chen, Hongyi, Yu Zhang, Uzair Aslam Bhatti, and Mengxing Huang. "Safe Decision Controller for Autonomous DrivingBased on Deep Reinforcement Learning inNondeterministic Environment." Sensors 23, no. 3 (January 20, 2023): 1198. http://dx.doi.org/10.3390/s23031198.

Full text
Abstract:
Autonomous driving systems are crucial complicated cyber–physical systems that combine physical environment awareness with cognitive computing. Deep reinforcement learning is currently commonly used in the decision-making of such systems. However, black-box-based deep reinforcement learning systems do not guarantee system safety and the interpretability of the reward-function settings in the face of complex environments and the influence of uncontrolled uncertainties. Therefore, a formal security reinforcement learning method is proposed. First, we propose an environmental modeling approach based on the influence of nondeterministic environmental factors, which enables the precise quantification of environmental issues. Second, we use the environment model to formalize the reward machine’s structure, which is used to guide the reward-function setting in reinforcement learning. Third, we generate a control barrier function to ensure a safer state behavior policy for reinforcement learning. Finally, we verify the method’s effectiveness in intelligent driving using overtaking and lane-changing scenarios.
APA, Harvard, Vancouver, ISO, and other styles
15

Ryu, Yoon-Ha, Doukhi Oualid, and Deok-Jin Lee. "Research on Safe Reinforcement Controller Using Deep Reinforcement Learning with Control Barrier Function." Journal of Institute of Control, Robotics and Systems 28, no. 11 (November 30, 2022): 1013–21. http://dx.doi.org/10.5302/j.icros.2022.22.0187.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Thananjeyan, Brijen, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. "Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones." IEEE Robotics and Automation Letters 6, no. 3 (July 2021): 4915–22. http://dx.doi.org/10.1109/lra.2021.3070252.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Cui, Wenqi, Jiayi Li, and Baosen Zhang. "Decentralized safe reinforcement learning for inverter-based voltage control." Electric Power Systems Research 211 (October 2022): 108609. http://dx.doi.org/10.1016/j.epsr.2022.108609.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Basso, Rafael, Balázs Kulcsár, Ivan Sanchez-Diaz, and Xiaobo Qu. "Dynamic stochastic electric vehicle routing with safe reinforcement learning." Transportation Research Part E: Logistics and Transportation Review 157 (January 2022): 102496. http://dx.doi.org/10.1016/j.tre.2021.102496.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Pai, PENG, ZHU Fei, LIU Quan, ZHAO Peiyao, and WU Wen. "Achieving Safe Deep Reinforcement Learning via Environment Comprehension Mechanism." Chinese Journal of Electronics 30, no. 6 (November 2021): 1049–58. http://dx.doi.org/10.1049/cje.2021.07.025.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Mowbray, M., P. Petsagkourakis, E. A. del Rio-Chanona, and D. Zhang. "Safe chance constrained reinforcement learning for batch process control." Computers & Chemical Engineering 157 (January 2022): 107630. http://dx.doi.org/10.1016/j.compchemeng.2021.107630.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Zhao, Qingye, Yi Zhang, and Xuandong Li. "Safe reinforcement learning for dynamical systems using barrier certificates." Connection Science 34, no. 1 (December 12, 2022): 2822–44. http://dx.doi.org/10.1080/09540091.2022.2151567.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Gros, Sebastien, Mario Zanon, and Alberto Bemporad. "Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?" IFAC-PapersOnLine 53, no. 2 (2020): 8076–81. http://dx.doi.org/10.1016/j.ifacol.2020.12.2276.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Lu, Xiaozhen, Liang Xiao, Guohang Niu, Xiangyang Ji, and Qian Wang. "Safe Exploration in Wireless Security: A Safe Reinforcement Learning Algorithm With Hierarchical Structure." IEEE Transactions on Information Forensics and Security 17 (2022): 732–43. http://dx.doi.org/10.1109/tifs.2022.3149396.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Yuan, Zhaocong, Adam W. Hall, Siqi Zhou, Lukas Brunke, Melissa Greeff, Jacopo Panerati, and Angela P. Schoellig. "Safe-Control-Gym: A Unified Benchmark Suite for Safe Learning-Based Control and Reinforcement Learning in Robotics." IEEE Robotics and Automation Letters 7, no. 4 (October 2022): 11142–49. http://dx.doi.org/10.1109/lra.2022.3196132.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Garcia, J., and F. Fernandez. "Safe Exploration of State and Action Spaces in Reinforcement Learning." Journal of Artificial Intelligence Research 45 (December 19, 2012): 515–64. http://dx.doi.org/10.1613/jair.3761.

Full text
Abstract:
In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.
APA, Harvard, Vancouver, ISO, and other styles
26

Ma, Yecheng Jason, Andrew Shen, Osbert Bastani, and Jayaraman Dinesh. "Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 5 (June 28, 2022): 5404–12. http://dx.doi.org/10.1609/aaai.v36i5.20478.

Full text
Abstract:
Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment. We evaluate this conservative and adaptive penalty-based approach for model-based safe RL extensively on state and image-based environments. Our results demonstrate substantial gains in sample-efficiency while incurring fewer violations than prior safe RL algorithms. Code is available at: https://github.com/Redrew/CAP
APA, Harvard, Vancouver, ISO, and other styles
27

Chen, Hongyi, and Changliu Liu. "Safe and Sample-Efficient Reinforcement Learning for Clustered Dynamic Environments." IEEE Control Systems Letters 6 (2022): 1928–33. http://dx.doi.org/10.1109/lcsys.2021.3136486.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Yang, Yongliang, Kyriakos G. Vamvoudakis, Hamidreza Modares, Yixin Yin, and Donald C. Wunsch. "Safe Intermittent Reinforcement Learning With Static and Dynamic Event Generators." IEEE Transactions on Neural Networks and Learning Systems 31, no. 12 (December 2020): 5441–55. http://dx.doi.org/10.1109/tnnls.2020.2967871.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Li, Hepeng, Zhiqiang Wan, and Haibo He. "Constrained EV Charging Scheduling Based on Safe Deep Reinforcement Learning." IEEE Transactions on Smart Grid 11, no. 3 (May 2020): 2427–39. http://dx.doi.org/10.1109/tsg.2019.2955437.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Hailemichael, Habtamu, Beshah Ayalew, Lindsey Kerbel, Andrej Ivanco, and Keith Loiselle. "Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System." IFAC-PapersOnLine 55, no. 37 (2022): 615–20. http://dx.doi.org/10.1016/j.ifacol.2022.11.250.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

MINAMOTO, Gaku, Toshimitsu KANEKO, and Noriyuki HIRAYAMA. "Autonomous driving with safe reinforcement learning using rule-based judgment." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2022 (2022): 2A2—K03. http://dx.doi.org/10.1299/jsmermd.2022.2a2-k03.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Pathak, Shashank, Luca Pulina, and Armando Tacchella. "Verification and repair of control policies for safe reinforcement learning." Applied Intelligence 48, no. 4 (August 5, 2017): 886–908. http://dx.doi.org/10.1007/s10489-017-0999-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Dong, Wenbo, Shaofan Liu, and Shiliang Sun. "Safe batch constrained deep reinforcement learning with generative adversarial network." Information Sciences 634 (July 2023): 259–70. http://dx.doi.org/10.1016/j.ins.2023.03.108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Kondrup, Flemming, Thomas Jiralerspong, Elaine Lau, Nathan De Lara, Jacob Shkrob, My Duc Tran, Doina Precup, and Sumana Basu. "Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 15696–702. http://dx.doi.org/10.1609/aaai.v37i13.26862.

Full text
Abstract:
Mechanical ventilation is a key form of life support for patients with pulmonary impairment. Healthcare workers are required to continuously adjust ventilator settings for each patient, a challenging and time consuming task. Hence, it would be beneficial to develop an automated decision support tool to optimize ventilation treatment. We present DeepVent, a Conservative Q-Learning (CQL) based offline Deep Reinforcement Learning (DRL) agent that learns to predict the optimal ventilator parameters for a patient to promote 90 day survival. We design a clinically relevant intermediate reward that encourages continuous improvement of the patient vitals as well as addresses the challenge of sparse reward in RL. We find that DeepVent recommends ventilation parameters within safe ranges, as outlined in recent clinical trials. The CQL algorithm offers additional safety by mitigating the overestimation of the value estimates of out-of-distribution states/actions. We evaluate our agent using Fitted Q Evaluation (FQE) and demonstrate that it outperforms physicians from the MIMIC-III dataset.
APA, Harvard, Vancouver, ISO, and other styles
35

Fu, Yanbo, Wenjie Zhao, and Liu Liu. "Safe Reinforcement Learning for Transition Control of Ducted-Fan UAVs." Drones 7, no. 5 (May 22, 2023): 332. http://dx.doi.org/10.3390/drones7050332.

Full text
Abstract:
Ducted-fan tail-sitter unmanned aerial vehicles (UAVs) provide versatility and unique benefits, attracting significant attention in various applications. This study focuses on developing a safe reinforcement learning method for back-transition control between level flight mode and hover mode for ducted-fan tail-sitter UAVs. Our method enables transition control with a minimal altitude change and transition time while adhering to the velocity constraint. We employ the Trust Region Policy Optimization, Proximal Policy Optimization with Lagrangian, and Constrained Policy Optimization (CPO) algorithms for controller training, showcasing the superiority of the CPO algorithm and the necessity of the velocity constraint. The transition trajectory achieved using the CPO algorithm closely resembles the optimal trajectory obtained via the well-known GPOPS-II software with the SNOPT solver. Meanwhile, the CPO algorithm also exhibits strong robustness under unknown perturbations of UAV model parameters and wind disturbance.
APA, Harvard, Vancouver, ISO, and other styles
36

Xiao, Xinhang. "Reinforcement Learning Optimized Intelligent Electricity Dispatching System." Journal of Physics: Conference Series 2215, no. 1 (February 1, 2022): 012013. http://dx.doi.org/10.1088/1742-6596/2215/1/012013.

Full text
Abstract:
Abstract With the rapid development of artificial intelligence, new changes are coming to all walks of life including traditional manufacturing industry, bio-pharmaceutical industry, electric-power industry, etc.. For electric-power industry, many machine learning algorithms have been used to achieve intelligent dispatch of electricity, intelligent electricity equipment fault diagnosis and more optimized customer management, among which the automation of power network is one of the most important issue ensuring safe operation of electricity grid. In the recent years, intelligent dispatch method based on big data analysis has been used in intellectual scheduling of electricity and has got significant promotion. However, the intelligent dispatch on big data analysis needs tremendous history data that is sometime not that easy to get. Moreover, once a small probability event happens, the big data based intelligent dispatch method may lose efficacy, thus in this paper we proposed a new intelligent dispatch model based on the reinforcement learning that is more robust, safer and with higher efficiency. Finally, according to the empirical study result, our model outperforms the traditional methods, where the average economic benefit in the test area increases more than 25%, fluctuation of distribution is more stable than before and the carbon emission decreases about 30%.
APA, Harvard, Vancouver, ISO, and other styles
37

YOON, JAE UNG, and JUHONG LEE. "Uncertainty Sequence Modeling Approach for Safe and Effective Autonomous Driving." Korean Institute of Smart Media 11, no. 9 (October 31, 2022): 9–20. http://dx.doi.org/10.30693/smj.2022.11.9.9.

Full text
Abstract:
Deep reinforcement learning(RL) is an end-to-end data-driven control method that is widely used in the autonomous driving domain. However, conventional RL approaches have difficulties in applying it to autonomous driving tasks due to problems such as inefficiency, instability, and uncertainty. These issues play an important role in the autonomous driving domain. Although recent studies have attempted to solve these problems, they are computationally expensive and rely on special assumptions. In this paper, we propose a new algorithm MCDT that considers inefficiency, instability, and uncertainty by introducing a method called uncertainty sequence modeling to autonomous driving domain. The sequence modeling method, which views reinforcement learning as a decision making generation problem to obtain high rewards, avoids the disadvantages of exiting studies and guarantees efficiency, stability and also considers safety by integrating uncertainty estimation techniques. The proposed method was tested in the OpenAI Gym CarRacing environment, and the experimental results show that the MCDT algorithm provides efficient, stable and safe performance compared to the existing reinforcement learning method.
APA, Harvard, Vancouver, ISO, and other styles
38

Perk, Baris Eren, and Gokhan Inalhan. "Safe Motion Planning and Learning for Unmanned Aerial Systems." Aerospace 9, no. 2 (January 22, 2022): 56. http://dx.doi.org/10.3390/aerospace9020056.

Full text
Abstract:
To control unmanned aerial systems, we rarely have a perfect system model. Safe and aggressive planning is also challenging for nonlinear and under-actuated systems. Expert pilots, however, demonstrate maneuvers that are deemed at the edge of plane envelope. Inspired by biological systems, in this paper, we introduce a framework that leverages methods in the field of control theory and reinforcement learning to generate feasible, possibly aggressive, trajectories. For the control policies, Dynamic Movement Primitives (DMPs) imitate pilot-induced primitives, and DMPs are combined in parallel to generate trajectories to reach original or different goal points. The stability properties of DMPs and their overall systems are analyzed using contraction theory. For reinforcement learning, Policy Improvement with Path Integrals (PI2) was used for the maneuvers. The results in this paper show that PI2 updated policies are a feasible and parallel combination of different updated primitives transfer the learning in the contraction regions. Our proposed methodology can be used to imitate, reshape, and improve feasible, possibly aggressive, maneuvers. In addition, we can exploit trajectories generated by optimization methods, such as Model Predictive Control (MPC), and a library of maneuvers can be instantly generated. For application, 3-DOF (degrees of freedom) Helicopter and 2D-UAV (unmanned aerial vehicle) models are utilized to demonstrate the main results.
APA, Harvard, Vancouver, ISO, and other styles
39

Ugurlu, Halil Ibrahim, Xuan Huy Pham, and Erdal Kayacan. "Sim-to-Real Deep Reinforcement Learning for Safe End-to-End Planning of Aerial Robots." Robotics 11, no. 5 (October 13, 2022): 109. http://dx.doi.org/10.3390/robotics11050109.

Full text
Abstract:
In this study, a novel end-to-end path planning algorithm based on deep reinforcement learning is proposed for aerial robots deployed in dense environments. The learning agent finds an obstacle-free way around the provided rough, global path by only depending on the observations from a forward-facing depth camera. A novel deep reinforcement learning framework is proposed to train the end-to-end policy with the capability of safely avoiding obstacles. The Webots open-source robot simulator is utilized for training the policy, introducing highly randomized environmental configurations for better generalization. The training is performed without dynamics calculations through randomized position updates to minimize the amount of data processed. The trained policy is first comprehensively evaluated in simulations involving physical dynamics and software-in-the-loop flight control. The proposed method is proven to have a 38% and 50% higher success rate compared to both deep reinforcement learning-based and artificial potential field-based baselines, respectively. The generalization capability of the method is verified in simulation-to-real transfer without further training. Real-time experiments are conducted with several trials in two different scenarios, showing a 50% higher success rate of the proposed method compared to the deep reinforcement learning-based baseline.
APA, Harvard, Vancouver, ISO, and other styles
40

Lu, Songtao, Kaiqing Zhang, Tianyi Chen, Tamer Başar, and Lior Horesh. "Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 8767–75. http://dx.doi.org/10.1609/aaai.v35i10.17062.

Full text
Abstract:
This paper deals with distributed reinforcement learning problems with safety constraints. In particular, we consider that a team of agents cooperate in a shared environment, where each agent has its individual reward function and safety constraints that involve all agents' joint actions. As such, the agents aim to maximize the team-average long-term return, subject to all the safety constraints. More intriguingly, no central controller is assumed to coordinate the agents, and both the rewards and constraints are only known to each agent locally/privately. Instead, the agents are connected by a peer-to-peer communication network to share information with their neighbors. In this work, we first formulate this problem as a distributed constrained Markov decision process (D-CMDP) with networked agents. Then, we propose a decentralized policy gradient (PG) method, Safe Dec-PG, to perform policy optimization based on this D-CMDP model over a network. Convergence guarantees, together with numerical results, showcase the superiority of the proposed algorithm. To the best of our knowledge, this is the first decentralized PG algorithm that accounts for the coupled safety constraints with a quantifiable convergence rate in multi-agent reinforcement learning. Finally, we emphasize that our algorithm is also novel in solving a class of decentralized stochastic nonconvex-concave minimax optimization problems, where both the algorithm design and corresponding theoretical analysis are of independent interest.
APA, Harvard, Vancouver, ISO, and other styles
41

Ji, Guanglin, Junyan Yan, Jingxin Du, Wanquan Yan, Jibiao Chen, Yongkang Lu, Juan Rojas, and Shing Shin Cheng. "Towards Safe Control of Continuum Manipulator Using Shielded Multiagent Reinforcement Learning." IEEE Robotics and Automation Letters 6, no. 4 (October 2021): 7461–68. http://dx.doi.org/10.1109/lra.2021.3097660.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Savage, Thomas, Dongda Zhang, Max Mowbray, and Ehecatl Antonio Del Río Chanona. "Model-free safe reinforcement learning for chemical processes using Gaussian processes." IFAC-PapersOnLine 54, no. 3 (2021): 504–9. http://dx.doi.org/10.1016/j.ifacol.2021.08.292.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Du, Bin, Bin Lin, Chenming Zhang, Botao Dong, and Weidong Zhang. "Safe deep reinforcement learning-based adaptive control for USV interception mission." Ocean Engineering 246 (February 2022): 110477. http://dx.doi.org/10.1016/j.oceaneng.2021.110477.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Kim, Dohyeong, and Songhwai Oh. "TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning." IEEE Robotics and Automation Letters 7, no. 2 (April 2022): 2621–28. http://dx.doi.org/10.1109/lra.2022.3141829.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

García, Javier, and Diogo Shafie. "Teaching a humanoid robot to walk faster through Safe Reinforcement Learning." Engineering Applications of Artificial Intelligence 88 (February 2020): 103360. http://dx.doi.org/10.1016/j.engappai.2019.103360.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Cohen, Max H., and Calin Belta. "Safe exploration in model-based reinforcement learning using control barrier functions." Automatica 147 (January 2023): 110684. http://dx.doi.org/10.1016/j.automatica.2022.110684.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Selvaraj, Dinesh Cyril, Shailesh Hegde, Nicola Amati, Francesco Deflorio, and Carla Fabiana Chiasserini. "A Deep Reinforcement Learning Approach for Efficient, Safe and Comfortable Driving." Applied Sciences 13, no. 9 (April 23, 2023): 5272. http://dx.doi.org/10.3390/app13095272.

Full text
Abstract:
Sensing, computing, and communication advancements allow vehicles to generate and collect massive amounts of data on their state and surroundings. Such richness of information fosters data-driven decision-making model development that considers the vehicle’s environmental context. We propose a data-centric application of Adaptive Cruise Control employing Deep Reinforcement Learning (DRL). Our DRL approach considers multiple objectives, including safety, passengers’ comfort, and efficient road capacity usage. We compare the proposed framework’s performance to traditional ACC approaches by incorporating such schemes into the CoMoVe framework, which realistically models communication, traffic, and vehicle dynamics. Our solution offers excellent performance concerning stability, comfort, and efficient traffic flow in diverse real-world driving conditions. Notably, our DRL scheme can meet the desired values of road usage efficiency most of the time during the lead vehicle’s speed-variation phases, with less than 40% surpassing the desirable headway. In contrast, its alternatives increase headway during such transient phases, exceeding the desired range 85% of the time, thus degrading performance by over 300% and potentially contributing to traffic instability. Furthermore, our results emphasize the importance of vehicle connectivity in collecting more data to enhance the ACC’s performance.
APA, Harvard, Vancouver, ISO, and other styles
48

Vasilenko, Elizaveta, Niki Vazou, and Gilles Barthe. "Safe couplings: coupled refinement types." Proceedings of the ACM on Programming Languages 6, ICFP (August 29, 2022): 596–624. http://dx.doi.org/10.1145/3547643.

Full text
Abstract:
We enhance refinement types with mechanisms to reason about relational properties of probabilistic computations. Our mechanisms, which are inspired from probabilistic couplings, are applicable to a rich set of probabilistic properties, including expected sensitivity, which ensures that the distance between outputs of two probabilistic computations can be controlled from the distance between their inputs. We implement our mechanisms in the type system of Liquid Haskell and we use them to formally verify Haskell implementations of two classic machine learning algorithms: Temporal Difference (TD) reinforcement learning and stochastic gradient descent (SGD). We formalize a fragment of our system for discrete distributions and we prove soundness with respect to a set-theoretical semantics.
APA, Harvard, Vancouver, ISO, and other styles
49

Xiao, Wenli, Yiwei Lyu, and John M. Dolan. "Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 16362–63. http://dx.doi.org/10.1609/aaai.v37i13.27041.

Full text
Abstract:
Multi-agent Reinforcement Learning (MARL) has been increasingly used in safety-critical applications but has no safety guarantees, especially during training. In this paper, we propose dynamic shielding, a novel decentralized MARL framework to ensure safety in both training and deployment phases. Our framework leverages Shield, a reactive system running in parallel with the reinforcement learning algorithm to monitor and correct agents' behavior. In our algorithm, shields dynamically split and merge according to the environment state in order to maintain decentralization and avoid conservative behaviors while enjoying formal safety guarantees. We demonstrate the effectiveness of MARL with dynamic shielding in the mobile navigation scenario.
APA, Harvard, Vancouver, ISO, and other styles
50

Yang, Yanhua, and Ligang Yao. "Optimization Method of Power Equipment Maintenance Plan Decision-Making Based on Deep Reinforcement Learning." Mathematical Problems in Engineering 2021 (March 15, 2021): 1–8. http://dx.doi.org/10.1155/2021/9372803.

Full text
Abstract:
The safe and reliable operation of power grid equipment is the basis for ensuring the safe operation of the power system. At present, the traditional periodical maintenance has exposed the abuses such as deficient maintenance and excess maintenance. Based on a multiagent deep reinforcement learning decision-making optimization algorithm, a method for decision-making and optimization of power grid equipment maintenance plans is proposed. In this paper, an optimization model of power grid equipment maintenance plan that takes into account the reliability and economics of power grid operation is constructed with maintenance constraints and power grid safety constraints as its constraints. The deep distributed recurrent Q-networks multiagent deep reinforcement learning is adopted to solve the optimization model. The deep distributed recurrent Q-networks multiagent deep reinforcement learning uses the high-dimensional feature extraction capabilities of deep learning and decision-making capabilities of reinforcement learning to solve the multiobjective decision-making problem of power grid maintenance planning. Through case analysis, the comparative results show that the proposed algorithm has better optimization and decision-making ability, as well as lower maintenance cost. Accordingly, the algorithm can realize the optimal decision of power grid equipment maintenance plan. The expected value of power shortage and maintenance cost obtained by the proposed method is $71.75$ $MW·H$ and $496000$ $yuan$.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography