Artículos de revistas sobre el tema "Safe Reinforcement Learning"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Safe Reinforcement Learning.

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Safe Reinforcement Learning".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Horie, Naoto, Tohgoroh Matsui, Koichi Moriyama, Atsuko Mutoh y Nobuhiro Inuzuka. "Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning". Artificial Life and Robotics 24, n.º 3 (8 de febrero de 2019): 352–59. http://dx.doi.org/10.1007/s10015-019-00523-3.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Yang, Yongliang, Kyriakos G. Vamvoudakis y Hamidreza Modares. "Safe reinforcement learning for dynamical games". International Journal of Robust and Nonlinear Control 30, n.º 9 (25 de marzo de 2020): 3706–26. http://dx.doi.org/10.1002/rnc.4962.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Xu, Haoran, Xianyuan Zhan y Xiangyu Zhu. "Constraints Penalized Q-learning for Safe Offline Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 8 (28 de junio de 2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Texto completo
Resumen
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

García, Javier y Fernando Fernández. "Probabilistic Policy Reuse for Safe Reinforcement Learning". ACM Transactions on Autonomous and Adaptive Systems 13, n.º 3 (28 de marzo de 2019): 1–24. http://dx.doi.org/10.1145/3310090.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Mannucci, Tommaso, Erik-Jan van Kampen, Cornelis de Visser y Qiping Chu. "Safe Exploration Algorithms for Reinforcement Learning Controllers". IEEE Transactions on Neural Networks and Learning Systems 29, n.º 4 (abril de 2018): 1069–81. http://dx.doi.org/10.1109/tnnls.2017.2654539.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Karthikeyan, P., Wei-Lun Chen y Pao-Ann Hsiung. "Autonomous Intersection Management by Using Reinforcement Learning". Algorithms 15, n.º 9 (13 de septiembre de 2022): 326. http://dx.doi.org/10.3390/a15090326.

Texto completo
Resumen
Developing a safer and more effective intersection-control system is essential given the trends of rising populations and vehicle numbers. Additionally, as vehicle communication and self-driving technologies evolve, we may create a more intelligent control system to reduce traffic accidents. We recommend deep reinforcement learning-inspired autonomous intersection management (DRLAIM) to improve traffic environment efficiency and safety. The three primary models used in this methodology are the priority assignment model, the intersection-control model learning, and safe brake control. The brake-safe control module is utilized to make sure that each vehicle travels safely, and we train the system to acquire an effective model by using reinforcement learning. We have simulated our proposed method by using a simulation of urban mobility tools. Experimental results show that our approach outperforms the traditional method.
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Mazouchi, Majid, Subramanya Nageshrao y Hamidreza Modares. "Conflict-Aware Safe Reinforcement Learning: A Meta-Cognitive Learning Framework". IEEE/CAA Journal of Automatica Sinica 9, n.º 3 (marzo de 2022): 466–81. http://dx.doi.org/10.1109/jas.2021.1004353.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Cowen-Rivers, Alexander I., Daniel Palenicek, Vincent Moens, Mohammed Amin Abdullah, Aivar Sootla, Jun Wang y Haitham Bou-Ammar. "SAMBA: safe model-based & active reinforcement learning". Machine Learning 111, n.º 1 (enero de 2022): 173–203. http://dx.doi.org/10.1007/s10994-021-06103-6.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Serrano-Cuevas, Jonathan, Eduardo F. Morales y Pablo Hernández-Leal. "Safe reinforcement learning using risk mapping by similarity". Adaptive Behavior 28, n.º 4 (18 de julio de 2019): 213–24. http://dx.doi.org/10.1177/1059712319859650.

Texto completo
Resumen
Reinforcement learning (RL) has been used to successfully solve sequential decision problem. However, considering risk at the same time as the learning process is an open research problem. In this work, we are interested in the type of risk that can lead to a catastrophic state. Related works that aim to deal with risk propose complex models. In contrast, we follow a simple, yet effective, idea: similar states might lead to similar risk. Using this idea, we propose risk mapping by similarity (RMS), an algorithm for discrete scenarios which infers the risk of newly discovered states by analyzing how similar they are to previously known risky states. In general terms, the RMS algorithm transfers the knowledge gathered by the agent regarding the risk to newly discovered states. We contribute with a new approach to consider risk based on similarity and with RMS, which is simple and generalizable as long as the premise similar states yield similar risk holds. RMS is not an RL algorithm, but a method to generate a risk-aware reward shaping signal that can be used with a RL algorithm to generate risk-aware policies.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Andersen, Per-Arne, Morten Goodwin y Ole-Christoffer Granmo. "Towards safe reinforcement-learning in industrial grid-warehousing". Information Sciences 537 (octubre de 2020): 467–84. http://dx.doi.org/10.1016/j.ins.2020.06.010.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Carr, Steven, Nils Jansen, Sebastian Junges y Ufuk Topcu. "Safe Reinforcement Learning via Shielding under Partial Observability". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 12 (26 de junio de 2023): 14748–56. http://dx.doi.org/10.1609/aaai.v37i12.26723.

Texto completo
Resumen
Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches to this problem assume domain knowledge in the form of a (partial) model of this environment to decide upon the safety of an action. A so-called shield forces the RL agent to select only safe actions. However, for adoption in various applications, one must look beyond enforcing safety and also ensure the applicability of RL with good performance. We extend the applicability of shields via tight integration with state-of-the-art deep RL, and provide an extensive, empirical study in challenging, sparse-reward environments under partial observability. We show that a carefully integrated shield ensures safety and can improve the convergence rate and final performance of RL agents. We furthermore show that a shield can be used to bootstrap state-of-the-art RL agents: they remain safe after initial learning in a shielded setting, allowing us to disable a potentially too conservative shield eventually.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

Dai, Juntao, Jiaming Ji, Long Yang, Qian Zheng y Gang Pan. "Augmented Proximal Policy Optimization for Safe Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 6 (26 de junio de 2023): 7288–95. http://dx.doi.org/10.1609/aaai.v37i6.25888.

Texto completo
Resumen
Safe reinforcement learning considers practical scenarios that maximize the return while satisfying safety constraints. Current algorithms, which suffer from training oscillations or approximation errors, still struggle to update the policy efficiently with precise constraint satisfaction. In this article, we propose Augmented Proximal Policy Optimization (APPO), which augments the Lagrangian function of the primal constrained problem via attaching a quadratic deviation term. The constructed multiplier-penalty function dampens cost oscillation for stable convergence while being equivalent to the primal constrained problem to precisely control safety costs. APPO alternately updates the policy and the Lagrangian multiplier via solving the constructed augmented primal-dual problem, which can be easily implemented by any first-order optimizer. We apply our APPO methods in diverse safety-constrained tasks, setting a new state of the art compared with a comprehensive list of safe RL baselines. Extensive experiments verify the merits of our method in easy implementation, stable convergence, and precise cost control.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Marchesini, Enrico, Davide Corsi y Alessandro Farinelli. "Exploring Safer Behaviors for Deep Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 7 (28 de junio de 2022): 7701–9. http://dx.doi.org/10.1609/aaai.v36i7.20737.

Texto completo
Resumen
We consider Reinforcement Learning (RL) problems where an agent attempts to maximize a reward signal while minimizing a cost function that models unsafe behaviors. Such formalization is addressed in the literature using constrained optimization on the cost, limiting the exploration and leading to a significant trade-off between cost and reward. In contrast, we propose a Safety-Oriented Search that complements Deep RL algorithms to bias the policy toward safety within an evolutionary cost optimization. We leverage evolutionary exploration benefits to design a novel concept of safe mutations that use visited unsafe states to explore safer actions. We further characterize the behaviors of the policies over desired specifics with a sample-based bound estimation, which makes prior verification analysis tractable in the training loop. Hence, driving the learning process towards safer regions of the policy space. Empirical evidence on the Safety Gym benchmark shows that we successfully avoid drawbacks on the return while improving the safety of the policy.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Chen, Hongyi, Yu Zhang, Uzair Aslam Bhatti y Mengxing Huang. "Safe Decision Controller for Autonomous DrivingBased on Deep Reinforcement Learning inNondeterministic Environment". Sensors 23, n.º 3 (20 de enero de 2023): 1198. http://dx.doi.org/10.3390/s23031198.

Texto completo
Resumen
Autonomous driving systems are crucial complicated cyber–physical systems that combine physical environment awareness with cognitive computing. Deep reinforcement learning is currently commonly used in the decision-making of such systems. However, black-box-based deep reinforcement learning systems do not guarantee system safety and the interpretability of the reward-function settings in the face of complex environments and the influence of uncontrolled uncertainties. Therefore, a formal security reinforcement learning method is proposed. First, we propose an environmental modeling approach based on the influence of nondeterministic environmental factors, which enables the precise quantification of environmental issues. Second, we use the environment model to formalize the reward machine’s structure, which is used to guide the reward-function setting in reinforcement learning. Third, we generate a control barrier function to ensure a safer state behavior policy for reinforcement learning. Finally, we verify the method’s effectiveness in intelligent driving using overtaking and lane-changing scenarios.
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Ryu, Yoon-Ha, Doukhi Oualid y Deok-Jin Lee. "Research on Safe Reinforcement Controller Using Deep Reinforcement Learning with Control Barrier Function". Journal of Institute of Control, Robotics and Systems 28, n.º 11 (30 de noviembre de 2022): 1013–21. http://dx.doi.org/10.5302/j.icros.2022.22.0187.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Thananjeyan, Brijen, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn y Ken Goldberg. "Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones". IEEE Robotics and Automation Letters 6, n.º 3 (julio de 2021): 4915–22. http://dx.doi.org/10.1109/lra.2021.3070252.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Cui, Wenqi, Jiayi Li y Baosen Zhang. "Decentralized safe reinforcement learning for inverter-based voltage control". Electric Power Systems Research 211 (octubre de 2022): 108609. http://dx.doi.org/10.1016/j.epsr.2022.108609.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Basso, Rafael, Balázs Kulcsár, Ivan Sanchez-Diaz y Xiaobo Qu. "Dynamic stochastic electric vehicle routing with safe reinforcement learning". Transportation Research Part E: Logistics and Transportation Review 157 (enero de 2022): 102496. http://dx.doi.org/10.1016/j.tre.2021.102496.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Pai, PENG, ZHU Fei, LIU Quan, ZHAO Peiyao y WU Wen. "Achieving Safe Deep Reinforcement Learning via Environment Comprehension Mechanism". Chinese Journal of Electronics 30, n.º 6 (noviembre de 2021): 1049–58. http://dx.doi.org/10.1049/cje.2021.07.025.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Mowbray, M., P. Petsagkourakis, E. A. del Rio-Chanona y D. Zhang. "Safe chance constrained reinforcement learning for batch process control". Computers & Chemical Engineering 157 (enero de 2022): 107630. http://dx.doi.org/10.1016/j.compchemeng.2021.107630.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Zhao, Qingye, Yi Zhang y Xuandong Li. "Safe reinforcement learning for dynamical systems using barrier certificates". Connection Science 34, n.º 1 (12 de diciembre de 2022): 2822–44. http://dx.doi.org/10.1080/09540091.2022.2151567.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Gros, Sebastien, Mario Zanon y Alberto Bemporad. "Safe Reinforcement Learning via Projection on a Safe Set: How to Achieve Optimality?" IFAC-PapersOnLine 53, n.º 2 (2020): 8076–81. http://dx.doi.org/10.1016/j.ifacol.2020.12.2276.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Lu, Xiaozhen, Liang Xiao, Guohang Niu, Xiangyang Ji y Qian Wang. "Safe Exploration in Wireless Security: A Safe Reinforcement Learning Algorithm With Hierarchical Structure". IEEE Transactions on Information Forensics and Security 17 (2022): 732–43. http://dx.doi.org/10.1109/tifs.2022.3149396.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Yuan, Zhaocong, Adam W. Hall, Siqi Zhou, Lukas Brunke, Melissa Greeff, Jacopo Panerati y Angela P. Schoellig. "Safe-Control-Gym: A Unified Benchmark Suite for Safe Learning-Based Control and Reinforcement Learning in Robotics". IEEE Robotics and Automation Letters 7, n.º 4 (octubre de 2022): 11142–49. http://dx.doi.org/10.1109/lra.2022.3196132.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Garcia, J. y F. Fernandez. "Safe Exploration of State and Action Spaces in Reinforcement Learning". Journal of Artificial Intelligence Research 45 (19 de diciembre de 2012): 515–64. http://dx.doi.org/10.1613/jair.3761.

Texto completo
Resumen
In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Ma, Yecheng Jason, Andrew Shen, Osbert Bastani y Jayaraman Dinesh. "Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 5 (28 de junio de 2022): 5404–12. http://dx.doi.org/10.1609/aaai.v36i5.20478.

Texto completo
Resumen
Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment. We evaluate this conservative and adaptive penalty-based approach for model-based safe RL extensively on state and image-based environments. Our results demonstrate substantial gains in sample-efficiency while incurring fewer violations than prior safe RL algorithms. Code is available at: https://github.com/Redrew/CAP
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Chen, Hongyi y Changliu Liu. "Safe and Sample-Efficient Reinforcement Learning for Clustered Dynamic Environments". IEEE Control Systems Letters 6 (2022): 1928–33. http://dx.doi.org/10.1109/lcsys.2021.3136486.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Yang, Yongliang, Kyriakos G. Vamvoudakis, Hamidreza Modares, Yixin Yin y Donald C. Wunsch. "Safe Intermittent Reinforcement Learning With Static and Dynamic Event Generators". IEEE Transactions on Neural Networks and Learning Systems 31, n.º 12 (diciembre de 2020): 5441–55. http://dx.doi.org/10.1109/tnnls.2020.2967871.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Li, Hepeng, Zhiqiang Wan y Haibo He. "Constrained EV Charging Scheduling Based on Safe Deep Reinforcement Learning". IEEE Transactions on Smart Grid 11, n.º 3 (mayo de 2020): 2427–39. http://dx.doi.org/10.1109/tsg.2019.2955437.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

Hailemichael, Habtamu, Beshah Ayalew, Lindsey Kerbel, Andrej Ivanco y Keith Loiselle. "Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System". IFAC-PapersOnLine 55, n.º 37 (2022): 615–20. http://dx.doi.org/10.1016/j.ifacol.2022.11.250.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

MINAMOTO, Gaku, Toshimitsu KANEKO y Noriyuki HIRAYAMA. "Autonomous driving with safe reinforcement learning using rule-based judgment". Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2022 (2022): 2A2—K03. http://dx.doi.org/10.1299/jsmermd.2022.2a2-k03.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Pathak, Shashank, Luca Pulina y Armando Tacchella. "Verification and repair of control policies for safe reinforcement learning". Applied Intelligence 48, n.º 4 (5 de agosto de 2017): 886–908. http://dx.doi.org/10.1007/s10489-017-0999-8.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Dong, Wenbo, Shaofan Liu y Shiliang Sun. "Safe batch constrained deep reinforcement learning with generative adversarial network". Information Sciences 634 (julio de 2023): 259–70. http://dx.doi.org/10.1016/j.ins.2023.03.108.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Kondrup, Flemming, Thomas Jiralerspong, Elaine Lau, Nathan De Lara, Jacob Shkrob, My Duc Tran, Doina Precup y Sumana Basu. "Towards Safe Mechanical Ventilation Treatment Using Deep Offline Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 13 (26 de junio de 2023): 15696–702. http://dx.doi.org/10.1609/aaai.v37i13.26862.

Texto completo
Resumen
Mechanical ventilation is a key form of life support for patients with pulmonary impairment. Healthcare workers are required to continuously adjust ventilator settings for each patient, a challenging and time consuming task. Hence, it would be beneficial to develop an automated decision support tool to optimize ventilation treatment. We present DeepVent, a Conservative Q-Learning (CQL) based offline Deep Reinforcement Learning (DRL) agent that learns to predict the optimal ventilator parameters for a patient to promote 90 day survival. We design a clinically relevant intermediate reward that encourages continuous improvement of the patient vitals as well as addresses the challenge of sparse reward in RL. We find that DeepVent recommends ventilation parameters within safe ranges, as outlined in recent clinical trials. The CQL algorithm offers additional safety by mitigating the overestimation of the value estimates of out-of-distribution states/actions. We evaluate our agent using Fitted Q Evaluation (FQE) and demonstrate that it outperforms physicians from the MIMIC-III dataset.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Fu, Yanbo, Wenjie Zhao y Liu Liu. "Safe Reinforcement Learning for Transition Control of Ducted-Fan UAVs". Drones 7, n.º 5 (22 de mayo de 2023): 332. http://dx.doi.org/10.3390/drones7050332.

Texto completo
Resumen
Ducted-fan tail-sitter unmanned aerial vehicles (UAVs) provide versatility and unique benefits, attracting significant attention in various applications. This study focuses on developing a safe reinforcement learning method for back-transition control between level flight mode and hover mode for ducted-fan tail-sitter UAVs. Our method enables transition control with a minimal altitude change and transition time while adhering to the velocity constraint. We employ the Trust Region Policy Optimization, Proximal Policy Optimization with Lagrangian, and Constrained Policy Optimization (CPO) algorithms for controller training, showcasing the superiority of the CPO algorithm and the necessity of the velocity constraint. The transition trajectory achieved using the CPO algorithm closely resembles the optimal trajectory obtained via the well-known GPOPS-II software with the SNOPT solver. Meanwhile, the CPO algorithm also exhibits strong robustness under unknown perturbations of UAV model parameters and wind disturbance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Xiao, Xinhang. "Reinforcement Learning Optimized Intelligent Electricity Dispatching System". Journal of Physics: Conference Series 2215, n.º 1 (1 de febrero de 2022): 012013. http://dx.doi.org/10.1088/1742-6596/2215/1/012013.

Texto completo
Resumen
Abstract With the rapid development of artificial intelligence, new changes are coming to all walks of life including traditional manufacturing industry, bio-pharmaceutical industry, electric-power industry, etc.. For electric-power industry, many machine learning algorithms have been used to achieve intelligent dispatch of electricity, intelligent electricity equipment fault diagnosis and more optimized customer management, among which the automation of power network is one of the most important issue ensuring safe operation of electricity grid. In the recent years, intelligent dispatch method based on big data analysis has been used in intellectual scheduling of electricity and has got significant promotion. However, the intelligent dispatch on big data analysis needs tremendous history data that is sometime not that easy to get. Moreover, once a small probability event happens, the big data based intelligent dispatch method may lose efficacy, thus in this paper we proposed a new intelligent dispatch model based on the reinforcement learning that is more robust, safer and with higher efficiency. Finally, according to the empirical study result, our model outperforms the traditional methods, where the average economic benefit in the test area increases more than 25%, fluctuation of distribution is more stable than before and the carbon emission decreases about 30%.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

YOON, JAE UNG y JUHONG LEE. "Uncertainty Sequence Modeling Approach for Safe and Effective Autonomous Driving". Korean Institute of Smart Media 11, n.º 9 (31 de octubre de 2022): 9–20. http://dx.doi.org/10.30693/smj.2022.11.9.9.

Texto completo
Resumen
Deep reinforcement learning(RL) is an end-to-end data-driven control method that is widely used in the autonomous driving domain. However, conventional RL approaches have difficulties in applying it to autonomous driving tasks due to problems such as inefficiency, instability, and uncertainty. These issues play an important role in the autonomous driving domain. Although recent studies have attempted to solve these problems, they are computationally expensive and rely on special assumptions. In this paper, we propose a new algorithm MCDT that considers inefficiency, instability, and uncertainty by introducing a method called uncertainty sequence modeling to autonomous driving domain. The sequence modeling method, which views reinforcement learning as a decision making generation problem to obtain high rewards, avoids the disadvantages of exiting studies and guarantees efficiency, stability and also considers safety by integrating uncertainty estimation techniques. The proposed method was tested in the OpenAI Gym CarRacing environment, and the experimental results show that the MCDT algorithm provides efficient, stable and safe performance compared to the existing reinforcement learning method.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Perk, Baris Eren y Gokhan Inalhan. "Safe Motion Planning and Learning for Unmanned Aerial Systems". Aerospace 9, n.º 2 (22 de enero de 2022): 56. http://dx.doi.org/10.3390/aerospace9020056.

Texto completo
Resumen
To control unmanned aerial systems, we rarely have a perfect system model. Safe and aggressive planning is also challenging for nonlinear and under-actuated systems. Expert pilots, however, demonstrate maneuvers that are deemed at the edge of plane envelope. Inspired by biological systems, in this paper, we introduce a framework that leverages methods in the field of control theory and reinforcement learning to generate feasible, possibly aggressive, trajectories. For the control policies, Dynamic Movement Primitives (DMPs) imitate pilot-induced primitives, and DMPs are combined in parallel to generate trajectories to reach original or different goal points. The stability properties of DMPs and their overall systems are analyzed using contraction theory. For reinforcement learning, Policy Improvement with Path Integrals (PI2) was used for the maneuvers. The results in this paper show that PI2 updated policies are a feasible and parallel combination of different updated primitives transfer the learning in the contraction regions. Our proposed methodology can be used to imitate, reshape, and improve feasible, possibly aggressive, maneuvers. In addition, we can exploit trajectories generated by optimization methods, such as Model Predictive Control (MPC), and a library of maneuvers can be instantly generated. For application, 3-DOF (degrees of freedom) Helicopter and 2D-UAV (unmanned aerial vehicle) models are utilized to demonstrate the main results.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

Ugurlu, Halil Ibrahim, Xuan Huy Pham y Erdal Kayacan. "Sim-to-Real Deep Reinforcement Learning for Safe End-to-End Planning of Aerial Robots". Robotics 11, n.º 5 (13 de octubre de 2022): 109. http://dx.doi.org/10.3390/robotics11050109.

Texto completo
Resumen
In this study, a novel end-to-end path planning algorithm based on deep reinforcement learning is proposed for aerial robots deployed in dense environments. The learning agent finds an obstacle-free way around the provided rough, global path by only depending on the observations from a forward-facing depth camera. A novel deep reinforcement learning framework is proposed to train the end-to-end policy with the capability of safely avoiding obstacles. The Webots open-source robot simulator is utilized for training the policy, introducing highly randomized environmental configurations for better generalization. The training is performed without dynamics calculations through randomized position updates to minimize the amount of data processed. The trained policy is first comprehensively evaluated in simulations involving physical dynamics and software-in-the-loop flight control. The proposed method is proven to have a 38% and 50% higher success rate compared to both deep reinforcement learning-based and artificial potential field-based baselines, respectively. The generalization capability of the method is verified in simulation-to-real transfer without further training. Real-time experiments are conducted with several trials in two different scenarios, showing a 50% higher success rate of the proposed method compared to the deep reinforcement learning-based baseline.
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Lu, Songtao, Kaiqing Zhang, Tianyi Chen, Tamer Başar y Lior Horesh. "Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 10 (18 de mayo de 2021): 8767–75. http://dx.doi.org/10.1609/aaai.v35i10.17062.

Texto completo
Resumen
This paper deals with distributed reinforcement learning problems with safety constraints. In particular, we consider that a team of agents cooperate in a shared environment, where each agent has its individual reward function and safety constraints that involve all agents' joint actions. As such, the agents aim to maximize the team-average long-term return, subject to all the safety constraints. More intriguingly, no central controller is assumed to coordinate the agents, and both the rewards and constraints are only known to each agent locally/privately. Instead, the agents are connected by a peer-to-peer communication network to share information with their neighbors. In this work, we first formulate this problem as a distributed constrained Markov decision process (D-CMDP) with networked agents. Then, we propose a decentralized policy gradient (PG) method, Safe Dec-PG, to perform policy optimization based on this D-CMDP model over a network. Convergence guarantees, together with numerical results, showcase the superiority of the proposed algorithm. To the best of our knowledge, this is the first decentralized PG algorithm that accounts for the coupled safety constraints with a quantifiable convergence rate in multi-agent reinforcement learning. Finally, we emphasize that our algorithm is also novel in solving a class of decentralized stochastic nonconvex-concave minimax optimization problems, where both the algorithm design and corresponding theoretical analysis are of independent interest.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

Ji, Guanglin, Junyan Yan, Jingxin Du, Wanquan Yan, Jibiao Chen, Yongkang Lu, Juan Rojas y Shing Shin Cheng. "Towards Safe Control of Continuum Manipulator Using Shielded Multiagent Reinforcement Learning". IEEE Robotics and Automation Letters 6, n.º 4 (octubre de 2021): 7461–68. http://dx.doi.org/10.1109/lra.2021.3097660.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Savage, Thomas, Dongda Zhang, Max Mowbray y Ehecatl Antonio Del Río Chanona. "Model-free safe reinforcement learning for chemical processes using Gaussian processes". IFAC-PapersOnLine 54, n.º 3 (2021): 504–9. http://dx.doi.org/10.1016/j.ifacol.2021.08.292.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

Du, Bin, Bin Lin, Chenming Zhang, Botao Dong y Weidong Zhang. "Safe deep reinforcement learning-based adaptive control for USV interception mission". Ocean Engineering 246 (febrero de 2022): 110477. http://dx.doi.org/10.1016/j.oceaneng.2021.110477.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Kim, Dohyeong y Songhwai Oh. "TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning". IEEE Robotics and Automation Letters 7, n.º 2 (abril de 2022): 2621–28. http://dx.doi.org/10.1109/lra.2022.3141829.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

García, Javier y Diogo Shafie. "Teaching a humanoid robot to walk faster through Safe Reinforcement Learning". Engineering Applications of Artificial Intelligence 88 (febrero de 2020): 103360. http://dx.doi.org/10.1016/j.engappai.2019.103360.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Cohen, Max H. y Calin Belta. "Safe exploration in model-based reinforcement learning using control barrier functions". Automatica 147 (enero de 2023): 110684. http://dx.doi.org/10.1016/j.automatica.2022.110684.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Selvaraj, Dinesh Cyril, Shailesh Hegde, Nicola Amati, Francesco Deflorio y Carla Fabiana Chiasserini. "A Deep Reinforcement Learning Approach for Efficient, Safe and Comfortable Driving". Applied Sciences 13, n.º 9 (23 de abril de 2023): 5272. http://dx.doi.org/10.3390/app13095272.

Texto completo
Resumen
Sensing, computing, and communication advancements allow vehicles to generate and collect massive amounts of data on their state and surroundings. Such richness of information fosters data-driven decision-making model development that considers the vehicle’s environmental context. We propose a data-centric application of Adaptive Cruise Control employing Deep Reinforcement Learning (DRL). Our DRL approach considers multiple objectives, including safety, passengers’ comfort, and efficient road capacity usage. We compare the proposed framework’s performance to traditional ACC approaches by incorporating such schemes into the CoMoVe framework, which realistically models communication, traffic, and vehicle dynamics. Our solution offers excellent performance concerning stability, comfort, and efficient traffic flow in diverse real-world driving conditions. Notably, our DRL scheme can meet the desired values of road usage efficiency most of the time during the lead vehicle’s speed-variation phases, with less than 40% surpassing the desirable headway. In contrast, its alternatives increase headway during such transient phases, exceeding the desired range 85% of the time, thus degrading performance by over 300% and potentially contributing to traffic instability. Furthermore, our results emphasize the importance of vehicle connectivity in collecting more data to enhance the ACC’s performance.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Vasilenko, Elizaveta, Niki Vazou y Gilles Barthe. "Safe couplings: coupled refinement types". Proceedings of the ACM on Programming Languages 6, ICFP (29 de agosto de 2022): 596–624. http://dx.doi.org/10.1145/3547643.

Texto completo
Resumen
We enhance refinement types with mechanisms to reason about relational properties of probabilistic computations. Our mechanisms, which are inspired from probabilistic couplings, are applicable to a rich set of probabilistic properties, including expected sensitivity, which ensures that the distance between outputs of two probabilistic computations can be controlled from the distance between their inputs. We implement our mechanisms in the type system of Liquid Haskell and we use them to formally verify Haskell implementations of two classic machine learning algorithms: Temporal Difference (TD) reinforcement learning and stochastic gradient descent (SGD). We formalize a fragment of our system for discrete distributions and we prove soundness with respect to a set-theoretical semantics.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Xiao, Wenli, Yiwei Lyu y John M. Dolan. "Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract)". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 13 (26 de junio de 2023): 16362–63. http://dx.doi.org/10.1609/aaai.v37i13.27041.

Texto completo
Resumen
Multi-agent Reinforcement Learning (MARL) has been increasingly used in safety-critical applications but has no safety guarantees, especially during training. In this paper, we propose dynamic shielding, a novel decentralized MARL framework to ensure safety in both training and deployment phases. Our framework leverages Shield, a reactive system running in parallel with the reinforcement learning algorithm to monitor and correct agents' behavior. In our algorithm, shields dynamically split and merge according to the environment state in order to maintain decentralization and avoid conservative behaviors while enjoying formal safety guarantees. We demonstrate the effectiveness of MARL with dynamic shielding in the mobile navigation scenario.
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Yang, Yanhua y Ligang Yao. "Optimization Method of Power Equipment Maintenance Plan Decision-Making Based on Deep Reinforcement Learning". Mathematical Problems in Engineering 2021 (15 de marzo de 2021): 1–8. http://dx.doi.org/10.1155/2021/9372803.

Texto completo
Resumen
The safe and reliable operation of power grid equipment is the basis for ensuring the safe operation of the power system. At present, the traditional periodical maintenance has exposed the abuses such as deficient maintenance and excess maintenance. Based on a multiagent deep reinforcement learning decision-making optimization algorithm, a method for decision-making and optimization of power grid equipment maintenance plans is proposed. In this paper, an optimization model of power grid equipment maintenance plan that takes into account the reliability and economics of power grid operation is constructed with maintenance constraints and power grid safety constraints as its constraints. The deep distributed recurrent Q-networks multiagent deep reinforcement learning is adopted to solve the optimization model. The deep distributed recurrent Q-networks multiagent deep reinforcement learning uses the high-dimensional feature extraction capabilities of deep learning and decision-making capabilities of reinforcement learning to solve the multiobjective decision-making problem of power grid maintenance planning. Through case analysis, the comparative results show that the proposed algorithm has better optimization and decision-making ability, as well as lower maintenance cost. Accordingly, the algorithm can realize the optimal decision of power grid equipment maintenance plan. The expected value of power shortage and maintenance cost obtained by the proposed method is $71.75$ $MW·H$ and $496000$ $yuan$.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía