Academic literature on the topic 'Safe Reinforcement Learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Safe Reinforcement Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Safe Reinforcement Learning"

1

Horie, Naoto, Tohgoroh Matsui, Koichi Moriyama, Atsuko Mutoh, and Nobuhiro Inuzuka. "Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning." Artificial Life and Robotics 24, no. 3 (February 8, 2019): 352–59. http://dx.doi.org/10.1007/s10015-019-00523-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Yang, Yongliang, Kyriakos G. Vamvoudakis, and Hamidreza Modares. "Safe reinforcement learning for dynamical games." International Journal of Robust and Nonlinear Control 30, no. 9 (March 25, 2020): 3706–26. http://dx.doi.org/10.1002/rnc.4962.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Xu, Haoran, Xianyuan Zhan, and Xiangyu Zhu. "Constraints Penalized Q-learning for Safe Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Full text
Abstract:
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.
APA, Harvard, Vancouver, ISO, and other styles
4

García, Javier, and Fernando Fernández. "Probabilistic Policy Reuse for Safe Reinforcement Learning." ACM Transactions on Autonomous and Adaptive Systems 13, no. 3 (March 28, 2019): 1–24. http://dx.doi.org/10.1145/3310090.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Mannucci, Tommaso, Erik-Jan van Kampen, Cornelis de Visser, and Qiping Chu. "Safe Exploration Algorithms for Reinforcement Learning Controllers." IEEE Transactions on Neural Networks and Learning Systems 29, no. 4 (April 2018): 1069–81. http://dx.doi.org/10.1109/tnnls.2017.2654539.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Karthikeyan, P., Wei-Lun Chen, and Pao-Ann Hsiung. "Autonomous Intersection Management by Using Reinforcement Learning." Algorithms 15, no. 9 (September 13, 2022): 326. http://dx.doi.org/10.3390/a15090326.

Full text
Abstract:
Developing a safer and more effective intersection-control system is essential given the trends of rising populations and vehicle numbers. Additionally, as vehicle communication and self-driving technologies evolve, we may create a more intelligent control system to reduce traffic accidents. We recommend deep reinforcement learning-inspired autonomous intersection management (DRLAIM) to improve traffic environment efficiency and safety. The three primary models used in this methodology are the priority assignment model, the intersection-control model learning, and safe brake control. The brake-safe control module is utilized to make sure that each vehicle travels safely, and we train the system to acquire an effective model by using reinforcement learning. We have simulated our proposed method by using a simulation of urban mobility tools. Experimental results show that our approach outperforms the traditional method.
APA, Harvard, Vancouver, ISO, and other styles
7

Mazouchi, Majid, Subramanya Nageshrao, and Hamidreza Modares. "Conflict-Aware Safe Reinforcement Learning: A Meta-Cognitive Learning Framework." IEEE/CAA Journal of Automatica Sinica 9, no. 3 (March 2022): 466–81. http://dx.doi.org/10.1109/jas.2021.1004353.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cowen-Rivers, Alexander I., Daniel Palenicek, Vincent Moens, Mohammed Amin Abdullah, Aivar Sootla, Jun Wang, and Haitham Bou-Ammar. "SAMBA: safe model-based & active reinforcement learning." Machine Learning 111, no. 1 (January 2022): 173–203. http://dx.doi.org/10.1007/s10994-021-06103-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Serrano-Cuevas, Jonathan, Eduardo F. Morales, and Pablo Hernández-Leal. "Safe reinforcement learning using risk mapping by similarity." Adaptive Behavior 28, no. 4 (July 18, 2019): 213–24. http://dx.doi.org/10.1177/1059712319859650.

Full text
Abstract:
Reinforcement learning (RL) has been used to successfully solve sequential decision problem. However, considering risk at the same time as the learning process is an open research problem. In this work, we are interested in the type of risk that can lead to a catastrophic state. Related works that aim to deal with risk propose complex models. In contrast, we follow a simple, yet effective, idea: similar states might lead to similar risk. Using this idea, we propose risk mapping by similarity (RMS), an algorithm for discrete scenarios which infers the risk of newly discovered states by analyzing how similar they are to previously known risky states. In general terms, the RMS algorithm transfers the knowledge gathered by the agent regarding the risk to newly discovered states. We contribute with a new approach to consider risk based on similarity and with RMS, which is simple and generalizable as long as the premise similar states yield similar risk holds. RMS is not an RL algorithm, but a method to generate a risk-aware reward shaping signal that can be used with a RL algorithm to generate risk-aware policies.
APA, Harvard, Vancouver, ISO, and other styles
10

Andersen, Per-Arne, Morten Goodwin, and Ole-Christoffer Granmo. "Towards safe reinforcement-learning in industrial grid-warehousing." Information Sciences 537 (October 2020): 467–84. http://dx.doi.org/10.1016/j.ins.2020.06.010.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Safe Reinforcement Learning"

1

Magnusson, Björn, and Måns Forslund. "SAFE AND EFFICIENT REINFORCEMENT LEARNING." Thesis, Örebro universitet, Institutionen för naturvetenskap och teknik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-76588.

Full text
Abstract:
Pre-programming a robot may be efficient to some extent, but since a human has code the robot it will only be as efficient as the programming. The problem can solved by using machine learning, which lets the robot learn the most efficient way by itself. This thesis is continuation of a previous work that covered the development of the framework ​Safe-To-Explore-State-Spaces​ (STESS) for safe robot manipulation. This thesis evaluates the efficiency of the ​Q-Learning with normalized advantage function ​ (NAF), a deep reinforcement learning algorithm, when integrated with the safety framework STESS. It does this by performing a 2D task where the robot moves the tooltip on a plane from point A to point B in a set workspace. To test the viability different scenarios was presented to the robot. No obstacles, sphere obstacles and cylinder obstacles. The reinforcement learning algorithm only knew the starting position and the STESS pre-defined the workspace constraining the areas which the robot could not enter. By satisfying these constraints the robot could explore and learn the most efficient way to complete its task. The results show that in simulation the NAF-algorithm learns fast and efficient, while avoiding the obstacles without collision.
Förprogrammering av en robot kan vara effektiv i viss utsträckning, men eftersom en människa har programmerat roboten kommer den bara att vara lika effektiv som programmet är skrivet. Problemet kan lösas genom att använda maskininlärning. Detta gör att roboten kan lära sig det effektivaste sättet på sitt sätt. Denna avhandling är fortsättning på ett tidigare arbete som täckte utvecklingen av ramverket Safe-To-Explore-State-Spaces (STESS) för säker robot manipulation. Denna avhandling utvärderar effektiviteten hos ​Q-Learning with normalized advantage function (NAF)​, en deep reinforcement learning algoritm, när den integreras med ramverket STESS. Det gör detta genom att utföra en 2D-uppgift där roboten flyttar sitt verktyg på ett plan från punkt A till punkt B i en förbestämd arbetsyta. För att testa effektiviteten presenterades olika scenarier för roboten. Inga hinder, hinder med sfärisk form och hinder med cylindrisk form. Deep reinforcement learning algoritmen visste bara startpositionen och STESS-fördefinierade arbetsytan och begränsade de områden som roboten inte fick beträda. Genom att uppfylla dessa hinder kunde roboten utforska och lära sig det mest effektiva sättet att utföra sin uppgift. Resultaten visar att NAF-algoritmen i simulering lär sig snabbt och effektivt, samtidigt som man undviker hindren utan kollision.
APA, Harvard, Vancouver, ISO, and other styles
2

Mason, George. "Safe reinforcement learning using formally verified abstract policies." Thesis, University of York, 2018. http://etheses.whiterose.ac.uk/22450/.

Full text
Abstract:
Reinforcement learning (RL) is an artificial intelligence technique for finding optimal solutions for sequential decision-making problems modelled as Markov decision processes (MDPs). Objectives are represented as numerical rewards in the model where positive values represent achievements and negative values represent failures. An autonomous agent explores the model to locate rewards with the goal to learn behaviour which will cumulate the largest reward possible. Despite RL successes in applications ranging from robotics and planning systems to sensing, it has so far had little appeal in mission- and safety-critical systems where unpredictable agent actions could lead to mission failure, risks to humans, itself or other systems, or violations of legal requirements. This is due to the difficulty of encoding non-trivial requirements of agent behaviour through rewards alone. This thesis introduces assured reinforcement learning (ARL), a safe RL approach that restricts agent actions, during and after learning. This restriction is based on formally verified policies synthesised for a high-level, abstract MDP that models the safety-relevant aspects of the RL problem. The resulting actions form overall solutions whose properties satisfy strict safety and optimality requirements. Next, ARL with knowledge revision is introduced, allowing ARL to still be used if the initial knowledge for generating action constraints proves to be incorrect. Additionally, two case studies are introduced to test the efficacy of ARL: the first is an adaptation of the benchmark flag collection navigation task and the second is an assisted-living planning system. Finally, an architecture for runtime ARL is proposed to allow ARL to be utilised in real-time systems. ARL is empirically evaluated and is shown to successfully satisfy strict safety and optimality requirements and, furthermore, with knowledge revision and action reuse, it can be successfully applied in environments where initial information may prove incomplete or incorrect.
APA, Harvard, Vancouver, ISO, and other styles
3

Iakovidis, Grigorios. "Safe Reinforcement Learning for Remote Electrical Tilt Optimization." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-294161.

Full text
Abstract:
The adjustment of the vertical tilt angle of Base Station (BS) antennas, also known as Remote Electrical Tilt (RET) optimization, is a simple and efficient method of optimizing modern telecommunications networks. Reinforcement Learning (RL) is a machine learning framework that can solve complex problems like RET optimization due to its capability to learn from experience and adapt to dynamic environments. However, conventional RL methods involve trial-and-error processes which can result in short periods of poor network performance which is unacceptable to mobile network operators. This unreliability has prevented RL solutions from being deployed in real-world mobile networks. In this thesis, we formulate the RET optimization problem as a Safe Reinforcement Learning (SRL) problem and attempt to train an RL policy that can offer performance improvement guarantees with respect to an existing baseline policy. We utilize a recent SRL method called Safe Policy Improvement through Baseline Bootstrapping (SPIBB) to improve over a baseline by training an RL agent on a offline dataset of environment interactions gathered by the baseline. We evaluate our solution using a simulated environment and show that it is effective at improving a tilt update policy in a safe manner, thus providing a more reliable RL solution to the RET optimization problem and potentially enabling future real-world deployment.
Justeringen av den vertikala lutningsvinkeln hos basstationens antenner, även kallad Remote Electrical Tilt (RET) optimering, är en enkel och effektiv metod för att optimera moderna telenät. Förstärkningsinlärning är en maskininlärningsram som kan lösa komplexa problem som RET-optimering tack vare dess förmåga att lära sig av erfarenhet och anpassa sig till dynamiska miljöer. Konventionella förstärkningsinlärning metoder innebär emellertid försök och felprocesser som kan leda till korta perioder av dålig nätverksprestanda, vilket är oacceptabelt förmobilnätoperatörerna. Denna otillförlitlighet har hindrat förstärkningsinlärning lösningar från att användas i verkliga mobila nätverk. I denna hypotes formulerar vi problemet med RET-optimering som ett problem med Säker Förstärkningsinlärning(SF) och försöker utbilda en förstärkningsinlärning policy som kan erbjuda garantier för förbättrad prestanda i förhållande till en befintlig grundläggandepolicy. Vi använder en nyligen genomförd SF-metod som kallas Safe PolicyImprovement by Baseline Bootstrapping (SPIBB) för att förbättra en baslinje genom att utbilda en förstärkningsinlärning agent på en offlinedatabaserad datamängdmed miljöinteraktioner som samlats in vid baslinjen. Vi utvärderar vår lösning med hjälp av en simulerad miljö och visar att den är effektiv när det gäller att förbättra politiken för tippuppdatering på ett säkert sätt, vilket ger en mer tillförlitligförstärkningsinlärning lösning på problemet med RET-optimering och eventuellt möjliggör framtida realglobal driftsättning.
APA, Harvard, Vancouver, ISO, and other styles
4

Geramifard, Alborz 1980. "Practical reinforcement learning using representation learning and safe exploration for large scale Markov decision processes." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/71455.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 157-168).
While creating intelligent agents who can solve stochastic sequential decision making problems through interacting with the environment is the promise of Reinforcement Learning (RL), scaling existing RL methods to realistic domains such as planning for multiple unmanned aerial vehicles (UAVs) has remained a challenge due to three main factors: 1) RL methods often require a plethora of data to find reasonable policies, 2) the agent has limited computation time between interactions, and 3) while exploration is necessary to avoid convergence to the local optima, in sensitive domains visiting all parts of the planning space may lead to catastrophic outcomes. To address the first two challenges, this thesis introduces incremental Feature Dependency Discovery (iFDD) as a representation expansion method with cheap per-timestep computational complexity that can be combined with any online, value-based reinforcement learning using binary features. In addition to convergence and computational complexity guarantees, when coupled with SARSA, iFDD achieves much faster learning (i.e., requires much less data samples) in planning domains including two multi-UAV mission planning scenarios with hundreds of millions of state-action pairs. In particular, in a UAV mission planning domain, iFDD performed more than 12 times better than the best competitor given the same number of samples. The third challenge is addressed through a constructive relationship between a planner and a learner in order to mitigate the learning risk while boosting the asymptotic performance and safety of an agent's behavior. The framework is an instance of the intelligent cooperative control architecture where a learner initially follows a safe policy generated by a planner. The learner incrementally improves this baseline policy through interaction, while avoiding behaviors believed to be risky. The new approach is demonstrated to be superior in two multi-UAV task assignment scenarios. For example in one case, the proposed method reduced the risk by 8%, while improving the performance of the planner up to 30%.
by Alborz Geramifard.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
5

Heidenreich, Caroline. "Safe learning for control: Combining disturbance estimation, reachability analysis and reinforcement learning with systematic exploration." Thesis, KTH, Reglerteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214080.

Full text
Abstract:
Learning to control an uncertain system is a problem with a plethora ofapplications in various engineering elds. In the majority of practical scenarios,one wishes that the learning process terminates quickly and does not violatesafety limits on key variables. It is particularly appealing to learn the controlpolicy directly from experiments, since this eliminates the need to rst derivean accurate physical model of the system. The main challenge when using suchan approach is to ensure safety constraints during the learning process.This thesis investigates an approach to safe learning that relies on a partlyknown state-space model of the system and regards the unknown dynamics asan additive bounded disturbance. Based on an initial conservative disturbanceestimate, a safe set and the corresponding safe control are calculated using aHamilton-Jacobi-Isaacs reachability analysis. Within the computed safe set avariant of the celebrated Q-learning algorithm, which systematically exploresthe uncertain areas of the state space, is employed to learn a control policy.Whenever the system state hits the boundary of the safe set, a safety-preservingcontrol is applied to bring the system back to safety. The initial disturbancerange is updated on-line using Gaussian Process regression based on the measureddata. This less conservative disturbance estimate is used to increase thesize of the safe set. To the best of our knowledge, this thesis provides the rstattempt towards combining these theoretical tools from reinforcement learningand reachability analysis to safe learning.We evaluate our approach on an inverted pendulum system. The proposedalgorithm manages to learn a policy that does not violate the pre-speciedsafety constraints. We observe that performance is signicantly improved whenwe incorporate systematic exploration to make sure that an optimal policy islearned everywhere in the safe set. Finally, we outline some promising directionsfor future research beyond the scope of this thesis.
Maskininlärning för att uppnå en reglerstrategi för ett delvis okända systemär ett problem med en mångfald av tillämpningar i olika ingenjörvetenskapligaområden. I de esta praktiska scenarier vill man att inlärningsprocessen skaavsluta snabbt utan att bryta inte mot givna bivillkor. Särkild lockande är detatt lära in en reglerstrategi direkt från experiment eftersom man då kringgårnödvändigheten att härleda en exakt modell av systemet först. Den största utmaningenmed denna metod är att säkerställa att säkerhetsrelaterade bivillkorär uppfyllda under hela inlärningsprocessen.Detta examensarbete undersöker ett tillvägagångssätt att uppnå säker maskininlärning som bygger på en delvis känd modell av tillståndsrummet och betraktarde okända dynamikerna som en additiv begränsad störning. Baseratpå en initial konservativ uppskattning av störningen, beräknas en säker tillståndsmängd och en motsvarande reglerstragi genom använding av Hamilton-Jacobi-Isaacs nåbarhetsanalys. Inom den beräknade tillståndsmängden används en variant av Q-inlärning som systematiskt utforskar okända delar av tillståndsrummet för att lära in en reglerstrategi. När systemet stöter på gränsenav den säkra tillståndsmängden, tillämpas istället en säkerhetsbevarande reglerstrategiför att få systemet tillbaka till säkerhet. Den första uppskattningenav störningen uppdateras kontinuerligt genom Gaussprocessregression baseradpå uppmätt data. Nya, mindre konservativa uppskattningar används för attöka storleken på den säkra tillståndsmängden. Så vitt vi vet är detta examensarbetedet första försöket att kombinera dessa teoretiska metoder, frånförstärkningsinlärning och nåbarhetsanalys, för att uppnå säker inlärning.Vi utvärderar vår metod på ett inverterat pendelsystem. Den föreslagnaalgoritmen klarar av att lära in en reglerstrategi som inte bryter mot i förvägspecierade bivillkor. Vi iakttar att prestandan kan förbättras avsevärt om viintegrerar systematisk utforskning för att säkerställa att den optimala reglerstrateginlärs in överallt i den säkra tillståndsmängden. Slutligen diskuterar vinågra lovande inriktingar för framtida forskning utöver omfattningen av dettaarbete.
APA, Harvard, Vancouver, ISO, and other styles
6

Ohnishi, Motoya. "Safey-aware Adaptive Reinforcement Learning with Applications to Brushbot Navigation." Thesis, KTH, Reglerteknik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-226591.

Full text
Abstract:
This thesis presents a safety-aware learning framework that employs an adaptivemodel learning method together with barrier certificates for systems withpossibly nonstationary agent dynamics. To extract the dynamic structure ofthe model, we use a sparse optimization technique, and the resulting modelwill be used in combination with control barrier certificates which constrainfeedback controllers only when safety is about to be violated. Under somemild assumptions, solutions to the constrained feedback-controller optimizationare guaranteed to be globally optimal, and the monotonic improvementof a feedback controller is thus ensured. In addition, we reformulate the(action-)value function approximation to make any kernel-based nonlinearfunction estimation method applicable. We then employ a state-of-the-artkernel adaptive filtering technique for the (action-)value function approximation.The resulting framework is verified experimentally on a brushbot,whose dynamics is unknown and highly complex.
Det här examensarbetet presenterar ett ramverk för självlärande säkerhetskritiskareglersystem. Ramverket är baserat på en kombination av adaptivmodellinärning och barriär-certifikat, och kan hantera system med ickestationärdynamik. För att extrahera den dynamiska strukturen hos modellenanvänder vi en gles optimeringsteknik och den resulterande modellenanvänds sedan i kombination med barriär-certifikat som endast begränsarden återkopplade styrlagen när systemsäkerheten är i fara. Under milda antagandenvisar vi att optimeringsproblemet som måste lösas för att hittaden optimala styråtgärden i varje tidpunkt är konvext, och att prestandanhos den inlärda styrlagen förbättras monotont. Dessutom omformulerar vivärdefunktions-approximationsproblemet så att det kan lösas med en godtyckligmetod för kärnbaserad funktionsskattning. Vi använder sedan enledande teknik för kärnbaserad adaptiv filtrering för värdefunktionsapproximationeni vår algoritm. Det resulterande ramverket verifieras slutligenexperimentellt på en borst-robot, vars dynamik är okänd och mycket komplex.
APA, Harvard, Vancouver, ISO, and other styles
7

Ho, Chang-An, and 何長安. "Safe Reinforcement Learning based Sequential Perturbation Learning Algorithm." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/63234750154932788712.

Full text
Abstract:
碩士
國立交通大學
電機與控制工程系所
97
This article is about sequential perturbation learning architecture through safe reinforcement learning (SRL-SP) which based on the concept of linear search to apply perturbations on each weight value of the neural network. The evaluation of value of function between pre-perturb and post-perturb network is executed after the perturbations are applied, so as to update the weights. Applying perturbations can avoid the solution form the phenomenon which falls into the hands of local solution and oscillating in the solution space that decreases the learning efficiency. Besides, in the reinforcement learning structure, use the Lyapunov design methods to set the learning objective and pre-defined set of the goal state. This method would greatly reduces the learning time, in other words, it can rapidly guide the plant’s state into the goal state. During the simulation, use the n-mass inverted pendulum model to perform the experiment of humanoid robot model. To prove the method in this article is more effective in learning.
APA, Harvard, Vancouver, ISO, and other styles
8

Everitt, Tom. "Towards Safe Artificial General Intelligence." Phd thesis, 2018. http://hdl.handle.net/1885/164227.

Full text
Abstract:
The field of artificial intelligence has recently experienced a number of breakthroughs thanks to progress in deep learning and reinforcement learning. Computer algorithms now outperform humans at Go, Jeopardy, image classification, and lip reading, and are becoming very competent at driving cars and interpreting natural language. The rapid development has led many to conjecture that artificial intelligence with greater-than-human ability on a wide range of tasks may not be far. This in turn raises concerns whether we know how to control such systems, in case we were to successfully build them. Indeed, if humanity would find itself in conflict with a system of much greater intelligence than itself, then human society would likely lose. One way to make sure we avoid such a conflict is to ensure that any future AI system with potentially greater-than-human-intelligence has goals that are aligned with the goals of the rest of humanity. For example, it should not wish to kill humans or steal their resources. The main focus of this thesis will therefore be goal alignment, i.e. how to design artificially intelligent agents with goals coinciding with the goals of their designers. Focus will mainly be directed towards variants of reinforcement learning, as reinforcement learning currently seems to be the most promising path towards powerful artificial intelligence. We identify and categorize goal misalignment problems in reinforcement learning agents as designed today, and give examples of how these agents may cause catastrophes in the future. We also suggest a number of reasonably modest modifications that can be used to avoid or mitigate each identified misalignment problem. Finally, we also study various choices of decision algorithms, and conditions for when a powerful reinforcement learning system will permit us to shut it down. The central conclusion is that while reinforcement learning systems as designed today are inherently unsafe to scale to human levels of intelligence, there are ways to potentially address many of these issues without straying too far from the currently so successful reinforcement learning paradigm. Much work remains in turning the high-level proposals suggested in this thesis into practical algorithms, however.
APA, Harvard, Vancouver, ISO, and other styles
9

Jayant, Ashish. "Model-based Safe Deep Reinforcement Learning and Empirical Analysis of Safety via Attribution." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/5849.

Full text
Abstract:
During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps, which in the real-world limit the practicality of these algorithms as this can lead to potentially dangerous behavior. Hence safe exploration is a critical issue in applying RL algorithms in the real world. This problem is well studied in the literature under the Constrained Markov Decision Process (CMDP) Framework, where in addition to single-stage rewards, state transitions receive single-stage costs as well. The prescribed cost functions are responsible for mapping undesirable behavior at any given time-step to a scalar value. Then we aim to find a feasible policy that maximizes reward returns and keeps cost returns below a prescribed threshold during training as well as deployment. We propose a novel On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner as well as find a feasible optimal policy using Lagrangian Relaxation-based Proximal Policy Optimization. This combination of transition dynamics learning and a safety-promoting RL algorithm leads to 3-4 times less environment interactions and less cumulative hazard violations compared to the model-free ap- proach. We use an ensemble of neural networks with different initializations to tackle epistemic and aleatoric uncertainty issues faced during environment model learning. We present our results on a challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym. In addition to this, we perform an attribution analysis of actions taken by the Deep Neural Network-based policy at each time step. This analysis helps us to : 1. Identify the feature in state representation which is significantly responsible for the current action. 2. Empirically provide the evidence of the safety-aware agent’s ability to deal with hazards in the environment provided that hazard information is present in the state representation. In order to perform the above analysis, we assume state representation has meaningful information about hazards and goals. Then we calculate an attribution vector of the same dimension as state using a well-known attribution technique known as Integrated Gradients. The resultant attribution vector provides the importance of each state feature for the current action.
NA
APA, Harvard, Vancouver, ISO, and other styles
10

Hsu, Yung-Chi, and 徐永吉. "Improved Safe Reinforcement Learning Based Self Adaptive Evolutionary Algorithms for Neuro-Fuzzy Controller Design." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/43659775487135397105.

Full text
Abstract:
博士
國立交通大學
電機與控制工程系所
97
In this dissertation, improved safe reinforcement learning based self adaptive evolutionary algorithms (ISRL-SAEAs) are proposed for TSK-type neuro-fuzzy controller design. The ISRL-SAEAs can improve not only the reinforcement signal designed but also traditional evolutionary algorithms. There are two parts in the proposed ISRL-SAEAs. In the first part, the SAEAs are proposed to solve the following problems: 1) all the fuzzy rules are encoded into one chromosome; 2) the number of fuzzy rules has to be assigned in advance; and 3) the population cannot evaluate each fuzzy rule locally. The second part of the ISRL-SAEAs is the ISRL. In the ISRL, two different strategies (judgment and evaluation) are used to design the reinforcement signal. Moreover the Lyapunov stability is considered in ISRL. To demonstrate the performance of the proposed method, the inverted pendulum control system and tandem pendulum control system are presented. As shown in simulation, the ISRL-SAEAs perform better than other reinforcement evolution methods.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Safe Reinforcement Learning"

1

Trappenberg, Thomas P. Fundamentals of Machine Learning. Oxford University Press, 2019. http://dx.doi.org/10.1093/oso/9780198828044.001.0001.

Full text
Abstract:
Machine learning is exploding, both in research and for industrial applications. This book aims to be a brief introduction to this area given the importance of this topic in many disciplines, from sciences to engineering, and even for its broader impact on our society. This book tries to contribute with a style that keeps a balance between brevity of explanations, the rigor of mathematical arguments, and outlining principle ideas. At the same time, this book tries to give some comprehensive overview of a variety of methods to see their relation on specialization within this area. This includes some introduction to Bayesian approaches to modeling as well as deep learning. Writing small programs to apply machine learning techniques is made easy today by the availability of high-level programming systems. This book offers examples in Python with the machine learning libraries sklearn and Keras. The first four chapters concentrate largely on the practical side of applying machine learning techniques. The book then discusses more fundamental concepts and includes their formulation in a probabilistic context. This is followed by chapters on advanced models, that of recurrent neural networks and that of reinforcement learning. The book closes with a brief discussion on the impact of machine learning and AI on our society.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Safe Reinforcement Learning"

1

Zhang, Jianyi, and Paul Weng. "Safe Distributional Reinforcement Learning." In Lecture Notes in Computer Science, 107–28. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-94662-3_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Neufeld, Emery A., Ezio Bartocci, and Agata Ciabattoni. "On Normative Reinforcement Learning via Safe Reinforcement Learning." In PRIMA 2022: Principles and Practice of Multi-Agent Systems, 72–89. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-21203-1_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Fulton, Nathan, and André Platzer. "Verifiably Safe Off-Model Reinforcement Learning." In Tools and Algorithms for the Construction and Analysis of Systems, 413–30. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-17462-0_28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Bragg, John, and Ibrahim Habli. "What Is Acceptably Safe for Reinforcement Learning?" In Developments in Language Theory, 418–30. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-99229-7_35.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bacci, Edoardo, and David Parker. "Probabilistic Guarantees for Safe Deep Reinforcement Learning." In Lecture Notes in Computer Science, 231–48. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-57628-8_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Cheng, Jiangchang, Fumin Yu, Hongliang Zhang, and Yinglong Dai. "Skill Reward for Safe Deep Reinforcement Learning." In Communications in Computer and Information Science, 203–13. Singapore: Springer Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0468-4_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Liu, Shaofan, and Shiliang Sun. "Safe Offline Reinforcement Learning Through Hierarchical Policies." In Advances in Knowledge Discovery and Data Mining, 380–91. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-05936-0_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cohen, Max, and Calin Belta. "Safe Exploration in Model-Based Reinforcement Learning." In Adaptive and Learning-Based Control of Safety-Critical Systems, 133–63. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-29310-8_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Pecka, Martin, and Tomas Svoboda. "Safe Exploration Techniques for Reinforcement Learning – An Overview." In Modelling and Simulation for Autonomous Systems, 357–75. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-13823-7_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Cohen, Max, and Calin Belta. "Temporal Logic Guided Safe Model-Based Reinforcement Learning." In Adaptive and Learning-Based Control of Safety-Critical Systems, 165–92. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-29310-8_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Safe Reinforcement Learning"

1

Padakandla, Sindhu, Prabuchandran K. J, Sourav Ganguly, and Shalabh Bhatnagar. "Data Efficient Safe Reinforcement Learning." In 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2022. http://dx.doi.org/10.1109/smc53654.2022.9945313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Isele, David, Alireza Nakhaei, and Kikuo Fujimura. "Safe Reinforcement Learning on Autonomous Vehicles." In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018. http://dx.doi.org/10.1109/iros.2018.8593420.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Calvo-Fullana, Miguel, Luiz F. O. Chamon, and Santiago Paternain. "Towards Safe Continuing Task Reinforcement Learning." In 2021 American Control Conference (ACC). IEEE, 2021. http://dx.doi.org/10.23919/acc50511.2021.9482748.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Jia, Yan, John Burden, Tom Lawton, and Ibrahim Habli. "Safe Reinforcement Learning for Sepsis Treatment." In 2020 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, 2020. http://dx.doi.org/10.1109/ichi48887.2020.9374367.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Yang, Tsung-Yen, Tingnan Zhang, Linda Luu, Sehoon Ha, Jie Tan, and Wenhao Yu. "Safe Reinforcement Learning for Legged Locomotion." In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022. http://dx.doi.org/10.1109/iros47612.2022.9982038.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kim, Dohyeong, Jaeseok Heo, and Songhwai Oh. "SafeTAC: Safe Tsallis Actor-Critic Reinforcement Learning for Safer Exploration." In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022. http://dx.doi.org/10.1109/iros47612.2022.9982140.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Yang, Wen-Chi, Giuseppe Marra, Gavin Rens, and Luc De Raedt. "Safe Reinforcement Learning via Probabilistic Logic Shields." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/637.

Full text
Abstract:
Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions. Therefore, PLPG can be seamlessly applied to any policy gradient algorithm while still providing the same convergence guarantees. In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques.
APA, Harvard, Vancouver, ISO, and other styles
8

Rahman, Md Asifur, Tongtong Liu, and Sarra Alqahtani. "Adversarial Behavior Exclusion for Safe Reinforcement Learning." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/54.

Full text
Abstract:
Learning by exploration makes reinforcement learning (RL) potentially attractive for many real-world applications. However, this learning process makes RL inherently too vulnerable to be used in real-world applications where safety is of utmost importance. Most prior studies consider exploration at odds with safety and thereby restrict it using either joint optimization of task and safety or imposing constraints for safe exploration. This paper migrates from the current convention to using exploration as a key to safety by learning safety as a robust behavior that completely excludes any behavioral pattern responsible for safety violations. Adversarial Behavior Exclusion for Safe RL (AdvEx-RL) learns a behavioral representation of the agent's safety violations by approximating an optimal adversary utilizing exploration and later uses this representation to learn a separate safety policy that excludes those unsafe behaviors. In addition, AdvEx-RL ensures safety in a task-agnostic manner by acting as a safety firewall and therefore can be integrated with any RL task policy. We demonstrate the robustness of AdvEx-RL via comprehensive experiments in standard constrained Markov decision processes (CMDP) environments under 2 white-box action space perturbations as well as with changes in environment dynamics against 7 baselines. Consistently, AdvEx-RL outperforms the baselines by achieving an average safety performance of over 75% in the continuous action space with 10 times more variations in the testing environment dynamics. By using a standalone safety policy independent of conflicting objectives, AdvEx-RL also paves the way for interpretable safety behavior analysis as we show in our user study.
APA, Harvard, Vancouver, ISO, and other styles
9

Umemoto, Takumi, Tohgoroh Matsui, Atsuko Mutoh, Koichi Moriyama, and Nobuhiro Inuzuka. "Safe Reinforcement Learning in Continuous State Spaces." In 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE). IEEE, 2019. http://dx.doi.org/10.1109/gcce46687.2019.9014637.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Yang, Yongliang, Kyriakos G. Vamvoudakis, Hamidreza Modares, Wei He, Yixin Yin, and Donald C. Wunsch. "Safe Intermittent Reinforcement Learning for Nonlinear Systems." In 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019. http://dx.doi.org/10.1109/cdc40024.2019.9030210.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Safe Reinforcement Learning"

1

Miles, Gaines E., Yael Edan, F. Tom Turpin, Avshalom Grinstein, Thomas N. Jordan, Amots Hetzroni, Stephen C. Weller, Marvin M. Schreiber, and Okan K. Ersoy. Expert Sensor for Site Specification Application of Agricultural Chemicals. United States Department of Agriculture, August 1995. http://dx.doi.org/10.32747/1995.7570567.bard.

Full text
Abstract:
In this work multispectral reflectance images are used in conjunction with a neural network classifier for the purpose of detecting and classifying weeds under real field conditions. Multispectral reflectance images which contained different combinations of weeds and crops were taken under actual field conditions. This multispectral reflectance information was used to develop algorithms that could segment the plants from the background as well as classify them into weeds or crops. In order to segment the plants from the background the multispectrial reflectance of plants and background were studied and a relationship was derived. It was found that using a ratio of two wavelenght reflectance images (750nm and 670nm) it was possible to segment the plants from the background. Once ths was accomplished it was then possible to classify the segmented images into weed or crop by use of the neural network. The neural network developed for this work is a modification of the standard learning vector quantization algorithm. This neural network was modified by replacing the time-varying adaptation gain with a constant adaptation gain and a binary reinforcement function. This improved accuracy and training time as well as introducing several new properties such as hill climbing and momentum addition. The network was trained and tested with different wavelength combinations in order to find the best results. Finally, the results of the classifier were evaluated using a pixel based method and a block based method. In the pixel based method every single pixel is evaluated to test whether it was classified correctly or not and the best weed classification results were 81% and its associated crop classification accuracy is 57%. In the block based classification method, the image was divided into blocks and each block was evaluated to determine whether they contained weeds or not. Different block sizes and thesholds were tested. The best results for this method were 97% for a block size of 8 inches and a pixel threshold of 60. A simulation model was developed to 1) quantify the effectiveness of a site-specific sprayer, 2) evaluate influence of diffeent design parameters on efficiency of the site-specific sprayer. In each iteration of this model, infected areas (weed patches) in the field were randomly generated and the amount of required herbicides for spraying these areas were calculated. The effectiveness of the sprayer was estimated for different stain sizes, nozzle types (conic and flat), nozzle sizes and stain detection levels of the identification system. Simulation results indicated that the flat nozzle is much more effective as compared to the conic nozzle and its relative efficiency is greater for small nozzle sizes. By using a site-specific sprayer, the average ratio between the spraying areas and the stain areas is about 1.1 to 1.8 which can save up to 92% of herbicides, especially when the proportion of the stain areas is small.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography