Relevant bibliographies by topics / Safe RL

Journal articles
Dissertations / Theses
Book chapters
Conference papers
Reports

Academic literature on the topic 'Safe RL'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Safe RL.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Safe RL"

Carr, Steven, Nils Jansen, Sebastian Junges, and Ufuk Topcu. "Safe Reinforcement Learning via Shielding under Partial Observability." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 14748–56. http://dx.doi.org/10.1609/aaai.v37i12.26723.

Full text

Abstract:

Safe exploration is a common problem in reinforcement learning (RL) that aims to prevent agents from making disastrous decisions while exploring their environment. A family of approaches to this problem assume domain knowledge in the form of a (partial) model of this environment to decide upon the safety of an action. A so-called shield forces the RL agent to select only safe actions. However, for adoption in various applications, one must look beyond enforcing safety and also ensure the applicability of RL with good performance. We extend the applicability of shields via tight integration with state-of-the-art deep RL, and provide an extensive, empirical study in challenging, sparse-reward environments under partial observability. We show that a carefully integrated shield ensures safety and can improve the convergence rate and final performance of RL agents. We furthermore show that a shield can be used to bootstrap state-of-the-art RL agents: they remain safe after initial learning in a shielded setting, allowing us to disable a potentially too conservative shield eventually.

APA, Harvard, Vancouver, ISO, and other styles

Ma, Yecheng Jason, Andrew Shen, Osbert Bastani, and Jayaraman Dinesh. "Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 5 (June 28, 2022): 5404–12. http://dx.doi.org/10.1609/aaai.v36i5.20478.

Full text

Abstract:

Reinforcement Learning (RL) agents in the real world must satisfy safety constraints in addition to maximizing a reward objective. Model-based RL algorithms hold promise for reducing unsafe real-world actions: they may synthesize policies that obey all constraints using simulated samples from a learned model. However, imperfect models can result in real-world constraint violations even for actions that are predicted to satisfy all constraints. We propose Conservative and Adaptive Penalty (CAP), a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment. We evaluate this conservative and adaptive penalty-based approach for model-based safe RL extensively on state and image-based environments. Our results demonstrate substantial gains in sample-efficiency while incurring fewer violations than prior safe RL algorithms. Code is available at: https://github.com/Redrew/CAP

APA, Harvard, Vancouver, ISO, and other styles

Xu, Haoran, Xianyuan Zhan, and Xiangyu Zhu. "Constraints Penalized Q-learning for Safe Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8753–60. http://dx.doi.org/10.1609/aaai.v36i8.20855.

Full text

Abstract:

We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that naïve approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.

APA, Harvard, Vancouver, ISO, and other styles

Thananjeyan, Brijen, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. "Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones." IEEE Robotics and Automation Letters 6, no. 3 (July 2021): 4915–22. http://dx.doi.org/10.1109/lra.2021.3070252.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Serrano-Cuevas, Jonathan, Eduardo F. Morales, and Pablo Hernández-Leal. "Safe reinforcement learning using risk mapping by similarity." Adaptive Behavior 28, no. 4 (July 18, 2019): 213–24. http://dx.doi.org/10.1177/1059712319859650.

Full text

Abstract:

Reinforcement learning (RL) has been used to successfully solve sequential decision problem. However, considering risk at the same time as the learning process is an open research problem. In this work, we are interested in the type of risk that can lead to a catastrophic state. Related works that aim to deal with risk propose complex models. In contrast, we follow a simple, yet effective, idea: similar states might lead to similar risk. Using this idea, we propose risk mapping by similarity (RMS), an algorithm for discrete scenarios which infers the risk of newly discovered states by analyzing how similar they are to previously known risky states. In general terms, the RMS algorithm transfers the knowledge gathered by the agent regarding the risk to newly discovered states. We contribute with a new approach to consider risk based on similarity and with RMS, which is simple and generalizable as long as the premise similar states yield similar risk holds. RMS is not an RL algorithm, but a method to generate a risk-aware reward shaping signal that can be used with a RL algorithm to generate risk-aware policies.

APA, Harvard, Vancouver, ISO, and other styles

Cheng, Richard, Gábor Orosz, Richard M. Murray, and Joel W. Burdick. "End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3387–95. http://dx.doi.org/10.1609/aaai.v33i01.33013387.

Full text

Abstract:

Reinforcement Learning (RL) algorithms have found limited success beyond simulated applications, and one main reason is the absence of safety guarantees during the learning process. Real world systems would realistically fail or break before an optimal controller can be learned. To address this issue, we propose a controller architecture that combines (1) a model-free RL-based controller with (2) model-based controllers utilizing control barrier functions (CBFs) and (3) online learning of the unknown system dynamics, in order to ensure safety during learning. Our general framework leverages the success of RL algorithms to learn high-performance controllers, while the CBF-based controllers both guarantee safety and guide the learning process by constraining the set of explorable polices. We utilize Gaussian Processes (GPs) to model the system dynamics and its uncertainties. Our novel controller synthesis algorithm, RL-CBF, guarantees safety with high probability during the learning process, regardless of the RL algorithm used, and demonstrates greater policy exploration efficiency. We test our algorithm on (1) control of an inverted pendulum and (2) autonomous carfollowing with wireless vehicle-to-vehicle communication, and show that our algorithm attains much greater sample efficiency in learning than other state-of-the-art algorithms and maintains safety during the entire learning process.

APA, Harvard, Vancouver, ISO, and other styles

Jurj, Sorin Liviu, Dominik Grundt, Tino Werner, Philipp Borchers, Karina Rothemann, and Eike Möhlmann. "Increasing the Safety of Adaptive Cruise Control Using Physics-Guided Reinforcement Learning." Energies 14, no. 22 (November 12, 2021): 7572. http://dx.doi.org/10.3390/en14227572.

Full text

Abstract:

This paper presents a novel approach for improving the safety of vehicles equipped with Adaptive Cruise Control (ACC) by making use of Machine Learning (ML) and physical knowledge. More exactly, we train a Soft Actor-Critic (SAC) Reinforcement Learning (RL) algorithm that makes use of physical knowledge such as the jam-avoiding distance in order to automatically adjust the ideal longitudinal distance between the ego- and leading-vehicle, resulting in a safer solution. In our use case, the experimental results indicate that the physics-guided (PG) RL approach is better at avoiding collisions at any selected deceleration level and any fleet size when compared to a pure RL approach, proving that a physics-informed ML approach is more reliable when developing safe and efficient Artificial Intelligence (AI) components in autonomous vehicles (AVs).

APA, Harvard, Vancouver, ISO, and other styles

Sakrihei, Helen. "Using automatic storage for ILL – experiences from the National Repository Library in Norway." Interlending & Document Supply 44, no. 1 (February 15, 2016): 14–16. http://dx.doi.org/10.1108/ilds-11-2015-0035.

Full text

Abstract:

Purpose – The purpose of this paper is to share the Norwegian Repository Library (RL)’s experiences with an automatic storage for interlibrary lending (ILL). Design/methodology/approach – This paper describes how the RL uses the automatic storage to deliver ILL services to Norwegian libraries. Chaos storage is the main principle for storage. Findings – Using automatic storage for ILL is efficient, cost-effective and safe. Originality/value – The RL has used automatic storage since 2003, and it is one of a few libraries using this technology.

APA, Harvard, Vancouver, ISO, and other styles

Ding, Yuhao, and Javad Lavaei. "Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 7396–404. http://dx.doi.org/10.1609/aaai.v37i6.25900.

Full text

Abstract:

We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we identify two alternative conditions on the time-varying constraints under which we can guarantee the safety in the long run. We also propose the Periodically Restarted Optimistic Primal-Dual Proximal Policy Optimization (PROPD-PPO) algorithm that can coordinate with both two conditions. Furthermore, a dynamic regret bound and a constraint violation bound are established for the proposed algorithm in both the linear kernel CMDP function approximation setting and the tabular CMDP setting under two alternative conditions. This paper provides the first provably efficient algorithm for non-stationary CMDPs with safe exploration.

APA, Harvard, Vancouver, ISO, and other styles

Tubeuf, Carlotta, Felix Birkelbach, Anton Maly, and René Hofmann. "Increasing the Flexibility of Hydropower with Reinforcement Learning on a Digital Twin Platform." Energies 16, no. 4 (February 11, 2023): 1796. http://dx.doi.org/10.3390/en16041796.

Full text

Abstract:

The increasing demand for flexibility in hydropower systems requires pumped storage power plants to change operating modes and compensate reactive power more frequently. In this work, we demonstrate the potential of applying reinforcement learning (RL) to control the blow-out process of a hydraulic machine during pump start-up and when operating in synchronous condenser mode. Even though RL is a promising method that is currently getting much attention, safety concerns are stalling research on RL for the control of energy systems. Therefore, we present a concept that enables process control with RL through the use of a digital twin platform. This enables the safe and effective transfer of the algorithm’s learning strategy from a virtual test environment to the physical asset. The successful implementation of RL in a test environment is presented and an outlook on future research on the transfer to a model test rig is given.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Safe RL"

Gowda, Malali, R. C. Venu, Mohan Raghupathy, Kan Nobuta, Huameng Li, Rod Wing, Eric Stahlberg, et al. "Deep and comparative analysis of the mycelium and appressorium transcriptomes of Magnaporthe grisea using MPSS, RL-SAGE, and oligoarray methods." BioMed Central, 2006. http://hdl.handle.net/10150/610397.

Full text

Abstract:

BACKGROUND:Rice blast, caused by the fungal pathogen Magnaporthe grisea, is a devastating disease causing tremendous yield loss in rice production. The public availability of the complete genome sequence of M. grisea provides ample opportunities to understand the molecular mechanism of its pathogenesis on rice plants at the transcriptome level. To identify all the expressed genes encoded in the fungal genome, we have analyzed the mycelium and appressorium transcriptomes using massively parallel signature sequencing (MPSS), robust-long serial analysis of gene expression (RL-SAGE) and oligoarray methods.RESULTS:The MPSS analyses identified 12,531 and 12,927 distinct significant tags from mycelia and appressoria, respectively, while the RL-SAGE analysis identified 16,580 distinct significant tags from the mycelial library. When matching these 12,531 mycelial and 12,927 appressorial significant tags to the annotated CDS, 500 bp upstream and 500 bp downstream of CDS, 6,735 unique genes in mycelia and 7,686 unique genes in appressoria were identified. A total of 7,135 mycelium-specific and 7,531 appressorium-specific significant MPSS tags were identified, which correspond to 2,088 and 1,784 annotated genes, respectively, when matching to the same set of reference sequences. Nearly 85% of the significant MPSS tags from mycelia and appressoria and 65% of the significant tags from the RL-SAGE mycelium library matched to the M. grisea genome. MPSS and RL-SAGE methods supported the expression of more than 9,000 genes, representing over 80% of the predicted genes in M. grisea. About 40% of the MPSS tags and 55% of the RL-SAGE tags represent novel transcripts since they had no matches in the existing M. grisea EST collections. Over 19% of the annotated genes were found to produce both sense and antisense tags in the protein-coding region. The oligoarray analysis identified the expression of 3,793 mycelium-specific and 4,652 appressorium-specific genes. A total of 2,430 mycelial genes and 1,886 appressorial genes were identified by both MPSS and oligoarray.CONCLUSION:The comprehensive and deep transcriptome analysis by MPSS and RL-SAGE methods identified many novel sense and antisense transcripts in the M. grisea genome at two important growth stages. The differentially expressed transcripts that were identified, especially those specifically expressed in appressoria, represent a genomic resource useful for gaining a better understanding of the molecular basis of M. grisea pathogenicity. Further analysis of the novel antisense transcripts will provide new insights into the regulation and function of these genes in fungal growth, development and pathogenesis in the host plants.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Safe RL"

Lenka, Lalu Prasad, and Mélanie Bouroche. "Safe Lane-Changing in CAVs Using External Safety Supervisors: A Review." In Communications in Computer and Information Science, 527–38. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_41.

Full text

Abstract:

AbstractConnected autonomous vehicles (CAVs) can exploit information received from other vehicles in addition to their sensor information to make decisions. For this reason, their deployment is expected to improve traffic safety and efficiency. Safe lane-changing is a significant challenge for CAVs, particularly in mixed traffic, i.e. with human-driven vehicles (HDVs) on the road, as the set of vehicles around them varies very quickly, and they can only communicate with a fraction of them. Many approaches have been proposed, with most recent work adopting a multi-agent reinforcement learning (MARL) approach, but those do not provide safety guarantees making them unsuitable for such a safety-critical application. A number of external safety techniques for reinforcement learning have been proposed, such as shielding, control barrier functions, model predictive control and recovery RL, but those have not been applied to CAV lane changing.This paper investigates whether external safety supervisors could be used to provide safety guarantees for MARL-based CAV lane changing (LC-CAV). For this purpose, a MARL approach to CAV lane changing (MARL-CAV) is designed, using parameter sharing and a replay buffer to motivate cooperative behaviour and collaboration among CAVs. This is then used as a baseline to discuss the applicability of the state-of-the-art external safety techniques for reinforcement learning to MARL-CAV. Comprehensive analysis shows that integrating an external safety technique to MARL for lane changing in CAVs is challenging, and none of the existing external safety techniques can be directly applied to MARL-CAV as these safety techniques require prior knowledge of unsafe states and recovery policies.

APA, Harvard, Vancouver, ISO, and other styles

Gowda, Malali, and Guo-Liang Wang. "Robust-LongSAGE (RL-SAGE)." In Methods in Molecular Biology, 25–38. Totowa, NJ: Humana Press, 2008. http://dx.doi.org/10.1007/978-1-59745-454-4_2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Piepenbrock, Jelle, Tom Heskes, Mikoláš Janota, and Josef Urban. "Guiding an Automated Theorem Prover with Neural Rewriting." In Automated Reasoning, 597–617. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-10769-6_35.

Full text

Abstract:

AbstractAutomated theorem provers (ATPs) are today used to attack open problems in several areas of mathematics. An ongoing project by Kinyon and Veroff uses Prover9 to search for the proof of the Abelian Inner Mapping (AIM) Conjecture, one of the top open conjectures in quasigroup theory. In this work, we improve Prover9 on a benchmark of AIM problems by neural synthesis of useful alternative formulations of the goal. In particular, we design the 3SIL (stratified shortest solution imitation learning) method. 3SIL trains a neural predictor through a reinforcement learning (RL) loop to propose correct rewrites of the conjecture that guide the search.3SIL is first developed on a simpler, Robinson arithmetic rewriting task for which the reward structure is similar to theorem proving. There we show that 3SIL outperforms other RL methods. Next we train 3SIL on the AIM benchmark and show that the final trained network, deciding what actions to take within the equational rewriting environment, proves 70.2% of problems, outperforming Waldmeister (65.5%). When we combine the rewrites suggested by the network with Prover9, we prove 8.3% more theorems than Prover9 in the same time, bringing the performance of the combined system to 90%.

APA, Harvard, Vancouver, ISO, and other styles

Swan, Jerry, Eric Nivel, Neel Kant, Jules Hedges, Timothy Atkinson, and Bas Steunebrink. "Where is My Mind?" In The Road to General Intelligence, 17–22. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-08020-3_3.

Full text

Abstract:

AbstractThe research field of AI is concerned with devising theories, methods, and workflows for producing software artifacts which behave as intelligent subjects. Evidently, intelligence, as the property of an agent, is not of necessity inherited from the methods used to construct it: that a car has been assembled by robots does not make it a robot. Unfortunately, even this obvious distinction can sometimes be erased in some prominent published work. To wit: the statement, “an agent that performs sufficiently well on a sufficiently wide range of tasks is classified as intelligent” was recently published by DeepMind [273] to give context to a paper claiming to have developed “the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games” [14]. This invites the inference that the range of the tasks (57 games) that have been achieved warrants calling the advertised agent ‘intelligent’. However, careful reading of the paper reveals that the authors have in fact developed 57 different agents. Granted, this was achieved using the same development method and system architecture, but 57 agents were nonetheless trained, rather than the claimed single agent. Here is a prime example of distilled confusion: a property (applicability to 57 tasks) of one construction method (instantiating the Agent57 system architecture) has just been ‘magically’ transferred to some 57 artifacts produced by the method.

APA, Harvard, Vancouver, ISO, and other styles

Jamal, Mansoor, Zaib Ullah, and Musarat Abbas. "Self-Adapted Resource Allocation in V2X Communication." In Workshop Proceedings of the 19th International Conference on Intelligent Environments (IE2023). IOS Press, 2023. http://dx.doi.org/10.3233/aise230018.

Full text

Abstract:

The intelligent transportation system (ITS) along with vehicular communications are making our daily life safer and easier e.g., saving time, traffic control, safe driving, etc. Many transmission mode selection and resource allocation schemes are mitigating to full the quality of service requirements with minimum latency and negligible interference. For message transmission to far away vehicles realistic cellular link and mode selection is a bottleneck, however, for nearby devices, the safety critical information needs V2V links. Reinforcement learning (RL) and Deep learning (DL) have reshaped vehicular communication in a new model where vehicles act like human beings and take their decisions autonomously without human intervention. In our work, we investigate Vehicle-to-vehicle (V2V) and Vehicle-to-infrastructure where each link takes a decision to find the optimal subband and power level. We investigated the case where each V2V link connected with Vehicle-to-Infrastructure (V2I) satisfying stringent latency constraints while minimizing interference. We exploit RL with DL to develop highly intelligent performance where agents effectively learn to select mode and share spectrum with V2V and V2I.

APA, Harvard, Vancouver, ISO, and other styles

Jara, Claudia, Débora Buendía, Alvaro Ardiles, Pablo Muñoz, and Cheril Tapia-Rojas. "Transcranial Red LED Therapy: A Promising Non-Invasive Treatment to Prevent Age-Related Hippocampal Memory Impairment." In Hippocampus - Cytoarchitecture and Diseases. IntechOpen, 2022. http://dx.doi.org/10.5772/intechopen.100620.

Full text

Abstract:

The hippocampus is an integral portion of the limbic system and executes a critical role in spatial and recognition learning, memory encoding, and memory consolidation. Hippocampal aging showed neurobiological alterations, including increased oxidative stress, altered intracellular signaling pathways, synaptic impairment, and organelle deterioration such as mitochondrial dysfunction. These alterations lead to hippocampal cognitive decline during aging. Therefore, the search for new non-invasive therapies focused on preserving or attenuating age-related hippocampal memory impairment could have of great impact on aging, considering the increasing life expectancy in the world. Red light Transcranial LED therapy (RL-TCLT) is a promising but little explored strategy, which involves red light LED irradiation without surgical procedures, safe and at a low cost. Nevertheless, the precise mechanism involved and its real impact on age-related cognitive impairment is unclear, due to differences in protocol, wavelength applied, and time. Therefore, in this chapter, we will discuss the evidence about RL-TCLT and its effects on the hippocampal structure and function, and how this therapy could be used as a promising treatment for memory loss during aging and in age-related diseases such as Alzheimer’s Disease (AD). Finally, we will mention our advances in Red 630-light-Transcranial LED therapy on the hippocampus in aging and AD.

APA, Harvard, Vancouver, ISO, and other styles

Jiang, Haoge, Xudong Jiang, Kong-Wah Wan, and Han Wang. "Deep Reinforcement Learning Based Crowd Navigation via Feature Aggregation from Graph Convolutional Networks." In Advances in Transdisciplinary Engineering. IOS Press, 2023. http://dx.doi.org/10.3233/atde230066.

Full text

Abstract:

In this paper, we use the graph convolutional network (GCN) for feature aggregation. Our approach, termed as GCN-RL, can directly deploy on a holonomic mobile robot without any tuning. We first use GCN to extract the hidden features among the robot and humans. These extracted features that represent the spatial relationships and agents-agents interactions are then fed into the actor-critic learning framework. Finally, the deep RL network is optimized based on the aggregated features from GCN and the actor-critic framework. The GCN-RL enables a safer and more efficient navigation policy than the other RL navigation methods. The experiment results show that the proposed learning approach significantly outperforms ORCA and other RL navigation methods.

APA, Harvard, Vancouver, ISO, and other styles

Melo-Pfeifer, Sílvia. "Translanguaging in Multilingual Chat Interaction." In Advances in Educational Technologies and Instructional Design, 188–207. IGI Global, 2016. http://dx.doi.org/10.4018/978-1-5225-0177-0.ch009.

Full text

Abstract:

In this contribution, intercomprehension between Romance Languages (RL) will be analyzed as a particular setting of multilingual interaction in the globalized and digital world. Intercomprehension is a multilingual practice where interlocutors collaboratively achieve meaning through the use of typologically related languages and other semiotic resources, exploiting the similarities existing across languages and the opportunities of transfer they offer. The communicative contract underlying this particular typology of multilingual interaction stresses that each interlocutor should master at least one RL and use it productively and, at the same time, try to understand the RL of the other speakers. Through the analysis of multilingual exchanges in chat-rooms of the platform Galanet, the need to take a more open stance towards the communicative contract will be evinced. Particularly, three behaviors related to the breakdown of the communicative contract – and respective consequences – will be critically analyzed: the use of a taboo language (English), the use of other linguistic resources not included in the contract and the production of utterances in target languages. These communicative behaviors will justify the need to enrich the understanding of intercomprehension by adopting a translingual lens and, thus, by abandoning a still prevalent monoglossic orientation in research dealing with this multilingual communicative context.

APA, Harvard, Vancouver, ISO, and other styles

Hai-Jew, Shalin. "Modeling the Relationship between a Human and a Malicious Artificial Intelligence, Natural-Language ’Bot in an Immersive Virtual World." In Digital Democracy and the Impact of Technology on Governance and Politics, 287–306. IGI Global, 2013. http://dx.doi.org/10.4018/978-1-4666-3637-8.ch016.

Full text

Abstract:

People go to virtual immersive spaces online to socialize through their human-embodied avatars. Through the “passing stranger” phenomenon, many make fast relationships and share intimate information with the idea that they will not deal with the individual again. Others, though, pursue longer-term relationships from the virtual into Real Life (RL). Many do not realize that they are interacting with artificial intelligence ’bots with natural language capabilities. This chapter models some implications of malicious AI natural language ’bots in immersive virtual worlds (as socio-technical spaces). For simplicity, this is referred to as a one-on-one, but that is not to assume that various combinations of malicious ’bots or those that are occasionally human-embodied may not be deployed for the same deceptive purposes.

APA, Harvard, Vancouver, ISO, and other styles

Iachello, F., and R. D. Levine. "Four-Body Algebraic Theory." In Algebraic Theory of Molecules. Oxford University Press, 1995. http://dx.doi.org/10.1093/oso/9780195080919.003.0008.

Full text

Abstract:

In tetratomic molecules, there are three independent vector coordinates, rl, r2, and r3, which we can think of as three bonds. The general algebraic theory tells us that a quantization of these coordinates (and associated momenta) leads to the algebra . . .G = U1(4) ⊗ U2(4) ⊗ U3(4). . . . . . .(5.1). . . As in the previous case of two bonds, discussed in Chapter 4, we introduce boson operators for each bond . . .σ †1, π†1μ , μ = 0, ±1 ,. . . . . .σ †2, π†2μ , μ = 0, ±1 ,. . . . . .σ †3, π†3μ , μ = 0, ±1 ,. . . . . .(5.2). . . together with the corresponding annihilation operators σ1, π1μ, σ2, π2μ, σ3, π3μ. The elements of the algebras Ui(4) are the same as in Table 2.1, except that a bond index i = 1, 2, 3 is attached to them.

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Safe RL"

Yang, Wen-Chi, Giuseppe Marra, Gavin Rens, and Luc De Raedt. "Safe Reinforcement Learning via Probabilistic Logic Shields." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/637.

Full text

Abstract:

Safe Reinforcement learning (Safe RL) aims at learning optimal policies while staying safe. A popular solution to Safe RL is shielding, which uses a logical safety specification to prevent an RL agent from taking unsafe actions. However, traditional shielding techniques are difficult to integrate with continuous, end-to-end deep RL methods. To this end, we introduce Probabilistic Logic Policy Gradient (PLPG). PLPG is a model-based Safe RL technique that uses probabilistic logic programming to model logical safety constraints as differentiable functions. Therefore, PLPG can be seamlessly applied to any policy gradient algorithm while still providing the same convergence guarantees. In our experiments, we show that PLPG learns safer and more rewarding policies compared to other state-of-the-art shielding techniques.

APA, Harvard, Vancouver, ISO, and other styles

Rahman, Md Asifur, Tongtong Liu, and Sarra Alqahtani. "Adversarial Behavior Exclusion for Safe Reinforcement Learning." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/54.

Full text

Abstract:

Learning by exploration makes reinforcement learning (RL) potentially attractive for many real-world applications. However, this learning process makes RL inherently too vulnerable to be used in real-world applications where safety is of utmost importance. Most prior studies consider exploration at odds with safety and thereby restrict it using either joint optimization of task and safety or imposing constraints for safe exploration. This paper migrates from the current convention to using exploration as a key to safety by learning safety as a robust behavior that completely excludes any behavioral pattern responsible for safety violations. Adversarial Behavior Exclusion for Safe RL (AdvEx-RL) learns a behavioral representation of the agent's safety violations by approximating an optimal adversary utilizing exploration and later uses this representation to learn a separate safety policy that excludes those unsafe behaviors. In addition, AdvEx-RL ensures safety in a task-agnostic manner by acting as a safety firewall and therefore can be integrated with any RL task policy. We demonstrate the robustness of AdvEx-RL via comprehensive experiments in standard constrained Markov decision processes (CMDP) environments under 2 white-box action space perturbations as well as with changes in environment dynamics against 7 baselines. Consistently, AdvEx-RL outperforms the baselines by achieving an average safety performance of over 75% in the continuous action space with 10 times more variations in the testing environment dynamics. By using a standalone safety policy independent of conflicting objectives, AdvEx-RL also paves the way for interpretable safety behavior analysis as we show in our user study.

APA, Harvard, Vancouver, ISO, and other styles

Simão, Thiago D. "Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/919.

Full text

Abstract:

Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi'. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi' is better than the performance of pi given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.

APA, Harvard, Vancouver, ISO, and other styles

Zhao, Weiye, Tairan He, Rui Chen, Tianhao Wei, and Changliu Liu. "State-wise Safe Reinforcement Learning: A Survey." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/763.

Full text

Abstract:

Despite the tremendous success of Reinforcement Learning (RL) algorithms in simulation environments, applying RL to real-world applications still faces many challenges. A major concern is safety, in another word, constraint satisfaction. State-wise constraints are one of the most common constraints in real-world applications and one of the most challenging constraints in Safe RL. Enforcing state-wise constraints is necessary and essential to many challenging tasks such as autonomous driving, robot manipulation. This paper provides a comprehensive review of existing approaches that address state-wise constraints in RL. Under the framework of State-wise Constrained Markov Decision Process (SCMDP), we will discuss the connections, differences, and trade-offs of existing approaches in terms of (i) safety guarantee and scalability, (ii) safety and reward performance, and (iii) safety after convergence and during training. We also summarize limitations of current methods and discuss potential future directions.

APA, Harvard, Vancouver, ISO, and other styles

Bektas, Kemal, and H. Isil Bozma. "APF-RL: Safe Mapless Navigation in Unknown Environments." In 2022 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2022. http://dx.doi.org/10.1109/icra46639.2022.9811537.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Perepu, Satheesh K., and M. Saravanan. "Optimize Next State Prediction in Safe RL for 5G Ecosystem." In 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS). IEEE, 2022. http://dx.doi.org/10.1109/comsnets53615.2022.9668344.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Houston, Vern L., Carl P. Mason, Luigi Arena, Gangming Luo, Aaron C. Beattie, MaryAnne Garbarini, and Chaiya Thongpop. "Experimental Assessment and FEA Prediction of the Effects of Prosthetic Socket Geometry on Transtibial Amputee Residual Limb Circulation." In ASME 2001 International Mechanical Engineering Congress and Exposition. American Society of Mechanical Engineers, 2001. http://dx.doi.org/10.1115/imece2001/bed-23093.

Full text

Abstract:

Abstract Eighty plus percent of all lower limb amputations performed in the United States each year result from complications of peripheral vascular disease (PVD) [1]. PVD amputees’ vascular systems are significantly compromised and can withstand little extraneous insult. Clinical studies have shown that ill-fitting prosthetic sockets, and/or excessive prosthetics loads can traumatize amputees’ residual limb (RL) tissues [2]. Unfortunately, there are no decisive clinical measures currently available that can be used to ensure prosthesis geometry and applied prosthetics loads are safe and will not compromise limb circulation and cause tissue injury. Minus egregious symptoms of ischemic pain and tissue trauma, it is commonly assumed that the tissue stresses and strains caused by differences between RL and prosthetic socket geometries, and from prosthetics loads incurred during stance and gait, have minimal effect upon RL tissue circulation and health. The objective of this study was to investigate the validity of this assumption by quantitatively measuring the effects prosthetic socket design geometry and applied loads have on transtibial amputee RL tissue circulation, and to determine if tissue circulation sensitivities to variations in socket design were sufficient to enable optimization of prosthesis design parameters.

APA, Harvard, Vancouver, ISO, and other styles

Hopkins, Greg. "Design Qualification and Manufacturing of RTP-1 Tanks and Vessels." In ASME 2002 Pressure Vessels and Piping Conference. ASMEDC, 2002. http://dx.doi.org/10.1115/pvp2002-1250.

Full text

Abstract:

The ASME accredits RL Industries to fabricate both Section X, Class II and RTP-1 tanks and vessels, one of only two manufacturers in the world to hold this distinction. The design, quality assurance, inspection and testing requirements for the two stamps are similar, but not identical. The same manufacturing processes can be used to produce equipment to either standard, but the record keeping required by the ASME is somewhat different. Similarly, the design methods are not identical. This presentation describes the similarities and differences. Both standards are much newer than the other Sections of the Boiler and Pressure Vessel Code, and RL Industries pioneered in their introduction. The manufacturing, quality assurance, testing and inspection of FRP to the standards will be described. Vessels with a Section X or RTP-1 stamp are as safe and reliable as equipment built to other Sections of the Code.

APA, Harvard, Vancouver, ISO, and other styles

Altmann, Philipp, Fabian Ritz, Leonard Feuchtinger, Jonas Nüßlein, Claudia Linnhoff-Popien, and Thomy Phan. "CROP: Towards Distributional-Shift Robust Reinforcement Learning Using Compact Reshaped Observation Processing." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/380.

Full text

Abstract:

The safe application of reinforcement learning (RL) requires generalization from limited training data to unseen scenarios. Yet, fulfilling tasks under changing circumstances is a key challenge in RL. Current state-of-the-art approaches for generalization apply data augmentation techniques to increase the diversity of training data. Even though this prevents overfitting to the training environment(s), it hinders policy optimization. Crafting a suitable observation, only containing crucial information, has been shown to be a challenging task itself. To improve data efficiency and generalization capabilities, we propose Compact Reshaped Observation Processing (CROP) to reduce the state information used for policy optimization. By providing only relevant information, overfitting to a specific training layout is precluded and generalization to unseen environments is improved. We formulate three CROPs that can be applied to fully observable observation- and action-spaces and provide methodical foundation. We empirically show the improvements of CROP in a distributionally shifted safety gridworld. We furthermore provide benchmark comparisons to full observability and data-augmentation in two different-sized procedurally generated mazes.

APA, Harvard, Vancouver, ISO, and other styles

Hutter, Marcus, Samuel Yang-Zhao, and Sultan Javed Majeed. "Conditions on Features for Temporal Difference-Like Methods to Converge." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/357.

Full text

Abstract:

The convergence of many reinforcement learning (RL) algorithms with linear function approximation has been investigated extensively but most proofs assume that these methods converge to a unique solution. In this paper, we provide a complete characterization of non-uniqueness issues for a large class of reinforcement learning algorithms, simultaneously unifying many counter-examples to convergence in a theoretical framework. We achieve this by proving a new condition on features that can determine whether the convergence assumptions are valid or non-uniqueness holds. We consider a general class of RL methods, which we call natural algorithms, whose solutions are characterized as the fixed point of a projected Bellman equation. Our main result proves that natural algorithms converge to the correct solution if and only if all the value functions in the approximation space satisfy a certain shape. This implies that natural algorithms are, in general, inherently prone to converge to the wrong solution for most feature choices even if the value function can be represented exactly. Given our results, we show that state aggregation-based features are a safe choice for natural algorithms and also provide a condition for finding convergent algorithms under other feature constructions.

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Safe RL"

Kiefner and Vieth. L51688B A Modified Criterion for Evaluating the Remaining Strength of Corroded Pipe. Chantilly, Virginia: Pipeline Research Council International, Inc. (PRCI), December 1989. http://dx.doi.org/10.55274/r0011347.

Full text

Abstract:

The RSTRENG software program and users manual were developed to enhance the usability and options available in the original program RSTRENG. RSTRENG was developed as part of AGA PR-3-805, "A Modified Criterion for Evaluating the Remaining Strength of Corroded Pipe." RSTRENG can be used by pipeline operators to evaluate the remaining strength of corroded pipe. The required user inputs are pipe diameter, wall thickness, grade of the pipe material, and a series of pit depth and length measurements. The new version of the RSTRENG software actually produces the same results as the first version; the improvements serve to make the software more versatile and user-friendly. Includes soft-cover manual and CD. It also address some problems encountered in its use (principally through misunderstanding or misuse). There will be flags alerting the user that he is attempting to extend the RSTRENG equation beyond its validation points and that the results may be non-conservative in a given application. A new version of the software was released in August 1999. (Includes: L51688, RSTRENG 3.0 Users Manual and Software) (https://prci.quickbase.com/db/bc5ycaqrp?a=dr and r=hh3 and rl=bf8j)

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Safe RL'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Safe RL"

Dissertations / Theses on the topic "Safe RL"

Book chapters on the topic "Safe RL"

Conference papers on the topic "Safe RL"

Reports on the topic "Safe RL"