Academic literature on the topic 'Factored reinforcement learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Factored reinforcement learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Factored reinforcement learning"

1

Wu, Bo, Yan Peng Feng, and Hong Yan Zheng. "A Model-Based Factored Bayesian Reinforcement Learning Approach." Applied Mechanics and Materials 513-517 (February 2014): 1092–95. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.1092.

Full text
Abstract:
Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown struc
APA, Harvard, Vancouver, ISO, and other styles
2

Li, Chao, Yupeng Zhang, Jianqi Wang, et al. "Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (2024): 17453–60. http://dx.doi.org/10.1609/aaai.v38i16.29694.

Full text
Abstract:
In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address thi
APA, Harvard, Vancouver, ISO, and other styles
3

Kveton, Branislav, and Georgios Theocharous. "Structured Kernel-Based Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (2013): 569–75. http://dx.doi.org/10.1609/aaai.v27i1.8669.

Full text
Abstract:
Kernel-based reinforcement learning (KBRL) is a popular approach to learning non-parametric value function approximations. In this paper, we present structured KBRL, a paradigm for kernel-based RL that allows for modeling independencies in the transition and reward models of problems. Real-world problems often exhibit this structure and can be solved more efficiently when it is modeled. We make three contributions. First, we motivate our work, define a structured backup operator, and prove that it is a contraction. Second, we show how to evaluate our operator efficiently. Our analysis reveals
APA, Harvard, Vancouver, ISO, and other styles
4

Simão, Thiago D., and Matthijs T. J. Spaan. "Safe Policy Improvement with Baseline Bootstrapping in Factored Environments." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 4967–74. http://dx.doi.org/10.1609/aaai.v33i01.33014967.

Full text
Abstract:
We present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent’s behavior. Factored reinforcement learning, on the other hand, is kn
APA, Harvard, Vancouver, ISO, and other styles
5

Truong, Van Binh, and Long Bao Le. "Electric vehicle charging design: The factored action based reinforcement learning approach." Applied Energy 359 (April 2024): 122737. http://dx.doi.org/10.1016/j.apenergy.2024.122737.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

SIMM, Jaak, Masashi SUGIYAMA, and Hirotaka HACHIYA. "Multi-Task Approach to Reinforcement Learning for Factored-State Markov Decision Problems." IEICE Transactions on Information and Systems E95.D, no. 10 (2012): 2426–37. http://dx.doi.org/10.1587/transinf.e95.d.2426.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Du, Juan, Anshuang Yu, Hao Zhou, Qianli Jiang, and Xueying Bai. "Research on Integrated Control Strategy for Highway Merging Bottlenecks Based on Collaborative Multi-Agent Reinforcement Learning." Applied Sciences 15, no. 2 (2025): 836. https://doi.org/10.3390/app15020836.

Full text
Abstract:
The merging behavior of vehicles at entry ramps and the speed differences between ramps and mainline traffic cause merging traffic bottlenecks. Current research, primarily focusing on single traffic control strategies, fails to achieve the desired outcomes. To address this issue, this paper explores an integrated control strategy combining Variable Speed Limits (VSL) and Lane Change Control (LCC) to optimize traffic efficiency in ramp merging areas. For scenarios involving multiple ramp merges, a multi-agent reinforcement learning approach is introduced to optimize control strategies in these
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Zizhao, Caroline Wang, Xuesu Xiao, Yuke Zhu, and Peter Stone. "Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (2024): 15778–86. http://dx.doi.org/10.1609/aaai.v38i14.29507.

Full text
Abstract:
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and imp
APA, Harvard, Vancouver, ISO, and other styles
9

Mohamad Hafiz Abu Bakar, Abu Ubaidah bin Shamsudin, Ruzairi Abdul Rahim, Zubair Adil Soomro, and Andi Adrianshah. "Comparison Method Q-Learning and SARSA for Simulation of Drone Controller using Reinforcement Learning." Journal of Advanced Research in Applied Sciences and Engineering Technology 30, no. 3 (2023): 69–78. http://dx.doi.org/10.37934/araset.30.3.6978.

Full text
Abstract:
Nowadays, the advancement of drones is also factored in the development of a world surrounded by technologies. One of the aspects emphasized here is the difficulty of controlling the drone, and the system developed is still under full control by the users as well. Reinforcement Learning is used to enable the system to operate automatically, thus drone will learn the next movement based on the interaction between the agent and the environment. Through this study, Q-Learning and State-Action-Reward-State-Action (SARSA) are used in this study and the comparison of results involving both the perfo
APA, Harvard, Vancouver, ISO, and other styles
10

Kong, Minseok, and Jungmin So. "Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning." Applied Sciences 13, no. 1 (2023): 633. http://dx.doi.org/10.3390/app13010633.

Full text
Abstract:
There are several automated stock trading programs using reinforcement learning, one of which is an ensemble strategy. The main idea of the ensemble strategy is to train DRL agents and make an ensemble with three different actor–critic algorithms: Advantage Actor–Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). This novel idea was the concept mainly used in this paper. However, we did not stop there, but we refined the automated stock trading in two areas. First, we made another DRL-based ensemble and employed it as a new trading agent. We named
APA, Harvard, Vancouver, ISO, and other styles
More sources

Dissertations / Theses on the topic "Factored reinforcement learning"

1

Kozlova, Olga. "Hierarchical and factored reinforcement learning." Paris 6, 2010. http://www.theses.fr/2010PA066196.

Full text
Abstract:
Les méthodes d'apprentissage par renforcement factorisé et hiérarchique (HFRL) sont basées sur le formalisme des processus de décision markoviens factorisées (FMDP) et les MDP hiérarchiques (HMDP). Dans cette thèse, nous proposons une méthode de HFRL qui utilise les approches d’apprentissage par renforcement indirect et le formalisme des options pour résoudre les problèmes de prise de décision dans les environnements dynamiques sans connaissance a priori de la structure du problème. Dans la première contribution de cette thèse, nous montrons comment modéliser les problèmes où certaines combina
APA, Harvard, Vancouver, ISO, and other styles
2

Tournaire, Thomas. "Model-based reinforcement learning for dynamic resource allocation in cloud environments." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS004.

Full text
Abstract:
L'émergence de nouvelles technologies nécessite une allocation efficace des ressources pour satisfaire la demande. Cependant, ces nouveaux besoins nécessitent une puissance de calcul élevée impliquant une plus grande consommation d'énergie notamment dans les infrastructures cloud et data centers. Il est donc essentiel de trouver de nouvelles solutions qui peuvent satisfaire ces besoins tout en réduisant la consommation d'énergie des ressources. Dans cette thèse, nous proposons et comparons de nouvelles solutions d'IA (apprentissage par renforcement RL) pour orchestrer les ressources virtuelles
APA, Harvard, Vancouver, ISO, and other styles
3

Magnan, Jean-Christophe. "Représentations graphiques de fonctions et processus décisionnels Markoviens factorisés." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066042/document.

Full text
Abstract:
En planification théorique de la décision, le cadre des Processus Décisionnels Markoviens Factorisés (Factored Markov Decision Process, FMDP) a produit des algorithmes efficaces de résolution des problèmes de décisions séquentielles dans l'incertain. L'efficacité de ces algorithmes repose sur des structures de données telles que les Arbres de Décision ou les Diagrammes de Décision Algébriques (ADDs). Ces techniques de planification sont utilisées en Apprentissage par Renforcement par l'architecture SDYNA afin de résoudre des problèmes inconnus de grandes tailles. Toutefois, l'état-de-l'art des
APA, Harvard, Vancouver, ISO, and other styles
4

Magnan, Jean-Christophe. "Représentations graphiques de fonctions et processus décisionnels Markoviens factorisés." Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066042.

Full text
Abstract:
En planification théorique de la décision, le cadre des Processus Décisionnels Markoviens Factorisés (Factored Markov Decision Process, FMDP) a produit des algorithmes efficaces de résolution des problèmes de décisions séquentielles dans l'incertain. L'efficacité de ces algorithmes repose sur des structures de données telles que les Arbres de Décision ou les Diagrammes de Décision Algébriques (ADDs). Ces techniques de planification sont utilisées en Apprentissage par Renforcement par l'architecture SDYNA afin de résoudre des problèmes inconnus de grandes tailles. Toutefois, l'état-de-l'art des
APA, Harvard, Vancouver, ISO, and other styles
5

Heron, Michael James. "The ACCESS Framework : reinforcement learning for accessibility and cognitive support for older adults." Thesis, University of Dundee, 2011. https://discovery.dundee.ac.uk/en/studentTheses/0952d5ff-7a23-4c29-b050-fd799035652c.

Full text
Abstract:
This dissertation focuses on the ACCESS Framework which is an open source software framework designed to address four issues with regards to older and novice users with accessibility needs – that they often do not know what support is available within their systems, that they often do not know how to change those settings they know exist, that they often lack the confidence to make the changes they know how to make, and are often unable to physically enable accessibility support.The software discussed in this dissertation serves as a bridge between what users are expected to know and what they
APA, Harvard, Vancouver, ISO, and other styles
6

Al-Safi, Abdullah Taha. "Social reinforcement and risk-taking factors to enhance creativity in Saudi Arabian school children." Thesis, Cardiff University, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.296226.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Abed-Alguni, Bilal Hashem Kalil. "Cooperative reinforcement learning for independent learners." Thesis, 2014. http://hdl.handle.net/1959.13/1052917.

Full text
Abstract:
Research Doctorate - Doctor of Philosophy (PhD)<br>Machine learning in multi-agent domains poses several research challenges. One challenge is how to model cooperation between reinforcement learners. Cooperation between independent reinforcement learners is known to accelerate convergence to optimal solutions. In large state space problems, independent reinforcement learners normally cooperate to accelerate the learning process using decomposition techniques or knowledge sharing strategies. This thesis presents two techniques to multi-agent reinforcement learning and a comparison study. The fi
APA, Harvard, Vancouver, ISO, and other styles
8

Baker, Travis Edward. "Genetics, drugs, and cognitive control: uncovering individual differences in substance dependence." Thesis, 2012. http://hdl.handle.net/1828/4265.

Full text
Abstract:
Why is it that only some people who use drugs actually become addicted? In fact, addiction depends on a complicated process involving a confluence of risk factors related to biology, cognition, behaviour, and personality. Notably, all addictive drugs act on a neural system for reinforcement learning called the midbrain dopamine system, which projects to and regulates the brain's system for cognitive control, called frontal cortex and basal ganglia. Further, the development and expression of the dopamine system is determined in part by genetic factors that vary across individuals such that dopa
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Factored reinforcement learning"

1

Sallans, Brian. Reinforcement learning for factored Markov decision processes. 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Factored reinforcement learning"

1

Sigaud, Olivier, Martin V. Butz, Olga Kozlova, and Christophe Meyer. "Anticipatory Learning Classifier Systems and Factored Reinforcement Learning." In Anticipatory Behavior in Adaptive Learning Systems. Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-02565-5_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kozlova, Olga, Olivier Sigaud, and Christophe Meyer. "TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs." In From Animals to Animats 11. Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-15193-4_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Kozlova, Olga, Olivier Sigaud, Pierre-Henri Wuillemin, and Christophe Meyer. "Considering Unseen States as Impossible in Factored Reinforcement Learning." In Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-04180-8_64.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Degris, Thomas, Olivier Sigaud, and Pierre-Henri Wuillemin. "Exploiting Additive Structure in Factored MDPs for Reinforcement Learning." In Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89722-4_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Coqueret, Guillaume, and Tony Guida. "Reinforcement learning." In Machine Learning for Factor Investing. Chapman and Hall/CRC, 2023. http://dx.doi.org/10.1201/9781003121596-20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Guanchao, Wang, Huo Yuchong, Li Qun, and Li Qiang. "Maximizing Wind Farm Power Capture Based on Deep Reinforcement Learning." In Lecture Notes in Electrical Engineering. Springer Nature Singapore, 2025. https://doi.org/10.1007/978-981-96-4856-6_19.

Full text
Abstract:
Abstract The power capture capability of wind farms is often constrained by various factors. To maximize the power output of wind farms and address the wake effects and random wind speeds, this paper proposes a control scheme for wind farms based on deep reinforcement learning, integrating both model-based and model-free methods within a TD3 network framed by Actor-Critic architecture. This study improves the Jensen wake model by enhancing its accuracy through the consideration of time delays. Delay sensitivity is introduced as a factor in deep reinforcement learning, allowing for the optimiza
APA, Harvard, Vancouver, ISO, and other styles
7

Klar, M., J. Mertes, M. Glatt, B. Ravani, and J. C. Aurich. "A Holistic Framework for Factory Planning Using Reinforcement Learning." In Proceedings of the 3rd Conference on Physical Modeling for Virtual Manufacturing Systems and Processes. Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-35779-4_8.

Full text
Abstract:
AbstractThe generation of an optimized factory layout is a central element of the factory planning process. The generated factory layout predefines multiple characteristics of the future factory, such as the operational costs and proper resource allocations. However, manual layout planning is often time and resource-consuming and involves creative processes. In order to reduce the manual planning effort, automated, computer-aided planning approaches can support the factory planner to deal with this complexity by generating valuable solutions in the early phase of factory layout planning. Novel
APA, Harvard, Vancouver, ISO, and other styles
8

Wu, Tingyao, and Werner Van Leekwijck. "Factor Selection for Reinforcement Learning in HTTP Adaptive Streaming." In MultiMedia Modeling. Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-04114-8_47.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Alur, Rajeev, Osbert Bastani, Kishor Jothimurugan, Mateo Perez, Fabio Somenzi, and Ashutosh Trivedi. "Policy Synthesis and Reinforcement Learning for Discounted LTL." In Computer Aided Verification. Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37706-8_21.

Full text
Abstract:
AbstractThe difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL). However, LTL has the downside that it is sensitive to small perturbations in the transition probabilities, which prevents probably approximately correct (PAC) learning without additional assumptions. Time discounting provides a way of removing this sensitivity, while retaining the high expressivity of the logic. We study the use of discounted LTL for policy synthesis in Markov decision processes with unknown transition
APA, Harvard, Vancouver, ISO, and other styles
10

Hammler, Patric, Nicolas Riesterer, Gang Mu, and Torsten Braun. "Multi-Echelon Inventory Optimization Using Deep Reinforcement Learning." In Quantitative Models in Life Science Business. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-11814-2_5.

Full text
Abstract:
AbstractIn this chapter, we provide an overview of inventory management within the pharmaceutical industry and how to model and optimize it. Inventory management is a highly relevant topic, as it causes high costs such as holding, shortage, and reordering costs. Especially the event of a stock-out can cause damage that goes beyond monetary damage in the form of lost sales. To minimize those costs is the task of an optimized reorder policy. A reorder policy is optimal when it minimizes the accumulated cost in every situation. However, finding an optimal policy is not trivial. First, the problem
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Factored reinforcement learning"

1

Parzeller, Rafael, Elisa Schuster, Axel Busboom, and Detlef Gerhard. "Assembly Sequence Planning by Reinforcement Learning and Accessibility Checking using RRT*." In 2024 IEEE 29th International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 2024. http://dx.doi.org/10.1109/etfa61755.2024.10710703.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Batta, Mohamed Sofiane, Alexandre Moral, and Rahim Kacimi. "Dynamic Spreading Factor and Power Allocation in LoRaWAN Networks Using Reinforcement Learning." In 2025 International Wireless Communications and Mobile Computing (IWCMC). IEEE, 2025. https://doi.org/10.1109/iwcmc65282.2025.11059434.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zebenholzer, Moritz, Lukas Kasper, Alexander Schirrer, and Ren� Hofmann. "Optimal Energy Scheduling for Battery and Hydrogen Storage Systems Using Reinforcement Learning." In The 35th European Symposium on Computer Aided Process Engineering. PSE Press, 2025. https://doi.org/10.69997/sct.134052.

Full text
Abstract:
Optimal energy scheduling for sector-coupled multi-energy systems is becoming increasingly important as renewable energies such as wind and photovoltaics continue to expand. They are very volatile and difficult to predict. This creates a deviation between generation and demand that can be compensated for by energy storage technologies. For these, rule-based control is well established in industry, and mixed-integer model predictive control (MPC) is an area of research that promises the best results, usually regarding minimal costs. Drawbacks of MPC include the need for an adequate system model
APA, Harvard, Vancouver, ISO, and other styles
4

Pawlak, Iga, Hamid Reza Feyzmahdavian, and Soroush Rastegarpour. "Safe Reinforcement Learning for Level Control of Nonlinear Spherical Tank with Actuator Delays." In 2024 IEEE 29th International Conference on Emerging Technologies and Factory Automation (ETFA). IEEE, 2024. http://dx.doi.org/10.1109/etfa61755.2024.10710948.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Al-Sakkari, Eslam G., Ahmed Ragab, Mohamed Ali, Hanane Dagdougui, Daria C. Boffito, and Mouloud Amazouz. "Learn-To-Design: Reinforcement Learning-Assisted Chemical Process Optimization." In Foundations of Computer-Aided Process Design. PSE Press, 2024. http://dx.doi.org/10.69997/sct.103483.

Full text
Abstract:
This paper proposes an AI-assisted approach aimed at accelerating chemical process design through causal incremental reinforcement learning (CIRL) where an intelligent agent is interacting iteratively with a process simulation environment (e.g., Aspen HYSYS, DWSIM, etc.). The proposed approach is based on an incremental learnable optimizer capable of guiding multi-objective optimization towards optimal design variable configurations, depending on several factors including the problem complexity, selected RL algorithm and hyperparameters tuning. One advantage of this approach is that the agent-
APA, Harvard, Vancouver, ISO, and other styles
6

Strehl, Alexander L. "Model-Based Reinforcement Learning in Factored-State MDPs." In 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning. IEEE, 2007. http://dx.doi.org/10.1109/adprl.2007.368176.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Sahin, Coskun, Erkin Cilden, and Faruk Polat. "Memory efficient factored abstraction for reinforcement learning." In 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF). IEEE, 2015. http://dx.doi.org/10.1109/cybconf.2015.7175900.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yao, Hengshuai, Csaba Szepesvari, Bernardo Avila Pires, and Xinhua Zhang. "Pseudo-MDPs and factored linear action models." In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2014. http://dx.doi.org/10.1109/adprl.2014.7010633.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wu, Bo, and Yanpeng Feng. "Monte-Carlo Bayesian Reinforcement Learning Using a Compact Factored Representation." In 2017 4th International Conference on Information Science and Control Engineering (ICISCE). IEEE, 2017. http://dx.doi.org/10.1109/icisce.2017.104.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Simão, Thiago D. "Safe and Sample-Efficient Reinforcement Learning Algorithms for Factored Environments." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/919.

Full text
Abstract:
Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi'. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi' is better than the performance of pi given D. To dev
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Factored reinforcement learning"

1

Rinaudo, Christina, William Leonard, Jaylen Hopson, Christopher Morey, Robert Hilborn, and Theresa Coumbe. Enabling understanding of artificial intelligence (AI) agent wargaming decisions through visualizations. Engineer Research and Development Center (U.S.), 2024. http://dx.doi.org/10.21079/11681/48418.

Full text
Abstract:
The process to develop options for military planning course of action (COA) development and analysis relies on human subject matter expertise. Analyzing COAs requires examining several factors and understanding complex interactions and dependencies associated with actions, reactions, proposed counteractions, and multiple reasonable outcomes. In Fiscal Year 2021, the Institute for Systems Engineering Research team completed efforts resulting in a wargaming maritime framework capable of training an artificial intelligence (AI) agent with deep reinforcement learning (DRL) techniques within a mari
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!