Journal articles: 'Reinforcement Learning Generalization'

1

Kwon, Sunggyu, and Kwang Y. Lee. "GENERALIZATION OF REINFORCEMENT LEARNING WITH CMAC." IFAC Proceedings Volumes 38, no. 1 (2005): 360–65. http://dx.doi.org/10.3182/20050703-6-cz-1902.01138.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Wu, Keyu, Min Wu, Zhenghua Chen, Yuecong Xu, and Xiaoli Li. "Generalizing Reinforcement Learning through Fusing Self-Supervised Learning into Intrinsic Motivation." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8683–90. http://dx.doi.org/10.1609/aaai.v36i8.20847.

Full text

Abstract:

Despite the great potential of reinforcement learning (RL) in solving complex decision-making problems, generalization remains one of its key challenges, leading to difficulty in deploying learned RL policies to new environments. In this paper, we propose to improve the generalization of RL algorithms through fusing Self-supervised learning into Intrinsic Motivation (SIM). Specifically, SIM boosts representation learning through driving the cross-correlation matrix between the embeddings of augmented and non-augmented samples close to the identity matrix. This aims to increase the similarity between the embedding vectors of a sample and its augmented version while minimizing the redundancy between the components of these vectors. Meanwhile, the redundancy reduction based self-supervised loss is converted to an intrinsic reward to further improve generalization in RL via an auxiliary objective. As a general paradigm, SIM can be implemented on top of any RL algorithm. Extensive evaluations have been performed on a diversity of tasks. Experimental results demonstrate that SIM consistently outperforms the state-of-the-art methods and exhibits superior generalization capability and sample efficiency.

APA, Harvard, Vancouver, ISO, and other styles

3

Wimmer, G. Elliott, Nathaniel D. Daw, and Daphna Shohamy. "Generalization of value in reinforcement learning by humans." European Journal of Neuroscience 35, no. 7 (April 2012): 1092–104. http://dx.doi.org/10.1111/j.1460-9568.2012.08017.x.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Hashemzadeh, Maryam, Reshad Hosseini, and Majid Nili Ahmadabadi. "Clustering subspace generalization to obtain faster reinforcement learning." Evolving Systems 11, no. 1 (July 4, 2019): 89–103. http://dx.doi.org/10.1007/s12530-019-09290-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Gershman, Samuel J., and Yael Niv. "Novelty and Inductive Generalization in Human Reinforcement Learning." Topics in Cognitive Science 7, no. 3 (March 23, 2015): 391–415. http://dx.doi.org/10.1111/tops.12138.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Matiisen, Tambet, Aqeel Labash, Daniel Majoral, Jaan Aru, and Raul Vicente. "Do Deep Reinforcement Learning Agents Model Intentions?" Stats 6, no. 1 (December 28, 2022): 50–66. http://dx.doi.org/10.3390/stats6010004.

Full text

Abstract:

Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each agent models or represents other agents in their environment remains unclear. In this work, we test whether deep reinforcement learning agents trained with the multi-agent deep deterministic policy gradient (MADDPG) algorithm explicitly represent other agents’ intentions (their specific aims or plans) during a task in which the agents have to coordinate the covering of different spots in a 2D environment. In particular, we tracked over time the performance of a linear decoder trained to predict the final targets of all agents from the hidden-layer activations of each agent’s neural network controller. We observed that the hidden layers of agents represented explicit information about other agents’ intentions, i.e., the target landmark the other agent ended up covering. We also performed a series of experiments in which some agents were replaced by others with fixed targets to test the levels of generalization of the trained agents. We noticed that during the training phase, the agents developed a preference for each landmark, which hindered generalization. To alleviate the above problem, we evaluated simple changes to the MADDPG training algorithm which lead to better generalization against unseen agents. Our method for confirming intention modeling in deep learning agents is simple to implement and can be used to improve the generalization of multi-agent systems in fields such as robotics, autonomous vehicles and smart cities.

APA, Harvard, Vancouver, ISO, and other styles

7

Fang, Qiang, Wenzhuo Zhang, and Xitong Wang. "Visual Navigation Using Inverse Reinforcement Learning and an Extreme Learning Machine." Electronics 10, no. 16 (August 18, 2021): 1997. http://dx.doi.org/10.3390/electronics10161997.

Full text

Abstract:

In this paper, we focus on the challenges of training efficiency, the designation of reward functions, and generalization in reinforcement learning for visual navigation and propose a regularized extreme learning machine-based inverse reinforcement learning approach (RELM-IRL) to improve the navigation performance. Our contributions are mainly three-fold: First, a framework combining extreme learning machine with inverse reinforcement learning is presented. This framework can improve the sample efficiency and obtain the reward function directly from the image information observed by the agent and improve the generation for the new target and the new environment. Second, the extreme learning machine is regularized by multi-response sparse regression and the leave-one-out method, which can further improve the generalization ability. Simulation experiments in the AI-THOR environment showed that the proposed approach outperformed previous end-to-end approaches, thus, demonstrating the effectiveness and efficiency of our approach.

APA, Harvard, Vancouver, ISO, and other styles

8

Hatcho, Yasuyo, Kiyohiko Hattori, and Keiki Takadama. "Time Horizon Generalization in Reinforcement Learning: Generalizing Multiple Q-Tables in Q-Learning Agents." Journal of Advanced Computational Intelligence and Intelligent Informatics 13, no. 6 (November 20, 2009): 667–74. http://dx.doi.org/10.20965/jaciii.2009.p0667.

Full text

Abstract:

This paper focuses on generalization in reinforcement learning from the time horizon viewpoint, exploring the method that generalizes multiple Q-tables in the multiagent reinforcement learning domain. For this purpose, we propose time horizon generalization for reinforcement learning, which consists of (1) Q-table selection method and (2) Q-table merge timing method, enabling agents to (1) select which Q-tables can be generalized from among many Q-tables and (2) determine when the selected Q-tables should be generalized. Intensive simulation on the bargaining game as sequential interaction game have revealed the following implications: (1) both Q-table selection and merging timing methods help replicate the subject experimental results without ad-hoc parameter setting; and (2) such replication succeeds by agents using the proposed methods with smaller numbers of Q-tables.

APA, Harvard, Vancouver, ISO, and other styles

9

Kaelbling, L. P., M. L. Littman, and A. W. Moore. "Reinforcement Learning: A Survey." Journal of Artificial Intelligence Research 4 (May 1, 1996): 237–85. http://dx.doi.org/10.1613/jair.301.

Full text

Abstract:

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.

APA, Harvard, Vancouver, ISO, and other styles

10

Kim, Minbeom, Kyeongha Rho, Yong-duk Kim, and Kyomin Jung. "Action-driven contrastive representation for reinforcement learning." PLOS ONE 17, no. 3 (March 18, 2022): e0265456. http://dx.doi.org/10.1371/journal.pone.0265456.

Full text

Abstract:

In reinforcement learning, reward-driven feature learning directly from high-dimensional images faces two challenges: sample-efficiency for solving control tasks and generalization to unseen observations. In prior works, these issues have been addressed through learning representation from pixel inputs. However, their representation faced the limitations of being vulnerable to the high diversity inherent in environments or not taking the characteristics for solving control tasks. To attenuate these phenomena, we propose the novel contrastive representation method, Action-Driven Auxiliary Task (ADAT), which forces a representation to concentrate on essential features for deciding actions and ignore control-irrelevant details. In the augmented state-action dictionary of ADAT, the agent learns representation to maximize agreement between observations sharing the same actions. The proposed method significantly outperforms model-free and model-based algorithms in the Atari and OpenAI ProcGen, widely used benchmarks for sample-efficiency and generalization.

APA, Harvard, Vancouver, ISO, and other styles

11

Matsushima, Hiroyasu, Kiyohiko Hattori, and Keiki Takadama. "Exemplar Generalization in Reinforcement Learning: Improving Performance with Fewer Exemplars." Journal of Advanced Computational Intelligence and Intelligent Informatics 13, no. 6 (November 20, 2009): 683–90. http://dx.doi.org/10.20965/jaciii.2009.p0683.

Full text

Abstract:

This paper focuses on the generalization of exemplars (i.e., good rules) in the reinforcement learning framework and proposes Exemplar Generalization in Reinforcement Learning (EGRL) that extracts usual exemplars from a lot of exemplars provided as a prior knowledge and generalizes them by deleting unnecessary exemplars (some exemplars overlap) as much as possible. Through intensive simulation of a simple cargo layout problem to validate EGRL effectiveness, the following implications have been revealed: (1) EGRL derives good performance with fewer exemplars than using the efficient numbers of exemplars and randomly selected exemplars and (2) integration of covering, deletion, and subsumption mechanisms in EGRL is critical for improving EGRL performance and generalization.

APA, Harvard, Vancouver, ISO, and other styles

12

Wang, Cong, Qifeng Zhang, Qiyan Tian, Shuo Li, Xiaohui Wang, David Lane, Yvan Petillot, and Sen Wang. "Learning Mobile Manipulation through Deep Reinforcement Learning." Sensors 20, no. 3 (February 10, 2020): 939. http://dx.doi.org/10.3390/s20030939.

Full text

Abstract:

Mobile manipulation has a broad range of applications in robotics. However, it is usually more challenging than fixed-base manipulation due to the complex coordination of a mobile base and a manipulator. Although recent works have demonstrated that deep reinforcement learning is a powerful technique for fixed-base manipulation tasks, most of them are not applicable to mobile manipulation. This paper investigates how to leverage deep reinforcement learning to tackle whole-body mobile manipulation tasks in unstructured environments using only on-board sensors. A novel mobile manipulation system which integrates the state-of-the-art deep reinforcement learning algorithms with visual perception is proposed. It has an efficient framework decoupling visual perception from the deep reinforcement learning control, which enables its generalization from simulation training to real-world testing. Extensive simulation and experiment results show that the proposed mobile manipulation system is able to grasp different types of objects autonomously in various simulation and real-world scenarios, verifying the effectiveness of the proposed mobile manipulation system.

APA, Harvard, Vancouver, ISO, and other styles

13

Williams, Arthur, and Joshua Phillips. "Transfer Reinforcement Learning Using Output-Gated Working Memory." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 02 (April 3, 2020): 1324–31. http://dx.doi.org/10.1609/aaai.v34i02.5488.

Full text

Abstract:

Transfer learning allows for knowledge to generalize across tasks, resulting in increased learning speed and/or performance. These tasks must have commonalities that allow for knowledge to be transferred. The main goal of transfer learning in the reinforcement learning domain is to train and learn on one or more source tasks in order to learn a target task that exhibits better performance than if transfer was not used (Taylor and Stone 2009). Furthermore, the use of output-gated neural network models of working memory has been shown to increase generalization for supervised learning tasks (Kriete and Noelle 2011; Kriete et al. 2013). We propose that working memory-based generalization plays a significant role in a model's ability to transfer knowledge successfully across tasks. Thus, we extended the Holographic Working Memory Toolkit (HWMtk) (Dubois and Phillips 2017; Phillips and Noelle 2005) to utilize the generalization benefits of output gating within a working memory system. Finally, the model's utility was tested on a temporally extended, partially observable 5x5 2D grid-world maze task that required the agent to learn 3 tasks over the duration of the training period. The results indicate that the addition of output gating increases the initial learning performance of an agent in target tasks and decreases the learning time required to reach a fixed performance threshold.

APA, Harvard, Vancouver, ISO, and other styles

14

Botteghi, N., B. Sirmacek, R. Schulte, M. Poel, and C. Brune. "REINFORCEMENT LEARNING HELPS SLAM: LEARNING TO BUILD MAPS." ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLIII-B4-2020 (August 25, 2020): 329–35. http://dx.doi.org/10.5194/isprs-archives-xliii-b4-2020-329-2020.

Full text

Abstract:

Abstract. In this research, we investigate the use of Reinforcement Learning (RL) for an effective and robust solution for exploring unknown and indoor environments and reconstructing their maps. We benefit from a Simultaneous Localization and Mapping (SLAM) algorithm for real-time robot localization and mapping. Three different reward functions are compared and tested in different environments with growing complexity. The performances of the three different RL-based path planners are assessed not only on the training environments, but also on an a priori unseen environment to test the generalization properties of the policies. The results indicate that RL-based planners trained to maximize the coverage of the map are able to consistently explore and construct the maps of different indoor environments.

APA, Harvard, Vancouver, ISO, and other styles

15

Francois-Lavet, Vincent, Yoshua Bengio, Doina Precup, and Joelle Pineau. "Combined Reinforcement Learning via Abstract Representations." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3582–89. http://dx.doi.org/10.1609/aaai.v33i01.33013582.

Full text

Abstract:

In the quest for efficient and robust reinforcement learning methods, both model-free and model-based approaches offer advantages. In this paper we propose a new way of explicitly bridging both approaches via a shared low-dimensional learned encoding of the environment, meant to capture summarizing abstractions. We show that the modularity brought by this approach leads to good generalization while being computationally efficient, with planning happening in a smaller latent state space. In addition, this approach recovers a sufficient low-dimensional representation of the environment, which opens up new strategies for interpretable AI, exploration and transfer learning.

APA, Harvard, Vancouver, ISO, and other styles

16

Graham, Robert B. "A Computer Tutorial on the Principles of Stimulus Generalization." Teaching of Psychology 25, no. 2 (April 1998): 149–51. http://dx.doi.org/10.1207/s15328023top2502_21.

Full text

Abstract:

In this article, I describe a computer tutorial that teaches the fundamentals of stimulus generalization in operant learning. The content is appropriate for courses in general psychology, learning, and behavioral programming. Concepts covered include reinforcement, discrimination learning, stimulus continua, generalization, generalization gradients, and peak shift. The tutorial also reviews applications in animal and human situations. Student reaction to this form of presentation was very favorable.

APA, Harvard, Vancouver, ISO, and other styles

17

Tamar, Aviv, Daniel Soudry, and Ev Zisselman. "Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8423–31. http://dx.doi.org/10.1609/aaai.v36i8.20818.

Full text

Abstract:

In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem parameters -- the rewards and transitions -- is assumed, and a policy that optimizes the (posterior) expected return is sought. A common approximation, which has been recently popularized as meta-RL, is to train the agent on a sample of N problem instances from the prior, with the hope that for large enough N, good generalization behavior to an unseen test instance will be obtained. In this work, we study generalization in Bayesian RL under the probably approximately correct (PAC) framework, using the method of algorithmic stability. Our main contribution is showing that by adding regularization, the optimal policy becomes uniformly stable in an appropriate sense. Most stability results in the literature build on strong convexity of the regularized loss -- an approach that is not suitable for RL as Markov decision processes (MDPs) are not convex. Instead, building on recent results of fast convergence rates for mirror descent in regularized MDPs, we show that regularized MDPs satisfy a certain quadratic growth criterion, which is sufficient to establish stability. This result, which may be of independent interest, allows us to study the effect of regularization on generalization in the Bayesian RL setting.

APA, Harvard, Vancouver, ISO, and other styles

18

Vickery, Timothy, and Kyle Friedman. "Generalization of value to visual statistical associates during reinforcement learning." Journal of Vision 15, no. 12 (September 1, 2015): 1350. http://dx.doi.org/10.1167/15.12.1350.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Wen, Zheng, and Benjamin Van Roy. "Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization." Mathematics of Operations Research 42, no. 3 (August 2017): 762–82. http://dx.doi.org/10.1287/moor.2016.0826.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Hirashima, Yoichi. "A New Reinforcement Learning for Train Marshaling with Generalization Capability." Advanced Materials Research 974 (June 2014): 269–73. http://dx.doi.org/10.4028/www.scientific.net/amr.974.269.

Full text

Abstract:

This paper proposes a new marshaling method for assembling an outgoing train. In the proposed method, each set of freight cars that have the same destination make a group, and the desirable group layout constitutes the best outgoing train. The incoming freight cars are classified into several ``sub-tracks'' searching better assignment in order to reduce the transfer distance of locomotive. Classifications and marshaling plans based on the transfer distance of a locomotive are obtained autonomously by a reinforcement learning system. Then, the number of sub-tracks utilized in the classification is determined by the learning system in order to yield generalization capability.

APA, Harvard, Vancouver, ISO, and other styles

21

Goto, Ryo, and Hiroshi Matsuo. "State generalization method with support vector machines in reinforcement learning." Systems and Computers in Japan 37, no. 9 (2006): 77–86. http://dx.doi.org/10.1002/scj.20140.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Perrin, Sarah, Mathieu Laurière, Julien Pérolat, Romuald Élie, Matthieu Geist, and Olivier Pietquin. "Generalization in Mean Field Games by Learning Master Policies." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (June 28, 2022): 9413–21. http://dx.doi.org/10.1609/aaai.v36i9.21173.

Full text

Abstract:

Mean Field Games (MFGs) can potentially scale multi-agent systems to extremely large populations of agents. Yet, most of the literature assumes a single initial distribution for the agents, which limits the practical applications of MFGs. Machine Learning has the potential to solve a wider diversity of MFG problems thanks to generalizations capacities. We study how to leverage these generalization properties to learn policies enabling a typical agent to behave optimally against any population distribution. In reference to the Master equation in MFGs, we coin the term “Master policies” to describe them and we prove that a single Master policy provides a Nash equilibrium, whatever the initial distribution. We propose a method to learn such Master policies. Our approach relies on three ingredients: adding the current population distribution as part of the observation, approximating Master policies with neural networks, and training via Reinforcement Learning and Fictitious Play. We illustrate on numerical examples not only the efficiency of the learned Master policy but also its generalization capabilities beyond the distributions used for training.

APA, Harvard, Vancouver, ISO, and other styles

23

Gershman, Samuel J., Christopher D. Moore, Michael T. Todd, Kenneth A. Norman, and Per B. Sederberg. "The Successor Representation and Temporal Context." Neural Computation 24, no. 6 (June 2012): 1553–68. http://dx.doi.org/10.1162/neco_a_00282.

Full text

Abstract:

The successor representation was introduced into reinforcement learning by Dayan ( 1993 ) as a means of facilitating generalization between states with similar successors. Although reinforcement learning in general has been used extensively as a model of psychological and neural processes, the psychological validity of the successor representation has yet to be explored. An interesting possibility is that the successor representation can be used not only for reinforcement learning but for episodic learning as well. Our main contribution is to show that a variant of the temporal context model (TCM; Howard & Kahana, 2002 ), an influential model of episodic memory, can be understood as directly estimating the successor representation using the temporal difference learning algorithm (Sutton & Barto, 1998 ). This insight leads to a generalization of TCM and new experimental predictions. In addition to casting a new normative light on TCM, this equivalence suggests a previously unexplored point of contact between different learning systems.

APA, Harvard, Vancouver, ISO, and other styles

24

Wu, Haiping, Khimya Khetarpal, and Doina Precup. "Self-Supervised Attention-Aware Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10311–19. http://dx.doi.org/10.1609/aaai.v35i12.17235.

Full text

Abstract:

Visual saliency has emerged as a major visualization tool for interpreting deep reinforcement learning (RL) agents. However, much of the existing research uses it as an analyzing tool rather than an inductive bias for policy learning. In this work, we use visual attention as an inductive bias for RL agents. We propose a novel self-supervised attention learning approach which can 1. learn to select regions of interest without explicit annotations, and 2. act as a plug for existing deep RL methods to improve the learning performance. We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. Finally, we show that our method learns meaningful object keypoints highlighting improvements both qualitatively and quantitatively.

APA, Harvard, Vancouver, ISO, and other styles

25

Gao, Junli, Weijie Ye, Jing Guo, and Zhongjuan Li. "Deep Reinforcement Learning for Indoor Mobile Robot Path Planning." Sensors 20, no. 19 (September 25, 2020): 5493. http://dx.doi.org/10.3390/s20195493.

Full text

Abstract:

This paper proposes a novel incremental training mode to address the problem of Deep Reinforcement Learning (DRL) based path planning for a mobile robot. Firstly, we evaluate the related graphic search algorithms and Reinforcement Learning (RL) algorithms in a lightweight 2D environment. Then, we design the algorithm based on DRL, including observation states, reward function, network structure as well as parameters optimization, in a 2D environment to circumvent the time-consuming works for a 3D environment. We transfer the designed algorithm to a simple 3D environment for retraining to obtain the converged network parameters, including the weights and biases of deep neural network (DNN), etc. Using these parameters as initial values, we continue to train the model in a complex 3D environment. To improve the generalization of the model in different scenes, we propose to combine the DRL algorithm Twin Delayed Deep Deterministic policy gradients (TD3) with the traditional global path planning algorithm Probabilistic Roadmap (PRM) as a novel path planner (PRM+TD3). Experimental results show that the incremental training mode can notably improve the development efficiency. Moreover, the PRM+TD3 path planner can effectively improve the generalization of the model.

APA, Harvard, Vancouver, ISO, and other styles

26

Gustafson, Nicholas J., and Nathaniel D. Daw. "Grid Cells, Place Cells, and Geodesic Generalization for Spatial Reinforcement Learning." PLoS Computational Biology 7, no. 10 (October 27, 2011): e1002235. http://dx.doi.org/10.1371/journal.pcbi.1002235.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Fonteneau, R., D. Ernst, B. Boigelot, and Q. Louveaux. "Min Max Generalization for Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes." SIAM Journal on Control and Optimization 51, no. 5 (January 2013): 3355–85. http://dx.doi.org/10.1137/120867263.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Hashemzadeh, Maryam, Reshad Hosseini, and Majid Nili Ahmadabadi. "Exploiting Generalization in the Subspaces for Faster Model-Based Reinforcement Learning." IEEE Transactions on Neural Networks and Learning Systems 30, no. 6 (June 2019): 1635–50. http://dx.doi.org/10.1109/tnnls.2018.2869978.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Cilden, Erkin, and Faruk Polat. "Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning." IEEE Transactions on Cybernetics 45, no. 8 (August 2015): 1414–25. http://dx.doi.org/10.1109/tcyb.2014.2352038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Hu and Xu. "Fuzzy Reinforcement Learning and Curriculum Transfer Learning for Micromanagement in Multi-Robot Confrontation." Information 10, no. 11 (November 2, 2019): 341. http://dx.doi.org/10.3390/info10110341.

Full text

Abstract:

Multi-Robot Confrontation on physics-based simulators is a complex and time-consuming task, but simulators are required to evaluate the performance of the advanced algorithms. Recently, a few advanced algorithms have been able to produce considerably complex levels in the context of the robot confrontation system when the agents are facing multiple opponents. Meanwhile, the current confrontation decision-making system suffers from difficulties in optimization and generalization. In this paper, a fuzzy reinforcement learning (RL) and the curriculum transfer learning are applied to the micromanagement for robot confrontation system. Firstly, an improved Qlearning in the semi-Markov decision-making process is designed to train the agent and an efficient RL model is defined to avoid the curse of dimensionality. Secondly, a multi-agent RL algorithm with parameter sharing is proposed to train the agents. We use a neural network with adaptive momentum acceleration as a function approximator to estimate the state-action function. Then, a method of fuzzy logic is used to regulate the learning rate of RL. Thirdly, a curriculum transfer learning method is used to extend the RL model to more difficult scenarios, which ensures the generalization of the decision-making system. The experimental results show that the proposed method is effective.

APA, Harvard, Vancouver, ISO, and other styles

31

Zhang, Yichuan, Yixing Lan, Qiang Fang, Xin Xu, Junxiang Li, and Yujun Zeng. "Efficient Reinforcement Learning from Demonstration via Bayesian Network-Based Knowledge Extraction." Computational Intelligence and Neuroscience 2021 (September 24, 2021): 1–16. http://dx.doi.org/10.1155/2021/7588221.

Full text

Abstract:

Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision-making guidance. However, most existing RLfD methods only regard demonstrations as low-level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network-based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network-based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy’s confidence is low, another RL-based refine module will further optimize and fine-tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods.

APA, Harvard, Vancouver, ISO, and other styles

32

Zhou, Li, and Kevin Small. "Inverse Reinforcement Learning with Natural Language Goals." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 11116–24. http://dx.doi.org/10.1609/aaai.v35i12.17326.

Full text

Abstract:

Humans generally use natural language to communicate task requirements to each other. Ideally, natural language should also be usable for communicating goals to autonomous machines (e.g., robots) to minimize friction in task specification. However, understanding and mapping natural language goals to sequences of states and actions is challenging. Specifically, existing work along these lines has encountered difficulty in generalizing learned policies to new natural language goals and environments. In this paper, we propose a novel adversarial inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function. To improve generalization of the learned policy and reward function, we use a variational goal generator to relabel trajectories and sample diverse goals during training. Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset (Room-2-Room), demonstrating a promising advance in enabling the use of natural language instructions in specifying agent goals.

APA, Harvard, Vancouver, ISO, and other styles

33

Ugurlu, Halil Ibrahim, Xuan Huy Pham, and Erdal Kayacan. "Sim-to-Real Deep Reinforcement Learning for Safe End-to-End Planning of Aerial Robots." Robotics 11, no. 5 (October 13, 2022): 109. http://dx.doi.org/10.3390/robotics11050109.

Full text

Abstract:

In this study, a novel end-to-end path planning algorithm based on deep reinforcement learning is proposed for aerial robots deployed in dense environments. The learning agent finds an obstacle-free way around the provided rough, global path by only depending on the observations from a forward-facing depth camera. A novel deep reinforcement learning framework is proposed to train the end-to-end policy with the capability of safely avoiding obstacles. The Webots open-source robot simulator is utilized for training the policy, introducing highly randomized environmental configurations for better generalization. The training is performed without dynamics calculations through randomized position updates to minimize the amount of data processed. The trained policy is first comprehensively evaluated in simulations involving physical dynamics and software-in-the-loop flight control. The proposed method is proven to have a 38% and 50% higher success rate compared to both deep reinforcement learning-based and artificial potential field-based baselines, respectively. The generalization capability of the method is verified in simulation-to-real transfer without further training. Real-time experiments are conducted with several trials in two different scenarios, showing a 50% higher success rate of the proposed method compared to the deep reinforcement learning-based baseline.

APA, Harvard, Vancouver, ISO, and other styles

34

Landry, Jean-Francois, J. J. McArthur, Mikhail Genkin, and Karim El Mokhtari. "Development of the Reward Function to support Model-Free Reinforcement Learning for a Heat Recovery Chiller System Optimization." IOP Conference Series: Earth and Environmental Science 1101, no. 9 (November 1, 2022): 092027. http://dx.doi.org/10.1088/1755-1315/1101/9/092027.

Full text

Abstract:

Abstract Heat recovery chiller systems have significant strategic value to reduce building greenhouse gas emissions although this potential remains unrealized in practice. Real-time optimization using model-free reinforcement learning provides a potential solution to this challenge. A full-scale case study to implement reinforcement learning in a 6,000 m2 academic laboratory is planned. This paper presents the methodology used to translate historical data correlations and expert input from operations personnel into the development of the reinforcement learning agent and associated reward function. This approach will permit a more stable and robust implementation of model-free reinforcement learning and the methodology presented will allow operator-identified constraints to be translated into reward functions more broadly, allowing for generalization to similar heat recovery chiller systems.

APA, Harvard, Vancouver, ISO, and other styles

35

Herlau, Tue, and Rasmus Larsen. "Reinforcement Learning of Causal Variables Using Mediation Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6910–17. http://dx.doi.org/10.1609/aaai.v36i6.20648.

Full text

Abstract:

We consider the problem of acquiring causal representations and concepts in a reinforcement learning setting. Our approach defines a causal variable as being both manipulable by a policy, and able to predict the outcome. We thereby obtain a parsimonious causal graph in which interventions occur at the level of policies. The approach avoids defining a generative model of the data, prior pre-processing, or learning the transition kernel of the Markov decision process. Instead, causal variables and policies are determined by maximizing a new optimization target inspired by mediation analysis, which differs from the expected return. The maximization is accomplished using a generalization of Bellman's equation which is shown to converge, and the method finds meaningful causal representations in a simulated environment.

APA, Harvard, Vancouver, ISO, and other styles

36

Kim and Park. "Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning." Symmetry 11, no. 11 (November 1, 2019): 1352. http://dx.doi.org/10.3390/sym11111352.

Full text

Abstract:

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.

APA, Harvard, Vancouver, ISO, and other styles

37

Devo, Alessandro, Giacomo Mezzetti, Gabriele Costante, Mario L. Fravolini, and Paolo Valigi. "Towards Generalization in Target-Driven Visual Navigation by Using Deep Reinforcement Learning." IEEE Transactions on Robotics 36, no. 5 (October 2020): 1546–61. http://dx.doi.org/10.1109/tro.2020.2994002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Iima, Hitoshi, and Hiroya Oonishi. "Solution of an Optimal Routing Problem by Reinforcement Learning with Generalization Ability." IEEJ Transactions on Electronics, Information and Systems 139, no. 12 (December 1, 2019): 1494–500. http://dx.doi.org/10.1541/ieejeiss.139.1494.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Liu, Rongrong, Florent Nageotte, Philippe Zanne, Michel de Mathelin, and Birgitta Dresp-Langley. "Deep Reinforcement Learning for the Control of Robotic Manipulation: A Focussed Mini-Review." Robotics 10, no. 1 (January 24, 2021): 22. http://dx.doi.org/10.3390/robotics10010022.

Full text

Abstract:

Deep learning has provided new ways of manipulating, processing and analyzing data. It sometimes may achieve results comparable to, or surpassing human expert performance, and has become a source of inspiration in the era of artificial intelligence. Another subfield of machine learning named reinforcement learning, tries to find an optimal behavior strategy through interactions with the environment. Combining deep learning and reinforcement learning permits resolving critical issues relative to the dimensionality and scalability of data in tasks with sparse reward signals, such as robotic manipulation and control tasks, that neither method permits resolving when applied on its own. In this paper, we present recent significant progress of deep reinforcement learning algorithms, which try to tackle the problems for the application in the domain of robotic manipulation control, such as sample efficiency and generalization. Despite these continuous improvements, currently, the challenges of learning robust and versatile manipulation skills for robots with deep reinforcement learning are still far from being resolved for real-world applications.

APA, Harvard, Vancouver, ISO, and other styles

40

Ikegami, Tsuyoshi, J. Randall Flanagan, and Daniel M. Wolpert. "Reach adaption to a visuomotor gain with terminal error feedback involves reinforcement learning." PLOS ONE 17, no. 6 (June 1, 2022): e0269297. http://dx.doi.org/10.1371/journal.pone.0269297.

Full text

Abstract:

Motor adaptation can be achieved through error-based learning, driven by sensory prediction errors, or reinforcement learning, driven by reward prediction errors. Recent work on visuomotor adaptation has shown that reinforcement learning leads to more persistent adaptation when visual feedback is removed, compared to error-based learning in which continuous visual feedback of the movement is provided. However, there is evidence that error-based learning with terminal visual feedback of the movement (provided at the end of movement) may be driven by both sensory and reward prediction errors. Here we examined the influence of feedback on learning using a visuomotor adaptation task in which participants moved a cursor to a single target while the gain between hand and cursor movement displacement was gradually altered. Different groups received either continuous error feedback (EC), terminal error feedback (ET), or binary reinforcement feedback (success/fail) at the end of the movement (R). Following adaptation we tested generalization to targets located in different directions and found that generalization in the ET group was intermediate between the EC and R groups. We then examined the persistence of adaptation in the EC and ET groups when the cursor was extinguished and only binary reward feedback was provided. Whereas performance was maintained in the ET group, it quickly deteriorated in the EC group. These results suggest that terminal error feedback leads to a more robust form of learning than continuous error feedback. In addition our findings are consistent with the view that error-based learning with terminal feedback involves both error-based and reinforcement learning.

APA, Harvard, Vancouver, ISO, and other styles

41

Barreto, André, Shaobo Hou, Diana Borsa, David Silver, and Doina Precup. "Fast reinforcement learning with generalized policy updates." Proceedings of the National Academy of Sciences 117, no. 48 (August 17, 2020): 30079–87. http://dx.doi.org/10.1073/pnas.1907370117.

Full text

Abstract:

The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism. The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation. The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.

APA, Harvard, Vancouver, ISO, and other styles

42

Lou, Ping, Kun Xu, Xuemei Jiang, Zheng Xiao, and Junwei Yan. "Path planning in an unknown environment based on deep reinforcement learning with prior knowledge." Journal of Intelligent & Fuzzy Systems 41, no. 6 (December 16, 2021): 5773–89. http://dx.doi.org/10.3233/jifs-192171.

Full text

Abstract:

Path planning in an unknown environment is a basic task for mobile robots to complete tasks. As a typical deep reinforcement learning, deep Q-network (DQN) algorithm has gained wide popularity in path planning tasks due to its self-learning and adaptability to complex environment. However, most of path planning algorithms based on DQN spend plenty of time for model training and the learned model policy depends only on the information observed by sensors. It will cause poor generalization capability for the new task and time waste for model retraining. Therefore, a new deep reinforcement learning method combining DQN with prior knowledge is proposed to reduce training time and enhance generalization capability. In this method, a fuzzy logic controller is designed to avoid the obstacles and help the robot avoid blind exploration for reducing the training time. A target-driven approach is used to address the lack of generalization, in which the learned policy depends on the fusion of observed information and target information. Extensive experiments show that the proposed algorithm converges faster than DQN algorithm in path planning tasks and the target can be reached without retraining when the path planning task changes.

APA, Harvard, Vancouver, ISO, and other styles

43

Gao, Xiaoyu, Shipin Yang, and Lijuan Li. "Optimization of flow shop scheduling based on genetic algorithm with reinforcement learning." Journal of Physics: Conference Series 2258, no. 1 (April 1, 2022): 012019. http://dx.doi.org/10.1088/1742-6596/2258/1/012019.

Full text

Abstract:

Abstract Genetic algorithm, as a kind of evolutionary algorithm, has the characteristics of easy operation and global search, but its stochasticity is relatively strong and highly susceptible to parameters. When facing a large-scale scheduling problem, a strategy is needed to improve the parameter adaptability to make its solution more effective. Reinforcement learning, as an optimization method, has a strong autonomous learning capability. Therefore, this paper proposes a genetic algorithm based on reinforcement learning, which uses Q-learning to self-learning the crossover probability and improve the generalization ability of genetic algorithm, so as to achieve the solution of large-scale replacement flow shop scheduling problem.

APA, Harvard, Vancouver, ISO, and other styles

44

Keith, Kenneth D. "Peak Shift Phenomenon: A Teaching Activity for Basic Learning Theory." Teaching of Psychology 29, no. 4 (October 2002): 298–300. http://dx.doi.org/10.1207/s15328023top2904_09.

Full text

Abstract:

Stimulus discrimination is a standard subject in undergraduate courses presenting basic principles of learning, and a particularly interesting aspect of discrimination is the peak shift phenomenon. Peak shift occurs in generalization tests following intradimensional discrimination training as a displacement of peak responding away from the S+ (a stimulus signaling availability of reinforcement) in a direction opposite the S– (a stimulus signaling lack of reinforcement). This activity allows students to develop intradimensional discriminations that enable firsthand observation of the peak shift phenomenon. Evaluation of the activity suggests that it produces improved understanding of peak shift and that undergraduate students can demonstrate peak shift in simple discrimination tasks.

APA, Harvard, Vancouver, ISO, and other styles

45

Gomolka, Zbigniew, Ewa Dudek-Dyduch, and Ewa Zeslawska. "Generalization of ALMM Based Learning Method for Planning and Scheduling." Applied Sciences 12, no. 24 (December 12, 2022): 12766. http://dx.doi.org/10.3390/app122412766.

Full text

Abstract:

This paper refers to a machine learning method for solving NP-hard discrete optimization problems, especially planning and scheduling. The method utilizes a special multistage decision process modeling paradigm referred to as the Algebraic Logical Metamodel based learning methods of Multistage Decision Processes (ALMM). Hence, the name of the presented method is the ALMM Based Learning method. This learning method utilizes a specifically built local multicriterion optimization problem that is solved by means of scalarization. This paper describes both the development of such local optimization problems and the concept of the learning process with the fractional derivative mechanism itself. It includes proofs of theorems showing that the ALMM Based Learning method can be defined for a much broader problem class than initially assumed. This significantly extends the range of the prime learning method applications. New generalizations for the prime ALMM Based Learning method, as well as some essential comments on a comparison of Reinforcement Learning with the ALMM Based Learning, are also presented.

APA, Harvard, Vancouver, ISO, and other styles

46

Li, Bo, Zhigang Gan, Daqing Chen, and Dyachenko Sergey Aleksandrovich. "UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning." Remote Sensing 12, no. 22 (November 18, 2020): 3789. http://dx.doi.org/10.3390/rs12223789.

Full text

Abstract:

This paper combines deep reinforcement learning (DRL) with meta-learning and proposes a novel approach, named meta twin delayed deep deterministic policy gradient (Meta-TD3), to realize the control of unmanned aerial vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider a multi-task experience replay buffer to provide data for the multi-task learning of the DRL algorithm, and we combine meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, namely the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness.

APA, Harvard, Vancouver, ISO, and other styles

47

Murugesan, Keerthiram, Subhajit Chaudhury, and Kartik Talamadupula. "Eye of the Beholder: Improved Relation Generalization for Text-Based Reinforcement Learning Agents." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 11094–102. http://dx.doi.org/10.1609/aaai.v36i10.21358.

Full text

Abstract:

Text-based games (TBGs) have become a popular proving ground for the demonstration of learning-based agents that make decisions in quasi real-world settings. The crux of the problem for a reinforcement learning agent in such TBGs is identifying the objects in the world, and those objects' relations with that world. While the recent use of text-based resources for increasing an agent's knowledge and improving its generalization have shown promise, we posit in this paper that there is much yet to be learned from visual representations of these same worlds. Specifically, we propose to retrieve images that represent specific instances of text observations from the world and train our agents on such images. This improves the agent's overall understanding of the game scene and objects' relationships to the world around them, and the variety of visual representations on offer allow the agent to generate a better generalization of a relationship. We show that incorporating such images improves the performance of agents in various TBG settings.

APA, Harvard, Vancouver, ISO, and other styles

48

Li, Shang, Xin Chen, Min Zhang, Qingchen Jin, Yudi Guo, and Shunxiang Xing. "A UAV Coverage Path Planning Algorithm Based on Double Deep Q-Network." Journal of Physics: Conference Series 2216, no. 1 (March 1, 2022): 012017. http://dx.doi.org/10.1088/1742-6596/2216/1/012017.

Full text

Abstract:

Abstract The UAV path planning method has practical value in the military field and automated production. Based on deep reinforcement learning theory and the characteristics of coverage path planning, this paper designs and implements a set of deep reinforcement learning frameworks suitable for UAV coverage path planning and trains it in the abstract environment model built. The simulation experiment results show that the designed UAV coverage path planning frameworks can consider obstacles, no-fly zones and length constraints, plan a reasonable path to complete the coverage task, and have a certain generalization ability.

APA, Harvard, Vancouver, ISO, and other styles

49

Kimura, Hajime, Kei Aoki, and Shigenobu Kobayashi. "Reinforcement Learning in Large Scale Systems Using State Generalization and Multi-Agent Techniques." IEEJ Transactions on Industry Applications 123, no. 10 (2003): 1091–96. http://dx.doi.org/10.1541/ieejias.123.1091.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Barbahan, Ibraheem, Vladimir Baikalov, Valeriy Vyatkin, and Andrey Filchenkov. "Multi-Agent Deep Reinforcement Learning-Based Algorithm For Fast Generalization On Routing Problems." Procedia Computer Science 193 (2021): 228–38. http://dx.doi.org/10.1016/j.procs.2021.10.023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Journal articles on the topic 'Reinforcement Learning Generalization'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles