Thematische Bibliographien / Goal-conditioned reinforcement learning

Auswahl der wissenschaftlichen Literatur zum Thema „Goal-conditioned reinforcement learning“

Autor: Grafiati

Veröffentlicht am 25. Mai 2024

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Goal-conditioned reinforcement learning" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Goal-conditioned reinforcement learning"

Yin, Xiangyu, Sihao Wu, Jiaxu Liu, Meng Fang, Xingyu Zhao, Xiaowei Huang und Wenjie Ruan. „Representation-Based Robustness in Goal-Conditioned Reinforcement Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 19 (24.03.2024): 21761–69. http://dx.doi.org/10.1609/aaai.v38i19.30176.

Der volle Inhalt der Quelle

Annotation:

While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its algorithmic robustness against adversarial perturbations remains unexplored. The attacks and robust representation training methods that are designed for traditional RL become less effective when applied to GCRL. To address this challenge, we first propose the Semi-Contrastive Representation attack, a novel approach inspired by the adversarial contrastive attack. Unlike existing attacks in RL, it only necessitates information from the policy function and can be seamlessly implemented during deployment. Then, to mitigate the vulnerability of existing GCRL algorithms, we introduce Adversarial Representation Tactics, which combines Semi-Contrastive Adversarial Augmentation with Sensitivity-Aware Regularizer to improve the adversarial robustness of the underlying RL agent against various types of perturbations. Extensive experiments validate the superior performance of our attack and defence methods across multiple state-of-the-art GCRL algorithms. Our code is available at https://github.com/TrustAI/ReRoGCRL.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Levine, Alexander, und Soheil Feizi. „Goal-Conditioned Q-learning as Knowledge Distillation“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 7 (26.06.2023): 8500–8509. http://dx.doi.org/10.1609/aaai.v37i7.26024.

Der volle Inhalt der Quelle

Annotation:

Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals, where the agent can attain a reward by achieving any one of a large set of objectives, all specified at test time. Finally, to provide theoretical support, we give examples of classes of environments where (under some assumptions) standard off-policy algorithms such as DDPG require at least O(d^2) replay buffer transitions to learn an optimal policy, while our proposed technique requires only O(d) transitions, where d is the dimensionality of the goal and state space. Code and appendix are available at https://github.com/alevine0/ReenGAGE.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

YAMADA, Takaya, und Koich OGAWARA. „Goal-Conditioned Reinforcement Learning with Latent Representations using Contrastive Learning“. Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2021 (2021): 1P1—I15. http://dx.doi.org/10.1299/jsmermd.2021.1p1-i15.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Qian, Zhifeng, Mingyu You, Hongjun Zhou und Bin He. „Weakly Supervised Disentangled Representation for Goal-Conditioned Reinforcement Learning“. IEEE Robotics and Automation Letters 7, Nr. 2 (April 2022): 2202–9. http://dx.doi.org/10.1109/lra.2022.3141148.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

TANIGUCHI, Asuto, Fumihiro SASAKI und Ryota YAMASHINA. „Goal-Conditioned Reinforcement Learning with Extended Floyd-Warshall method“. Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2020 (2020): 2A1—L01. http://dx.doi.org/10.1299/jsmermd.2020.2a1-l01.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Elguea-Aguinaco, Íñigo, Antonio Serrano-Muñoz, Dimitrios Chrysostomou, Ibai Inziarte-Hidalgo, Simon Bøgh und Nestor Arana-Arexolaleiba. „Goal-Conditioned Reinforcement Learning within a Human-Robot Disassembly Environment“. Applied Sciences 12, Nr. 22 (15.11.2022): 11610. http://dx.doi.org/10.3390/app122211610.

Der volle Inhalt der Quelle

Annotation:

The introduction of collaborative robots in industrial environments reinforces the need to provide these robots with better cognition to accomplish their tasks while fostering worker safety without entering into safety shutdowns that reduce workflow and production times. This paper presents a novel strategy that combines the execution of contact-rich tasks, namely disassembly, with real-time collision avoidance through machine learning for safe human-robot interaction. Specifically, a goal-conditioned reinforcement learning approach is proposed, in which the removal direction of a peg, of varying friction, tolerance, and orientation, is subject to the location of a human collaborator with respect to a 7-degree-of-freedom manipulator at each time step. For this purpose, the suitability of three state-of-the-art actor-critic algorithms is evaluated, and results from simulation and real-world experiments are presented. In reality, the policy’s deployment is achieved through a new scalable multi-control framework that allows a direct transfer of the control policy to the robot and reduces response times. The results show the effectiveness, generalization, and transferability of the proposed approach with two collaborative robots against static and dynamic obstacles, leveraging the set of available solutions in non-monotonic tasks to avoid a potential collision with the human worker.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Liu, Bo, Yihao Feng, Qiang Liu und Peter Stone. „Metric Residual Network for Sample Efficient Goal-Conditioned Reinforcement Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 7 (26.06.2023): 8799–806. http://dx.doi.org/10.1609/aaai.v37i7.26058.

Der volle Inhalt der Quelle

Annotation:

Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture. The key insight is that the optimal action-value function must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal action-value function, thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency. The code is available at https://github.com/Cranial-XIX/metric-residual-network.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Ding, Hongyu, Yuanze Tang, Qing Wu, Bo Wang, Chunlin Chen und Zhi Wang. „Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning“. IEEE/CAA Journal of Automatica Sinica 10, Nr. 12 (Dezember 2023): 2233–47. http://dx.doi.org/10.1109/jas.2023.123477.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Xu, Jiawei, Shuxing Li, Rui Yang, Chun Yuan und Lei Han. „Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization“. Journal of Artificial Intelligence Research 77 (05.06.2023): 355–76. http://dx.doi.org/10.1613/jair.1.14398.

Der volle Inhalt der Quelle

Annotation:

Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challenging problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past experiences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of samples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Faccio, Francesco, Vincent Herrmann, Aditya Ramesh, Louis Kirsch und Jürgen Schmidhuber. „Goal-Conditioned Generators of Deep Policies“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 6 (26.06.2023): 7503–11. http://dx.doi.org/10.1609/aaai.v37i6.25912.

Der volle Inhalt der Quelle

Annotation:

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form ``generate a policy that achieves a desired expected return,'' our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Mehr Quellen

Dissertationen zum Thema "Goal-conditioned reinforcement learning"

Chenu, Alexandre. „Leveraging sequentiality in Robot Learning : Application of the Divide & Conquer paradigm to Neuro-Evolution and Deep Reinforcement Learning“. Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS342.

Der volle Inhalt der Quelle

Annotation:

"Pour réussir, il ne suffit pas de prévoir, il faut aussi savoir improviser." Cette citation d’Isaac Asimov, père fondateur de la robotique et auteur des Trois lois de la robotique, souligne toute l’importance d’être capable de s’adapter et d’agir dans l’instant présent pour réussir. Même si, aujourd’hui, les robots peuvent résoudre des tâches d’une complexité qui était inimaginable il y a encore quelques années, ces capacités d’adaptation leur font encore défaut, ce qui les empêche d’être déployé à une plus grande échelle. Pour remédier à ce manque d’adaptabilité, les roboticiens utilisent des algorithmes d’apprentissage afin de permettre aux robots de résoudre des tâches complexes de manière autonome. Deux types d’algorithmes d’apprentissage sont particulièrement adaptés à l’apprentissage autonome de contrôleurs par les robots : l’apprentissage profond par renforcement et la neuro-évolution. Cependant, ces deux classes d’algorithmes ne sont capables de résoudre des problèmes d’exploration difficiles, c’est-à- dire des problèmes avec un horizon long et un signal de récompense rare, que s’ils sont guidés dans leur processus d’apprentissage. Différentes approches peuvent être envisagées pour permettre à un robot de résoudre un tel problème sans être guidé. Une première approche consiste à rechercher une diversité de comportements plutôt qu’un comportement spécifique. L’idée étant que parmi cette diversité, certains comportements seront probablement capables de résoudre la tâche qui nous intéresse. Nous les appelons les algorithmes de recherche de diversité. Une deuxième approche consiste à guider le processus d’apprentissage en utilisant des démonstrations fournies par un expert. C’est ce qu’on appelle l’apprentissage par démonstration. Cependant, chercher des comportements divers ou apprendre par démonstration peut être inefficace dans certains contextes. En effet, la recherche de comportements divers peut être fastidieuse si l’environnement est complexe. D’autre part, l’apprentissage à partir d’une seule et unique démonstration peut être très difficile. Dans cette thèse, nous tentons d’améliorer l’efficacité des approches de recherche par diversité et d’apprentissage à partir d’une seule démonstration dans des problèmes d’exploration difficiles. Pour ce faire, nous supposons que les comportements robotiques complexes peuvent être décomposés en sous-comportements plus simples. Sur la base de ce biais séquentiel, nous adoptons une stratégie dite de "diviser-pour-régner", qui est bien connue pour être efficace lorsque le problème est composable. Nous proposons deux approches en particulier. Premièrement, après avoir identifié certaines limites des algorithmes de recherche de diversité basés sur la l’évolution de réseaux de neurones artificiels, nous proposons Novelty Search Skill Chaining. Cet algorithme combine la recherche de diversité avec l’enchaînement de compétences pour naviguer efficacement dans des labyrinthes qui sont difficiles à explorer pour des algorithmes de l’état-de-l’art. Dans une deuxième série de contributions, nous proposons les algorithmes Divide & Conquer Imitation Learning. L’intuition derrière ces méthodes est de décomposer la tâche complexe d’apprentissage à partir d’une seule démonstration en plusieurs sous-tâches plus simples consistant à atteindre des sous-buts successifs. DCIL-II, la variante la plus avancée, est capable d’apprendre des comportements de marche pour des robots humanoïdes sous-actionnés avec une efficacité sans précédent. Au-delà de souligner l’efficacité du paradigme de diviser-pour-régner dans l’apprentissage des robots, cette thèse met également en évidence les difficultés qui peuvent survenir lorsqu’on compose de comportements, même dans des environnements élémentaires. Il faudra inévitablement résoudre ces difficultés avant d’appliquer ces algorithmes directement à des robots réels. C’est peut-être une condition nécessaire pour le succès des prochaines générations [...]
“To succeed, planning alone is insufficient. One must improvise as well.” This quote from Isaac Asimov, founding father of robotics and author of the Three Laws of Robotics, emphasizes the importance of being able to adapt and think on one’s feet to achieve success. Although robots can nowadays resolve highly complex tasks, they still need to gain those crucial adaptability skills to be deployed on a larger scale. Robot Learning uses learning algorithms to tackle this lack of adaptability and to enable robots to solve complex tasks autonomously. Two types of learning algorithms are particularly suitable for robots to learn controllers autonomously: Deep Reinforcement Learning and Neuro-Evolution. However, both classes of algorithms often cannot solve Hard Exploration Problems, that is problems with a long horizon and a sparse reward signal, unless they are guided in their learning process. One can consider different approaches to tackle those problems. An option is to search for a diversity of behaviors rather than a specific one. The idea is that among this diversity, some behaviors will be able to solve the task. We call these algorithms Diversity Search algorithms. A second option consists in guiding the learning process using demonstrations provided by an expert. This is called Learning from Demonstration. However, searching for diverse behaviors or learning from demonstration can be inefficient in some contexts. Indeed, finding diverse behaviors can be tedious if the environment is complex. On the other hand, learning from demonstration can be very difficult if only one demonstration is available. This thesis attempts to improve the effectiveness of Diversity Search and Learning from Demonstration when applied to Hard Exploration Problems. To do so, we assume that complex robotics behaviors can be decomposed into reaching simpler sub-goals. Based on this sequential bias, we try to improve the sample efficiency of Diversity Search and Learning from Demonstration algorithms by adopting Divide & Conquer strategies, which are well-known for their efficiency when the problem is composable. Throughout the thesis, we propose two main strategies. First, after identifying some limitations of Diversity Search algorithms based on Neuro-Evolution, we propose Novelty Search Skill Chaining. This algorithm combines Diversity Search with Skill- Chaining to efficiently navigate maze environments that are difficult to explore for state-of-the-art Diversity Search. In a second set of contributions, we propose the Divide & Conquer Imitation Learning algorithms. The key intuition behind those methods is to decompose the complex task of learning from a single demonstration into several simpler goal-reaching sub-tasks. DCIL-II, the most advanced variant, can learn walking behaviors for under-actuated humanoid robots with unprecedented efficiency. Beyond underlining the effectiveness of the Divide & Conquer paradigm in Robot Learning, this work also highlights the difficulties that can arise when composing behaviors, even in elementary environments. One will inevitably have to address these difficulties before applying these algorithms directly to real robots. It may be necessary for the success of the next generations of robots, as outlined by Asimov

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Goal-conditioned reinforcement learning"

Steccanella, Lorenzo, und Anders Jonsson. „State Representation Learning for Goal-Conditioned Reinforcement Learning“. In Machine Learning and Knowledge Discovery in Databases, 84–99. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26412-2_6.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Zou, Qiming, und Einoshin Suzuki. „Contrastive Goal Grouping for Policy Generalization in Goal-Conditioned Reinforcement Learning“. In Neural Information Processing, 240–53. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-92185-9_20.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Goal-conditioned reinforcement learning"

Liu, Minghuan, Menghui Zhu und Weinan Zhang. „Goal-Conditioned Reinforcement Learning: Problems and Solutions“. In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/770.

Der volle Inhalt der Quelle

Annotation:

Goal-conditioned reinforcement learning (GCRL), related to a set of complex RL problems, trains an agent to achieve different goals under particular scenarios. Compared to the standard RL solutions that learn a policy solely depending on the states or observations, GCRL additionally requires the agent to make decisions according to different goals. In this survey, we provide a comprehensive overview of the challenges and algorithms for GCRL. Firstly, we answer what the basic problems are studied in this field. Then, we explain how goals are represented and present how existing solutions are designed from different points of view. Finally, we make the conclusion and discuss potential future prospects that recent researches focus on.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bortkiewicz, Michał, Jakub Łyskawa, Paweł Wawrzyński, Mateusz Ostaszewski, Artur Grudkowski, Bartłomiej Sobieski und Tomasz Trzciński. „Subgoal Reachability in Goal Conditioned Hierarchical Reinforcement Learning“. In 16th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, 2024. http://dx.doi.org/10.5220/0012326200003636.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Yu, Zhe, Kailai Sun, Chenghao Li, Dianyu Zhong, Yiqin Yang und Qianchuan Zhao. „A Goal-Conditioned Reinforcement Learning Algorithm with Environment Modeling“. In 2023 42nd Chinese Control Conference (CCC). IEEE, 2023. http://dx.doi.org/10.23919/ccc58697.2023.10240963.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Zou, Qiming, und Einoshin Suzuki. „Sample-Efficient Goal-Conditioned Reinforcement Learning via Predictive Information Bottleneck for Goal Representation Learning“. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023. http://dx.doi.org/10.1109/icra48891.2023.10161213.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Deng, Yuhong, Chongkun Xia, Xueqian Wang und Lipeng Chen. „Deep Reinforcement Learning Based on Local GNN for Goal-Conditioned Deformable Object Rearranging“. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022. http://dx.doi.org/10.1109/iros47612.2022.9981669.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Bagaria, Akhil, und Tom Schaul. „Scaling Goal-based Exploration via Pruning Proto-goals“. In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/384.

Der volle Inhalt der Quelle

Annotation:

One of the gnarliest challenges in reinforcement learning (RL) is exploration that scales to vast domains, where novelty-, or coverage-seeking behaviour falls short. Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space. The core challenge in goal discovery is finding the right balance between generality (not hand-crafted) and tractability (useful, not too many). Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space, and an autonomous discovery process to refine this to a narrower space of controllable, reachable, novel, and relevant goals. The effectiveness of goal-conditioned exploration with the latter is then demonstrated in three challenging environments.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Simmons-Edler, Riley, Ben Eisner, Daniel Yang, Anthony Bisulco, Eric Mitchell, Sebastian Seung und Daniel Lee. „Reward Prediction Error as an Exploration Objective in Deep RL“. In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/390.

Der volle Inhalt der Quelle

Annotation:

A major challenge in reinforcement learning is exploration, when local dithering methods such as epsilon-greedy sampling are insufficient to solve a given task. Many recent methods have proposed to intrinsically motivate an agent to seek novel states, driving the agent to discover improved reward. However, while state-novelty exploration methods are suitable for tasks where novel observations correlate well with improved reward, they may not explore more efficiently than epsilon-greedy approaches in environments where the two are not well-correlated. In this paper, we distinguish between exploration tasks in which seeking novel states aids in finding new reward, and those where it does not, such as goal-conditioned tasks and escaping local reward maxima. We propose a new exploration objective, maximizing the reward prediction error (RPE) of a value function trained to predict extrinsic reward. We then propose a deep reinforcement learning method, QXplore, which exploits the temporal difference error of a Q-function to solve hard exploration tasks in high-dimensional MDPs. We demonstrate the exploration behavior of QXplore on several OpenAI Gym MuJoCo tasks and Atari games and observe that QXplore is comparable to or better than a baseline state-novelty method in all cases, outperforming the baseline on tasks where state novelty is not well-correlated with improved reward.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!