Journal articles on the topic 'Goal-conditioned reinforcement learning'

To see the other types of publications on this topic, follow the link: Goal-conditioned reinforcement learning.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 37 journal articles for your research on the topic 'Goal-conditioned reinforcement learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Yin, Xiangyu, Sihao Wu, Jiaxu Liu, Meng Fang, Xingyu Zhao, Xiaowei Huang, and Wenjie Ruan. "Representation-Based Robustness in Goal-Conditioned Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21761–69. http://dx.doi.org/10.1609/aaai.v38i19.30176.

Full text
Abstract:
While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its algorithmic robustness against adversarial perturbations remains unexplored. The attacks and robust representation training methods that are designed for traditional RL become less effective when applied to GCRL. To address this challenge, we first propose the Semi-Contrastive Representation attack, a novel approach inspired by the adversarial contrastive attack. Unlike existing attacks in RL, it only necessitates information from the policy function and can be seamlessly implemented during deployment. Then, to mitigate the vulnerability of existing GCRL algorithms, we introduce Adversarial Representation Tactics, which combines Semi-Contrastive Adversarial Augmentation with Sensitivity-Aware Regularizer to improve the adversarial robustness of the underlying RL agent against various types of perturbations. Extensive experiments validate the superior performance of our attack and defence methods across multiple state-of-the-art GCRL algorithms. Our code is available at https://github.com/TrustAI/ReRoGCRL.
APA, Harvard, Vancouver, ISO, and other styles
2

Levine, Alexander, and Soheil Feizi. "Goal-Conditioned Q-learning as Knowledge Distillation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8500–8509. http://dx.doi.org/10.1609/aaai.v37i7.26024.

Full text
Abstract:
Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals, where the agent can attain a reward by achieving any one of a large set of objectives, all specified at test time. Finally, to provide theoretical support, we give examples of classes of environments where (under some assumptions) standard off-policy algorithms such as DDPG require at least O(d^2) replay buffer transitions to learn an optimal policy, while our proposed technique requires only O(d) transitions, where d is the dimensionality of the goal and state space. Code and appendix are available at https://github.com/alevine0/ReenGAGE.
APA, Harvard, Vancouver, ISO, and other styles
3

YAMADA, Takaya, and Koich OGAWARA. "Goal-Conditioned Reinforcement Learning with Latent Representations using Contrastive Learning." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2021 (2021): 1P1—I15. http://dx.doi.org/10.1299/jsmermd.2021.1p1-i15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Qian, Zhifeng, Mingyu You, Hongjun Zhou, and Bin He. "Weakly Supervised Disentangled Representation for Goal-Conditioned Reinforcement Learning." IEEE Robotics and Automation Letters 7, no. 2 (April 2022): 2202–9. http://dx.doi.org/10.1109/lra.2022.3141148.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

TANIGUCHI, Asuto, Fumihiro SASAKI, and Ryota YAMASHINA. "Goal-Conditioned Reinforcement Learning with Extended Floyd-Warshall method." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2020 (2020): 2A1—L01. http://dx.doi.org/10.1299/jsmermd.2020.2a1-l01.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Elguea-Aguinaco, Íñigo, Antonio Serrano-Muñoz, Dimitrios Chrysostomou, Ibai Inziarte-Hidalgo, Simon Bøgh, and Nestor Arana-Arexolaleiba. "Goal-Conditioned Reinforcement Learning within a Human-Robot Disassembly Environment." Applied Sciences 12, no. 22 (November 15, 2022): 11610. http://dx.doi.org/10.3390/app122211610.

Full text
Abstract:
The introduction of collaborative robots in industrial environments reinforces the need to provide these robots with better cognition to accomplish their tasks while fostering worker safety without entering into safety shutdowns that reduce workflow and production times. This paper presents a novel strategy that combines the execution of contact-rich tasks, namely disassembly, with real-time collision avoidance through machine learning for safe human-robot interaction. Specifically, a goal-conditioned reinforcement learning approach is proposed, in which the removal direction of a peg, of varying friction, tolerance, and orientation, is subject to the location of a human collaborator with respect to a 7-degree-of-freedom manipulator at each time step. For this purpose, the suitability of three state-of-the-art actor-critic algorithms is evaluated, and results from simulation and real-world experiments are presented. In reality, the policy’s deployment is achieved through a new scalable multi-control framework that allows a direct transfer of the control policy to the robot and reduces response times. The results show the effectiveness, generalization, and transferability of the proposed approach with two collaborative robots against static and dynamic obstacles, leveraging the set of available solutions in non-monotonic tasks to avoid a potential collision with the human worker.
APA, Harvard, Vancouver, ISO, and other styles
7

Liu, Bo, Yihao Feng, Qiang Liu, and Peter Stone. "Metric Residual Network for Sample Efficient Goal-Conditioned Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8799–806. http://dx.doi.org/10.1609/aaai.v37i7.26058.

Full text
Abstract:
Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture. The key insight is that the optimal action-value function must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal action-value function, thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency. The code is available at https://github.com/Cranial-XIX/metric-residual-network.
APA, Harvard, Vancouver, ISO, and other styles
8

Ding, Hongyu, Yuanze Tang, Qing Wu, Bo Wang, Chunlin Chen, and Zhi Wang. "Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning." IEEE/CAA Journal of Automatica Sinica 10, no. 12 (December 2023): 2233–47. http://dx.doi.org/10.1109/jas.2023.123477.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Xu, Jiawei, Shuxing Li, Rui Yang, Chun Yuan, and Lei Han. "Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization." Journal of Artificial Intelligence Research 77 (June 5, 2023): 355–76. http://dx.doi.org/10.1613/jair.1.14398.

Full text
Abstract:
Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challenging problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past experiences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of samples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning.
APA, Harvard, Vancouver, ISO, and other styles
10

Faccio, Francesco, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, and Jürgen Schmidhuber. "Goal-Conditioned Generators of Deep Policies." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 7503–11. http://dx.doi.org/10.1609/aaai.v37i6.25912.

Full text
Abstract:
Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form ``generate a policy that achieves a desired expected return,'' our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.
APA, Harvard, Vancouver, ISO, and other styles
11

Liu, Jinxin, Donglin Wang, Qiangxing Tian, and Zhengyu Chen. "Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7558–66. http://dx.doi.org/10.1609/aaai.v36i7.20721.

Full text
Abstract:
It is of significance for an agent to autonomously explore the environment and learn a widely applicable and general-purpose goal-conditioned policy that can achieve diverse goals including images and text descriptions. Considering such perceptually-specific goals, one natural approach is to reward the agent with a prior non-parametric distance over the embedding spaces of states and goals. However, this may be infeasible in some situations, either because it is unclear how to choose suitable measurement, or because embedding (heterogeneous) goals and states is non-trivial. The key insight of this work is that we introduce a latent-conditioned policy to provide goals and intrinsic rewards for learning the goal-conditioned policy. As opposed to directly scoring current states with regards to goals, we obtain rewards by scoring current states with associated latent variables. We theoretically characterize the connection between our unsupervised objective and the multi-goal setting, and empirically demonstrate the effectiveness of our proposed method which substantially outperforms prior techniques in a variety of tasks.
APA, Harvard, Vancouver, ISO, and other styles
12

Jang, Seongwon, Hyemi Jeong, and Hyunseok Yang. "MURM: Utilization of Multi-Views for Goal-Conditioned Reinforcement Learning in Robotic Manipulation." Robotics 12, no. 4 (August 19, 2023): 119. http://dx.doi.org/10.3390/robotics12040119.

Full text
Abstract:
We present a novel framework, multi-view unified reinforcement learning for robotic manipulation (MURM), which efficiently utilizes multiple camera views to train a goal-conditioned policy for a robot to perform complex tasks. The MURM framework consists of three main phases: (i) demo collection from an expert, (ii) representation learning, and (iii) offline reinforcement learning. In the demo collection phase, we design a scripted expert policy that uses privileged information, such as Cartesian coordinates of a target and goal, to solve the tasks. We add noise to the expert policy to provide sufficient interactive information about the environment, as well as suboptimal behavioral trajectories. We designed three tasks in a Pybullet simulation environment, including placing an object in a desired goal position and picking up various objects that are randomly positioned in the environment. In the representation learning phase, we use a vector-quantized variational autoencoder (VQVAE) to learn a more structured latent representation that makes it feasible to train for RL compared to high-dimensional raw images. We train VQVAE models for each distinct camera view and define the best viewpoint settings for training. In the offline reinforcement learning phase, we use the Implicit Q-learning (IQL) algorithm as our baseline and introduce a separated Q-functions method and dropout method that can be implemented in multi-view settings to train the goal-conditioned policy with supervised goal images. We conduct experiments in simulation and show that the single-view baseline fails to solve complex tasks, whereas MURM is successful.
APA, Harvard, Vancouver, ISO, and other styles
13

Colas, Cédric, Tristan Karch, Olivier Sigaud, and Pierre-Yves Oudeyer. "Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: A Short Survey." Journal of Artificial Intelligence Research 74 (July 9, 2022): 1159–99. http://dx.doi.org/10.1613/jair.1.13554.

Full text
Abstract:
Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autotelic agents: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: developmental reinforcement learning. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem— the intrinsically motivated acquisition of open-ended repertoires of skills. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition.
APA, Harvard, Vancouver, ISO, and other styles
14

Ma, Haozhe, Thanh Vinh Vo, and Tze-Yun Leong. "Human-AI Collaborative Sub-Goal Optimization in Hierarchical Reinforcement Learning." Proceedings of the AAAI Symposium Series 1, no. 1 (October 3, 2023): 86–89. http://dx.doi.org/10.1609/aaaiss.v1i1.27481.

Full text
Abstract:
Hierarchical reinforcement learning often involves human expertise in defining multiple sub-goals to decompose complex objectives into relevant sub-tasks. However, manually specifying these sub-goals is labor-intensive, costly, and prone to introducing biases or misleading the agent. To overcome these challenges, we propose a collaborative human-AI algorithm that seamlessly integrates with hierarchical models to automatically update prior knowledge and optimize candidate sub-goals. Our algorithm can be easily incorporated into a wide range of goal-conditioned frameworks. We evaluate our approach in comparison with relevant baselines, we demonstrate the effectiveness of our algorithm in addressing and preventing negative inferences arising from confusing or conflicting sub-goals. Additionally, our algorithm shows robustness across different levels of human knowledge, accelerating convergence towards optimal sub-goal spaces and hierarchical policies.
APA, Harvard, Vancouver, ISO, and other styles
15

Zhou, Li, and Kevin Small. "Inverse Reinforcement Learning with Natural Language Goals." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 11116–24. http://dx.doi.org/10.1609/aaai.v35i12.17326.

Full text
Abstract:
Humans generally use natural language to communicate task requirements to each other. Ideally, natural language should also be usable for communicating goals to autonomous machines (e.g., robots) to minimize friction in task specification. However, understanding and mapping natural language goals to sequences of states and actions is challenging. Specifically, existing work along these lines has encountered difficulty in generalizing learned policies to new natural language goals and environments. In this paper, we propose a novel adversarial inverse reinforcement learning algorithm to learn a language-conditioned policy and reward function. To improve generalization of the learned policy and reward function, we use a variational goal generator to relabel trajectories and sample diverse goals during training. Our algorithm outperforms multiple baselines by a large margin on a vision-based natural language instruction following dataset (Room-2-Room), demonstrating a promising advance in enabling the use of natural language instructions in specifying agent goals.
APA, Harvard, Vancouver, ISO, and other styles
16

Lu, Yuxiao, Arunesh Sinha, and Pradeep Varakantham. "Handling Long and Richly Constrained Tasks through Constrained Hierarchical Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21368–77. http://dx.doi.org/10.1609/aaai.v38i19.30132.

Full text
Abstract:
Safety in goal directed Reinforcement Learning (RL) settings has typically been handled through constraints over trajectories and have demonstrated good performance in primarily short horizon tasks. In this paper, we are specifically interested in the problem of solving temporally extended decision making problems such as robots cleaning different areas in a house while avoiding slippery and unsafe areas (e.g., stairs) and retaining enough charge to move to a charging dock; in the presence of complex safety constraints. Our key contribution is a (safety) Constrained Search with Hierarchical Reinforcement Learning (CoSHRL) mechanism that combines an upper level constrained search agent (which computes a reward maximizing policy from a given start to a far away goal state while satisfying cost constraints) with a low-level goal conditioned RL agent (which estimates cost and reward values to move between nearby states). A major advantage of CoSHRL is that it can handle constraints on the cost value distribution (e.g., on Conditional Value at Risk, CVaR) and can adjust to flexible constraint thresholds without retraining. We perform extensive experiments with different types of safety constraints to demonstrate the utility of our approach over leading approaches in constrained and hierarchical RL.
APA, Harvard, Vancouver, ISO, and other styles
17

Kim, Sungyoon, Yunseon Choi, Daiki E. Matsunaga, and Kee-Eung Kim. "Stitching Sub-trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 12 (March 24, 2024): 13160–67. http://dx.doi.org/10.1609/aaai.v38i12.29215.

Full text
Abstract:
Offline Goal-Conditioned Reinforcement Learning (Offline GCRL) is an important problem in RL that focuses on acquiring diverse goal-oriented skills solely from pre-collected behavior datasets. In this setting, the reward feedback is typically absent except when the goal is achieved, which makes it difficult to learn policies especially from a finite dataset of suboptimal behaviors. In addition, realistic scenarios involve long-horizon planning, which necessitates the extraction of useful skills within sub-trajectories. Recently, the conditional diffusion model has been shown to be a promising approach to generate high-quality long-horizon plans for RL. However, their practicality for the goal-conditioned setting is still limited due to a number of technical assumptions made by the methods. In this paper, we propose SSD (Sub-trajectory Stitching with Diffusion), a model-based offline GCRL method that leverages the conditional diffusion model to address these limitations. In summary, we use the diffusion model that generates future plans conditioned on the target goal and value, with the target value estimated from the goal-relabeled offline dataset. We report state-of-the-art performance in the standard benchmark set of GCRL tasks, and demonstrate the capability to successfully stitch the segments of suboptimal trajectories in the offline data to generate high-quality plans.
APA, Harvard, Vancouver, ISO, and other styles
18

Grossberg, Stephen, and John W. L. Merrill. "The Hippocampus and Cerebellum in Adaptively Timed Learning, Recognition, and Movement." Journal of Cognitive Neuroscience 8, no. 3 (July 1996): 257–77. http://dx.doi.org/10.1162/jocn.1996.8.3.257.

Full text
Abstract:
The concepts of declarative memory and procedural memory have been used to distinguish two basic types of learning. A neural network model suggests how such memory processes work together as recognition learning, reinforcement learning, and sensorimotor learning take place during adaptive behaviors. To coordinate these processes, the hippocampal formation and cerebellum each contains circuits that learn to adaptively time their outputs. Within the model, hippocampal timing helps to maintain attention on motivationally salient goal objects during variable task-related delays, and cerebellar timing controls the release of conditioned responses. This property is part of the model's description of how cognitive-emotional interactions focus attention on motivationally valued cues, and how this process breaks down due to hippocampal ablation. The model suggests that the hippocampal mechanisms that help to rapidly draw attention to salient cues could prematurely release motor commands were not the release of these commands adaptively timed by the cerebellum. The model hippocampal system modulates cortical recognition learning without actually encoding the representational information that the cortex encodes. These properties avoid the difficulties faced by several models that propose a direct hippocampal role in recognition learning. Learning within the model hippocampal system controls adaptive timing and spatial orientation. Model properties hereby clarify how hippocampal ablations cause amnesic symptoms and difficulties with tasks which combine task delays, novelty detection, and attention toward goal objects amid distractions. When these model recognition, reinforcement, sensorimotor, and timing processes work together, they suggest how the brain can accomplish conditioning of multiple sensory events to delayed rewards, as during serial compound conditioning.
APA, Harvard, Vancouver, ISO, and other styles
19

Anam, Mamoona, Dr Kantilal P. Rane, Ali Alenezi, Ruby Mishra, Dr Swaminathan Ramamurthy, and Ferdin Joe John Joseph. "Content Classification Tasks with Data Preprocessing Manifestations." Webology 19, no. 1 (January 20, 2022): 1413–30. http://dx.doi.org/10.14704/web/v19i1/web19094.

Full text
Abstract:
Deep reinforcement learning has a major hurdle in terms of data efficiency. We solve this challenge by pretraining an encoder with unlabeled input, which is subsequently finetuned on a tiny quantity of task-specific input. We use a mixture of latent dynamics modelling and unsupervised goal-conditioned RL to encourage learning representations that capture various elements of the underlying MDP. Our approach significantly outperforms previous work combining offline representation pretraining with task-specific finetuning when limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience) and compares favourably with other pretraining methods that require orders of magnitude more data. When paired with larger models and more diverse, task-aligned observational data, our methodology shows great promise, nearing human-level performance and data efficiency on Atari in the best-case scenario.
APA, Harvard, Vancouver, ISO, and other styles
20

Mutti, Mirco, and Marcello Restelli. "An Intrinsically-Motivated Approach for Learning Highly Exploring and Fast Mixing Policies." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 5232–39. http://dx.doi.org/10.1609/aaai.v34i04.5968.

Full text
Abstract:
What is a good exploration strategy for an agent that interacts with an environment in the absence of external rewards? Ideally, we would like to get a policy driving towards a uniform state-action visitation (highly exploring) in a minimum number of steps (fast mixing), in order to ease efficient learning of any goal-conditioned policy later on. Unfortunately, it is remarkably arduous to directly learn an optimal policy of this nature. In this paper, we propose a novel surrogate objective for learning highly exploring and fast mixing policies, which focuses on maximizing a lower bound to the entropy of the steady-state distribution induced by the policy. In particular, we introduce three novel lower bounds, that lead to as many optimization problems, that tradeoff the theoretical guarantees with computational complexity. Then, we present a model-based reinforcement learning algorithm, IDE3AL, to learn an optimal policy according to the introduced objective. Finally, we provide an empirical evaluation of this algorithm on a set of hard-exploration tasks.
APA, Harvard, Vancouver, ISO, and other styles
21

Feng, Xiaoyun. "Consistent Experience Replay in High-Dimensional Continuous Control with Decayed Hindsights." Machines 10, no. 10 (September 26, 2022): 856. http://dx.doi.org/10.3390/machines10100856.

Full text
Abstract:
The manipulation of complex robotics, which is in general high-dimensional continuous control without an accurate dynamic model, summons studies and applications of reinforcement learning (RL) algorithms. Typically, RL learns with the objective of maximizing the accumulated rewards from interactions with the environment. In reality, external rewards are not trivial, which depend on either expert knowledge or domain priors. Recent advances on hindsight experience replay (HER) instead enable a robot to learn from the automatically generated sparse and binary rewards, indicating whether it reaches the desired goals or pseudo goals. However, HER inevitably introduces hindsight bias that skews the optimal control since the replays against the achieved pseudo goals may often differ from the exploration of the desired goals. To tackle the problem, we analyze the skewed objective and induce the decayed hindsight (DH), which enables consistent multi-goal experience replay via countering the bias between exploration and hindsight replay. We implement DH for goal-conditioned RL both in online and offline settings. Experiments on online robotic control tasks demonstrate that DH achieves the best average performance and is competitive with state-of-the-art replay strategies. Experiments on offline robotic control tasks show that DH substantially improves the ability to extract near-optimal policies from offline datasets.
APA, Harvard, Vancouver, ISO, and other styles
22

Cichosz, Mariusz. "Individual, family and environment as the subject of research in social pedagogy – development and transformations." Papers of Social Pedagogy 7, no. 2 (January 28, 2018): 6–18. http://dx.doi.org/10.5604/01.3001.0010.8133.

Full text
Abstract:
The cognitive specificity of social pedagogy is its interest in the issues related to social conditionings of human development and, respectively, the specific social conditionings of the upbringing process. The notion has been developed in various directions since the very beginning of the discipline, yet the most clearly visible area seems to be the functioning of individuals, families and broader environment. Simultaneously, it is possible to observe that the issues have been entangled in certain socio-political conditions, the knowledge of which is substantial for the reconstruction and identification of the research heritage of social pedagogy. All these interrelationships allowed to distinguish particular stages of development of social pedagogy. Contemporarily, it is a discipline with descent scientific achievements which marks out and indicates new perspectives both in the field of educational practice and the theory of social activity. Social pedagogy, similarly to other areas (subdisciplines) of pedagogy, deals with the notion of upbringing in a certain aspect – in a certain problem inclination. It specializes in social and environmental conditionings of the upbringing process. It is the thread of the social context of upbringing what proves to be the crucial, basic and fundamental determinant of upbringing and, thus, decisive factor for human development. This notion was always present in the general pedagogical thought however, its organized and rationalized character surfaced only when the social pedagogy was distinguished as a separate, systematic area of pedagogy. It occurred in Poland only at the beginning of the 19th century. From the very beginning the creators and precursors of this subdiscipline pointed out its relatively wide range. It has been the notion of individual – social conditionings of human development, yet, social pedagogists were interested in human at every stage of their lives i.e. childhood, adolescence, adulthood and old age. Another area of interest were the issues related to family as the most important “place” of human development and, in this respect, the issues connected with institutions undertaking various activities: help, care, support and animation. Finally, the scope of interest included issues related to the environment as the place where the upbringing process is supposed to realize and realizes. Since the very beginning of social pedagogy these have been the prominent threads for exploration. At the same time it ought to be stated that these threads have always been interwoven with various social-political conditions both with regards to their interpretation as well as possible and planned educational practice. Therefore social pedagogy and its findings must be always “read” in the context of social-political conditions which accompanied the creation of a given thought or realization of some educational practice. As these conditions have constantly been undergoing certain transformations one may clearly distinguish particular stages of development of social pedagogy. The stages reflect various approaches to exploring and describing the above-mentioned areas of this discipline. Following the assumptions regarding the chronology of social pedagogy development and the three distinguished stages of development, it seems worthwhile to study how the issues related to an individual, family and environment were shaped at these stages. The first stage when the social psychology was arising was mainly the time of Helena Radlińska’s activities as well as less popular and already forgotten Polish pedagogists – precursors of this discipline such as: Anna Chmielewska, Irena Jurgielewiczowa, Zofia Gulińska or Maria Korytowska. In that period social pedagogists mainly dealt with individuals, families and the functioning of environments in the context of educational activities aimed at arousing national identity and consciousness. However, their work did no focus on indicating the layers of threats and deficits in functioning of individuals, social groups or families but on the possibilities to stimulate their development and cultural life. Therefore social pedagogy of those times was not as strongly related to social work as it currently is but dealt mainly with educational work. The classic example of such approach in the research carried out in the social pedagogy of that time may be the early works by Helena Radlińska who undertook the narrow field of cultural-educational work targeted to all categories of people. The works described such issues as the organization of libraries, organizing extra-school education (H. Orsza, 1922, H. Orsza-Radlińska, 1925). It ought to be stated that this kind of work was regarded as public and educational work, whereas currently it exists under the name of social work. Frequently quoted works related to the issues of arising social pedagogy were also the works by Eustachy Nowicki e.g. “Extra-school education and its social-educational role in the contemporary Polish life” from 1923 or the works by Stefania Sempołowska, Jerzy Grodecki or Jadwiga Dziubińska. Such an approach and tendencies are clearly visible in a book from 1913 (a book which has been regarded by some pedagogists as the first synthetic presentation of social pedagogy). It is a group work entitled “Educational work – its tasks, methods and organization” (T. Bobrowski, Z. Daszyńska-Golińska, J. Dziubińska, Z. Gargasa, M. Heilperna, Z. Kruszewska, L. Krzywicki, M. Orsetti, H. Orsza, St. Posner, M. Stępkowski, T. Szydłowski, Wł. Weychert-Szymanowska, 1913). The problem of indicated and undertaken research areas and hence, the topics of works realized by the social pedagogists of that times changed immediately after regaining independence and before World War II. It was the time when the area of social pedagogists interests started to include the issues of social inequality, poverty and, subsequently, the possibility of helping (with regards to the practical character of social pedagogy). The research works undertaken by social pedagogists were clearly of diagnostic, practical and praxeological character. They were aimed at seeking the causes of these phenomena with simultaneous identification and exploration of certain environmental factors as their sources. A classic example of such a paper – created before the war – under the editorial management of H. Radlińska was the work entitled “Social causes of school successes and failures” from 1937 (H. Radlińska, 1937). Well known are also the pre-war works written by the students of H. Radlińska which revealed diagnostic character such as: “The harm of a child” by Maria Korytowska (1937) or “A child of Polish countryside” edited by M. Librachowa and published in Warsaw in 1934 (M. Librachowa, 1934). Worthwhile are also the works by Czesław Wroczyński from 1935 entitled “Care of an unmarried mother and struggle against abandoning infants in Warsaw” or the research papers by E. Hryniewicz, J. Ryngmanowa and J. Czarnecka which touched upon the problem of neglected urban and rural families and the situation of an urban and rural child – frequently an orphaned child. As it may be inferred, the issues of poverty, inefficient families, single-parent families remain current and valid also after the World War II. These phenomena where nothing but an outcome of various war events and became the main point of interest for researchers. Example works created in the circle of social pedagogists and dealing with these issues may be two books written in the closest scientific environment of Helena Radlińska – with her immense editorial impact. They are “Orphanage – scope and compensation” (H. Radlińska, J. Wojtyniak, 1964) and “Foster families in Łódź” (A. Majewska, 1948), both published immediately after the war. Following the chronological approach I adopted, the next years mark the beginning of a relative stagnation in the research undertaken in the field of social pedagogy. Especially the 50’s – the years of notably strong political indoctrination and the Marxist ideological offensive which involved building the so called socialist educational society – by definition free from socio-educational problems in public life. The creation and conduction of research in this period was also hindered due to organizational and institutional reasons. The effect of the mentioned policy was also the liquidation of the majority of social sciences including research facilities – institutes, departments and units. An interesting and characteristic description of the situation may be the statement given by Professor J. Auletner who described the period from the perspective of development of social policy and said that: “During the Stalinist years scientific cultivation of social policy was factually forbidden”. During the period of real socialism it becomes truly difficult to explore the science of social policy. The name became mainly the synonym of the current activity of the state and a manifestation of struggles aimed at maintaining the existing status quo. The state authorities clearly wanted to subdue the science of social activities of the state […]. During the real socialism neither the freedom for scientific criticism of the reality nor the freedom of research in the field of social sciences existed. It was impossible (yet deliberated) to carry out a review of poverty and other drastic social issues” (J. Auletner, 2000). The situation changes at the beginning of the 60’s (which marks the second stage of development of social pedagogy) when certain socio-political transformations – on the one hand abandoning the limitation of the Stalinist period (1953 – the death of Stalin and political thaw), on the other – reinforcement of the idea of socialist education in social sciences lead to resuming environmental research. It was simultaneously the period of revival of Polish social pedagogy with regards to its institutional dimension as well as its ideological self-determination (M. Cichosz, 2006, 2014). The issues of individuals, families and environments was at that time explored with regards to the functioning of educational environments and in the context of exploring the environmental conditionings of the upbringing process. Typical examples here may be the research by Helena Izdebska entitled “The functioning of a family and childcare tasks” (H. Izdebska, 1967) and “The causes of conflicts in a family” (H. Izdebska, 1975) or research conducted by Anna Przecławska on adolescents and their participation in culture: “Book, youth and cultural transformations” (A. Przecławska, 1967) or e.g. “Cultural diversity of adolescents against upbringing problems” (A. Przecławska, 1976). A very frequent notion undertaken at that time and remaining within the scope of the indicated areas were the issues connected with organization and use of free time. This may be observed through research by T. Wujek: “Homework and active leisure of a student” (T. Wujek, 1969). Another frequently explored area was the problem of looking after children mainly in the papers by Albin Kelm or Marian Balcerek. It is worthwhile that the research on individuals, families or environments were carried out as part of the current pedagogical concepts of that time like: parallel education, permanent education, lifelong learning or the education of adults, whereas, the places indicated as the areas of human social functioning in which the environmental education took place were: family, school, housing estate, workplace, social associations. It may be inferred that from a certain (ideological) perspective at that time we witnessed a kind of modeling of social reality as, on the one hand particular areas were diagnosed, on the other – a desired (expected) model was built (designed) (with respect to the pragmatic function of practical pedagogy). A group work entitled “Upbringing and environment” edited by B. Passini and T. Pilch (B. Passini, T. Pilch, 1979) published in 1979 was a perfect illustration of these research areas. It ought to be stated that in those years a certain model of social diagnosis proper for undertaken social-pedagogical research was reinforced (M. Deptuła, 2005). Example paper could be the work by I. Lepalczyk and J. Badura entitled: “Elements of pedagogical diagnostics” (I. Lepalczyk, J. Badura, 1987). Finally, the social turning point in the 80’s and 90’s brought new approaches to the research on individuals, families and environments which may be considered as the beginning of the third stage of the development of social pedagogy. Breaking off the idea of socialist education meant abandoning the specific approach to research on the educational environment previously carried out within a holistic system of socio-educational influences (A. Przecławska, w. Theiss, 1995). The issues which dominated in the 90’s and still dominate in social pedagogy with regards to the functioning of individuals, families and local environments have been the issues connected with social welfare and security as well as education of adults. Research papers related to such approach may be the work by Józefa Brągiel: “Upbringing in a single-parent family” from 1990; the work edited by Zofia Brańka “The subjects of care and upbringing” from 2002 or a previous paper written in 1998 by the same author in collaboration with Mirosław Szymański “Aggression and violence in modern world” published in 1999 as well as the work by Danuta Marzec “Childcare at the time of social transformations” from 1999 or numerous works by St. Kawula, A. Janke. Also a growing interest in social welfare and social work is visible in the papers by J. Brągiel and P. Sikora “Social work, multiplicity of perspectives, family – multiculturalism – education” from 2004, E. Kanwicz and A. Olubiński: “Social activity in social welfare at the threshold of 21st century” from 2004 or numerous works on this topic created by the circles gathered around the Social Pedagogy Faculty in Łódź under the management of E. Marynowicz-Hetka. Current researchers also undertake the issues related to childhood (B. Smolińska-Theiss, 2014, B. Matyjas, 2014) and the conditionings of the lives of seniors (A. Baranowska, E. Kościńska, 2013). Ultimately, among the presented, yet not exclusive, research areas related to particular activities undertaken in human life environment (individuals, families) and fulfilled within the field of caregiving, social welfare, adult education, socio-cultural animation or health education one may distinguish the following notions:  the functioning of extra-school education institutions, most frequently caregiving or providing help such as: orphanage, residential home, dormitory, community centre but also facilities aimed at animating culture like youth cultural centres, cultural centres, clubs etc.,  the functioning of school, the realization of its functions (especially educational care), fulfilling and conditioning roles of student/teacher, the functioning of peer groups, collaboration with other institutions,  the functioning (social conditionings) of family including various forms of families e.g. full families, single-parent families, separated families, families at risk (unemployment) and their functioning in the context of other institutions e.g. school,  social pathologies, the issues of violence and aggression, youth subcultures,  participation in culture, leisure time, the role of media,  the functioning of the seniors – animation of activities in this field,  various dimensions of social welfare, support, providing help, the conditionings of functioning of such jobs as the social welfare worker, culture animator, voluntary work. It might be concluded that the issues connected with individuals, families and environment have been the centre of interest of social pedagogy since the very beginning of this discipline. These were the planes on which social pedagogists most often identified and described social life – from the perspective of human participation. On the course of describing the lives of individuals, families and broader educational environments social pedagogists figured out and elaborated on particular methods and ways of diagnosing social life. Is it possible to determine any regularities or tendencies in this respect? Unquestionably, at the initial stage of existence of this discipline, aimed at stimulating national consciousness and subsequent popularization of cultural achievements through certain activities – social and educational work, social pedagogists built certain models of these undertakings which were focused on stimulating particular social activity and conscious participation in social life. The issues concerning social diagnosis, though not as significant as during other stages, served these purposes and hence were, to a certain extent, ideologically engaged. The situation changed significantly before and shortly after the World War II. Facing particular conditions of social life – increase in many unfavourable phenomena, social pedagogists attempted to diagnose and describe them. It seems to have been the period of clear shaping and consolidation of the accepted model of empirical research in this respect. The model was widely accepted as dominating and has been developed in Polish social pedagogy during the second and subsequent stages of developing of this discipline. Practical and praxeological character of social pedagogy became the main direction of this development. Consequently, social diagnosis realized and undertaken with regard to social pedagogy was associated with the idea of a holistic system of education and extra-school educational influences and related educational environments. Therefore, the more and more clearly emphasized goal of environmental research – forecasting, was associated with the idea of building holistic, uniform educational impacts. After the systemic transformation which occurred in Poland in the 90’s, i.e. the third stage of social pedagogy development, abandoning the previous ideological solutions, environmental research including diagnosis was reassociated with social life problems mainly regarding social welfare and security. Individuals, families and environment have been and still seem to be the subject of research in the field of social pedagogy in Poland. These research areas are structurally bound with its acquired paradigm – of a science describing transformations of social life and formulating a directive of practical conduct regarding these transformations. A question arouses about the development of social pedagogy as the one which charts the direction of transformations of practices within the undertaken research areas. If it may be considered as such, then it would be worthwhile to enquire about the directions of the accepted theoretical acknowledgments. On the one hand we may observe a relatively long tradition of specifically elaborated and developed concepts, on the other – there are still new challenges ahead. Observing the previous and current development of Polish social pedagogy it may be inferred that its achievements are not overextensive with regards to the described and acquired theoretical deliberations. Nevertheless, from the very beginning, it has generated certain, specific theoretical solutions attempting to describe and explain particular areas of social reality. Especially noteworthy is the first period of the existence of this discipline, the period of such social pedagogists like i.a. J.W. Dawid, A. Szycówna, I. Moszczeńska or Helena Radlińska. The variety of the reflections with typically philosophical background undertaken in their works (e.g. E. Abramowski) is stunning. Equally involving is the second stage of development of social pedagogy i.e. shortly after the World War II, when Polish social pedagogy did not fully break with the heritage of previous philosophical reflections (A. Kamiński, R. Wroczyński) yet was developed in the Marxist current. A question arouses whether the area of education and the projects of its functioning of that time were also specific with regards to theory (it seems to be the problem of the whole Socialist pedagogy realised in Poland at that time). The following years of development of this discipline, especially at the turn of 80’s and 90’s was the period of various social ideas existing in social pedagogy – the influences of various concepts and theories in this field. The extent to which they were creatively adapted and included in the current of specific interpretations still requires detailed analysis, yet remains clearly visible. Another important area is the field of confronting the theories with the existing and undertaken solutions in the world pedagogy. A. Radziewicz-Winnicki refers to the views of the representatives of European and world social thought: P. Bourdieu, U. Beck, J. Baudrillard, Z. Bauman and M. Foucault, and tries to identify possible connections and relationships between these ideas and social pedagogy: “the ideas undertaken by the mentioned sociologists undoubtedly account for a significant source of inspiration for practical reflection within social pedagogy. Therefore, it is worthwhile to suggest certain propositions of their application in the field of the mentioned subdiscipline of pedagogy” (Radziewicz-Winnicki 2008). The contemporary social pedagogy in Poland constantly faces numerous challenges. W. Theiss analysed the contemporary social pedagogy with regards to its deficiencies but also the challenges imposed by globalisation and wrote: “Modern social pedagogy focuses mainly on the narrow empirical research and narrow practical activity and neglects research in the field of theory functioning separately from the realms of the global (or globalising) world or pays insufficient attention to these problems. It leads to a certain self-marginalisation of our discipline which leaves us beyond the current of main socio-educational problems of modern times. In this respect, it seems worthwhile and necessary to carry out intensive conceptual and research work focused on e.g. the following issues:  metatheory of social pedagogy and its relationship with modern trends in social sciences;  the concepts of human and the world, the concepts of the hierarchy of values;  the theory of upbringing, the theory of socialization, the theory of educational environment;  a conceptual key of the modern reality; new terms and new meanings of classical concepts;  socio-educational activities with direct and indirect macro range e.g. balanced development and its programmes, global school, intercultural education, inclusive education, professional education of emigrants”. Considering the currently undertaken research in this field and the accepted theoretical perspectives it is possible to indicate specific and elaborated concepts. They fluctuate around structural spheres of social pedagogy on the axis: human – environment – environmental transformations. It accounts for an ontological sphere of the acknowledged concepts and theories. Below, I am enumerating the concepts which are most commonly discussed in social pedagogy with regards to the acquired and accepted model. Currently discussed theoretical perspectives (contexts) in social pedagogy and the concepts within. I. The context of social personal relationships  social participation, social presence;  social communication, interaction;  reciprocity. II. The context of social activities (the organization of environment)  institutionalisation;  modernization;  urbanization. III. The context of environment  space;  place;  locality. The socially conditioned process of human development is a process which constantly undergoes transformations. The pedagogical description of this process ought to include these transformations also at the stage of formulating directives of practical activities – the educational practice. It is a big challenge for social pedagogy to simultaneously do not undergo limitations imposed by current social policy and response to real social needs. It has been and remains a very important task for social pedagogy.
APA, Harvard, Vancouver, ISO, and other styles
23

Li, Yao, YuHui Wang, and XiaoYang Tan. "Self-imitation guided goal-conditioned reinforcement learning." Pattern Recognition, August 2023, 109845. http://dx.doi.org/10.1016/j.patcog.2023.109845.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Zou, Qiming, and Einoshin Suzuki. "Compact Goal Representation Learning via Information Bottleneck in Goal-Conditioned Reinforcement Learning." IEEE Transactions on Neural Networks and Learning Systems, 2024, 1–14. http://dx.doi.org/10.1109/tnnls.2023.3344880.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Li, Jinning, Chen Tang, Masayoshi Tomizuka, and Wei Zhan. "Hierarchical Planning Through Goal-Conditioned Offline Reinforcement Learning." IEEE Robotics and Automation Letters, 2022, 1–8. http://dx.doi.org/10.1109/lra.2022.3190100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Feng, Xiaoyun, Li Jiang, Xudong Yu, Haoran Xu, Xiaoyan Sun, Jie Wang, Xianyuan Zhan, and Wai Kin Victor Chan. "Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning." IEEE Transactions on Games, 2022, 1–11. http://dx.doi.org/10.1109/tg.2022.3224088.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Wang, Mianchu, Yue Jin, and Giovanni Montana. "Goal-conditioned offline reinforcement learning through state space partitioning." Machine Learning, February 5, 2024. http://dx.doi.org/10.1007/s10994-023-06500-z.

Full text
Abstract:
AbstractOffline reinforcement learning (RL) aims to create policies for sequential decision-making using exclusively offline datasets. This presents a significant challenge, especially when attempting to accomplish multiple distinct goals or outcomes within a given scenario while receiving sparse rewards. Prior methods using advantage weighting for offline goal-conditioned learning improve policies monotonically. However, they still face challenges from distribution shift and multi-modality that arise due to conflicting ways to reach a goal. This issue is especially challenging in long-horizon tasks, where the presence of multiple, often conflicting, solutions makes it hard to identify a single optimal policy for transitioning from a state to a desired goal. To address these challenges, we introduce a complementary advantage-based weighting scheme that incorporates an additional source of inductive bias. Given a value-based partitioning of the state space, the contribution of actions expected to lead to target regions that are easier to reach, compared to the final goal, is further increased. Our proposed approach, Dual-Advantage Weighted Offline Goal-conditioned RL, outperforms several competing offline algorithms in widely used benchmarks. Furthermore, we provide a theoretical guarantee that the learned policy will not be inferior to the underlying behavior policy.
APA, Harvard, Vancouver, ISO, and other styles
28

Qian, Zhifeng, Mingyu You, Hongjun Zhou, Xuanhui Xu, and Bin He. "Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning." IEEE Robotics and Automation Letters, 2023, 1–8. http://dx.doi.org/10.1109/lra.2023.3287362.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Luo, Yu, Tianying Ji, Fuchun Sun, Huaping Liu, Jianwei Zhang, Mingxuan Jing, and Wenbing Huang. "Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation." IEEE Transactions on Neural Networks and Learning Systems, 2024, 1–15. http://dx.doi.org/10.1109/tnnls.2024.3354061.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Wu, Lisheng, and Ke Chen. "Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning." Machine Learning, February 5, 2024. http://dx.doi.org/10.1007/s10994-023-06503-w.

Full text
Abstract:
AbstractReinforcement learning often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal. How to explore novel sub-goals efficiently is one of the most challenging issues in GCRL. Several goal exploration methods have been proposed to address this issue but still struggle to find the desired goals efficiently. In this paper, we propose a novel learning objective by optimizing the entropy of both achieved and new goals to be explored for more efficient goal exploration in sub-goal selection based GCRL. To optimize this objective, we first explore and exploit the frequently occurring goal-transition patterns mined in the environments similar to the current task to compose skills via skill learning. Then, the pre-trained skills are applied in goal exploration with theoretical justification. Evaluation on a variety of spare-reward long-horizon benchmark tasks suggests that incorporating our method into several state-of-the-art GCRL baselines significantly boosts their exploration efficiency while improving or maintaining their performance.
APA, Harvard, Vancouver, ISO, and other styles
31

Lee, Gyeong Taek, and Kangjin Kim. "A Controllable Agent by Subgoals in Path Planning using Goal-Conditioned Reinforcement Learning." IEEE Access, 2023, 1. http://dx.doi.org/10.1109/access.2023.3264264.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Song, Wongeun, and Jungwoo Lee. "Ricci planner: Zero-Shot Transfer for Goal-Conditioned Reinforcement Learning via Geometric Flow." IEEE Access, 2024, 1. http://dx.doi.org/10.1109/access.2024.3361478.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Lee, GyeongTaek, KangJin Kim, and Jaeyeon Jang. "Real-time path planning of controllable UAV by subgoals using goal-conditioned reinforcement learning." Applied Soft Computing, July 2023, 110660. http://dx.doi.org/10.1016/j.asoc.2023.110660.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

He, Xiangkun, and Chen Lv. "Robotic Control in Adversarial and Sparse Reward Environments: A Robust Goal-Conditioned Reinforcement Learning Approach." IEEE Transactions on Artificial Intelligence, 2023, 1–10. http://dx.doi.org/10.1109/tai.2023.3237665.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Bougie, Nicolas, and Ryutaro Ichise. "Goal-driven active learning." Autonomous Agents and Multi-Agent Systems 35, no. 2 (August 16, 2021). http://dx.doi.org/10.1007/s10458-021-09527-5.

Full text
Abstract:
AbstractDeep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.
APA, Harvard, Vancouver, ISO, and other styles
36

Li, Jingchen, Haobin Shi, and Kao-Shing Hwang. "Using Goal-Conditioned Reinforcement Learning With Deep Imitation to Control Robot Arm in Flexible Flat Cable Assembly Task." IEEE Transactions on Automation Science and Engineering, 2023, 1–12. http://dx.doi.org/10.1109/tase.2023.3323307.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Han, Changlin, Zhiyong Peng, Yadong Liu, Jingsheng Tang, Yang Yu, and Zongtan Zhou. "Learning robotic manipulation skills with multiple semantic goals by conservative curiosity-motivated exploration." Frontiers in Neurorobotics 17 (March 7, 2023). http://dx.doi.org/10.3389/fnbot.2023.1089270.

Full text
Abstract:
Reinforcement learning (RL) empowers the agent to learn robotic manipulation skills autonomously. Compared with traditional single-goal RL, semantic-goal-conditioned RL expands the agent capacity to accomplish multiple semantic manipulation instructions. However, due to sparsely distributed semantic goals and sparse-reward agent-environment interactions, the hard exploration problem arises and impedes the agent training process. In traditional RL, curiosity-motivated exploration shows effectiveness in solving the hard exploration problem. However, in semantic-goal-conditioned RL, the performance of previous curiosity-motivated methods deteriorates, which we propose is because of their two defects: uncontrollability and distraction. To solve these defects, we propose a conservative curiosity-motivated method named mutual information motivation with hybrid policy mechanism (MIHM). MIHM mainly contributes two innovations: the decoupled-mutual-information-based intrinsic motivation, which prevents the agent from being motivated to explore dangerous states by uncontrollable curiosity; the precisely trained and automatically switched hybrid policy mechanism, which eliminates the distraction from the curiosity-motivated policy and achieves the optimal utilization of exploration and exploitation. Compared with four state-of-the-art curiosity-motivated methods in the sparse-reward robotic manipulation task with 35 valid semantic goals, including stacks of 2 or 3 objects and pyramids, our MIHM shows the fastest learning speed. Moreover, MIHM achieves the highest 0.9 total success rate, which is up to 0.6 in other methods. Throughout all the baseline methods, our MIHM is the only one that achieves to stack three objects.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography