Índice
Literatura académica sobre el tema "Apprentissage par renforcement conditionné par des buts"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Apprentissage par renforcement conditionné par des buts".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Tesis sobre el tema "Apprentissage par renforcement conditionné par des buts"
Fournier, Pierre. "Intrinsically Motivated and Interactive Reinforcement Learning : a Developmental Approach". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS634.
Texto completoReinforcement learning (RL) is today more popular than ever, but certain basic skills are still out of reach of this paradigm: object manipulation, sensorimotor control, natural interaction with other agents. A possible approach to address these challenges consist in taking inspiration from human development, or even trying to reproduce it. In this thesis, we study the intersection of two crucial topics in developmental sciences and how to apply them to RL in order to tackle the aforementioned challenges: interactive learning and intrinsic motivation. Interactive learning and intrinsic motivation have already been studied, separately, in combination with RL, but in order to improve quantitatively existing agents performances, rather than to learn in a developmental fashion. We thus focus our efforts on the developmental aspect of these subjects. Our work touches the self-organisation of learning in developmental trajectories through an intrinsically motivated for learning progress, and the interaction of this organisation with goal-directed learning and imitation learning. We show that these mechanisms, when implemented in open-ended environments with no task predefined, can interact to produce learning behaviors that are sound from a developmental standpoint, and richer than those produced by each mechanism separately
Chenu, Alexandre. "Leveraging sequentiality in Robot Learning : Application of the Divide & Conquer paradigm to Neuro-Evolution and Deep Reinforcement Learning". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS342.
Texto completo“To succeed, planning alone is insufficient. One must improvise as well.” This quote from Isaac Asimov, founding father of robotics and author of the Three Laws of Robotics, emphasizes the importance of being able to adapt and think on one’s feet to achieve success. Although robots can nowadays resolve highly complex tasks, they still need to gain those crucial adaptability skills to be deployed on a larger scale. Robot Learning uses learning algorithms to tackle this lack of adaptability and to enable robots to solve complex tasks autonomously. Two types of learning algorithms are particularly suitable for robots to learn controllers autonomously: Deep Reinforcement Learning and Neuro-Evolution. However, both classes of algorithms often cannot solve Hard Exploration Problems, that is problems with a long horizon and a sparse reward signal, unless they are guided in their learning process. One can consider different approaches to tackle those problems. An option is to search for a diversity of behaviors rather than a specific one. The idea is that among this diversity, some behaviors will be able to solve the task. We call these algorithms Diversity Search algorithms. A second option consists in guiding the learning process using demonstrations provided by an expert. This is called Learning from Demonstration. However, searching for diverse behaviors or learning from demonstration can be inefficient in some contexts. Indeed, finding diverse behaviors can be tedious if the environment is complex. On the other hand, learning from demonstration can be very difficult if only one demonstration is available. This thesis attempts to improve the effectiveness of Diversity Search and Learning from Demonstration when applied to Hard Exploration Problems. To do so, we assume that complex robotics behaviors can be decomposed into reaching simpler sub-goals. Based on this sequential bias, we try to improve the sample efficiency of Diversity Search and Learning from Demonstration algorithms by adopting Divide & Conquer strategies, which are well-known for their efficiency when the problem is composable. Throughout the thesis, we propose two main strategies. First, after identifying some limitations of Diversity Search algorithms based on Neuro-Evolution, we propose Novelty Search Skill Chaining. This algorithm combines Diversity Search with Skill- Chaining to efficiently navigate maze environments that are difficult to explore for state-of-the-art Diversity Search. In a second set of contributions, we propose the Divide & Conquer Imitation Learning algorithms. The key intuition behind those methods is to decompose the complex task of learning from a single demonstration into several simpler goal-reaching sub-tasks. DCIL-II, the most advanced variant, can learn walking behaviors for under-actuated humanoid robots with unprecedented efficiency. Beyond underlining the effectiveness of the Divide & Conquer paradigm in Robot Learning, this work also highlights the difficulties that can arise when composing behaviors, even in elementary environments. One will inevitably have to address these difficulties before applying these algorithms directly to real robots. It may be necessary for the success of the next generations of robots, as outlined by Asimov
Gueguen, Maëlle. "Dynamique intracérébrale de l'apprentissage par renforcement chez l'humain". Thesis, Université Grenoble Alpes (ComUE), 2017. http://www.theses.fr/2017GREAS042/document.
Texto completoWe make decisions every waking day of our life. Facing our options, we tend to pick the most likely to get our expected outcome. Taking into account our past experiences and their outcome is mandatory to identify the best option. This cognitive process is called reinforcement learning. To date, the underlying neural mechanisms are debated. Despite a consensus on the role of dopaminergic neurons in reward processing, several hypotheses on the neural bases of reinforcement learning coexist: either two distinct opposite systems covering cortical and subcortical areas, or a segregation of neurons within brain regions to process reward-based and punishment-avoidance learning.This PhD work aimed to identify the brain dynamics of human reinforcement learning. To unravel the neural mechanisms involved, we used intracerebral recordings in refractory epileptic patients during a probabilistic learning task. In the first study, we used a computational model to tackle the brain dynamics of reinforcement signal encoding, especially the encoding of reward and punishment prediction errors. Local field potentials exhibited the central role of high frequency gamma activity (50-150Hz) in these encodings. We report a role of the ventromedial prefrontal cortex in reward prediction error encoding while the anterior insula and the dorsolateral prefrontal cortex encoded punishment prediction errors. In addition, the magnitude of the neural response in the insula predicted behavioral learning and trial-to-trial behavioral adaptations. These results are consistent with the existence of two distinct opposite cortical systems processing reward and punishments during reinforcement learning. In a second study, we recorded the neural activity of the anterior and dorsomedial nuclei of the thalamus during the same cognitive task. Local field potentials recordings highlighted the role of low frequency theta activity in punishment processing, supporting an implication of these nuclei during punishment-avoidance learning. In a third behavioral study, we investigated the influence of risk on reinforcement learning. We observed a risk-aversion during punishment-avoidance, affecting the performance, as well as a risk-seeking behavior during reward-seeking, revealed by an increased reaction time towards appetitive risky choices. Taken together, these results suggest we are risk-seeking when we have something to gain and risk-averse when we have something to lose, in contrast to the prediction of the prospect theory.Improving our common knowledge of the brain dynamics of human reinforcement learning could improve the understanding of cognitive deficits of neurological patients, but also the decision bias all human beings can exhibit
Tarbouriech, Jean. "Goal-oriented exploration for reinforcement learning". Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILB014.
Texto completoLearning to reach goals is a competence of high practical relevance to acquire for intelligent agents. For instance, this encompasses many navigation tasks ("go to target X"), robotic manipulation ("attain position Y of the robotic arm"), or game-playing scenarios ("win the game by fulfilling objective Z"). As a living being interacting with the world, I am constantly driven by goals to reach, varying in scope and difficulty.Reinforcement Learning (RL) holds the promise to frame and learn goal-oriented behavior. Goals can be modeled as specific configurations of the environment that must be attained via sequential interaction and exploration of the unknown environment. Although various deep RL algorithms have been proposed for goal-oriented RL, existing methods often lack principled understanding, sample efficiency and general-purpose effectiveness. In fact, very limited theoretical analysis of goal-oriented RL was available, even in the basic scenario of finitely many states and actions.We first focus on a supervised scenario of goal-oriented RL, where a goal state to be reached in minimum total expected cost is provided as part of the problem definition. After formalizing the online learning problem in this setting often known as Stochastic Shortest Path (SSP), we introduce two no-regret algorithms (one is the first available in the literature, the other attains nearly optimal guarantees).Beyond training our RL agent to solve only one task, we then aspire that it learns to autonomously solve a wide variety of tasks, in the absence of any reward supervision. In this challenging unsupervised RL scenario, we advocate to "Set Your Own Goals" (SYOG), which suggests the agent to learn the ability to intrinsically select and reach its own goal states. We derive finite-time guarantees of this popular heuristic in various settings, each with its specific learning objective and technical challenges. As an illustration, we propose a rigorous analysis of the algorithmic principle of targeting "uncertain" goals which we also anchor in deep RL.The main focus and contribution of this thesis are to instigate a principled analysis of goal-oriented exploration in RL, both in the supervised and unsupervised scenarios. We hope that it helps suggest promising research directions to improve the interpretability and sample efficiency of goal-oriented RL algorithms in practical applications
Roussel, Edith. "Bases comportementales et neurobiologiques du conditionnement olfactif aversif chez l'abeille Apis mellifera". Toulouse 3, 2009. http://thesesups.ups-tlse.fr//.
Texto completoThis work aimed at understanding how the brain differentiates, processes and stores information acquired from positive and negative experiences. We have worked on the honeybee Apis mellifera. Learning and memory studies in the honeybee mostly rely on an appetitive conditioning protocol. We have thus developed an olfactory aversive conditioning, which consists in pairing odorant and electric shock eliciting the sting extension reflex. Bees learn to extend their sting to the odorant (I). This conditioning is indeed aversive because it produces an avoidance of the odorant previously punished when the animal is placed in a Y-maze after conditioning (II). The aversive reinforcement pathway depends on dopaminergic signalling, whereas appetitive conditioning depends on octopaminergic signalling. Bees could master simultaneously appetitive and aversive associations during the same conditioning experiment (I). Responsiveness of bees towards unconditioned appetitive and aversive stimuli are independent in the same bees. The more sensitive to shocks is a bee, the better it learns the aversive association as seen for the foragers, more sensitive to shocks than guards, learn better aversive associations (III). We described an olfactory coding in the lateral horn (IV). In the antennal lobe and lateral horn, we did not found any learning-induced modifications of odour-induced activation during olfactory aversive conditioning (V). Our study contributes to a better understanding of how the brain differentiates and processes positive and negative experiences
Forestier, Sébastien. "Intrinsically Motivated Goal Exploration in Child Development and Artificial Intelligence : Learning and Development of Speech and Tool Use". Thesis, Bordeaux, 2019. http://www.theses.fr/2019BORD0247.
Texto completoBabies and children are curious, active explorers of their world. One of their challenges is to learn of the relations between their actions such as the use of tools or speech, and the changes in their environment. Intrinsic motivations have been little studied in psychology, such that its mechanisms are mostly unknown. On the other hand, most artificial agents and robots have been learning in a way very different from humans. The objective of this thesis is twofold: understanding the role of intrinsic motivations in human development of speech and tool use through robotic modeling, and improving the abilities of artificial agents inspired by the mechanisms of human exploration and learning. A first part of this work concerns the understanding and modeling of intrinsic motivations. We reanalyze a typical tool-use experiment, showing that intrinsically motivated exploration seems to play an important role in the observed behaviors and to interfere with the measured success rates. With a robotic model, we show that an intrinsic motivation based on the learning progress to reach goals with a modular representation can self-organize phases of behaviors in the development of tool-use precursors that share properties with child tool-use development. We present the first robotic model learning both speech and tool use from scratch, which predicts that the grounded exploration of objects in a social interaction scenario should accelerate infant vocal learning of accurate sounds for these objects' names as a result of a goal-directed exploration of the objects. In the second part of this thesis, we extend, formalize and evaluate the algorithms designed to model child development, with the aim to obtain an efficient learning robot. We formalize an approach called Intrinsically Motivated Goal Exploration Processes (IMGEP) that enables the discovery and acquisition of large repertoires of skills. We show within several experimental setups including a real humanoid robot that learning diverse spaces of goals with intrinsic motivations is more efficient for learning complex skills than only trying to directly learn these complex skills