Índice
Literatura académica sobre el tema "Apprentissage par reinforcement"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Apprentissage par reinforcement".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Artículos de revistas sobre el tema "Apprentissage par reinforcement"
Noulawe Tchamanbe, Landry Steve y Paulin MELATAGIA YONTA. "Algorithms to get out of Boring Area Trap in Reinforcement Learning". Revue Africaine de la Recherche en Informatique et Mathématiques Appliquées Volume 34 - 2020 - Special... (2 de julio de 2021). http://dx.doi.org/10.46298/arima.6748.
Texto completoAbraich, Ayoub. "Apprentissage par renforcement profond pour la réponse visuelle aux questions (Deep Reinforcement Learning for Visual Question Answering)". SSRN Electronic Journal, 2019. http://dx.doi.org/10.2139/ssrn.3530241.
Texto completoSabahi, Kamran, Mohsin Jamil, Yaser Shokri-Kalandaragh, Mehdi Tavan y Yogendra Arya. "Deep Deterministic Policy Gradient Reinforcement Learning Based Adaptive PID Load Frequency Control of an AC Micro-Grid Apprentissage par renforcement du gradient de la politique déterministe profonde basé sur le contrôle adaptatif de la fréquence de charge PID d’un micro-réseau de courant alternatif". IEEE Canadian Journal of Electrical and Computer Engineering, 2024, 1–7. http://dx.doi.org/10.1109/icjece.2024.3353670.
Texto completoTesis sobre el tema "Apprentissage par reinforcement"
Carrara, Nicolas. "Reinforcement learning for dialogue systems optimization with user adaptation". Thesis, Lille 1, 2019. http://www.theses.fr/2019LIL1I071/document.
Texto completoThe most powerful artificial intelligence systems are now based on learned statistical models. In order to build efficient models, these systems must collect a huge amount of data on their environment. Personal assistants, smart-homes, voice-servers and other dialogue applications are no exceptions to this statement. A specificity of those systems is that they are designed to interact with humans, and as a consequence, their training data has to be collected from interactions with these humans. As the number of interactions with a single person is often too scarce to train a proper model, the usual approach to maximise the amount of data consists in mixing data collected with different users into a single corpus. However, one limitation of this approach is that, by construction, the trained models are only efficient with an "average" human and do not include any sort of adaptation; this lack of adaptation makes the service unusable for some specific group of persons and leads to a restricted customers base and inclusiveness problems. This thesis proposes solutions to construct Dialogue Systems that are robust to this problem by combining Transfer Learning and Reinforcement Learning. It explores two main ideas: The first idea of this thesis consists in incorporating adaptation in the very first dialogues with a new user. To that extend, we use the knowledge gathered with previous users. But how to scale such systems with a growing database of user interactions? The first proposed approach involves clustering of Dialogue Systems (tailored for their respective user) based on their behaviours. We demonstrated through handcrafted and real user-models experiments how this method improves the dialogue quality for new and unknown users. The second approach extends the Deep Q-learning algorithm with a continuous transfer process.The second idea states that before using a dedicated Dialogue System, the first interactions with a user should be handled carefully by a safe Dialogue System common to all users. The underlying approach is divided in two steps. The first step consists in learning a safe strategy through Reinforcement Learning. To that extent, we introduced a budgeted Reinforcement Learning framework for continuous state space and the underlying extensions of classic Reinforcement Learning algorithms. In particular, the safe version of the Fitted-Q algorithm has been validated, in term of safety and efficiency, on a dialogue system tasks and an autonomous driving problem. The second step consists in using those safe strategies when facing new users; this method is an extension of the classic ε-greedy algorithm
Akrour, Riad. "Robust Preference Learning-based Reinforcement Learning". Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112236/document.
Texto completoThe thesis contributions resolves around sequential decision taking and more precisely Reinforcement Learning (RL). Taking its root in Machine Learning in the same way as supervised and unsupervised learning, RL quickly grow in popularity within the last two decades due to a handful of achievements on both the theoretical and applicative front. RL supposes that the learning agent and its environment follow a stochastic Markovian decision process over a state and action space. The process is said of decision as the agent is asked to choose at each time step an action to take. It is said stochastic as the effect of selecting a given action in a given state does not systematically yield the same state but rather defines a distribution over the state space. It is said to be Markovian as this distribution only depends on the current state-action pair. Consequently to the choice of an action, the agent receives a reward. The RL goal is then to solve the underlying optimization problem of finding the behaviour that maximizes the sum of rewards all along the interaction of the agent with its environment. From an applicative point of view, a large spectrum of problems can be cast onto an RL one, from Backgammon (TD-Gammon, was one of Machine Learning first success giving rise to a world class player of advanced level) to decision problems in the industrial and medical world. However, the optimization problem solved by RL depends on the prevous definition of a reward function that requires a certain level of domain expertise and also knowledge of the internal quirks of RL algorithms. As such, the first contribution of the thesis was to propose a learning framework that lightens the requirements made to the user. The latter does not need anymore to know the exact solution of the problem but to only be able to choose between two behaviours exhibited by the agent, the one that matches more closely the solution. Learning is interactive between the agent and the user and resolves around the three main following points: i) The agent demonstrates a behaviour ii) The user compares it w.r.t. to the current best one iii) The agent uses this feedback to update its preference model of the user and uses it to find the next behaviour to demonstrate. To reduce the number of required interactions before finding the optimal behaviour, the second contribution of the thesis was to define a theoretically sound criterion making the trade-off between the sometimes contradicting desires of complying with the user's preferences and demonstrating sufficiently different behaviours. The last contribution was to ensure the robustness of the algorithm w.r.t. the feedback errors that the user might make. Which happens more often than not in practice, especially at the initial phase of the interaction, when all the behaviours are far from the expected solution
Fournier, Pierre. "Intrinsically Motivated and Interactive Reinforcement Learning : a Developmental Approach". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS634.
Texto completoReinforcement learning (RL) is today more popular than ever, but certain basic skills are still out of reach of this paradigm: object manipulation, sensorimotor control, natural interaction with other agents. A possible approach to address these challenges consist in taking inspiration from human development, or even trying to reproduce it. In this thesis, we study the intersection of two crucial topics in developmental sciences and how to apply them to RL in order to tackle the aforementioned challenges: interactive learning and intrinsic motivation. Interactive learning and intrinsic motivation have already been studied, separately, in combination with RL, but in order to improve quantitatively existing agents performances, rather than to learn in a developmental fashion. We thus focus our efforts on the developmental aspect of these subjects. Our work touches the self-organisation of learning in developmental trajectories through an intrinsically motivated for learning progress, and the interaction of this organisation with goal-directed learning and imitation learning. We show that these mechanisms, when implemented in open-ended environments with no task predefined, can interact to produce learning behaviors that are sound from a developmental standpoint, and richer than those produced by each mechanism separately
Blier, Léonard. "Some Principled Methods for Deep Reinforcement Learning". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG040.
Texto completoThis thesis develops and studies some principled methods for Deep Learning (DL) and deep Reinforcement Learning (RL).In Part II, we study the efficiency of DL models from the context of the Minimum Description Length principle, which formalize Occam's razor, and holds that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. Surprisingly, we demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding, hence showing that DL approaches are well principled from this information theory viewpoint.In Part III, we tackle two limitations of standard approaches in DL and RL, and develop principled methods, improving robustness empirically.The first one concerns optimisation of deep learning models with SGD, and the cost of finding the optimal learning rate, which prevents using a new method out of the box without hyperparameter tuning. When design a principled optimisation method for DL, 'All Learning Rates At Once' : each unit or feature in the network gets its own learning rate sampled from a random distribution spanning several orders of magnitude. Perhaps surprisingly, Alrao performs close to SGD with an optimally tuned learning rate, for various architectures and problems.The second one tackles near continuous-time RL environments (such as robotics, control environment, …) : we show that time discretization (number of action per second) in as a critical factor, and that empirically, Q-learning-based approaches collapse with small time steps. Formally, we prove that Q-learning does not exist in continuous time. We detail a principled way to build an off-policy RL algorithm that yields similar performances over a wide range of time discretizations, and confirm this robustness empirically.The main part of this thesis, (Part IV), studies the Successor States Operator in RL, and how it can improve sample efficiency of policy evaluation. In an environment with a very sparse reward, learning the value function is a hard problem. At the beginning of training, no learning will occur until a reward is observed. This highlight the fact that not all the observed information is used. Leveraging this information might lead to better sample efficiency. The Successor State Operator is an object that expresses the value functions of all possible reward functions for a given, fixed policy. Learning the successor state operator can be done without reward signals, and can extract information from every observed transition, illustrating an unsupervised reinforcement learning approach.We offer a formal treatment of these objects in both finite and continuous spaces with function approximators. We present several learning algorithms and associated results. Similarly to the value function, the successor states operator satisfies a Bellman equation. Additionally, it also satisfies two other fixed point equations: a backward Bellman equation and a Bellman-Newton equation, expressing path compositionality in the Markov process. These new relation allow us to generalize from observed trajectories in several ways, potentially leading to more sample efficiency. Every of these equations lead to corresponding algorithms for any function approximators such as neural networks.Finally, (Part V) the study of the successor states operator and its algorithms allow us to derive unbiased methods in the setting of multi-goal RL, dealing with the issue of extremely sparse rewards. We additionally show that the popular Hindsight Experience Replay algorithm, known to be biased, is actually unbiased in the large class of deterministic environments
Chatzilygeroudis, Konstantinos. "Micro-Data Reinforcement Learning for Adaptive Robots". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0276/document.
Texto completoRobots have to face the real world, in which trying something might take seconds, hours, or even days. Unfortunately, the current state-of-the-art reinforcement learning algorithms (e.g., deep reinforcement learning) require big interaction times to find effective policies. In this thesis, we explored approaches that tackle the challenge of learning by trial-and-error in a few minutes on physical robots. We call this challenge “micro-data reinforcement learning”. In our first contribution, we introduced a novel learning algorithm called “Reset-free Trial-and-Error” that allows complex robots to quickly recover from unknown circumstances (e.g., damages or different terrain) while completing their tasks and taking the environment into account; in particular, a physical damaged hexapod robot recovered most of its locomotion abilities in an environment with obstacles, and without any human intervention. In our second contribution, we introduced a novel model-based reinforcement learning algorithm, called Black-DROPS that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. We additionally proposed Multi-DEX, a model-based policy search approach, that takes inspiration from novelty-based ideas and effectively solved several sparse reward scenarios. In our third contribution, we introduced a new model learning procedure in Black-DROPS (we call it GP-MI) that leverages parameterized black-box priors to scale up to high-dimensional systems; for instance, it found high-performing walking policies for a physical damaged hexapod robot (48D state and 18D action space) in less than 1 minute of interaction time. Finally, in the last part of the thesis, we explored a few ideas on how to incorporate safety constraints, robustness and leverage multiple priors in Bayesian optimization in order to tackle the micro-data reinforcement learning challenge. Throughout this thesis, our goal was to design algorithms that work on physical robots, and not only in simulation. Consequently, all the proposed approaches have been evaluated on at least one physical robot. Overall, this thesis aimed at providing methods and algorithms that will allow physical robots to be more autonomous and be able to learn in a handful of trials
Achab, Mastane. "Ranking and risk-aware reinforcement learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT020.
Texto completoThis thesis divides into two parts: the first part is on ranking and the second on risk-aware reinforcement learning. While binary classification is the flagship application of empirical risk minimization (ERM), the main paradigm of machine learning, more challenging problems such as bipartite ranking can also be expressed through that setup. In bipartite ranking, the goal is to order, by means of scoring methods, all the elements of some feature space based on a training dataset composed of feature vectors with their binary labels. This thesis extends this setting to the continuous ranking problem, a variant where the labels are taking continuous values instead of being simply binary. The analysis of ranking data, initiated in the 18th century in the context of elections, has led to another ranking problem using ERM, namely ranking aggregation and more precisely the Kemeny's consensus approach. From a training dataset made of ranking data, such as permutations or pairwise comparisons, the goal is to find the single "median permutation" that best corresponds to a consensus order. We present a less drastic dimensionality reduction approach where a distribution on rankings is approximated by a simpler distribution, which is not necessarily reduced to a Dirac mass as in ranking aggregation.For that purpose, we rely on mathematical tools from the theory of optimal transport such as Wasserstein metrics. The second part of this thesis focuses on risk-aware versions of the stochastic multi-armed bandit problem and of reinforcement learning (RL), where an agent is interacting with a dynamic environment by taking actions and receiving rewards, the objective being to maximize the total payoff. In particular, a novel atomic distributional RL approach is provided: the distribution of the total payoff is approximated by particles that correspond to trimmed means
Zimmer, Matthieu. "Apprentissage par renforcement développemental". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0008/document.
Texto completoReinforcement learning allows an agent to learn a behavior that has never been previously defined by humans. The agent discovers the environment and the different consequences of its actions through its interaction: it learns from its own experience, without having pre-established knowledge of the goals or effects of its actions. This thesis tackles how deep learning can help reinforcement learning to handle continuous spaces and environments with many degrees of freedom in order to solve problems closer to reality. Indeed, neural networks have a good scalability and representativeness. They make possible to approximate functions on continuous spaces and allow a developmental approach, because they require little a priori knowledge on the domain. We seek to reduce the amount of necessary interaction of the agent to achieve acceptable behavior. To do so, we proposed the Neural Fitted Actor-Critic framework that defines several data efficient actor-critic algorithms. We examine how the agent can fully exploit the transitions generated by previous behaviors by integrating off-policy data into the proposed framework. Finally, we study how the agent can learn faster by taking advantage of the development of his body, in particular, by proceeding with a gradual increase in the dimensionality of its sensorimotor space
Tréca, Maxime. "Designing traffic signal control systems using reinforcement learning". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG043.
Texto completoThis thesis studies the problem of traffic optimization through traffic light signals on road networks. Traffic optimization is achieved in our case through the use of reinforcement learning, a branch of machine learning in which an agent solves a given task in an environment by maximizing its reward signals.First, we present the fields of traffic signal control (TSC) and reinforcement learning (RL) separately, before presenting how the latter is applied on the former (RL-TSC). Then, we define a mathematical model of traffic based on graph theory, before introducing the reinforcement learning model, traffic simulator and deep reinforcement learning library created for our research work.Finally, these definitions allow us to build an efficient traffic signal control method based on reinforcement learning.We first study multiple classical reinforcement learning techniques on an isolated traffic intersection. Multiple classes of RL algorithms are compared (e.g. Q-learning, LRP, actor-critic) to deterministic TSC methods used as a baseline. We then introduce function approximation methods using deep neural networks, allowing for significant performance improvement on isolated intersections. These experiments allow us to single out dueling deep Q-learning as the best isolated RL-TSC method for out model.On this basis, we introduce the concept of agent coordination in multi-agent reinforcement learning systems (MARL). We compare multiple modes of coordinaiton to the isolated baseline that we previously defined. These experiments allow us to define the DEC-DQN coordination method, which allows for multiple agents of a POMDP to communicate in order to better optimize traffic. DEC-DQN uses a deep neural network shared by all agents of the network, allowing them to learn a common communication protocol from scratch. In order to correctly reward communication actions, which are entirely distinct from traffic optimization actions taken by agents, DEC-DQN defines a special reward function allowing each agent to directly estimate the impact of its communications on neighboring agents of the network. Communicaiton action rewards are directly estimated on the traffic optimization neural networks of neighboring intersections.Finally, this novel cooridnation method is compared to other methods of the literature on a large-scale simulation. The DEC-DQN algorithm results in faster agent learning, as well as increased performance and stability thanks to agent coordination
Tarbouriech, Jean. "Goal-oriented exploration for reinforcement learning". Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILB014.
Texto completoLearning to reach goals is a competence of high practical relevance to acquire for intelligent agents. For instance, this encompasses many navigation tasks ("go to target X"), robotic manipulation ("attain position Y of the robotic arm"), or game-playing scenarios ("win the game by fulfilling objective Z"). As a living being interacting with the world, I am constantly driven by goals to reach, varying in scope and difficulty.Reinforcement Learning (RL) holds the promise to frame and learn goal-oriented behavior. Goals can be modeled as specific configurations of the environment that must be attained via sequential interaction and exploration of the unknown environment. Although various deep RL algorithms have been proposed for goal-oriented RL, existing methods often lack principled understanding, sample efficiency and general-purpose effectiveness. In fact, very limited theoretical analysis of goal-oriented RL was available, even in the basic scenario of finitely many states and actions.We first focus on a supervised scenario of goal-oriented RL, where a goal state to be reached in minimum total expected cost is provided as part of the problem definition. After formalizing the online learning problem in this setting often known as Stochastic Shortest Path (SSP), we introduce two no-regret algorithms (one is the first available in the literature, the other attains nearly optimal guarantees).Beyond training our RL agent to solve only one task, we then aspire that it learns to autonomously solve a wide variety of tasks, in the absence of any reward supervision. In this challenging unsupervised RL scenario, we advocate to "Set Your Own Goals" (SYOG), which suggests the agent to learn the ability to intrinsically select and reach its own goal states. We derive finite-time guarantees of this popular heuristic in various settings, each with its specific learning objective and technical challenges. As an illustration, we propose a rigorous analysis of the algorithmic principle of targeting "uncertain" goals which we also anchor in deep RL.The main focus and contribution of this thesis are to instigate a principled analysis of goal-oriented exploration in RL, both in the supervised and unsupervised scenarios. We hope that it helps suggest promising research directions to improve the interpretability and sample efficiency of goal-oriented RL algorithms in practical applications
Théro, Héloïse. "Contrôle, agentivité et apprentissage par renforcement". Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE028/document.
Texto completoSense of agency or subjective control can be defined by the feeling that we control our actions, and through them effects in the outside world. This cluster of experiences depend on the ability to learn action-outcome contingencies and a more classical algorithm to model this originates in the field of human reinforcementlearning. In this PhD thesis, we used the cognitive modeling approach to investigate further the interaction between perceived control and reinforcement learning. First, we saw that participants undergoing a reinforcement-learning task experienced higher agency; this influence of reinforcement learning on agency comes as no surprise, because reinforcement learning relies on linking a voluntary action and its outcome. But our results also suggest that agency influences reinforcement learning in two ways. We found that people learn actionoutcome contingencies based on a default assumption: their actions make a difference to the world. Finally, we also found that the mere fact of choosing freely shapes the learning processes following that decision. Our general conclusion is that agency and reinforcement learning, two fundamental fields of human psychology, are deeply intertwined. Contrary to machines, humans do care about being in control, or about making the right choice, and this results in integrating information in a one-sided way
Libros sobre el tema "Apprentissage par reinforcement"
Sutton, Richard S. Reinforcement learning: An introduction. Cambridge, Mass: MIT Press, 1998.
Buscar texto completoDeep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more. Packt Publishing, 2018.
Buscar texto completo