Academic literature on the topic 'Causal reinforcement learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Causal reinforcement learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Causal reinforcement learning":

1

Madumal, Prashan, Tim Miller, Liz Sonenberg, and Frank Vetere. "Explainable Reinforcement Learning through a Causal Lens." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 03 (April 3, 2020): 2493–500. http://dx.doi.org/10.1609/aaai.v34i03.5631.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Prominent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen by referring to counterfactuals — things that did not happen. In this paper, we use causal models to derive causal explanations of the behaviour of model-free reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We computationally evaluate the model in 6 domains and measure performance and task prediction accuracy. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents' behaviour. We investigate: 1) participants' understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models.
2

Li, Dezhi, Yunjun Lu, Jianping Wu, Wenlu Zhou, and Guangjun Zeng. "Causal Reinforcement Learning for Knowledge Graph Reasoning." Applied Sciences 14, no. 6 (March 15, 2024): 2498. http://dx.doi.org/10.3390/app14062498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Knowledge graph reasoning can deduce new facts and relationships, which is an important research direction of knowledge graphs. Most of the existing methods are based on end-to-end reasoning which cannot effectively use the knowledge graph, so consequently the performance of the method still needs to be improved. Therefore, we combine causal inference with reinforcement learning and propose a new framework for knowledge graph reasoning. By combining the counterfactual method in causal inference, our method can obtain more information as prior knowledge and integrate it into the control strategy in the reinforcement model. The proposed method mainly includes the steps of relationship importance identification, reinforcement learning framework design, policy network design, and the training and testing of the causal reinforcement learning model. Specifically, a prior knowledge table is first constructed to indicate which relationship is more important for the problem to be queried; secondly, designing state space, optimization, action space, state transition and reward, respectively, is undertaken; then, the standard value is set and compared with the weight value of each candidate edge, and an action strategy is selected according to the comparison result through prior knowledge or neural network; finally, the parameters of the reinforcement learning model are determined through training and testing. We used four datasets to compare our method to the baseline method and conducted ablation experiments. On dataset NELL-995 and FB15k-237, the experimental results show that the MAP scores of our method are 87.8 and 45.2, and the optimal performance is achieved.
3

Yang, Dezhi, Guoxian Yu, Jun Wang, Zhengtian Wu, and Maozu Guo. "Reinforcement Causal Structure Learning on Order Graph." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 10737–44. http://dx.doi.org/10.1609/aaai.v37i9.26274.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Learning directed acyclic graph (DAG) that describes the causality of observed data is a very challenging but important task. Due to the limited quantity and quality of observed data, and non-identifiability of causal graph, it is almost impossible to infer a single precise DAG. Some methods approximate the posterior distribution of DAGs to explore the DAG space via Markov chain Monte Carlo (MCMC), but the DAG space is over the nature of super-exponential growth, accurately characterizing the whole distribution over DAGs is very intractable. In this paper, we propose Reinforcement Causal Structure Learning on Order Graph (RCL-OG) that uses order graph instead of MCMC to model different DAG topological orderings and to reduce the problem size. RCL-OG first defines reinforcement learning with a new reward mechanism to approximate the posterior distribution of orderings in an efficacy way, and uses deep Q-learning to update and transfer rewards between nodes. Next, it obtains the probability transition model of nodes on order graph, and computes the posterior probability of different orderings. In this way, we can sample on this model to obtain the ordering with high probability. Experiments on synthetic and benchmark datasets show that RCL-OG provides accurate posterior probability approximation and achieves better results than competitive causal discovery algorithms.
4

Madumal, Prashan. "Explainable Agency in Reinforcement Learning Agents." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (April 3, 2020): 13724–25. http://dx.doi.org/10.1609/aaai.v34i10.7134.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis explores how reinforcement learning (RL) agents can provide explanations for their actions and behaviours. As humans, we build causal models to encode cause-effect relations of events and use these to explain why events happen. Taking inspiration from cognitive psychology and social science literature, I build causal explanation models and explanation dialogue models for RL agents. By mimicking human-like explanation models, these agents can provide explanations that are natural and intuitive to humans.
5

Herlau, Tue, and Rasmus Larsen. "Reinforcement Learning of Causal Variables Using Mediation Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6910–17. http://dx.doi.org/10.1609/aaai.v36i6.20648.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We consider the problem of acquiring causal representations and concepts in a reinforcement learning setting. Our approach defines a causal variable as being both manipulable by a policy, and able to predict the outcome. We thereby obtain a parsimonious causal graph in which interventions occur at the level of policies. The approach avoids defining a generative model of the data, prior pre-processing, or learning the transition kernel of the Markov decision process. Instead, causal variables and policies are determined by maximizing a new optimization target inspired by mediation analysis, which differs from the expected return. The maximization is accomplished using a generalization of Bellman's equation which is shown to converge, and the method finds meaningful causal representations in a simulated environment.
6

Duong, Tri Dung, Qian Li, and Guandong Xu. "Stochastic intervention for causal inference via reinforcement learning." Neurocomputing 482 (April 2022): 40–49. http://dx.doi.org/10.1016/j.neucom.2022.01.086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Wei, Xuesong Wang, Haoyu Wang, and Yuhu Cheng. "Causal Meta-Reinforcement Learning for Multimodal Remote Sensing Data Classification." Remote Sensing 16, no. 6 (March 16, 2024): 1055. http://dx.doi.org/10.3390/rs16061055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Multimodal remote sensing data classification can enhance a model’s ability to distinguish land features through multimodal data fusion. In this context, how to help models understand the relationship between multimodal data and target tasks has become the focus of researchers. Inspired by the human feedback learning mechanism, causal reasoning mechanism, and knowledge induction mechanism, this paper integrates causal learning, reinforcement learning, and meta learning into a unified remote sensing data classification framework and proposes causal meta-reinforcement learning (CMRL). First, based on the feedback learning mechanism, we overcame the limitations of traditional implicit optimization of fusion features and customized a reinforcement learning environment for multimodal remote sensing data classification tasks. Through feedback interactive learning between agents and the environment, we helped the agents understand the complex relationships between multimodal data and labels, thereby achieving full mining of multimodal complementary information.Second, based on the causal inference mechanism, we designed causal distribution prediction actions, classification rewards, and causal intervention rewards, capturing pure causal factors in multimodal data and preventing false statistical associations between non-causal factors and class labels. Finally, based on the knowledge induction mechanism, we designed a bi-layer optimization mechanism based on meta-learning. By constructing a meta training task and meta validation task simulation model in the generalization scenario of unseen data, we helped the model induce cross-task shared knowledge, thereby improving its generalization ability for unseen multimodal data. The experimental results on multiple sets of multimodal datasets showed that the proposed method achieved state-of-the-art performance in multimodal remote sensing data classification tasks.
8

Veselic, Sebastijan, Gerhard Jocham, Christian Gausterer, Bernhard Wagner, Miriam Ernhoefer-Reßler, Rupert Lanzenberger, Christoph Eisenegger, Claus Lamm, and Annabel Losecaat Vermeer. "A causal role of estradiol in human reinforcement learning." Hormones and Behavior 134 (August 2021): 105022. http://dx.doi.org/10.1016/j.yhbeh.2021.105022.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Zhou, Zhengyuan, Michael Bloem, and Nicholas Bambos. "Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning." IEEE Transactions on Automatic Control 63, no. 9 (September 2018): 2787–802. http://dx.doi.org/10.1109/tac.2017.2775960.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Zizhao, Caroline Wang, Xuesu Xiao, Yuke Zhu, and Peter Stone. "Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (March 24, 2024): 15778–86. http://dx.doi.org/10.1609/aaai.v38i14.29507.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on two manipulation environments and four tasks reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.

Dissertations / Theses on the topic "Causal reinforcement learning":

1

Tournaire, Thomas. "Model-based reinforcement learning for dynamic resource allocation in cloud environments." Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
L'émergence de nouvelles technologies nécessite une allocation efficace des ressources pour satisfaire la demande. Cependant, ces nouveaux besoins nécessitent une puissance de calcul élevée impliquant une plus grande consommation d'énergie notamment dans les infrastructures cloud et data centers. Il est donc essentiel de trouver de nouvelles solutions qui peuvent satisfaire ces besoins tout en réduisant la consommation d'énergie des ressources. Dans cette thèse, nous proposons et comparons de nouvelles solutions d'IA (apprentissage par renforcement RL) pour orchestrer les ressources virtuelles dans les environnements de réseaux virtuels de manière à garantir les performances et minimiser les coûts opérationnels. Nous considérons les systèmes de file d'attente comme un modèle pour les infrastructures cloud IaaS et apportons des méthodes d'apprentissage pour allouer efficacement le bon nombre de ressources.Notre objectif est de minimiser une fonction de coût en tenant compte des coûts de performance et opérationnels. Nous utilisons différents types d'algorithmes de RL (du « sans-modèle » au modèle relationnel) pour apprendre la meilleure politique. L'apprentissage par renforcement s'intéresse à la manière dont un agent doit agir dans un environnement pour maximiser une récompense cumulative. Nous développons d'abord un modèle de files d'attente d'un système cloud avec un nœud physique hébergeant plusieurs ressources virtuelles. Dans cette première partie, nous supposons que l'agent connaît le modèle (dynamiques de l'environnement et coût), ce qui lui donne la possibilité d'utiliser des méthodes de programmation dynamique pour le calcul de la politique optimale. Puisque le modèle est connu dans cette partie, nous nous concentrons également sur les propriétés des politiques optimales, qui sont des règles basées sur les seuils et l'hystérésis. Cela nous permet d'intégrer la propriété structurelle des politiques dans les algorithmes MDP. Après avoir fourni un modèle de cloud concret avec des arrivées exponentielles avec des intensités réelles et des données d'énergie pour le fournisseur de cloud, nous comparons dans cette première approche l'efficacité et le temps de calcul des algorithmes MDP par rapport aux heuristiques construites sur les distributions stationnaires de la chaîne de Markov des files d'attente.Dans une deuxième partie, nous considérons que l'agent n'a pas accès au modèle de l'environnement et nous concentrons notre travail sur les techniques de RL. Nous évaluons d'abord des méthodes basées sur un modèle où l'agent peut réutiliser son expérience pour mettre à jour sa fonction de valeur. Nous considérons également des techniques de MDP en ligne où l'agent autonome approxime le modèle pour effectuer une programmation dynamique. Cette partie est évaluée dans un environnement plus large avec deux nœuds physiques en tandem et nous évaluons le temps de convergence et la précision des différentes méthodes, principalement les techniques basées sur un modèle par rapport aux méthodes sans modèle de l'état de l'art.La dernière partie se concentre sur les techniques de RL basées sur des modèles avec une structure relationnelle entre les variables d’état. Comme ces réseaux en tandem ont des propriétés structurelles dues à la forme de l’infrastructure, nous intégrons les approches factorisées et causales aux méthodes de RL pour inclure cette connaissance. Nous fournissons à l'agent une connaissance relationnelle de l'environnement qui lui permet de comprendre comment les variables sont reliées. L'objectif principal est d'accélérer la convergence: d'abord avec une représentation plus compacte avec la factorisation où nous concevons un algorithme en ligne de MDP factorisé que nous comparons avec des algorithmes de RL sans modèle et basés sur un modèle ; ensuite en intégrant le raisonnement causal et contrefactuel qui peut traiter les environnements avec des observations partielles et des facteurs de confusion non observés
The emergence of new technologies (Internet of Things, smart cities, autonomous vehicles, health, industrial automation, ...) requires efficient resource allocation to satisfy the demand. These new offers are compatible with new 5G network infrastructure since it can provide low latency and reliability. However, these new needs require high computational power to fulfill the demand, implying more energy consumption in particular in cloud infrastructures and more particularly in data centers. Therefore, it is critical to find new solutions that can satisfy these needs still reducing the power usage of resources in cloud environments. In this thesis we propose and compare new AI solutions (Reinforcement Learning) to orchestrate virtual resources in virtual network environments such that performances are guaranteed and operational costs are minimised. We consider queuing systems as a model for clouds IaaS infrastructures and bring learning methodologies to efficiently allocate the right number of resources for the users.Our objective is to minimise a cost function considering performance costs and operational costs. We go through different types of reinforcement learning algorithms (from model-free to relational model-based) to learn the best policy. Reinforcement learning is concerned with how a software agent ought to take actions in an environment to maximise some cumulative reward. We first develop queuing model of a cloud system with one physical node hosting several virtual resources. On this first part we assume the agent perfectly knows the model (dynamics of the environment and the cost function), giving him the opportunity to perform dynamic programming methods for optimal policy computation. Since the model is known in this part, we also concentrate on the properties of the optimal policies, which are threshold-based and hysteresis-based rules. This allows us to integrate the structural property of the policies into MDP algorithms. After providing a concrete cloud model with exponential arrivals with real intensities and energy data for cloud provider, we compare in this first approach efficiency and time computation of MDP algorithms against heuristics built on top of the queuing Markov Chain stationary distributions.In a second part we consider that the agent does not have access to the model of the environment and concentrate our work with reinforcement learning techniques, especially model-based reinforcement learning. We first develop model-based reinforcement learning methods where the agent can re-use its experience replay to update its value function. We also consider MDP online techniques where the autonomous agent approximates environment model to perform dynamic programming. This part is evaluated in a larger network environment with two physical nodes in tandem and we assess convergence time and accuracy of different reinforcement learning methods, mainly model-based techniques versus the state-of-the-art model-free methods (e.g. Q-Learning).The last part focuses on model-based reinforcement learning techniques with relational structure between environment variables. As these tandem networks have structural properties due to their infrastructure shape, we investigate factored and causal approaches built-in reinforcement learning methods to integrate this information. We provide the autonomous agent with a relational knowledge of the environment where it can understand how variables are related to each other. The main goal is to accelerate convergence by: first having a more compact representation with factorisation where we devise a factored MDP online algorithm that we evaluate and compare with model-free and model-based reinforcement learning algorithms; second integrating causal and counterfactual reasoning that can tackle environments with partial observations and unobserved confounders
2

Bernigau, Holger. "Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop." Doctoral thesis, Universitätsbibliothek Leipzig, 2015. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-164734.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm.
3

Théro, Héloïse. "Contrôle, agentivité et apprentissage par renforcement." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE028/document.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Le sentiment d’agentivité est défini comme le sentiment de contrôler nos actions, et à travers elles, les évènements du monde extérieur. Cet ensemble phénoménologique dépend de notre capacité d’apprendre les contingences entre nos actions et leurs résultats, et un algorithme classique pour modéliser cela vient du domaine de l’apprentissage par renforcement. Dans cette thèse, nous avons utilisé l’approche de modélisation cognitive pour étudier l’interaction entre agentivité et apprentissage par renforcement. Tout d’abord, les participants réalisant une tâche d’apprentissage par renforcement tendent à avoir plus d’agentivité. Cet effet est logique, étant donné que l’apprentissage par renforcement consiste à associer une action volontaire et sa conséquence. Mais nous avons aussi découvert que l’agentivité influence l’apprentissage de deux manières. Le mode par défaut pour apprendre des contingences action-conséquence est que nos actions ont toujours un pouvoir causal. De plus, simplement choisir une action change l’apprentissage de sa conséquence. En conclusion, l’agentivité et l’apprentissage par renforcement, deux piliers de la psychologie humaine, sont fortement liés. Contrairement à des ordinateurs, les humains veulent être en contrôle, et faire les bons choix, ce qui biaise notre aquisition d’information
Sense of agency or subjective control can be defined by the feeling that we control our actions, and through them effects in the outside world. This cluster of experiences depend on the ability to learn action-outcome contingencies and a more classical algorithm to model this originates in the field of human reinforcementlearning. In this PhD thesis, we used the cognitive modeling approach to investigate further the interaction between perceived control and reinforcement learning. First, we saw that participants undergoing a reinforcement-learning task experienced higher agency; this influence of reinforcement learning on agency comes as no surprise, because reinforcement learning relies on linking a voluntary action and its outcome. But our results also suggest that agency influences reinforcement learning in two ways. We found that people learn actionoutcome contingencies based on a default assumption: their actions make a difference to the world. Finally, we also found that the mere fact of choosing freely shapes the learning processes following that decision. Our general conclusion is that agency and reinforcement learning, two fundamental fields of human psychology, are deeply intertwined. Contrary to machines, humans do care about being in control, or about making the right choice, and this results in integrating information in a one-sided way
4

Jonsson, Anders. "A causal approach to hierarchical decomposition in reinforcement learning." 2006. https://scholarworks.umass.edu/dissertations/AAI3212735.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Reinforcement learning provides a means for autonomous agents to improve their action selection strategies without the need for explicit training information provided by an informed instructor. Theoretical and empirical results indicate that reinforcement learning algorithms can efficiently determine optimal or approximately optimal policies in tasks of limited size. However, as the size of a task grows, reinforcement learning algorithms become less consistent and less efficient at determining a useful policy. A key challenge in reinforcement learning is to develop methods that facilitate scaling reinforcement learning algorithms up to larger, more realistic tasks. ^ We present a series of algorithms that take advantage of task structure to make reinforcement learning more efficient in realistic tasks that display such structure. In each algorithm, we assume that the state space of a task is factored; i.e., states are collections of values of a set of state variables. Our work combines hierarchical decomposition and state abstraction to reduce the size of a task prior to applying reinforcement learning. Hierarchical decomposition breaks a task into several subtasks that can be solved separately. For hierarchical decomposition to simplify learning, it is critical that each subtask is easier to solve than the overall task. To achieve the goal of simplifying the subtasks, we perform state abstraction separately for each subtask. ^ We begin by presenting an algorithm that uses experience from the environment to dynamically perform state abstraction for each subtask in an existing hierarchy of subtasks. Since our goal is to automate hierarchical decomposition as well as state abstraction, a second algorithm uses a dynamic Bayesian network action representation to automatically decompose a task into a hierarchy of subtasks. In addition, the algorithm provides an efficient way to perform state abstraction for each resulting subtask. A third algorithm constructs compact representations of activities that represent solutions to the subtasks. These compact representations enable the use of planning to efficiently approximate solutions to higher-level subtasks without interacting with the environment. Our fourth and final algorithm provides a means to learn a dynamic Bayesian network representation of actions from experience in tasks for which the representation is not available prior to learning. ^ The dissertation provides a detailed description of each algorithm as well as some theoretical results. We also present empirical results of each algorithm in a series of experiments. In tasks that display certain types of structure, the simplifications introduced by our algorithms significantly improve the performance of reinforcement learning. The results indicate that our algorithms provide a promising approach to make reinforcement learning better suited to solve realistic tasks in which these types of structure are present. ^
5

Lattimore, Finnian Rachel. "Learning how to act: making good decisions with machine learning." Phd thesis, 2017. http://hdl.handle.net/1885/144602.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis is about machine learning and statistical approaches to decision making. How can we learn from data to anticipate the consequence of, and optimally select, interventions or actions? Problems such as deciding which medication to prescribe to patients, who should be released on bail, and how much to charge for insurance are ubiquitous, and have far reaching impacts on our lives. There are two fundamental approaches to learning how to act: reinforcement learning, in which an agent directly intervenes in a system and learns from the outcome, and observational causal inference, whereby we seek to infer the outcome of an intervention from observing the system. The goal of this thesis to connect and unify these key approaches. I introduce causal bandit problems: a synthesis that combines causal graphical models, which were developed for observational causal inference, with multi-armed bandit problems, which are a subset of reinforcement learning problems that are simple enough to admit formal analysis. I show that knowledge of the causal structure allows us to transfer information learned about the outcome of one action to predict the outcome of an alternate action, yielding a novel form of structure between bandit arms that cannot be exploited by existing algorithms. I propose an algorithm for causal bandit problems and prove bounds on the simple regret demonstrating it is close to mini-max optimal and better than algorithms that do not use the additional causal information.
6

Bernigau, Holger. "Causal Models over Infinite Graphs and their Application to the Sensorimotor Loop: Causal Models over Infinite Graphs and their Application to theSensorimotor Loop: General Stochastic Aspects and GradientMethods for Optimal Control." Doctoral thesis, 2014. https://ul.qucosa.de/id/qucosa%3A13254.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Motivation and background The enormous amount of capabilities that every human learns throughout his life, is probably among the most remarkable and fascinating aspects of life. Learning has therefore drawn lots of interest from scientists working in very different fields like philosophy, biology, sociology, educational sciences, computer sciences and mathematics. This thesis focuses on the information theoretical and mathematical aspects of learning. We are interested in the learning process of an agent (which can be for example a human, an animal, a robot, an economical institution or a state) that interacts with its environment. Common models for this interaction are Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Learning is then considered to be the maximization of the expectation of a predefined reward function. In order to formulate general principles (like a formal definition of curiosity-driven learning or avoidance of unpleasant situation) in a rigorous way, it might be desirable to have a theoretical framework for the optimization of more complex functionals of the underlying process law. This might include the entropy of certain sensor values or their mutual information. An optimization of the latter quantity (also known as predictive information) has been investigated intensively both theoretically and experimentally using computer simulations by N. Ay, R. Der, K Zahedi and G. Martius. In this thesis, we develop a mathematical theory for learning in the sensorimotor loop beyond expected reward maximization. Approaches and results This thesis covers four different topics related to the theory of learning in the sensorimotor loop. First of all, we need to specify the model of an agent interacting with the environment, either with learning or without learning. This interaction naturally results in complex causal dependencies. Since we are interested in asymptotic properties of learning algorithms, it is necessary to consider infinite time horizons. It turns out that the well-understood theory of causal networks known from the machine learning literature is not powerful enough for our purpose. Therefore we extend important theorems on causal networks to infinite graphs and general state spaces using analytical methods from measure theoretic probability theory and the theory of discrete time stochastic processes. Furthermore, we prove a generalization of the strong Markov property from Markov processes to infinite causal networks. Secondly, we develop a new idea for a projected stochastic constraint optimization algorithm. Generally a discrete gradient ascent algorithm can be used to generate an iterative sequence that converges to the stationary points of a given optimization problem. Whenever the optimization takes place over a compact subset of a vector space, it is possible that the iterative sequence leaves the constraint set. One possibility to cope with this problem is to project all points to the constraint set using Euclidean best-approximation. The latter is sometimes difficult to calculate. A concrete example is an optimization over the unit ball in a matrix space equipped with operator norm. Our idea consists of a back-projection using quasi-projectors different from the Euclidean best-approximation. In the matrix example, there is another canonical way to force the iterative sequence to stay in the constraint set: Whenever a point leaves the unit ball, it is divided by its norm. For a given target function, this procedure might introduce spurious stationary points on the boundary. We show that this problem can be circumvented by using a gradient that is tailored to the quasi-projector used for back-projection. We state a general technical compatibility condition between a quasi-projector and a metric used for gradient ascent, prove convergence of stochastic iterative sequences and provide an appropriate metric for the unit-ball example. Thirdly, a class of learning problems in the sensorimotor loop is defined and motivated. This class of problems is more general than the usual expected reward maximization and is illustrated by numerous examples (like expected reward maximization, maximization of the predictive information, maximization of the entropy and minimization of the variance of a given reward function). We also provide stationarity conditions together with appropriate gradient formulas. Last but not least, we prove convergence of a stochastic optimization algorithm (as considered in the second topic) applied to a general learning problem (as considered in the third topic). It is shown that the learning algorithm converges to the set of stationary points. Among others, the proof covers the convergence of an improved version of an algorithm for the maximization of the predictive information as proposed by N. Ay, R. Der and K. Zahedi. We also investigate an application to a linear Gaussian dynamic, where the policies are encoded by the unit-ball in a space of matrices equipped with operator norm.

Books on the topic "Causal reinforcement learning":

1

Chakraborty, Bibhas. Statistical methods for dynamic treatment regimes: Reinforcement learning, causal inference, and personalized medicine. New York, NY: Springer, 2013.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Gershman, Samuel J. Reinforcement Learning and Causal Models. Edited by Michael R. Waldmann. Oxford University Press, 2017. http://dx.doi.org/10.1093/oxfordhb/9780199399550.013.20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This chapter reviews the diverse roles that causal knowledge plays in reinforcement learning. The first half of the chapter contrasts a “model-free” system that learns to repeat actions that lead to reward with a “model-based” system that learns a probabilistic causal model of the environment, which it then uses to plan action sequences. Evidence suggests that these two systems coexist in the brain, both competing and cooperating with each other. The interplay of two systems allows the brain to negotiate a balance between cognitively cheap but inaccurate model-free algorithms and accurate but expensive model-based algorithms. The second half of the chapter reviews research on hidden state inference in reinforcement learning. The problem of inferring hidden states can be construed in terms of inferring the latent causes that give rise to sensory data and rewards. Because hidden state inference affects both model-based and model-free reinforcement learning, causal knowledge impinges upon both systems.
3

Moodie, Erica E. M., and Bibhas Chakraborty. Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine. Springer New York, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Butz, Martin V., and Esther F. Kutter. How the Mind Comes into Being. Oxford University Press, 2017. http://dx.doi.org/10.1093/acprof:oso/9780198739692.001.0001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
For more than 2000 years Greek philosophers have thought about the puzzling introspectively assessed dichotomy between our physical bodies and our seemingly non-physical minds. How is it that we can think highly abstract thoughts, seemingly fully detached from actual, physical reality? Despite the obvious interactions between mind and body (we get tired, we are hungry, we stay up late despite being tired, etc.), until today it remains puzzling how our mind controls our body, and vice versa, how our body shapes our mind. Despite a big movement towards embodied cognitive science over the last 20 years or so, introductory books with a functional and computational perspective on how human thought and language capabilities may actually have come about – and are coming about over and over again – are missing. This book fills that gap. Starting with a historical background on traditional cognitive science and resulting fundamental challenges that have not been resolved, embodied cognitive science is introduced and its implications for how human minds have come and continue to come into being are detailed. In particular, the book shows that evolution has produced biological bodies that provide “morphologically intelligent” structures, which foster the development of suitable behavioral and cognitive capabilities. While these capabilities can be modified and optimized given positive and negative reward as feedback, to reach abstract cognitive capabilities, evolution has furthermore produced particular anticipatory control-oriented mechanisms, which cause the development of particular types of predictive encodings, modularizations, and abstractions. Coupled with an embodied motivational system, versatile, goal-directed, self-motivated behavior, learning becomes possible. These lines of thought are introduced and detailed from interdisciplinary, evolutionary, ontogenetic, reinforcement learning, and anticipatory predictive encoding perspectives in the first part of the book. A short excursus then provides an introduction to neuroscience, including general knowledge about brain anatomy, and basic neural and brain functionality, as well as the main research methodologies. With reference to this knowledge, the subsequent chapters then focus on how the human brain manages to develop abstract thought and language. Sensory systems, motor systems, and their predictive, control-oriented interactions are detailed from a functional and computational perspective. Bayesian information processing is introduced along these lines as are generative models. Moreover, it is shown how particular modularizations can develop. When control and attention come into play, these structures develop also dependent on the available motor capabilities. Vice versa, the development of more versatile motor capabilities depends on structural development. Event-oriented abstractions enable conceptualizations and behavioral compositions, paving the path towards abstract thought and language. Also evolutionary drives towards social interactions play a crucial role. Based on the developing sensorimotor- and socially-grounded structures, the human mind becomes language ready. The development of language in each human child then further facilitates the self-motivated generation of abstract, compositional, highly flexible thought about the present, past, and future, as well as about others. In conclusion, the book gives an overview over how the human mind comes into being – sketching out a developmental pathway towards the mastery of abstract and reflective thought, while detailing the critical body and neural functionalities, and computational mechanisms, which enable this development.

Book chapters on the topic "Causal reinforcement learning":

1

Xiong, Momiao. "Reinforcement Learning and Causal Inference." In Artificial Intelligence and Causal Inference, 293–348. Boca Raton: Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003028543-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Yang, Dezhi, Guoxian Yu, Jun Wang, Zhongmin Yan, and Maozu Guo. "Causal Discovery by Graph Attention Reinforcement Learning." In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), 28–36. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2023. http://dx.doi.org/10.1137/1.9781611977653.ch4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Weytjens, Hans, Wouter Verbeke, and Jochen De Weerdt. "Timed Process Interventions: Causal Inference vs. Reinforcement Learning." In Business Process Management Workshops, 245–58. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-50974-2_19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Gajcin, Jasmina, and Ivana Dusparic. "ReCCoVER: Detecting Causal Confusion for Explainable Reinforcement Learning." In Explainable and Transparent AI and Multi-Agent Systems, 38–56. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-15565-9_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Feliciano-Avelino, Ivan, Arquímides Méndez-Molina, Eduardo F. Morales, and L. Enrique Sucar. "Causal Based Action Selection Policy for Reinforcement Learning." In Advances in Computational Intelligence, 213–27. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-89817-5_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Paliwal, Yash, Rajarshi Roy, Jean-Raphaël Gaglione, Nasim Baharisangari, Daniel Neider, Xiaoming Duan, Ufuk Topcu, and Zhe Xu. "Reinforcement Learning with Temporal-Logic-Based Causal Diagrams." In Lecture Notes in Computer Science, 123–40. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-40837-3_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hao, Zhifeng, Haipeng Zhu, Wei Chen, and Ruichu Cai. "Latent Causal Dynamics Model for Model-Based Reinforcement Learning." In Neural Information Processing, 219–30. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-99-8082-6_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bozorgi, Zahra Dasht, Marlon Dumas, Marcello La Rosa, Artem Polyvyanyy, Mahmoud Shoush, and Irene Teinemaa. "Learning When to Treat Business Processes: Prescriptive Process Monitoring with Causal Inference and Reinforcement Learning." In Advanced Information Systems Engineering, 364–80. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-34560-9_22.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
AbstractIncreasing the success rate of a process, i.e. the percentage of cases that end in a positive outcome, is a recurrent process improvement goal. At runtime, there are often certain actions (a.k.a. treatments) that workers may execute to lift the probability that a case ends in a positive outcome. For example, in a loan origination process, a possible treatment is to issue multiple loan offers to increase the probability that the customer takes a loan. Each treatment has a cost. Thus, when defining policies for prescribing treatments to cases, managers need to consider the net gain of the treatments. Also, the effect of a treatment varies over time: treating a case earlier may be more effective than later in a case. This paper presents a prescriptive monitoring method that automates this decision-making task. The method combines causal inference and reinforcement learning to learn treatment policies that maximize the net gain. The method leverages a conformal prediction technique to speed up the convergence of the reinforcement learning mechanism by separating cases that are likely to end up in a positive or negative outcome, from uncertain cases. An evaluation on two real-life datasets shows that the proposed method outperforms a state-of-the-art baseline.
9

Sridharan, Mohan, and Sarah Rainge. "Integrating Reinforcement Learning and Declarative Programming to Learn Causal Laws in Dynamic Domains." In Social Robotics, 320–29. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-11973-1_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Swan, Jerry, Eric Nivel, Neel Kant, Jules Hedges, Timothy Atkinson, and Bas Steunebrink. "Background." In The Road to General Intelligence, 7–15. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-08020-3_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
AbstractRecent years have seen an explosion in academic, industrial, and popular interest in AI, as exemplified by machine learning and primarily driven by the widely-reported successes of deep- and reinforcement learning (e.g. [314, 315, 351]). Deep learning is essentially predicated on the notion that, with a sufficiently large training set, the statistical correlations captured by training will actually be causal [310]. However, in the absence of convergence theorems to support this, it remains a hypothesis. Indeed, insofar as there is evidence, it increasingly indicates to the contrary, since the application of enormous volumes of computational effort has still failed to deliver models with the generalization capability of an infant. There is accordingly increasing discussion about what further conceptual or practical insights might be required [57]. At the time of writing, the very definition of deep learning is in flux, with one Turing Award laureate defining it as “a way to try to make machines intelligent by allowing computers to learn from examples” and another as “differentiable programming”. We argue in the following that deep learning is highly unlikely to yield intelligence, at the very least while it equates intelligence with “solving a regression problem”.

Conference papers on the topic "Causal reinforcement learning":

1

Blübaum, Lukas, and Stefan Heindorf. "Causal Question Answering with Reinforcement Learning." In WWW '24: The ACM Web Conference 2024. New York, NY, USA: ACM, 2024. http://dx.doi.org/10.1145/3589334.3645610.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhu, Wenxuan, Chao Yu, and Qiang Zhang. "Causal Deep Reinforcement Learning Using Observational Data." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/524.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Deep reinforcement learning (DRL) requires the collection of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.
3

Wang, Xiaoqiang, Yali Du, Shengyu Zhu, Liangjun Ke, Zhitang Chen, Jianye Hao, and Jun Wang. "Ordering-Based Causal Discovery with Reinforcement Learning." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/491.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scale problems. In this work, we propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. Specifically, we formulate the ordering search problem as a multi-step Markov decision process, implement the ordering generating process with an encoder-decoder architecture, and finally use RL to optimize the proposed model based on the reward mechanisms designed for each ordering. A generated ordering would then be processed using variable selection to obtain the final causal graph. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training. Experimental results on both synthetic and real data sets shows that the proposed method achieves a much improved performance over existing RL-based method.
4

Ashton, Hal. "Causal Campbell-Goodhart’s Law and Reinforcement Learning." In 13th International Conference on Agents and Artificial Intelligence. SCITEPRESS - Science and Technology Publications, 2021. http://dx.doi.org/10.5220/0010197300670073.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Ma, Hao, Zhiqiang Pu, Yi Pan, Boyin Liu, Junlong Gao, and Zhenyu Guo. "Causal Mean Field Multi-Agent Reinforcement Learning." In 2023 International Joint Conference on Neural Networks (IJCNN). IEEE, 2023. http://dx.doi.org/10.1109/ijcnn54540.2023.10191654.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Yu, Zhongwei, Jingqing Ruan, and Dengpeng Xing. "Explainable Reinforcement Learning via a Causal World Model." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/505.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Generating explanations for reinforcement learning (RL) is challenging as actions may produce long-term effects on the future. In this paper, we develop a novel framework for explainable RL by learning a causal world model without prior knowledge of the causal structure of the environment. The model captures the influence of actions, allowing us to interpret the long-term effects of actions through causal chains, which present how actions influence environmental variables and finally lead to rewards. Different from most explanatory models which suffer from low accuracy, our model remains accurate while improving explainability, making it applicable in model-based learning. As a result, we demonstrate that our causal model can serve as the bridge between explainability and learning.
7

Sankar, Namasi G., Ankit Khandelwal, and M. Girish Chandra. "Quantum-Enhanced Resilient Reinforcement Learning Using Causal Inference." In 2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS). IEEE, 2024. http://dx.doi.org/10.1109/comsnets59351.2024.10427302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Méndez-Molina, Arquímides. "Combining Reinforcement Learning and Causal Models for Robotics Applications." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/684.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The relation between Reinforcement learning (RL) and Causal Modeling(CM) is an underexplored area with untapped potential for any learning task. In this extended abstract of our Ph.D. research proposal, we present a way to combine both areas to improve their respective learning processes, especially in the context of our application area (service robotics). The preliminary results obtained so far are a good starting point for thinking about the success of our research project.
9

Bloem, Michael, and Nicholas Bambos. "Infinite time horizon maximum causal entropy inverse reinforcement learning." In 2014 IEEE 53rd Annual Conference on Decision and Control (CDC). IEEE, 2014. http://dx.doi.org/10.1109/cdc.2014.7040156.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Siyu, Xiaocong Chen, Dietmar Jannach, and Lina Yao. "Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning." In SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3539618.3591648.

Full text
APA, Harvard, Vancouver, ISO, and other styles

To the bibliography