Log in

Relevant bibliographies by topics / Causal reinforcement learning / Journal articles

Journal articles on the topic 'Causal reinforcement learning'

To see the other types of publications on this topic, follow the link: Causal reinforcement learning.

Author: Grafiati

Published: 1 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Causal reinforcement learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Madumal, Prashan, Tim Miller, Liz Sonenberg, and Frank Vetere. "Explainable Reinforcement Learning through a Causal Lens." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 03 (April 3, 2020): 2493–500. http://dx.doi.org/10.1609/aaai.v34i03.5631.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Prominent theories in cognitive science propose that humans understand and represent the knowledge of the world through causal relationships. In making sense of the world, we build causal models in our mind to encode cause-effect relations of events and use these to explain why new events happen by referring to counterfactuals — things that did not happen. In this paper, we use causal models to derive causal explanations of the behaviour of model-free reinforcement learning agents. We present an approach that learns a structural causal model during reinforcement learning and encodes causal relationships between variables of interest. This model is then used to generate explanations of behaviour based on counterfactual analysis of the causal model. We computationally evaluate the model in 6 domains and measure performance and task prediction accuracy. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents' behaviour. We investigate: 1) participants' understanding gained by explanations through task prediction; 2) explanation satisfaction and 3) trust. Our results show that causal model explanations perform better on these measures compared to two other baseline explanation models.

2

Li, Dezhi, Yunjun Lu, Jianping Wu, Wenlu Zhou, and Guangjun Zeng. "Causal Reinforcement Learning for Knowledge Graph Reasoning." Applied Sciences 14, no. 6 (March 15, 2024): 2498. http://dx.doi.org/10.3390/app14062498.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Knowledge graph reasoning can deduce new facts and relationships, which is an important research direction of knowledge graphs. Most of the existing methods are based on end-to-end reasoning which cannot effectively use the knowledge graph, so consequently the performance of the method still needs to be improved. Therefore, we combine causal inference with reinforcement learning and propose a new framework for knowledge graph reasoning. By combining the counterfactual method in causal inference, our method can obtain more information as prior knowledge and integrate it into the control strategy in the reinforcement model. The proposed method mainly includes the steps of relationship importance identification, reinforcement learning framework design, policy network design, and the training and testing of the causal reinforcement learning model. Specifically, a prior knowledge table is first constructed to indicate which relationship is more important for the problem to be queried; secondly, designing state space, optimization, action space, state transition and reward, respectively, is undertaken; then, the standard value is set and compared with the weight value of each candidate edge, and an action strategy is selected according to the comparison result through prior knowledge or neural network; finally, the parameters of the reinforcement learning model are determined through training and testing. We used four datasets to compare our method to the baseline method and conducted ablation experiments. On dataset NELL-995 and FB15k-237, the experimental results show that the MAP scores of our method are 87.8 and 45.2, and the optimal performance is achieved.

3

Yang, Dezhi, Guoxian Yu, Jun Wang, Zhengtian Wu, and Maozu Guo. "Reinforcement Causal Structure Learning on Order Graph." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 10737–44. http://dx.doi.org/10.1609/aaai.v37i9.26274.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Learning directed acyclic graph (DAG) that describes the causality of observed data is a very challenging but important task. Due to the limited quantity and quality of observed data, and non-identifiability of causal graph, it is almost impossible to infer a single precise DAG. Some methods approximate the posterior distribution of DAGs to explore the DAG space via Markov chain Monte Carlo (MCMC), but the DAG space is over the nature of super-exponential growth, accurately characterizing the whole distribution over DAGs is very intractable. In this paper, we propose Reinforcement Causal Structure Learning on Order Graph (RCL-OG) that uses order graph instead of MCMC to model different DAG topological orderings and to reduce the problem size. RCL-OG first defines reinforcement learning with a new reward mechanism to approximate the posterior distribution of orderings in an efficacy way, and uses deep Q-learning to update and transfer rewards between nodes. Next, it obtains the probability transition model of nodes on order graph, and computes the posterior probability of different orderings. In this way, we can sample on this model to obtain the ordering with high probability. Experiments on synthetic and benchmark datasets show that RCL-OG provides accurate posterior probability approximation and achieves better results than competitive causal discovery algorithms.

4

Madumal, Prashan. "Explainable Agency in Reinforcement Learning Agents." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (April 3, 2020): 13724–25. http://dx.doi.org/10.1609/aaai.v34i10.7134.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This thesis explores how reinforcement learning (RL) agents can provide explanations for their actions and behaviours. As humans, we build causal models to encode cause-effect relations of events and use these to explain why events happen. Taking inspiration from cognitive psychology and social science literature, I build causal explanation models and explanation dialogue models for RL agents. By mimicking human-like explanation models, these agents can provide explanations that are natural and intuitive to humans.

5

Herlau, Tue, and Rasmus Larsen. "Reinforcement Learning of Causal Variables Using Mediation Analysis." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6910–17. http://dx.doi.org/10.1609/aaai.v36i6.20648.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

We consider the problem of acquiring causal representations and concepts in a reinforcement learning setting. Our approach defines a causal variable as being both manipulable by a policy, and able to predict the outcome. We thereby obtain a parsimonious causal graph in which interventions occur at the level of policies. The approach avoids defining a generative model of the data, prior pre-processing, or learning the transition kernel of the Markov decision process. Instead, causal variables and policies are determined by maximizing a new optimization target inspired by mediation analysis, which differs from the expected return. The maximization is accomplished using a generalization of Bellman's equation which is shown to converge, and the method finds meaningful causal representations in a simulated environment.

6

Duong, Tri Dung, Qian Li, and Guandong Xu. "Stochastic intervention for causal inference via reinforcement learning." Neurocomputing 482 (April 2022): 40–49. http://dx.doi.org/10.1016/j.neucom.2022.01.086.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Zhang, Wei, Xuesong Wang, Haoyu Wang, and Yuhu Cheng. "Causal Meta-Reinforcement Learning for Multimodal Remote Sensing Data Classification." Remote Sensing 16, no. 6 (March 16, 2024): 1055. http://dx.doi.org/10.3390/rs16061055.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Multimodal remote sensing data classification can enhance a model’s ability to distinguish land features through multimodal data fusion. In this context, how to help models understand the relationship between multimodal data and target tasks has become the focus of researchers. Inspired by the human feedback learning mechanism, causal reasoning mechanism, and knowledge induction mechanism, this paper integrates causal learning, reinforcement learning, and meta learning into a unified remote sensing data classification framework and proposes causal meta-reinforcement learning (CMRL). First, based on the feedback learning mechanism, we overcame the limitations of traditional implicit optimization of fusion features and customized a reinforcement learning environment for multimodal remote sensing data classification tasks. Through feedback interactive learning between agents and the environment, we helped the agents understand the complex relationships between multimodal data and labels, thereby achieving full mining of multimodal complementary information.Second, based on the causal inference mechanism, we designed causal distribution prediction actions, classification rewards, and causal intervention rewards, capturing pure causal factors in multimodal data and preventing false statistical associations between non-causal factors and class labels. Finally, based on the knowledge induction mechanism, we designed a bi-layer optimization mechanism based on meta-learning. By constructing a meta training task and meta validation task simulation model in the generalization scenario of unseen data, we helped the model induce cross-task shared knowledge, thereby improving its generalization ability for unseen multimodal data. The experimental results on multiple sets of multimodal datasets showed that the proposed method achieved state-of-the-art performance in multimodal remote sensing data classification tasks.

8

Veselic, Sebastijan, Gerhard Jocham, Christian Gausterer, Bernhard Wagner, Miriam Ernhoefer-Reßler, Rupert Lanzenberger, Christoph Eisenegger, Claus Lamm, and Annabel Losecaat Vermeer. "A causal role of estradiol in human reinforcement learning." Hormones and Behavior 134 (August 2021): 105022. http://dx.doi.org/10.1016/j.yhbeh.2021.105022.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Zhou, Zhengyuan, Michael Bloem, and Nicholas Bambos. "Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning." IEEE Transactions on Automatic Control 63, no. 9 (September 2018): 2787–802. http://dx.doi.org/10.1109/tac.2017.2775960.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Wang, Zizhao, Caroline Wang, Xuesu Xiao, Yuke Zhu, and Peter Stone. "Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (March 24, 2024): 15778–86. http://dx.doi.org/10.1609/aaai.v38i14.29507.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on two manipulation environments and four tasks reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.

11

Du, Xiao, Yutong Ye, Pengyu Zhang, Yaning Yang, Mingsong Chen, and Ting Wang. "Situation-Dependent Causal Influence-Based Cooperative Multi-Agent Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 16 (March 24, 2024): 17362–70. http://dx.doi.org/10.1609/aaai.v38i16.29684.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Learning to collaborate has witnessed significant progress in multi-agent reinforcement learning (MARL). However, promoting coordination among agents and enhancing exploration capabilities remain challenges. In multi-agent environments, interactions between agents are limited in specific situations. Effective collaboration between agents thus requires a nuanced understanding of when and how agents' actions influence others.To this end, in this paper, we propose a novel MARL algorithm named Situation-Dependent Causal Influence-Based Cooperative Multi-agent Reinforcement Learning (SCIC), which incorporates a novel Intrinsic reward mechanism based on a new cooperation criterion measured by situation-dependent causal influence among agents.Our approach aims to detect inter-agent causal influences in specific situations based on the criterion using causal intervention and conditional mutual information. This effectively assists agents in exploring states that can positively impact other agents, thus promoting cooperation between agents.The resulting update links coordinated exploration and intrinsic reward distribution, which enhance overall collaboration and performance.Experimental results on various MARL benchmarks demonstrate the superiority of our method compared to state-of-the-art approaches.

12

Skalse, Joar, and Alessandro Abate. "Misspecification in Inverse Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 15136–43. http://dx.doi.org/10.1609/aaai.v37i12.26766.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function R from a policy pi. To do this, we need a model of how pi relates to R. In the current literature, the most common models are optimality, Boltzmann rationality, and causal entropy maximisation. One of the primary motivations behind IRL is to infer human preferences from human behaviour. However, the true relationship between human preferences and human behaviour is much more complex than any of the models currently used in IRL. This means that they are misspecified, which raises the worry that they might lead to unsound inferences if applied to real-world data. In this paper, we provide a mathematical analysis of how robust different IRL models are to misspecification, and answer precisely how the demonstrator policy may differ from each of the standard models before that model leads to faulty inferences about the reward function R. We also introduce a framework for reasoning about misspecification in IRL, together with formal tools that can be used to easily derive the misspecification robustness of new IRL models.

13

Buehner, Marc J., and Jon May. "Abolishing the effect of reinforcement delay on human causal learning." Quarterly Journal of Experimental Psychology Section B 57, no. 2b (April 2004): 179–91. http://dx.doi.org/10.1080/02724990344000123.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Yang, Shantian, Bo Yang, Zheng Zeng, and Zhongfeng Kang. "Causal inference multi-agent reinforcement learning for traffic signal control." Information Fusion 94 (June 2023): 243–56. http://dx.doi.org/10.1016/j.inffus.2023.02.009.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Mutti, Mirco, Riccardo De Santi, Emanuele Rossi, Juan Felipe Calderon, Michael Bronstein, and Marcello Restelli. "Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 9251–59. http://dx.doi.org/10.1609/aaai.v37i8.26109.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments. The agent then must be able to approximately solve any planning task defined over any environment in the original set, relying on the above interactions only. Can we design a provably efficient algorithm that achieves this ambitious goal of systematic generalization? In this paper, we give a partially positive answer to this question. First, we provide a tractable formulation of systematic generalization by employing a causal viewpoint. Then, under specific structural assumptions, we provide a simple learning algorithm that guarantees any desired planning error up to an unavoidable sub-optimality term, while showcasing a polynomial sample complexity.

16

Eka, Eka Madya, Yunyun Yudiana, and Komarudin. "Effect of reinceforcement on physical learning on motivation learning." Gladi : Jurnal Ilmu Keolahragaan 13, no. 1 (March 31, 2022): 41–46. http://dx.doi.org/10.21009/gjik.131.04.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

In this study, the author aims to determine the effect of reinforcement on physical learning on student motivation at SMPN 5 Cirebon. The method used by this researcher is a causal comparative with an ex post facto design. The population in this study were all students of SMPN 5 Cirebon with a sampling technique, namely Cluster Random Sampling with a total sample of 64 people consisting of class VIII 1 students and class VIII 2 students. The instrument used to collect data was using a questionnaire. motivation to learn. The questionnaire was used to measure the effect of reinforcement on learning motivation. The results of the data description can be seen that the effect of reinforcement on student motivation in class VIII 1 has an average value of 68.22 with a standard deviation of 8.315, the lowest value is 56 and the highest value is 100. In class VIII 2 the effect of reinforcement on student learning motivation has an average The average is 68.59 with a standard deviation of 6.997, the lowest value is 52 and the highest value is 80. The results of data processing show the value of sig. greater than 0.05 or 0.506, it can be concluded that the use of reinforcement in physical education learning can increase students' learning motivation.

17

Mehta, Neville, Soumya Ray, Prasad Tadepalli, and Thomas Dietterich. "Automatic Discovery and Transfer of Task Hierarchies in Reinforcement Learning." AI Magazine 32, no. 1 (March 16, 2011): 35. http://dx.doi.org/10.1609/aimag.v32i1.2342.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Sequential decision tasks present many opportunities for the study of transfer learning. A principal one among them is the existence of multiple domains that share the same underlying causal structure for actions. We describe an approach that exploits this shared causal structure to discover a hierarchical task structure in a source domain, which in turn speeds up learning of task execution knowledge in a new target domain. Our approach is theoretically justiﬁed and compares favorably to manually designed task hierarchies in learning efﬁciency in the target domain. We demonstrate that causally motivated task hierarchies transfer more robustly than other kinds of detailed knowledge that depend on the idiosyncrasies of the source domain and are hence less transferable.

18

Valverde, Gabriel, David Quesada, Pedro Larrañaga, and Concha Bielza. "Causal reinforcement learning based on Bayesian networks applied to industrial settings." Engineering Applications of Artificial Intelligence 125 (October 2023): 106657. http://dx.doi.org/10.1016/j.engappai.2023.106657.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Sun, Yuewen, Erli Wang, Biwei Huang, Chaochao Lu, Lu Feng, Changyin Sun, and Kun Zhang. "ACAMDA: Improving Data Efficiency in Reinforcement Learning through Guided Counterfactual Data Augmentation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (March 24, 2024): 15193–201. http://dx.doi.org/10.1609/aaai.v38i14.29442.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Data augmentation plays a crucial role in improving the data efficiency of reinforcement learning (RL). However, the generation of high-quality augmented data remains a significant challenge. To overcome this, we introduce ACAMDA (Adversarial Causal Modeling for Data Augmentation), a novel framework that integrates two causality-based tasks: causal structure recovery and counterfactual estimation. The unique aspect of ACAMDA lies in its ability to recover temporal causal relationships from limited non-expert datasets. The identification of the sequential cause-and-effect allows the creation of realistic yet unobserved scenarios. We utilize this characteristic to generate guided counterfactual datasets, which, in turn, substantially reduces the need for extensive data collection. By simulating various state-action pairs under hypothetical actions, ACAMDA enriches the training dataset for diverse and heterogeneous conditions. Our experimental evaluation shows that ACAMDA outperforms existing methods, particularly when applied to novel and unseen domains.

20

Liefeng Zhu, Liefeng Zhu, and Yongbiao Luo Liefeng Zhu. "Application of Bayesian Networks and Reinforcement Learning in Intelligent Control Systems in Uncertain Environments." 電腦學刊 35, no. 2 (April 2024): 001–16. http://dx.doi.org/10.53106/199115992024043502001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

<p>Reinforcement learning is a machine learning paradigm that focuses on how an agent can perform actions in an environment to achieve a certain goal. The agent learns through interaction with the environment, observing the state and making decisions to maximize its reward. Reinforcement learning has wide applications in intelligent control systems. However, one limitation of reinforcement learning is the uncertainty in handling the environment model. Usually, reinforcement learning is performed without a clear model, which requires estimating environmental uncertainty and state transitions. Bayesian Networks are effective in modeling uncertainty, which can aid in establishing a probabilistic model of environmental dynamics. This allows for the integration of uncertainty information into the environmental model, leading to a more accurate understanding of the dynamic characteristics of the environment. In this study, we propose a reinforcement learning algorithm based on Bayesian Networks. We utilize optimal generalized residual differentiation, parallel integration causal directional reasoning, and other modeling techniques to address reinforcement learning tasks. The main idea is to utilize the prior distribution to estimate the uncertainty of unknown parameters. Then, the obtained observation information is used to calculate the posterior distribution in order to acquire knowledge. Experiments demonstrate that this approach is feasible in intelligent control systems operating in uncertain environments.</p> <p> </p>

21

Buehner, Marc J., and Jon May. "Rethinking Temporal Contiguity and the Judgement of Causality: Effects of Prior Knowledge, Experience, and Reinforcement Procedure." Quarterly Journal of Experimental Psychology Section A 56, no. 5 (July 2003): 865–90. http://dx.doi.org/10.1080/02724980244000675.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Time plays a pivotal role in causal inference. Nonetheless most contemporary theories of causal induction do not address the implications of temporal contiguity and delay, with the exception of associative learning theory. Shanks, Pearson, and Dickinson (1989) and several replications (Reed, 1992, 1999) have demonstrated that people fail to identify causal relations if cause and effect are separated by more than two seconds. In line with an associationist perspective, these findings have been interpreted to indicate that temporal lags universally impair causal induction. This interpretation clashes with the richness of everyday causal cognition where people apparently can reason about causal relations involving considerable delays. We look at the implications of cause-effect delays from a computational perspective and predict that delays should generally hinder reasoning performance, but that this hindrance should be alleviated if reasoners have knowledge of the delay. Two experiments demonstrated that (1) the impact of delay on causal judgement depends on participants’ expectations about the timeframe of the causal relation, and (2) the free-operant procedures used in previous studies are ill-suited to study the direct influences of delay on causal induction, because they confound delay with weaker evidence for the relation in question. Implications for contemporary causal learning theories are discussed.

22

Sanghvi, Navyata, Shinnosuke Usami, Mohit Sharma, Joachim Groeger, and Kris Kitani. "Inverse Reinforcement Learning with Explicit Policy Estimates." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 9472–80. http://dx.doi.org/10.1609/aaai.v35i11.17141.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Various methods for solving the inverse reinforcement learning (IRL) problem have been developed independently in machine learning and economics. In particular, the method of Maximum Causal Entropy IRL is based on the perspective of entropy maximization, while related advances in the field of economics instead assume the existence of unobserved action shocks to explain expert behavior (Nested Fixed Point Algorithm, Conditional Choice Probability method, Nested Pseudo-Likelihood Algorithm). In this work, we make previously unknown connections between these related methods from both fields. We achieve this by showing that they all belong to a class of optimization problems, characterized by a common form of the objective, the associated policy and the objective gradient. We demonstrate key computational and algorithmic differences which arise between the methods due to an approximation of the optimal soft value function, and describe how this leads to more efficient algorithms. Using insights which emerge from our study of this class of optimization problems, we identify various problem scenarios and investigate each method's suitability for these problems.

23

Agarwal, Anish. "Causal Inference for Social and Engineering Systems." ACM SIGMETRICS Performance Evaluation Review 50, no. 3 (December 30, 2022): 7–11. http://dx.doi.org/10.1145/3579342.3579345.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

What will happen to Y if we do A? A variety of meaningful social and engineering questions can be formulated this way: What will happen to a patient's health if they are given a new therapy? What will happen to a country's economy if policy-makers legislate a new tax? What will happen to a data center's latency if a new congestion control protocol is used? We explore how to answer such counterfactual questions using observational data-which is increasingly available due to digitization and pervasive sensors-and/or very limited experimental data. The two key challenges are: (i) counterfactual prediction in the presence of latent confounders; (ii) estimation with modern datasets which are high-dimensional, noisy, and sparse. The key framework we introduce is connecting causal inference with tensor completion. In particular, we represent the various potential outcomes (i.e., counterfactuals) of interest through an order-3 tensor. The key theoretical results presented are: (i) Formal identification results establishing under what missingness patterns, latent confounding, and structure on the tensor is recovery of unobserved potential outcomes possible. (ii) Introducing novel estimators to recover these unobserved potential outcomes and proving they are finite-sample consistent and asymptotically normal. Finally, we discuss connections between matrix/tensor completion and time series analysis and reinforcement learning; we believe this could serve as a basis to do counterfactual forecasting, and building data-driven simulators for reinforcement learning.

24

Gao, Haichuan, Tianren Zhang, Zhile Yang, Yuqing Guo, Jinsheng Ren, Shangqi Guo, and Feng Chen. "Fast Counterfactual Inference for History-Based Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 7613–23. http://dx.doi.org/10.1609/aaai.v37i6.25924.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Incorporating sequence-to-sequence models into history-based Reinforcement Learning (RL) provides a general way to extend RL to partially-observable tasks. This method compresses history spaces according to the correlations between historical observations and the rewards. However, they do not adjust for the confounding correlations caused by data sampling and assign high beliefs to uninformative historical observations, leading to limited compression of history spaces. Counterfactual Inference (CI), which estimates causal effects by single-variable intervention, is a promising way to adjust for confounding. However, it is computationally infeasible to directly apply the single-variable intervention to a huge number of historical observations. This paper proposes to perform CI on observation sub-spaces instead of single observations and develop a coarse-to-fine CI algorithm, called Tree-based History Counterfactual Inference (T-HCI), to reduce the number of interventions exponentially. We show that T-HCI is computationally feasible in practice and brings significant sample efficiency gains in various challenging partially-observable tasks, including Maze, BabyAI, and robot manipulation tasks.

25

Martinez-Gil, Francisco, Miguel Lozano, Ignacio García-Fernández, Pau Romero, Dolors Serra, and Rafael Sebastián. "Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations." Mathematics 8, no. 9 (September 2, 2020): 1479. http://dx.doi.org/10.3390/math8091479.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

26

Lee, Kyungjae, Sungjoon Choi, and Songhwai Oh. "Sparse Markov Decision Processes With Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning." IEEE Robotics and Automation Letters 3, no. 3 (July 2018): 1466–73. http://dx.doi.org/10.1109/lra.2018.2800085.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Ghorbel, N., S.-A. Addouche, and A. El Mhamedi. "Forward management of spare parts stock shortages via causal reasoning using reinforcement learning." IFAC-PapersOnLine 48, no. 3 (2015): 1061–66. http://dx.doi.org/10.1016/j.ifacol.2015.06.224.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Nadim, Karim, Mohamed-Salah Ouali, Hakim Ghezzaz, and Ahmed Ragab. "Learn-to-supervise: Causal reinforcement learning for high-level control in industrial processes." Engineering Applications of Artificial Intelligence 126 (November 2023): 106853. http://dx.doi.org/10.1016/j.engappai.2023.106853.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Zhu, Zheng-Mao, Shengyi Jiang, Yu-Ren Liu, Yang Yu, and Kun Zhang. "Invariant Action Effect Model for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 9260–68. http://dx.doi.org/10.1609/aaai.v36i8.20913.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Good representations can help RL agents perform concise modeling of their surroundings, and thus support effective decision-making in complex environments. Previous methods learn good representations by imposing extra constraints on dynamics. However, in the causal perspective, the causation between the action and its effect is not fully considered in those methods, which leads to the ignorance of the underlying relations among the action effects on the transitions. Based on the intuition that the same action always causes similar effects among different states, we induce such causation by taking the invariance of action effects among states as the relation. By explicitly utilizing such invariance, in this paper, we show that a better representation can be learned and potentially improves the sample efficiency and the generalization ability of the learned policy. We propose Invariant Action Effect Model (IAEM) to capture the invariance in action effects, where the effect of an action is represented as the residual of representations from neighboring states. IAEM is composed of two parts: (1) a new contrastive-based loss to capture the underlying invariance of action effects; (2) an individual action effect and provides a self-adapted weighting strategy to tackle the corner cases where the invariance does not hold. The extensive experiments on two benchmarks, i.e. Grid-World and Atari, show that the representations learned by IAEM preserve the invariance of action effects. Moreover, with the invariant action effect, IAEM can accelerate the learning process by 1.6x, rapidly generalize to new environments by fine-tuning on a few components, and outperform other dynamics-based representation methods by 1.4x in limited steps.

30

Zhou, Haoran, Junliang Lu, Ziyu Li, and Xinyi Zhang. "Study on whether marriage affects depression based on causal inference." Applied and Computational Engineering 6, no. 1 (June 14, 2023): 1661–72. http://dx.doi.org/10.54254/2755-2721/6/20230827.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

This paper applies causal-based machine learning algorithms to evaluate the causal effect of marriage on depression. The paper verifies the reinforcement of adopting causal inference through the relationship between causality and correlation and confounding bias and selection bias. In this paper, we firstly implement meta learner to estimate and analyse the causal effects. Considering the influence of confounding factors, we utilize two stages of least squares estimation and deep IV estimation based on instrumental variables to fully evaluate the causal effects. The evaluation of linear and nonlinear models shows different results, which is worthy of discussion in future studies. In conclusion, people in the rural region who get married are slightly less likely to get depressed in the future.

31

Djeumou, Franck, Murat Cubuktepe, Craig Lennon, and Ufuk Topcu. "Task-Guided Inverse Reinforcement Learning under Partial Information." Proceedings of the International Conference on Automated Planning and Scheduling 32 (June 13, 2022): 53–61. http://dx.doi.org/10.1609/icaps.v32i1.19785.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

We study the problem of inverse reinforcement learning (IRL), where the learning agent recovers a reward function using expert demonstrations. Most of the existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). The algorithm addresses several limitations of existing techniques that do not take the information asymmetry between the expert and the learner into account. First, it adopts causal entropy as the measure of the likelihood of the expert demonstrations as opposed to entropy in most existing IRL techniques, and avoids a common source of algorithmic complexity. Second, it incorporates task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations and may reduce the information asymmetry. Nevertheless, the resulting formulation is still nonconvex due to the intrinsic nonconvexity of the so-called forward problem, i.e., computing an optimal policy given a reward function, in POMDPs. We address this nonconvexity through sequential convex programming and introduce several extensions to solve the forward problem in a scalable manner. This scalability allows computing policies that incorporate memory at the expense of added computational cost yet also outperform memoryless policies. We demonstrate that, even with severely limited data, the algorithm learns reward functions and policies that satisfy the task and induce a similar behavior to the expert by leveraging the side information and incorporating memory into the policy.

32

Edmonds, Mark, Xiaojian Ma, Siyuan Qi, Yixin Zhu, Hongjing Lu, and Song-Chun Zhu. "Theory-Based Causal Transfer:Integrating Instance-Level Induction and Abstract-Level Structure Learning." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 02 (April 3, 2020): 1283–91. http://dx.doi.org/10.1609/aaai.v34i02.5483.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Learning transferable knowledge across similar but different settings is a fundamental component of generalized intelligence. In this paper, we approach the transfer learning challenge from a causal theory perspective. Our agent is endowed with two basic yet general theories for transfer learning: (i) a task shares a common abstract structure that is invariant across domains, and (ii) the behavior of specific features of the environment remain constant across domains. We adopt a Bayesian perspective of causal theory induction and use these theories to transfer knowledge between environments. Given these general theories, the goal is to train an agent by interactively exploring the problem space to (i) discover, form, and transfer useful abstract and structural knowledge, and (ii) induce useful knowledge from the instance-level attributes observed in the environment. A hierarchy of Bayesian structures is used to model abstract-level structural causal knowledge, and an instance-level associative learning scheme learns which specific objects can be used to induce state changes through interaction. This model-learning scheme is then integrated with a model-based planner to achieve a task in the OpenLock environment, a virtual “escape room” with a complex hierarchy that requires agents to reason about an abstract, generalized causal structure. We compare performances against a set of predominate model-free reinforcement learning (RL) algorithms. RL agents showed poor ability transferring learned knowledge across different trials. Whereas the proposed model revealed similar performance trends as human learners, and more importantly, demonstrated transfer behavior across trials and learning situations.1

33

Wang, Yuchen, Mitsuhiro Hayashibe, and Dai Owaki. "Data-Driven Policy Learning Methods from Biological Behavior: A Systematic Review." Applied Sciences 14, no. 10 (May 9, 2024): 4038. http://dx.doi.org/10.3390/app14104038.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Policy learning enables agents to learn how to map states to actions, thus enabling adaptive and flexible behavioral generation in complex environments. Policy learning methods are fundamental to reinforcement learning techniques. However, as problem complexity and the requirement for motion flexibility increase, traditional methods that rely on manual design have revealed their limitations. Conversely, data-driven policy learning focuses on extracting strategies from biological behavioral data and aims to replicate these behaviors in real-world environments. This approach enhances the adaptability of agents to dynamic substrates. Furthermore, this approach has been extensively applied in autonomous driving, robot control, and interpretation of biological behavior. In this review, we survey developments in data-driven policy-learning algorithms over the past decade. We categorized them into the following three types according to the purpose of the method: (1) imitation learning (IL), (2) inverse reinforcement learning (IRL), and (3) causal policy learning (CPL). We describe the classification principles, methodologies, progress, and applications of each category in detail. In addition, we discuss the distinct features and practical applications of these methods. Finally, we explore the challenges these methods face and prospective directions for future research.

34

Barnby, Joseph M., Mitul A. Mehta, and Michael Moutoussis. "The computational relationship between reinforcement learning, social inference, and paranoia." PLOS Computational Biology 18, no. 7 (July 25, 2022): e1010326. http://dx.doi.org/10.1371/journal.pcbi.1010326.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Theoretical accounts suggest heightened uncertainty about the state of the world underpin aberrant belief updates, which in turn increase the risk of developing a persecutory delusion. However, this raises the question as to how an agent’s uncertainty may relate to the precise phenomenology of paranoia, as opposed to other qualitatively different forms of belief. We tested whether the same population (n = 693) responded similarly to non-social and social contingency changes in a probabilistic reversal learning task and a modified repeated reversal Dictator game, and the impact of paranoia on both. We fitted computational models that included closely related parameters that quantified the rigidity across contingency reversals and the uncertainty about the environment/partner. Consistent with prior work we show that paranoia was associated with uncertainty around a partner’s behavioural policy and rigidity in harmful intent attributions in the social task. In the non-social task we found that pre-existing paranoia was associated with larger decision temperatures and commitment to suboptimal cards. We show relationships between decision temperature in the non-social task and priors over harmful intent attributions and uncertainty over beliefs about partners in the social task. Our results converge across both classes of model, suggesting paranoia is associated with a general uncertainty over the state of the world (and agents within it) that takes longer to resolve, although we demonstrate that this uncertainty is expressed asymmetrically in social contexts. Our model and data allow the representation of sociocognitive mechanisms that explain persecutory delusions and provide testable, phenomenologically relevant predictions for causal experiments.

35

Mokhtarian, Ehsan, Mohmmadsadegh Khorasani, Jalal Etesami, and Negar Kiyavash. "Novel Ordering-Based Approaches for Causal Structure Learning in the Presence of Unobserved Variables." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (June 26, 2023): 12260–68. http://dx.doi.org/10.1609/aaai.v37i10.26445.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

We propose ordering-based approaches for learning the maximal ancestral graph (MAG) of a structural equation model (SEM) up to its Markov equivalence class (MEC) in the presence of unobserved variables. Existing ordering-based methods in the literature recover a graph through learning a causal order (c-order). We advocate for a novel order called removable order (r-order) as they are advantageous over c-orders for structure learning. This is because r-orders are the minimizers of an appropriately defined optimization problem that could be either solved exactly (using a reinforcement learning approach) or approximately (using a hill-climbing search). Moreover, the r-orders (unlike c-orders) are invariant among all the graphs in a MEC and include c-orders as a subset. Given that set of r-orders is often significantly larger than the set of c-orders, it is easier for the optimization problem to find an r-order instead of a c-order. We evaluate the performance and the scalability of our proposed approaches on both real-world and randomly generated networks.

36

Yang, Chao-Han Huck, I.-Te Danny Hung, Yi Ouyang, and Pin-Yu Chen. "Training a Resilient Q-network against Observational Interference." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8814–22. http://dx.doi.org/10.1609/aaai.v36i8.20862.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications. In practice, however, a DRL agent may receive faulty observation by abrupt interferences such as black-out, frozen-screen, and adversarial perturbation. How to design a resilient DRL algorithm against these rare but mission-critical and safety-crucial scenarios is an essential yet challenging task. In this paper, we consider a deep q-network (DQN) framework training with an auxiliary task of observational interferences such as artificial noises. Inspired by causal inference for observational interference, we propose a causal inference based DQN algorithm called causal inference Q-network (CIQ). We evaluate the performance of CIQ in several benchmark DQN environments with different types of interferences as auxiliary labels. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.

37

Hasanah, Uswatun, Luluk Salimah Oktavia, and Putri Silaturrahmi. "INCREASING STUDENTS’ LEARNING INTEREST THROUGH BLENDED LEARNING IN THE EDUCATIONAL PSYCHOLOGY COURSE." JURNAL PAJAR (Pendidikan dan Pengajaran) 7, no. 1 (January 31, 2023): 181. http://dx.doi.org/10.33578/pjr.v7i1.9069.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Education is a development, reinforcement, and improvement attempt at people’s ability and potential through teaching, guidance, example, and so on needed by themselves, other people, and nation and state. In fact, there are many students who believe that learning is not fun so their interest in learning decreases. Therefore, innovative learning is needed to increase students’ interest in learning. One of the attempts is to apply the Blended Learning model. This paper aims to determine whether there is a significant effect of blended learning on students’ learning interest in educational psychology courses. This paper is quantitative research with a causal type, which involved 43 third-semester students of the Islamic Education study program of IDIA Prenduan. Based on the findings of the data analysis tested through simple linear regression, the value of the t-count of 0.847 is smaller than t-table 2.019 at a significant level of 5%. It can be concluded that blended learning has an insignificant effect on students’ learning interest in Educational Psychology courses of the Islamic Education study program of IDIA in the academic year 2021-2022.

38

Weissengruber, Sebastian, Sang Wan Lee, John P. O’Doherty, and Christian C. Ruff. "Neurostimulation Reveals Context-Dependent Arbitration Between Model-Based and Model-Free Reinforcement Learning." Cerebral Cortex 29, no. 11 (March 19, 2019): 4850–62. http://dx.doi.org/10.1093/cercor/bhz019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Abstract While it is established that humans use model-based (MB) and model-free (MF) reinforcement learning in a complementary fashion, much less is known about how the brain determines which of these systems should control behavior at any given moment. Here we provide causal evidence for a neural mechanism that acts as a context-dependent arbitrator between both systems. We applied excitatory and inhibitory transcranial direct current stimulation over a region of the left ventrolateral prefrontal cortex previously found to encode the reliability of both learning systems. The opposing neural interventions resulted in a bidirectional shift of control between MB and MF learning. Stimulation also affected the sensitivity of the arbitration mechanism itself, as it changed how often subjects switched between the dominant system over time. Both of these effects depended on varying task contexts that either favored MB or MF control, indicating that this arbitration mechanism is not context-invariant but flexibly incorporates information about current environmental demands.

39

Zhang, Yuzhu, and Hao Xu. "Reconfigurable-Intelligent-Surface-Enhanced Dynamic Resource Allocation for the Social Internet of Electric Vehicle Charging Networks with Causal-Structure-Based Reinforcement Learning." Future Internet 16, no. 5 (May 11, 2024): 165. http://dx.doi.org/10.3390/fi16050165.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Charging stations and electric vehicle (EV) charging networks signify a significant advancement in technology as a frontier application of the Social Internet of Things (SIoT), presenting both challenges and opportunities for current 6G wireless networks. One primary challenge in this integration is limited wireless network resources, particularly when serving a large number of users within distributed EV charging networks in the SIoT. Factors such as congestion during EV travel, varying EV user preferences, and uncertainties in decision-making regarding charging station resources significantly impact system operation and network resource allocation. To address these challenges, this paper develops a novel framework harnessing the potential of emerging technologies, specifically reconfigurable intelligent surfaces (RISs) and causal-structure-enhanced asynchronous advantage actor–critic (A3C) reinforcement learning techniques. This framework aims to optimize resource allocation, thereby enhancing communication support within EV charging networks. Through the integration of RIS technology, which enables control over electromagnetic waves, and the application of causal reinforcement learning algorithms, the framework dynamically adjusts resource allocation strategies to accommodate evolving conditions in EV charging networks. An essential aspect of this framework is its ability to simultaneously meet real-world social requirements, such as ensuring efficient utilization of network resources. Numerical simulation results validate the effectiveness and adaptability of this approach in improving wireless network efficiency and enhancing user experience within the SIoT context. Through these simulations, it becomes evident that the developed framework offers promising solutions to the challenges posed by integrating the SIoT with EV charging networks.

40

Elder, Jacob, Tyler Davis, and Brent L. Hughes. "Learning About the Self: Motives for Coherence and Positivity Constrain Learning From Self-Relevant Social Feedback." Psychological Science 33, no. 4 (March 28, 2022): 629–47. http://dx.doi.org/10.1177/09567976211045934.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

People learn about themselves from social feedback, but desires for coherence and positivity constrain how feedback is incorporated into the self-concept. We developed a network-based model of the self-concept and embedded it in a reinforcement-learning framework to provide a computational account of how motivations shape self-learning from feedback. Participants ( N = 46 adult university students) received feedback while evaluating themselves on traits drawn from a causal network of trait semantics. Network-defined communities were assigned different likelihoods of positive feedback. Participants learned from positive feedback but dismissed negative feedback, as reflected by asymmetries in computational parameters that represent the incorporation of positive versus negative outcomes. Furthermore, participants were constrained in how they incorporated feedback: Self-evaluations changed less for traits that have more implications and are thus more important to the coherence of the network. We provide a computational explanation of how motives for coherence and positivity jointly constrain learning about the self from feedback, an explanation that makes testable predictions for future clinical research.

41

NISHINA, Kyosuke, and Shigeru FUJITA. "A World Model Reinforcement Learning Method That Is Not Distracted by Background Information by Using Representation Learning via Invariant Causal Mechanisms for Non-Contrastive Learning." Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 36, no. 1 (February 15, 2024): 571–81. http://dx.doi.org/10.3156/jsoft.36.1_571.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Kawato, Mitsuo, and Aurelio Cortese. "From internal models toward metacognitive AI." Biological Cybernetics 115, no. 5 (October 2021): 415–30. http://dx.doi.org/10.1007/s00422-021-00904-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

AbstractIn several papers published in Biological Cybernetics in the 1980s and 1990s, Kawato and colleagues proposed computational models explaining how internal models are acquired in the cerebellum. These models were later supported by neurophysiological experiments using monkeys and neuroimaging experiments involving humans. These early studies influenced neuroscience from basic, sensory-motor control to higher cognitive functions. One of the most perplexing enigmas related to internal models is to understand the neural mechanisms that enable animals to learn large-dimensional problems with so few trials. Consciousness and metacognition—the ability to monitor one’s own thoughts, may be part of the solution to this enigma. Based on literature reviews of the past 20 years, here we propose a computational neuroscience model of metacognition. The model comprises a modular hierarchical reinforcement-learning architecture of parallel and layered, generative-inverse model pairs. In the prefrontal cortex, a distributed executive network called the “cognitive reality monitoring network” (CRMN) orchestrates conscious involvement of generative-inverse model pairs in perception and action. Based on mismatches between computations by generative and inverse models, as well as reward prediction errors, CRMN computes a “responsibility signal” that gates selection and learning of pairs in perception, action, and reinforcement learning. A high responsibility signal is given to the pairs that best capture the external world, that are competent in movements (small mismatch), and that are capable of reinforcement learning (small reward-prediction error). CRMN selects pairs with higher responsibility signals as objects of metacognition, and consciousness is determined by the entropy of responsibility signals across all pairs. This model could lead to new-generation AI, which exhibits metacognition, consciousness, dimension reduction, selection of modules and corresponding representations, and learning from small samples. It may also lead to the development of a new scientific paradigm that enables the causal study of consciousness by combining CRMN and decoded neurofeedback.

43

Liu, Xiuwen, Xinghua Lei, Xin Li, and Sirui Chen. "Self-Interested Coalitional Crowdsensing for Multi-Agent Interactive Environment Monitoring." Sensors 24, no. 2 (January 14, 2024): 509. http://dx.doi.org/10.3390/s24020509.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

As a promising paradigm, mobile crowdsensing (MCS) takes advantage of sensing abilities and cooperates with multi-agent reinforcement learning technologies to provide services for users in large sensing areas, such as smart transportation, environment monitoring, etc. In most cases, strategy training for multi-agent reinforcement learning requires substantial interaction with the sensing environment, which results in unaffordable costs. Thus, environment reconstruction via extraction of the causal effect model from past data is an effective way to smoothly accomplish environment monitoring. However, the sensing environment is often so complex that the observable and unobservable data collected are sparse and heterogeneous, affecting the accuracy of the reconstruction. In this paper, we focus on developing a robust multi-agent environment monitoring framework, called self-interested coalitional crowdsensing for multi-agent interactive environment monitoring (SCC-MIE), including environment reconstruction and worker selection. In SCC-MIE, we start from a multi-agent generative adversarial imitation learning framework to introduce a new self-interested coalitional learning strategy, which forges cooperation between a reconstructor and a discriminator to learn the sensing environment together with the hidden confounder while providing interpretability on the results of environment monitoring. Based on this, we utilize the secretary problem to select suitable workers to collect data for accurate environment monitoring in a real-time manner. It is shown that SCC-MIE realizes a significant performance improvement in environment monitoring compared to the existing models.

44

Syarah, Evi, Asdar Asdar, and Mas'ud Muhamadiyah. "Pengaruh Pemberian Penguatan Terhadap Motivasi Belajar Siswa Pada Mata Pelajaran Bahasa Indonesia Kelas V SDN Se-Kecamatan Suppa Kabupaten Pinrang." Bosowa Journal of Education 2, no. 1 (December 24, 2021): 33–39. http://dx.doi.org/10.35965/bje.v2i1.1178.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Penelitian ini bertujuan (1) Untuk pendeskripsian jenis-jenis penguatan yang diberikan guru pada mata pelajaran bahasa Indonesia kelas V SDN Se- Kecamatan Suppa Kabupaten Pinrang, (2) Untuk pendeskripsian gambaran motivasi belajar siswa pada mata pelajaran bahasa Indonesia kelas V SDN Se- Kecamatan Suppa Kabupaten Pinrang, dan 3) Untuk pendeskripsian pengaruh pemberian penguatan terhadap motivasi belajar siswa pada mata pelajaran bahasa Indonesia kelas V SDN Se- Kecamatan Suppa Kabupaten Pinrang. Metode penelitian yang digunakan adalah Penelitian kuantitatif dengan menggunakan expostfacto (kausal komparatif). Berdasarkan hasil penelitian disimpulkan bahwa (1) Jenis-jenis pemberian penguatan guru pada Mata Pelajaran Bahasa Indonesia kelas V SDN Se- Kecamatan Suppa Kabupaten Pinrang secara maksimal tercapai melalui penguatan berupa verbal, nonverbal dan penghargaan yang diberikan kepada siswa yang telah menuntaskan proses pembelajaran yang diberikan oleh guru, (2) Gambaran motivasi belajar siswa pada Mata Pelajaran Bahasa Indonesia kelas V SDN Se- Kecamatan Suppa Kabupaten Pinrang berupa motivasi instrinsik dan motivasi ekstrinsik. Motivasi instrinsik berupa adanya motivasi diri dengan menumbuhkan rasa percaya dalam diri. Motivasi ekstrinsik berupa adanya dukungan atau pembelajaran sebaya, dan media pembelajaran yang mendukung dalam proses pembelajaran, dan (3) Hasil Rhitung 0.488 > Rtabel 0. 334. Nilai signifikansi pengaruh penguatan yang diberikan kepada siswa terhadap motivasi belajar sebesar 0,003 yang berarti lebih kecil dari 0,01 hal tersebut dapat disimpulkan data tersebut adalah adanya hubungan antara penguatan yang diberikan kepada siswa terhadap motivasi belajar. This study aims (1) to describe the types of reinforcement given by the teachers to the Indonesian language subject for class V Primary Schools in Suppa District, Pinrang Regency, (2) to describe the picture of student learning motivation in Indonesian language subject in class V Primary Schools in Suppa District, Pinrang Regency, and 3) to describe the effect of providing reinforcement on students' learning motivation in Indonesian language subject in class V Primary Schools in Suppa District, Pinrang Regency. The research method used is quantitative research using ex post facto (comparative causal). Based on the results of the study, it was concluded that (1) The types of teachers’ reinforcement in Indonesian Language Subject for class V Primary Schools in Suppa District, Pinrang Regency were maximally achieved through reinforcement in the form of verbal, nonverbal and awards given to students who had completed the given learning process by the Teachers, (2) The description of students' learning motivation in Indonesian Subject for Class V at Primary Schools in Suppa District, Pinrang Regency in the form of intrinsic motivation and extrinsic motivation. Intrinsic motivation is in the form of self-motivation by growing self-confidence. Extrinsic motivation in the form of support or peer learning, and learning media that supports the learning process and (3) the results of R-count 0.488 > R-table 0.334. The significance value of the effect of reinforcement given to students on learning motivation is 0.003 which means it is smaller than 0.01. It can be concluded from the data that there is a relationship between the reinforcement given to students and their motivation to learn.

45

Wang, Zhicheng, Biwei Huang, Shikui Tu, Kun Zhang, and Lei Xu. "DeepTrader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 1 (May 18, 2021): 643–50. http://dx.doi.org/10.1609/aaai.v35i1.16144.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Most existing reinforcement learning (RL)-based portfolio management models do not take into account the market conditions, which limits their performance in risk-return balancing. In this paper, we propose DeepTrader, a deep RL method to optimize the investment policy. In particular, to tackle the risk-return balancing problem, our model embeds macro market conditions as an indicator to dynamically adjust the proportion between long and short funds, to lower the risk of market fluctuations, with the negative maximum drawdown as the reward function. Additionally, the model involves a unit to evaluate individual assets, which learns dynamic patterns from historical data with the price rising rate as the reward function. Both temporal and spatial dependencies between assets are captured hierarchically by a specific type of graph structure. Particularly, we find that the estimated causal structure best captures the interrelationships between assets, compared to industry classification and correlation. The two units are complementary and integrated to generate a suitable portfolio which fits the market trend well and strikes a balance between return and risk effectively. Experiments on three well-known stock indexes demonstrate the superiority of DeepTrader in terms of risk-gain criteria.

46

Zhang, Xianjie, Yu Liu, Wenjun Li, and Chen Gong. "Pruning the Communication Bandwidth between Reinforcement Learning Agents through Causal Inference: An Innovative Approach to Designing a Smart Grid Power System." Sensors 22, no. 20 (October 13, 2022): 7785. http://dx.doi.org/10.3390/s22207785.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Electricity demands are increasing significantly and the traditional power grid system is facing huge challenges. As the desired next-generation power grid system, smart grid can provide secure and reliable power generation, and consumption, and can also realize the system’s coordinated and intelligent power distribution. Coordinating grid power distribution usually requires mutual communication between power distributors to accomplish coordination. However, the power network is complex, the network nodes are far apart, and the communication bandwidth is often expensive. Therefore, how to reduce the communication bandwidth in the cooperative power distribution process task is crucially important. One way to tackle this problem is to build mechanisms to selectively send out communications, which allow distributors to send information at certain moments and key states. The distributors in the power grid are modeled as reinforcement learning agents, and the communication bandwidth in the power grid can be reduced by optimizing the communication frequency between agents. Therefore, in this paper, we propose a model for deciding whether to communicate based on the causal inference method, Causal Inference Communication Model (CICM). CICM regards whether to communicate as a binary intervention variable, and determines which intervention is more effective by estimating the individual treatment effect (ITE). It offers the optimal communication strategy about whether to send information while ensuring task completion. This method effectively reduces the communication frequency between grid distributors, and at the same time maximizes the power distribution effect. In addition, we test the method in StarCraft II and 3D environment habitation experiments, which fully proves the effectiveness of the method.

47

McMilin, Emily. "Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 17 (March 24, 2024): 18778–88. http://dx.doi.org/10.1609/aaai.v38i17.29842.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Modern language modeling tasks are often underspecified: for a given token prediction, many words may satisfy the user’s intent of producing natural language at inference time, however only one word will minimize the task’s loss function at training time. We introduce a simple causal mechanism to describe the role underspecification plays in the generation of spurious correlations. Despite its simplicity, our causal model directly informs the development of two lightweight black-box evaluation methods, that we apply to gendered pronoun resolution tasks on a wide range of LLMs to 1) aid in the detection of inference-time task underspecification by exploiting 2) previously unreported gender vs. time and gender vs. location spurious correlations on LLMs with a range of A) sizes: from BERT-base to GPT-3.5, B) pre-training objectives: from masked & autoregressive language modeling to a mixture of these objectives, and C) training stages: from pre-training only to reinforcement learning from human feedback (RLHF). Code and open-source demos available at https://github.com/2dot71mily/uspec.

48

Palacios Garay, Jessica Paola, Jorge Luis Escalante, Juan Carlos Chumacero Calle, Inocenta Marivel Cavarjal Bautista, Segundo Perez-Saavedra, and Jose Nieto-Gamboa. "Impact of Emotional Style on Academic Goals in Pandemic Times." International Journal of Higher Education 9, no. 9 (November 2, 2020): 21. http://dx.doi.org/10.5430/ijhe.v9n9p21.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

The objective of the present study was to determine the incidence of university students’ emotional style on the dimensions of academic goals (academic goals, learning goals, achievement goals and social reinforcement goals). For this study, 780 students of the fifth and sixth cycle of the Health Sciences School at a private university in Lima were chosen.In this quantitative study, of a substantive type, and a causal correlational cross-sectional non-experimental design, The Emotional Style Questionnaire (ESQ) was administered and for academic goals the questionnaire of the same name (CMA). The results evidenced the significant incidence of emotional style in the academic goals of university students with 72.1%; because the reason of plausibility of the logistic model (p<0.05) fits well with the data (Deviation with p<0.05).

49

Shen, Lingdong, Chunlei Huo, Nuo Xu, Chaowei Han, and Zichen Wang. "Learn How to See: Collaborative Embodied Learning for Object Detection and Camera Adjusting." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 5 (March 24, 2024): 4793–801. http://dx.doi.org/10.1609/aaai.v38i5.28281.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Passive object detectors, trained on large-scale static datasets, often overlook the feedback from object detection to image acquisition. Embodied vision and active detection mitigate this issue by interacting with the environment. Nevertheless, the materialization of activeness hinges on resource-intensive data collection and annotation. To tackle these challenges, we propose a collaborative student-teacher framework. Technically, a replay buffer is built based on the trajectory data to encapsulate the relationship of state, action, and reward. In addition, the student network diverges from reinforcement learning by redefining sequential decision pathways using a GPT structure enriched with causal self-attention. Moreover, the teacher network establishes a subtle state-reward mapping based on adjacent benefit differences, providing reliable rewards for student adaptively self-tuning with the vast unlabeled replay buffer data. Additionally, an innovative yet straightforward benefit reference value is proposed within the teacher network, adding to its effectiveness and simplicity. Leveraging a flexible replay buffer and embodied collaboration between teacher and student, the framework learns to see before detection with shallower features and shorter inference steps. Experiments highlight significant advantages of our algorithm over state-of-the-art detectors. The code is released at https://github.com/lydonShen/STF.

50

van der Oord, Saskia, and Gail Tripp. "How to Improve Behavioral Parent and Teacher Training for Children with ADHD: Integrating Empirical Research on Learning and Motivation into Treatment." Clinical Child and Family Psychology Review 23, no. 4 (September 24, 2020): 577–604. http://dx.doi.org/10.1007/s10567-020-00327-z.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Abstract:

Abstract Attention deficit hyperactivity disorder [ADHD] is one of the most common psychiatric disorders of childhood with poor prognosis if not treated effectively. Recommended psychosocial evidence-based treatment for preschool and school-aged children is behavioral parent and teacher training [BPT]. The core elements of BPT are instrumental learning principles, i.e., reinforcement of adaptive and the ignoring or punishment of non-adaptive behaviors together with stimulus control techniques. BPT is moderately effective in reducing oppositional behavior and improving parenting practices; however, it does not reduce blinded ratings of ADHD symptoms. Also after training effects dissipate. This practitioner review proposes steps that can be taken to improve BPT outcomes for ADHD, based on purported causal processes underlying ADHD. The focus is on altered motivational processes (reward and punishment sensitivity), as they closely link to the instrumental processes used in BPT. Following a critical analysis of current behavioral treatments for ADHD, we selectively review motivational reinforcement-based theories of ADHD, including the empirical evidence for the behavioral predictions arising from these theories. This includes consideration of children’s emotional reactions to expected and unexpected outcomes. Next we translate this evidence into potential ADHD-specific adjustments designed to enhance the immediate and long-term effectiveness of BPT programs in addressing the needs of children with ADHD. This includes the use of remediation strategies for proposed deficits in learning not commonly used in BPT programs and cautions regarding the use of punishment. Finally, we address how these recommendations can be effectively transferred to clinical practice.