Articoli di riviste sul tema "Factored reinforcement learning"

Segui questo link per vedere altri tipi di pubblicazioni sul tema: Factored reinforcement learning.

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-45 articoli di riviste per l'attività di ricerca sul tema "Factored reinforcement learning".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Wu, Bo, Yan Peng Feng e Hong Yan Zheng. "A Model-Based Factored Bayesian Reinforcement Learning Approach". Applied Mechanics and Materials 513-517 (febbraio 2014): 1092–95. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.1092.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Bayesian reinforcement learning has turned out to be an effective solution to the optimal tradeoff between exploration and exploitation. However, in practical applications, the learning parameters with exponential growth are the main impediment for online planning and learning. To overcome this problem, we bring factored representations, model-based learning, and Bayesian reinforcement learning together in a new approach. Firstly, we exploit a factored representation to describe the states to reduce the size of learning parameters, and adopt Bayesian inference method to learn the unknown structure and parameters simultaneously. Then, we use an online point-based value iteration algorithm to plan and learn. The experimental results show that the proposed approach is an effective way for improving the learning efficiency in large-scale state spaces.
2

Li, Chao, Yupeng Zhang, Jianqi Wang, Yujing Hu, Shaokang Dong, Wenbin Li, Tangjie Lv, Changjie Fan e Yang Gao. "Optimistic Value Instructors for Cooperative Multi-Agent Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 16 (24 marzo 2024): 17453–60. http://dx.doi.org/10.1609/aaai.v38i16.29694.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In cooperative multi-agent reinforcement learning, decentralized agents hold the promise of overcoming the combinatorial explosion of joint action space and enabling greater scalability. However, they are susceptible to a game-theoretic pathology called relative overgeneralization that shadows the optimal joint action. Although recent value-decomposition algorithms guide decentralized agents by learning a factored global action value function, the representational limitation and the inaccurate sampling of optimal joint actions during the learning process make this problem still. To address this limitation, this paper proposes a novel algorithm called Optimistic Value Instructors (OVI). The main idea behind OVI is to introduce multiple optimistic instructors into the value-decomposition paradigm, which are capable of suggesting potentially optimal joint actions and rectifying the factored global action value function to recover these optimal actions. Specifically, the instructors maintain optimistic value estimations of per-agent local actions and thus eliminate the negative effects caused by other agents' exploratory or sub-optimal non-cooperation, enabling accurate identification and suggestion of optimal joint actions. Based on the instructors' suggestions, the paper further presents two instructive constraints to rectify the factored global action value function to recover these optimal joint actions, thus overcoming the RO problem. Experimental evaluation of OVI on various cooperative multi-agent tasks demonstrates its superior performance against multiple baselines, highlighting its effectiveness.
3

Kveton, Branislav, e Georgios Theocharous. "Structured Kernel-Based Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 27, n. 1 (30 giugno 2013): 569–75. http://dx.doi.org/10.1609/aaai.v27i1.8669.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Kernel-based reinforcement learning (KBRL) is a popular approach to learning non-parametric value function approximations. In this paper, we present structured KBRL, a paradigm for kernel-based RL that allows for modeling independencies in the transition and reward models of problems. Real-world problems often exhibit this structure and can be solved more efficiently when it is modeled. We make three contributions. First, we motivate our work, define a structured backup operator, and prove that it is a contraction. Second, we show how to evaluate our operator efficiently. Our analysis reveals that the fixed point of the operator is the optimal value function in a special factored MDP. Finally, we evaluate our method on a synthetic problem and compare it to two KBRL baselines. In most experiments, we learn better policies than the baselines from an order of magnitude less training data.
4

Simão, Thiago D., e Matthijs T. J. Spaan. "Safe Policy Improvement with Baseline Bootstrapping in Factored Environments". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 4967–74. http://dx.doi.org/10.1609/aaai.v33i01.33014967.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
We present a novel safe reinforcement learning algorithm that exploits the factored dynamics of the environment to become less conservative. We focus on problem settings in which a policy is already running and the interaction with the environment is limited. In order to safely deploy an updated policy, it is necessary to provide a confidence level regarding its expected performance. However, algorithms for safe policy improvement might require a large number of past experiences to become confident enough to change the agent’s behavior. Factored reinforcement learning, on the other hand, is known to make good use of the data provided. It can achieve a better sample complexity by exploiting independence between features of the environment, but it lacks a confidence level. We study how to improve the sample efficiency of the safe policy improvement with baseline bootstrapping algorithm by exploiting the factored structure of the environment. Our main result is a theoretical bound that is linear in the number of parameters of the factored representation instead of the number of states. The empirical analysis shows that our method can improve the policy using a number of samples potentially one order of magnitude smaller than the flat algorithm.
5

Truong, Van Binh, e Long Bao Le. "Electric vehicle charging design: The factored action based reinforcement learning approach". Applied Energy 359 (aprile 2024): 122737. http://dx.doi.org/10.1016/j.apenergy.2024.122737.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
6

SIMM, Jaak, Masashi SUGIYAMA e Hirotaka HACHIYA. "Multi-Task Approach to Reinforcement Learning for Factored-State Markov Decision Problems". IEICE Transactions on Information and Systems E95.D, n. 10 (2012): 2426–37. http://dx.doi.org/10.1587/transinf.e95.d.2426.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Wang, Zizhao, Caroline Wang, Xuesu Xiao, Yuke Zhu e Peter Stone. "Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 14 (24 marzo 2024): 15778–86. http://dx.doi.org/10.1609/aaai.v38i14.29507.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on two manipulation environments and four tasks reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.
8

Mohamad Hafiz Abu Bakar, Abu Ubaidah bin Shamsudin, Ruzairi Abdul Rahim, Zubair Adil Soomro e Andi Adrianshah. "Comparison Method Q-Learning and SARSA for Simulation of Drone Controller using Reinforcement Learning". Journal of Advanced Research in Applied Sciences and Engineering Technology 30, n. 3 (15 maggio 2023): 69–78. http://dx.doi.org/10.37934/araset.30.3.6978.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Nowadays, the advancement of drones is also factored in the development of a world surrounded by technologies. One of the aspects emphasized here is the difficulty of controlling the drone, and the system developed is still under full control by the users as well. Reinforcement Learning is used to enable the system to operate automatically, thus drone will learn the next movement based on the interaction between the agent and the environment. Through this study, Q-Learning and State-Action-Reward-State-Action (SARSA) are used in this study and the comparison of results involving both the performance and effectiveness of the system based on the simulation of both methods can be seen through the analysis. A comparison of both Q-learning and State-Action-Reward-State-Action (SARSA) based systems in autonomous drone application was performed for evaluation in this study. According to this simulation process is shows that Q-Learning is a better performance and effective to train the system to achieve desire compared with SARSA algorithm for drone controller.
9

Kong, Minseok, e Jungmin So. "Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning". Applied Sciences 13, n. 1 (3 gennaio 2023): 633. http://dx.doi.org/10.3390/app13010633.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
There are several automated stock trading programs using reinforcement learning, one of which is an ensemble strategy. The main idea of the ensemble strategy is to train DRL agents and make an ensemble with three different actor–critic algorithms: Advantage Actor–Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). This novel idea was the concept mainly used in this paper. However, we did not stop there, but we refined the automated stock trading in two areas. First, we made another DRL-based ensemble and employed it as a new trading agent. We named it Remake Ensemble, and it combines not only A2C, DDPG, and PPO but also Actor–Critic using Kronecker-Factored Trust Region (ACKTR), Soft Actor–Critic (SAC), Twin Delayed DDPG (TD3), and Trust Region Policy Optimization (TRPO). Furthermore, we expanded the application domain of automated stock trading. Although the existing stock trading method treats only 30 Dow Jones stocks, ours handles KOSPI stocks, JPX stocks, and Dow Jones stocks. We conducted experiments with our modified automated stock trading system to validate its robustness in terms of cumulative return. Finally, we suggested some methods to gain relatively stable profits following the experiments.
10

Mutti, Mirco, Riccardo De Santi, Emanuele Rossi, Juan Felipe Calderon, Michael Bronstein e Marcello Restelli. "Provably Efficient Causal Model-Based Reinforcement Learning for Systematic Generalization". Proceedings of the AAAI Conference on Artificial Intelligence 37, n. 8 (26 giugno 2023): 9251–59. http://dx.doi.org/10.1609/aaai.v37i8.26109.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
In the sequential decision making setting, an agent aims to achieve systematic generalization over a large, possibly infinite, set of environments. Such environments are modeled as discrete Markov decision processes with both states and actions represented through a feature vector. The underlying structure of the environments allows the transition dynamics to be factored into two components: one that is environment-specific and another that is shared. Consider a set of environments that share the laws of motion as an example. In this setting, the agent can take a finite amount of reward-free interactions from a subset of these environments. The agent then must be able to approximately solve any planning task defined over any environment in the original set, relying on the above interactions only. Can we design a provably efficient algorithm that achieves this ambitious goal of systematic generalization? In this paper, we give a partially positive answer to this question. First, we provide a tractable formulation of systematic generalization by employing a causal viewpoint. Then, under specific structural assumptions, we provide a simple learning algorithm that guarantees any desired planning error up to an unavoidable sub-optimality term, while showcasing a polynomial sample complexity.
11

Sui, Dong, Chenyu Ma e Chunjie Wei. "Tactical Conflict Solver Assisting Air Traffic Controllers Using Deep Reinforcement Learning". Aerospace 10, n. 2 (15 febbraio 2023): 182. http://dx.doi.org/10.3390/aerospace10020182.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
To assist air traffic controllers (ATCOs) in resolving tactical conflicts, this paper proposes a conflict detection and resolution mechanism for handling continuous traffic flow by adopting finite discrete actions to resolve conflicts. The tactical conflict solver (TCS) was developed based on deep reinforcement learning (DRL) to train a TCS agent with the actor–critic using a Kronecker-factored trust region. The agent’s actions are determined by the ATCOs’ instructions, such as altitude, speed, and heading adjustments. The reward function is designed in accordance with air traffic control regulations. Considering the uncertainty in a real-life situation, this study characterised the deviation of the aircraft’s estimated position to improve the feasibility of conflict resolution schemes. A DRL environment was developed with the actual airspace structure and traffic density of the air traffic operation simulation system. Results show that for 1000 test samples, the trained TCS could resolve 87.1% of the samples. The conflict resolution rate decreased slightly to 81.2% when the airspace density was increased by a factor of 1.4. This research can be applied to intelligent decision-making systems for air traffic control.
12

Hao, Zheng, Haowei Zhang e Yipu Zhang. "Stock Portfolio Management by Using Fuzzy Ensemble Deep Reinforcement Learning Algorithm". Journal of Risk and Financial Management 16, n. 3 (15 marzo 2023): 201. http://dx.doi.org/10.3390/jrfm16030201.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The research objective of this article is to train a computer (agent) with market information data so it can learn trading strategies and beat the market index in stock trading without having to make any prediction on market moves. The approach assumes no trading knowledge, so the agent will only learn from conducting trading with historical data. In this work, we address this task by considering Reinforcement Learning (RL) algorithms for stock portfolio management. We first generate a three-dimension fuzzy vector to describe the current trend for each stock. Then the fuzzy terms, along with other stock market features, such as prices, volumes, and technical indicators, were used as the input for five algorithms, including Advantage Actor-Critic, Trust Region Policy Optimization, Proximal Policy Optimization, Actor-Critic Using Kronecker Factored Trust Region, and Deep Deterministic Policy Gradient. An average ensemble method was applied to obtain trading actions. We set SP100 component stocks as the portfolio pool and used 11 years of daily data to train the model and simulate the trading. Our method demonstrated better performance than the two benchmark methods and each individual algorithm without fuzzy extension. In practice, real market traders could use the trained model to make inferences and conduct trading, then retrain the model once in a while since training such models is time0consuming but making inferences is nearly simultaneous.
13

Chu, Yunfei, Zhinong Wei, Guoqiang Sun, Haixiang Zang, Sheng Chen e Yizhou Zhou. "Optimal home energy management strategy: A reinforcement learning method with actor-critic using Kronecker-factored trust region". Electric Power Systems Research 212 (novembre 2022): 108617. http://dx.doi.org/10.1016/j.epsr.2022.108617.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
14

Abdulhai, Marwa, Dong-Ki Kim, Matthew Riemer, Miao Liu, Gerald Tesauro e Jonathan P. How. "Context-Specific Representation Abstraction for Deep Option Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, n. 6 (28 giugno 2022): 5959–67. http://dx.doi.org/10.1609/aaai.v36i6.20541.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Hierarchical reinforcement learning has focused on discovering temporally extended actions, such as options, that can provide benefits in problems requiring extensive exploration. One promising approach that learns these options end-to-end is the option-critic (OC) framework. We examine and show in this paper that OC does not decompose a problem into simpler sub-problems, but instead increases the size of the search over policy space with each option considering the entire state space during learning. This issue can result in practical limitations of this method, including sample inefficient learning. To address this problem, we introduce Context-Specific Representation Abstraction for Deep Option Learning (CRADOL), a new framework that considers both temporal abstraction and context-specific representation abstraction to effectively reduce the size of the search over policy space. Specifically, our method learns a factored belief state representation that enables each option to learn a policy over only a subsection of the state space. We test our method against hierarchical, non-hierarchical, and modular recurrent neural network baselines, demonstrating significant sample efficiency improvements in challenging partially observable environments.
15

Li, Hengjie, Jianghao Zhu, Yun Zhou, Qi Feng e Donghan Feng. "Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning". International Transactions on Electrical Energy Systems 2022 (15 dicembre 2022): 1–14. http://dx.doi.org/10.1155/2022/6854620.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Maximizing the return on electric vehicle charging station (EVCS) operation helps to expand the EVCS, thus expanding the EV (electric vehicle) stock and better addressing climate change. However, in the face of dynamic regulation scenarios with large data, multiple variables, and low time scales, the existing regulation strategies aiming at maximizing EVCS returns many times fail to meet the demand. To handle increasingly complex regulation scenarios, a deep reinforcement learning algorithm (DRL) based on the improved twin delayed deep deterministic policy gradient (TD3) is used to construct basic energy management strategies in this paper. To enable the strategy to be more suitable for the goal of real-time energy regulation strategy, we used Thompson sampling strategy to improve TD3’s exploration noise sampling strategy, which greatly accelerated the initial convergence of TD3 during training. Also, we use marginalised importance sampling to calculate the Q-return function for TD3, which ensures that the constructed strategies are more likely to learn high-value experiences while having higher robustness. It is shown in numerical experiments that the charging station management strategy (CSMS) based on the modified TD3 obtains the fastest convergence speed and the highest robustness and achieves the largest operational returns compared to the CSMS constructed using deep deterministic policy gradient (DDPG), actor-critic using Kronecker-factored trust region (ACKTR), trust region policy optimization (TRPO), proximal policy optimization (PPO), soft actor-critic (SAC), and the original TD3.
16

Gavane, Vaibhav. "A Measure of Real-Time Intelligence". Journal of Artificial General Intelligence 4, n. 1 (1 marzo 2013): 31–48. http://dx.doi.org/10.2478/jagi-2013-0003.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract We propose a new measure of intelligence for general reinforcement learning agents, based on the notion that an agent’s environment can change at any step of execution of the agent. That is, an agent is considered to be interacting with its environment in real-time. In this sense, the resulting intelligence measure is more general than the universal intelligence measure (Legg and Hutter, 2007) and the anytime universal intelligence test (Hernández-Orallo and Dowe, 2010). A major advantage of the measure is that an agent’s computational complexity is factored into the measure in a natural manner. We show that there exist agents with intelligence arbitrarily close to the theoretical maximum, and that the intelligence of agents depends on their parallel processing capability. We thus believe that the measure can provide a better evaluation of agents and guidance for building practical agents with high intelligence.
17

Yedukondalu, Gangolu, Yasmeen Yasmeen, G. Vinoda Reddy, Ravindra Changala, Mahesh Kotha, Adapa Gopi e Annapurna Gummadi. "Framework for Virtualized Network Functions (VNFs) in Cloud of Things Based on Network Traffic Services". International Journal on Recent and Innovation Trends in Computing and Communication 11, n. 11s (7 ottobre 2023): 38–48. http://dx.doi.org/10.17762/ijritcc.v11i11s.8068.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
The cloud of things (CoT), which combines the Internet of Things (IoT) and cloud computing, may offer Virtualized Network Functions (VNFs) for IoT devices on a dynamic basis based on service-specific requirements. Although the provisioning of VNFs in CoT is described as an online decision-making problem, most widely used techniques primarily focus on defining the environment using simple models in order to discover the optimum solution. This leads to inefficient and coarse-grained provisioning since the Quality of Service (QoS) requirements for different types of CoT services are not considered, and important historical experience on how to provide for the best long-term benefits is disregarded. This paper suggests a methodology for providing VNFs intelligently in order to schedule adaptive CoT resources in line with the detection of traffic from diverse network services. The system makes decisions based on Deep Reinforcement Learning (DRL) based models that take into account the complexity of network configurations and traffic changes. To obtain stable performance in this model, a special surrogate objective function and a policy gradient DRL method known as Policy Optimisation using Kronecker-Factored Trust Region (POKTR) are utilised. The assertion that our strategy improves CoT QoS through real-time VNF provisioning is supported by experimental results. The POKTR algorithm-based DRL-based model maximises throughput while minimising network congestion compared to earlier DRL algorithms.
18

Li, Guangliang, Randy Gomez, Keisuke Nakamura e Bo He. "Human-Centered Reinforcement Learning: A Survey". IEEE Transactions on Human-Machine Systems 49, n. 4 (agosto 2019): 337–49. http://dx.doi.org/10.1109/thms.2019.2912447.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
19

Li, Zhuoran, Chao Zeng, Zhen Deng, Qinling Xu, Bingwei He e Jianwei Zhang. "Learning Variable Impedance Control for Robotic Massage With Deep Reinforcement Learning: A Novel Learning Framework". IEEE Systems, Man, and Cybernetics Magazine 10, n. 1 (gennaio 2024): 17–27. http://dx.doi.org/10.1109/msmc.2022.3231416.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
20

White, Jack, Tatiana Kameneva e Chris McCarthy. "Vision Processing for Assistive Vision: A Deep Reinforcement Learning Approach". IEEE Transactions on Human-Machine Systems 52, n. 1 (febbraio 2022): 123–33. http://dx.doi.org/10.1109/thms.2021.3121661.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
21

Chihara, Takanori, e Jiro Sakamoto. "Generating deceleration behavior of automatic driving by reinforcement learning that reflects passenger discomfort". International Journal of Industrial Ergonomics 91 (settembre 2022): 103343. http://dx.doi.org/10.1016/j.ergon.2022.103343.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
22

Wang, Zhe, Helai Huang, Jinjun Tang, Xianwei Meng e Lipeng Hu. "Velocity control in car-following behavior with autonomous vehicles using reinforcement learning". Accident Analysis & Prevention 174 (settembre 2022): 106729. http://dx.doi.org/10.1016/j.aap.2022.106729.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
23

Salehi, V., T. T. Tran, B. Veitch e D. Smith. "A reinforcement learning development of the FRAM for functional reward-based assessments of complex systems performance". International Journal of Industrial Ergonomics 88 (marzo 2022): 103271. http://dx.doi.org/10.1016/j.ergon.2022.103271.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
24

Matarese, Marco, Alessandra Sciutti, Francesco Rea e Silvia Rossi. "Toward Robots’ Behavioral Transparency of Temporal Difference Reinforcement Learning With a Human Teacher". IEEE Transactions on Human-Machine Systems 51, n. 6 (dicembre 2021): 578–89. http://dx.doi.org/10.1109/thms.2021.3116119.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
25

Roy, Ananya, Moinul Hossain e Yasunori Muromachi. "A deep reinforcement learning-based intelligent intervention framework for real-time proactive road safety management". Accident Analysis & Prevention 165 (febbraio 2022): 106512. http://dx.doi.org/10.1016/j.aap.2021.106512.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
26

Gong, Yaobang, Mohamed Abdel-Aty, Jinghui Yuan e Qing Cai. "Multi-Objective reinforcement learning approach for improving safety at intersections with adaptive traffic signal control". Accident Analysis & Prevention 144 (settembre 2020): 105655. http://dx.doi.org/10.1016/j.aap.2020.105655.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
27

Yang, Kui, Mohammed Quddus e Constantinos Antoniou. "Developing a new real-time traffic safety management framework for urban expressways utilizing reinforcement learning tree". Accident Analysis & Prevention 178 (dicembre 2022): 106848. http://dx.doi.org/10.1016/j.aap.2022.106848.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Qin, ShuJin, ZhiLiang Bi, Jiacun Wang, Shixin Liu, XiWang Guo, Ziyan Zhao e Liang Qi. "Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems: Demonstrating and Comparing a Proposed Model". IEEE Systems, Man, and Cybernetics Magazine 10, n. 2 (aprile 2024): 24–31. http://dx.doi.org/10.1109/msmc.2023.3303615.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
29

Yan, Longhao, Ping Wang, Fan Qi, Zhuohang Xu, Ronghui Zhang e Yu Han. "A task-level emergency experience reuse method for freeway accidents onsite disposal with policy distilled reinforcement learning". Accident Analysis & Prevention 190 (settembre 2023): 107179. http://dx.doi.org/10.1016/j.aap.2023.107179.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
30

Nasernejad, Payam, Tarek Sayed e Rushdi Alsaleh. "Modeling pedestrian behavior in pedestrian-vehicle near misses: A continuous Gaussian Process Inverse Reinforcement Learning (GP-IRL) approach". Accident Analysis & Prevention 161 (ottobre 2021): 106355. http://dx.doi.org/10.1016/j.aap.2021.106355.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
31

Guo, Hongyu, Kun Xie e Mehdi Keyvan-Ekbatani. "Modeling driver’s evasive behavior during safety–critical lane changes: Two-dimensional time-to-collision and deep reinforcement learning". Accident Analysis & Prevention 186 (giugno 2023): 107063. http://dx.doi.org/10.1016/j.aap.2023.107063.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Jin, Jieling, Ye Li, Helai Huang, Yuxuan Dong e Pan Liu. "A variable speed limit control approach for freeway tunnels based on the model-based reinforcement learning framework with safety perception". Accident Analysis & Prevention 201 (giugno 2024): 107570. http://dx.doi.org/10.1016/j.aap.2024.107570.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
33

Vandaele, Mathilde, e Sanna Stålhammar. "“Hope dies, action begins?” The role of hope for proactive sustainability engagement among university students". International Journal of Sustainability in Higher Education 23, n. 8 (25 agosto 2022): 272–89. http://dx.doi.org/10.1108/ijshe-11-2021-0463.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Purpose Education in sustainability science is largely ignorant of the implications of the environmental crisis on inner dimensions, including mindsets, beliefs, values and worldviews. Increased awareness of the acuteness and severity of the environmental and climate crisis has caused a contemporary spread of hopelessness among younger generations. This calls for a better understanding of potential generative forces of hope in the face of climate change. This paper aims to uncover strategies for fostering constructive hope among students. Design/methodology/approach This study examines, through qualitative interviews, the characteristics of constructive hope amongst proactive students enrolled in university programs related to global environmental challenges. Constructive hope describes a form of hope leading to sustained emotional stability and proactive engagement through both individual and collective actions. Findings The findings are presented according to four characteristics of constructive hope: goal, pathway thinking, agency thinking and emotional reinforcement. This shows how students perceive the importance of: collaboratively constructing and empowering locally grounded objectives; reinforcing trust in the collective potential and external actors; raising students’ perceived self-efficacy through practical applications; teaching different coping strategies related to the emotional consequences of education on students’ well-being. Originality/value We outline practical recommendations for educational environments to encourage and develop constructive hope at multiple levels of university education, including structures, programs, courses and among students’ interactions. We call for practitioners to connect theoretical learning and curriculum content with practice, provide space for emotional expressions, release the pressure from climate anxiety, and to foster a stronger sense of community among students.
34

Zhang, Gongquan, Fangrong Chang, Jieling Jin, Fan Yang e Helai Huang. "Multi-objective deep reinforcement learning approach for adaptive traffic signal control system with concurrent optimization of safety, efficiency, and decarbonization at intersections". Accident Analysis & Prevention 199 (maggio 2024): 107451. http://dx.doi.org/10.1016/j.aap.2023.107451.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
35

Hoffmann, Patrick, Kirill Gorelik e Valentin Ivanov. "Comparison of Reinforcement Learning and Model Predictive Control for Automated Generation of Optimal Control for Dynamic Systems within a Design Space Exploration Framework". International Journal of Automotive Engineering 15, n. 1 (2024): 19–26. http://dx.doi.org/10.20485/jsaeijae.15.1_19.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
36

Wu, Bo, Yanpeng Feng e Hongyan Zheng. "Model-based Bayesian Reinforcement Learning in Factored Markov Decision Process". Journal of Computers 9, n. 4 (1 aprile 2014). http://dx.doi.org/10.4304/jcp.9.4.845-850.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
37

Xu, Jianyu, Bin Liu, Xiujie Zhao e Xiao-Lin Wang. "Online reinforcement learning for condition-based group maintenance using factored Markov decision processes". European Journal of Operational Research, novembre 2023. http://dx.doi.org/10.1016/j.ejor.2023.11.039.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
38

Amato, Christopher, e Frans Oliehoek. "Scalable Planning and Learning for Multiagent POMDPs". Proceedings of the AAAI Conference on Artificial Intelligence 29, n. 1 (18 febbraio 2015). http://dx.doi.org/10.1609/aaai.v29i1.9439.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems.
39

Street, Charlie, Masoumeh Mansouri e Bruno Lacerda. "Formal Modelling for Multi-Robot Systems Under Uncertainty". Current Robotics Reports, 15 agosto 2023. http://dx.doi.org/10.1007/s43154-023-00104-0.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Abstract Purpose of Review To effectively synthesise and analyse multi-robot behaviour, we require formal task-level models which accurately capture multi-robot execution. In this paper, we review modelling formalisms for multi-robot systems under uncertainty and discuss how they can be used for planning, reinforcement learning, model checking, and simulation. Recent Findings Recent work has investigated models which more accurately capture multi-robot execution by considering different forms of uncertainty, such as temporal uncertainty and partial observability, and modelling the effects of robot interactions on action execution. Other strands of work have presented approaches for reducing the size of multi-robot models to admit more efficient solution methods. This can be achieved by decoupling the robots under independence assumptions or reasoning over higher-level macro actions. Summary Existing multi-robot models demonstrate a trade-off between accurately capturing robot dependencies and uncertainty, and being small enough to tractably solve real-world problems. Therefore, future research should exploit realistic assumptions over multi-robot behaviour to develop smaller models which retain accurate representations of uncertainty and robot interactions; and exploit the structure of multi-robot problems, such as factored state spaces, to develop scalable solution methods.
40

Xie, Ziyang, Lu Lu, Hanwen Wang, Bingyi Su, Yunan Liu e Xu Xu. "Improving Workers’ Musculoskeletal Health During Human-Robot Collaboration Through Reinforcement Learning". Human Factors: The Journal of the Human Factors and Ergonomics Society, 22 maggio 2023, 001872082311775. http://dx.doi.org/10.1177/00187208231177574.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Objective This study aims to improve workers’ postures and thus reduce the risk of musculoskeletal disorders in human-robot collaboration by developing a novel model-free reinforcement learning method. Background Human-robot collaboration has been a flourishing work configuration in recent years. Yet, it could lead to work-related musculoskeletal disorders if the collaborative tasks result in awkward postures for workers. Methods The proposed approach follows two steps: first, a 3D human skeleton reconstruction method was adopted to calculate workers’ continuous awkward posture (CAP) score; second, an online gradient-based reinforcement learning algorithm was designed to dynamically improve workers’ CAP score by adjusting the positions and orientations of the robot end effector. Results In an empirical experiment, the proposed approach can significantly improve the CAP scores of the participants during a human-robot collaboration task when compared with the scenarios where robot and participants worked together at a fixed position or at the individual elbow height. The questionnaire outcomes also showed that the working posture resulted from the proposed approach was preferred by the participants. Conclusion The proposed model-free reinforcement learning method can learn the optimal worker postures without the need for specific biomechanical models. The data-driven nature of this method can make it adaptive to provide personalized optimal work posture. Application The proposed method can be applied to improve the occupational safety in robot-implemented factories. Specifically, the personalized robot working positions and orientations can proactively reduce exposure to awkward postures that increase the risk of musculoskeletal disorders. The algorithm can also reactively protect workers by reducing the workload in specific joints.
41

Rigoli, Lillian, Gaurav Patil, Patrick Nalepka, Rachel W. Kallen, Simon Hosking, Christopher Best e Michael J. Richardson. "A Comparison of Dynamical Perceptual-Motor Primitives and Deep Reinforcement Learning for Human-Artificial Agent Training Systems". Journal of Cognitive Engineering and Decision Making, 25 aprile 2022, 155534342210929. http://dx.doi.org/10.1177/15553434221092930.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Effective team performance often requires that individuals engage in team training exercises. However, organizing team-training scenarios presents economic and logistical challenges and can be prone to trainer bias and fatigue. Accordingly, a growing body of research is investigating the effectiveness of employing artificial agents (AAs) as synthetic teammates in team training simulations, and, relatedly, how to best develop AAs capable of robust, human-like behavioral interaction. Motivated by these challenges, the current study examined whether task dynamical models of expert human herding behavior could be embedded in the control architecture of AAs to train novice actors to perform a complex multiagent herding task. Training outcomes were compared to human-expert trainers, novice baseline performance, and AAs developed using deep reinforcement learning (DRL). Participants’ subjective preferences for the AAs developed using DRL or dynamical models of human performance were also investigated. The results revealed that AAs controlled by dynamical models of human expert performance could train novice actors at levels equivalent to expert human trainers and were also preferred over AAs developed using DRL. The implications for the development of AAs for robust human-AA interaction and training are discussed, including the potential benefits of employing hybrid Dynamical-DRL techniques for AA development.
42

Fragkos, Georgios, Jay Johnson e Eirini Eleni Tsiropoulou. "Dynamic Role-Based Access Control Policy for Smart Grid Applications: An Offline Deep Reinforcement Learning Approach". IEEE Transactions on Human-Machine Systems, 2022, 1–13. http://dx.doi.org/10.1109/thms.2022.3163185.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
43

Sun, Yuxiang, Bo Yuan, Qi Xiang, Jiawei Zhou, Jiahui Yu, Di Dai e Xianzhong Zhou. "Intelligent Decision-Making and Human Language Communication Based on Deep Reinforcement Learning in a Wargame Environment". IEEE Transactions on Human-Machine Systems, 2022, 1–14. http://dx.doi.org/10.1109/thms.2022.3225867.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
44

Jokinen, Jussi P. P., Tuomo Kujala e Antti Oulasvirta. "Multitasking in Driving as Optimal Adaptation Under Uncertainty". Human Factors: The Journal of the Human Factors and Ergonomics Society, 30 luglio 2020, 001872082092768. http://dx.doi.org/10.1177/0018720820927687.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
Objective The objective was to better understand how people adapt multitasking behavior when circumstances in driving change and how safe versus unsafe behaviors emerge. Background Multitasking strategies in driving adapt to changes in the task environment, but the cognitive mechanisms of this adaptation are not well known. Missing is a unifying account to explain the joint contribution of task constraints, goals, cognitive capabilities, and beliefs about the driving environment. Method We model the driver’s decision to deploy visual attention as a stochastic sequential decision-making problem and propose hierarchical reinforcement learning as a computationally tractable solution to it. The supervisory level deploys attention based on per-task value estimates, which incorporate beliefs about risk. Model simulations are compared against human data collected in a driving simulator. Results Human data show adaptation to the attentional demands of ongoing tasks, as measured in lane deviation and in-car gaze deployment. The predictions of our model fit the human data on these metrics. Conclusion Multitasking strategies can be understood as optimal adaptation under uncertainty, wherein the driver adapts to cognitive constraints and the task environment’s uncertainties, aiming to maximize the expected long-term utility. Safe and unsafe behaviors emerge as the driver has to arbitrate between conflicting goals and manage uncertainty about them. Application Simulations can inform studies of conditions that are likely to give rise to unsafe driving behavior.
45

Ferrão, Maria Eugénia, e Cristiano Fernandes. "O efeito-escola e a mudança - dá para mudar? Evidências da investigação Brasileira". REICE. Revista Iberoamericana sobre Calidad, Eficacia y Cambio en Educación 1, n. 1 (2 luglio 2016). http://dx.doi.org/10.15366/reice2003.1.1.005.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
Abstract (sommario):
O objectivo principal do presente trabalho é apresentar uma resenha de investigações empíricas realizadas sobre o efeito-escola no Brasil ao longo dos últimos 7 anos, bem como enunciar os principais factores, escolares e familiares, associados aos resultados escolares e que são passíveis de mudança no curto-médio prazo, sobretudo no que se refere à educação dos alunos com déficit educacional sistemático. O texto aponta pistas de como a actuação da escola e da família pode ser corrigida e articulada de modo a produzir a melhoria efectiva nos resultados escolares. É enfatizada a importância do acompanhamento de resultados e recuperação atempada dos alunos em risco de repetência, da necessidade de reforço educativo nas turmas com maior proporção de alunos repetentes (para que os conteúdos programáticos possam ser integralmente cumpridos), a importância de que a implementação das políticas para a correcção do desfasamento idade-série seja acompanhada de medidas complementares e estruturantes que confiram estabilidade ao sistema, a importância de que as famílias usem o seu capital social a favor da educação.Descritores: eficácia escolar - efeito-escola - modelos multinível - ensino básico - capital social The school effect and change - is change possible? ?Evidence based on Brazilian research The main objective of this paper is to present a review of empirical research conducted on the school effect in Brazil during the last 7 years, and to illustrate some of the factors, both school and family, related to the students performance at school. These factors can promote change in the short or mid term mainly in the education of students with systematic educational deficiencies. Repetition causes the discouragement of students, which is a risk factor associated with school evasion. The paper explores the action of the school and family towards producing improvement in student performance. It emphasizes four features: the monitoring of learning as an early way of diagnosing students at risk of repetition, the need for educational reinforcement in classes where the proportion of repeaters is high, the importance of implementing policies for age-grade correction together with complementary and structural measures which guarantee the stability of the educational system, and the importance that families use their social capital in favour of their children's education.Keywords: school effectiveness - school effect - multilevel model - compulsory education - social capital

Vai alla bibliografia