Segui questo link per vedere altri tipi di pubblicazioni sul tema: Reinforcement Learning.

Articoli di riviste sul tema "Reinforcement Learning"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Reinforcement Learning".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Deora, Merin, e Sumit Mathur. "Reinforcement Learning". IJARCCE 6, n. 4 (30 aprile 2017): 178–81. http://dx.doi.org/10.17148/ijarcce.2017.6433.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Barto, Andrew G. "Reinforcement Learning". IFAC Proceedings Volumes 31, n. 29 (ottobre 1998): 5. http://dx.doi.org/10.1016/s1474-6670(17)38315-5.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
3

Woergoetter, Florentin, e Bernd Porr. "Reinforcement learning". Scholarpedia 3, n. 3 (2008): 1448. http://dx.doi.org/10.4249/scholarpedia.1448.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Moore, Brett L., Anthony G. Doufas e Larry D. Pyeatt. "Reinforcement Learning". Anesthesia & Analgesia 112, n. 2 (febbraio 2011): 360–67. http://dx.doi.org/10.1213/ane.0b013e31820334a7.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Likas, Aristidis. "A Reinforcement Learning Approach to Online Clustering". Neural Computation 11, n. 8 (1 novembre 1999): 1915–32. http://dx.doi.org/10.1162/089976699300016025.

Testo completo
Abstract (sommario):
A general technique is proposed for embedding online clustering algorithms based on competitive learning in a reinforcement learning framework. The basic idea is that the clustering system can be viewed as a reinforcement learning system that learns through reinforcements to follow the clustering strategy we wish to implement. In this sense, the reinforcement guided competitive learning (RGCL) algorithm is proposed that constitutes a reinforcement-based adaptation of learning vector quantization (LVQ) with enhanced clustering capabilities. In addition, we suggest extensions of RGCL and LVQ that are characterized by the property of sustained exploration and significantly improve the performance of those algorithms, as indicated by experimental tests on well-known data sets.
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Liaq, Mudassar, e Yungcheol Byun. "Autonomous UAV Navigation Using Reinforcement Learning". International Journal of Machine Learning and Computing 9, n. 6 (dicembre 2019): 756–61. http://dx.doi.org/10.18178/ijmlc.2019.9.6.869.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Alrammal, Muath, e Munir Naveed. "Monte-Carlo Based Reinforcement Learning (MCRL)". International Journal of Machine Learning and Computing 10, n. 2 (febbraio 2020): 227–32. http://dx.doi.org/10.18178/ijmlc.2020.10.2.924.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Nurmuhammet, Abdullayev. "DEEP REINFORCEMENT LEARNING ON STOCK DATA". Alatoo Academic Studies 23, n. 2 (30 giugno 2023): 505–18. http://dx.doi.org/10.17015/aas.2023.232.49.

Testo completo
Abstract (sommario):
This study proposes using Deep Reinforcement Learning (DRL) for stock trading decisions and prediction. DRL is a machine learning technique that enables agents to learn optimal strategies by interacting with their environment. The proposed model surpasses traditional models and can make informed trading decisions in real-time. The study highlights the feasibility of applying DRL in financial markets and its advantages in strategic decision- making. The model's ability to learn from market dynamics makes it a promising approach for stock market forecasting. Overall, this paper provides valuable insights into the use of DRL for stock trading decisions and prediction, establishing a strong case for its adoption in financial markets. Keywords: reinforcement learning, stock market, deep reinforcement learning.
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Mardhatillah, Elsy. "Teacher’s Reinforcement in English Classroom in MTSS Darul Makmur Sungai Cubadak". Indonesian Research Journal On Education 3, n. 1 (2 gennaio 2022): 825–32. http://dx.doi.org/10.31004/irje.v3i1.202.

Testo completo
Abstract (sommario):
This research was due to some problems found in MTsS Darul Makmur. First, some students were not motivated in learning. Second, sometime the teacher still uses Indonesian in giving reinforcements. Third, some Students did not care about the teacher's reinforcement. This study aimed to find out the types of reinforcement used by the teacher. Then, to find out the types of reinforcement often and rarely to be usedby the teacher. Then, to find out the reasons the teacher used certain reinforcements. Last, to find out how the teacher understands the reinforcement. This research used a qualitative approach. The design of this research was descriptive because the researcher made a description of the use of reinforcement by theteacher in the English classroom. In this research, the interview and observation sheets were used by the researcher. The researcher found that the type of reinforcement used by the teacher is positive reinforcement and negative reinforcement. First, there were two types of positive reinforcement used by teachers, namely verbal reinforcement and non-verbal reinforcement. The verbal often used by theteacher was a reinforcement in the form of words and reinforcement in the form of phrases. Then, verbal reinforcement in the form of sentences was never done by the teacher in the learning process. While the non-verbal reinforcement often used by the teacher was gestural, activity reinforcement, and proximity reinforcement. Second, the negative reinforcement often used by the teacher was a warning, gesture, and eye contact. Meanwhile, the negative reinforcement rarely used by the teacher was speech volume andpunishment. Third, the reasons teachers reinforce learning are to motivate students and make students feel appreciated and happy while learning.
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Fan, ZiSheng. "An exploration of reinforcement learning and deep reinforcement learning". Applied and Computational Engineering 73, n. 1 (5 luglio 2024): 154–59. http://dx.doi.org/10.54254/2755-2721/73/20240386.

Testo completo
Abstract (sommario):
Today, machine learning is evolving so quickly that new algorithms are always appearing. Deep neural networks in particular have shown positive outcomes in a variety of areas, including computer vision, natural language processing, and time series prediction. Its development moves at a very sluggish pace due to the high threshold. Therefore, a thorough examination of the reinforcement learning field should be required. This essay examines both the deep learning algorithm and the reinforcement learning operational procedure. The study identifies information retrieval, data mining, intelligent speech, natural language processing, and reinforcement learning as key technologies. The scientific study of reinforcement learning has advanced remarkably quickly, and it is now being used to tackle important decision optimization issues at academic conferences and journal research work in computer networks, computer graphics, etc. Brief introductions and reviews of both types of models are provided in this paper, along with an understanding of some of the most cutting-edge reinforcement learning applications and approaches.
Gli stili APA, Harvard, Vancouver, ISO e altri
11

Myers, Catherine. "LEARNING WITH DELAYED REINFORCEMENT THROUGH ATTENTION-DRIVEN BUFFERING". International Journal of Neural Systems 01, n. 04 (gennaio 1991): 337–46. http://dx.doi.org/10.1142/s0129065791000376.

Testo completo
Abstract (sommario):
Learning with delayed reinforcement refers to situations where the reinforcement to a learning system occurs only at the end of a string of actions or outputs, and it must then be assigned back to the relevant actions. A method for accomplishing this is presented which buffers a small number of past actions based on the unpredictability of or attention to each as it occurs. This approach allows for the buffer size to be small, and yet learning can reach indefinitely far back into the past; it also allows the system to learn when reinforcement is not only delayed but also reinforcements from other unrelated actions may arrive during this delay. An example of a simulated food-finding creature is used to show the system at work in a predictive application where reinforcements show this interleaving behaviour.
Gli stili APA, Harvard, Vancouver, ISO e altri
12

Horie, Naoto, Tohgoroh Matsui, Koichi Moriyama, Atsuko Mutoh e Nobuhiro Inuzuka. "Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning". Artificial Life and Robotics 24, n. 3 (8 febbraio 2019): 352–59. http://dx.doi.org/10.1007/s10015-019-00523-3.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
13

Lee, Dongsu, Chanin Eom, Sungwoo Choi, Sungkwan Kim e Minhae Kwon. "Survey on Practical Reinforcement Learning : from Imitation Learning to Offline Reinforcement Learning". Journal of Korean Institute of Communications and Information Sciences 48, n. 11 (30 novembre 2023): 1405–17. http://dx.doi.org/10.7840/kics.2023.48.11.1405.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
14

Osogami, Takayuki, e Rudy Raymond. "Determinantal Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 4659–66. http://dx.doi.org/10.1609/aaai.v33i01.33014659.

Testo completo
Abstract (sommario):
We study reinforcement learning for controlling multiple agents in a collaborative manner. In some of those tasks, it is insufficient for the individual agents to take relevant actions, but those actions should also have diversity. We propose the approach of using the determinant of a positive semidefinite matrix to approximate the action-value function in reinforcement learning, where we learn the matrix in a way that it represents the relevance and diversity of the actions. Experimental results show that the proposed approach allows the agents to learn a nearly optimal policy approximately ten times faster than baseline approaches in benchmark tasks of multi-agent reinforcement learning. The proposed approach is also shown to achieve the performance that cannot be achieved with conventional approaches in partially observable environment with exponentially large action space.
Gli stili APA, Harvard, Vancouver, ISO e altri
15

Pateria, Shubham, Budhitama Subagdja, Ah-hwee Tan e Chai Quek. "Hierarchical Reinforcement Learning". ACM Computing Surveys 54, n. 5 (giugno 2021): 1–35. http://dx.doi.org/10.1145/3453160.

Testo completo
Abstract (sommario):
Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of challenging long-horizon decision-making tasks into simpler subtasks. During the past years, the landscape of HRL research has grown profoundly, resulting in copious approaches. A comprehensive overview of this vast landscape is necessary to study HRL in an organized manner. We provide a survey of the diverse HRL approaches concerning the challenges of learning hierarchical policies, subtask discovery, transfer learning, and multi-agent learning using HRL. The survey is presented according to a novel taxonomy of the approaches. Based on the survey, a set of important open problems is proposed to motivate the future research in HRL. Furthermore, we outline a few suitable task domains for evaluating the HRL approaches and a few interesting examples of the practical applications of HRL in the Supplementary Material.
Gli stili APA, Harvard, Vancouver, ISO e altri
16

Matsui, Tohgoroh. "Compound Reinforcement Learning". Transactions of the Japanese Society for Artificial Intelligence 26 (2011): 330–34. http://dx.doi.org/10.1527/tjsai.26.330.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
17

Daoyi Dong, Chunlin Chen, Hanxiong Li e Tzyh-Jong Tarn. "Quantum Reinforcement Learning". IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 38, n. 5 (ottobre 2008): 1207–20. http://dx.doi.org/10.1109/tsmcb.2008.925743.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
18

Farias, Vivek F., Ciamac C. Moallemi, Benjamin Van Roy e Tsachy Weissman. "Universal Reinforcement Learning". IEEE Transactions on Information Theory 56, n. 5 (maggio 2010): 2441–54. http://dx.doi.org/10.1109/tit.2010.2043762.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
19

Morimoto, Jun, e Kenji Doya. "Robust Reinforcement Learning". Neural Computation 17, n. 2 (1 febbraio 2005): 335–59. http://dx.doi.org/10.1162/0899766053011528.

Testo completo
Abstract (sommario):
This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H∞ control, we consider a differential game in which a “disturbing” agent tries to make the worst possible disturbance while a “control” agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H∞ control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.
Gli stili APA, Harvard, Vancouver, ISO e altri
20

Weiβ, Gerhard. "Distributed reinforcement learning". Robotics and Autonomous Systems 15, n. 1-2 (luglio 1995): 135–42. http://dx.doi.org/10.1016/0921-8890(95)00018-b.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
21

Servedio, Maria R., Stein A. Sæther e Glenn-Peter Sætre. "Reinforcement and learning". Evolutionary Ecology 23, n. 1 (17 luglio 2007): 109–23. http://dx.doi.org/10.1007/s10682-007-9188-2.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
22

ANDRECUT, M., e M. K. ALI. "FUZZY REINFORCEMENT LEARNING". International Journal of Modern Physics C 13, n. 05 (giugno 2002): 659–74. http://dx.doi.org/10.1142/s0129183102003450.

Testo completo
Abstract (sommario):
Fuzzy logic represents an extension of classical logic, giving modes of approximate reasoning in an environment of uncertainty and imprecision. Fuzzy inference systems incorporates human knowledge into their knowledge base on the conclusions of the fuzzy rules, which are affected by subjective decisions. In this paper we show how the reinforcement learning technique can be used to tune the conclusion part of a fuzzy inference system. The fuzzy reinforcement learning technique is illustrated using two examples: the cart centering problem and the autonomous navigation problem.
Gli stili APA, Harvard, Vancouver, ISO e altri
23

Zhu, Ruoqing, Donglin Zeng e Michael R. Kosorok. "Reinforcement Learning Trees". Journal of the American Statistical Association 110, n. 512 (2 ottobre 2015): 1770–84. http://dx.doi.org/10.1080/01621459.2015.1036994.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
24

Oku, Makito, e Kazuyuki Aihara. "Networked reinforcement learning". Artificial Life and Robotics 13, n. 1 (dicembre 2008): 112–15. http://dx.doi.org/10.1007/s10015-008-0565-x.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
25

Barto, Andrew G. "Reinforcement learning control". Current Opinion in Neurobiology 4, n. 6 (dicembre 1994): 888–93. http://dx.doi.org/10.1016/0959-4388(94)90138-4.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
26

Hernandez-Orallo, Jose. "Constructive reinforcement learning". International Journal of Intelligent Systems 15, n. 3 (marzo 2000): 241–64. http://dx.doi.org/10.1002/(sici)1098-111x(200003)15:3<241::aid-int6>3.0.co;2-z.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
27

Aydin, Mehmet Emin, Rafet Durgut e Abdur Rakib. "Why Reinforcement Learning?" Algorithms 17, n. 6 (20 giugno 2024): 269. http://dx.doi.org/10.3390/a17060269.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Chakraborty, Montosh, Shivakrishna Gouroju, Pinki Garg e Karthikeyan P. "PBL: An Effective Method Of Reinforcement Learning". International Journal of Integrative Medical Sciences 2, n. 6 (30 giugno 2015): 134–38. http://dx.doi.org/10.16965/ijims.2015.119.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
29

De, Ashis, Barun Mazumdar, Aritra Dhabal, Saikat Bhattacharjee, Aridip Maity e Sourav Bandopadhyay. "Design of PID Controller using Reinforcement Learning". International Journal of Research Publication and Reviews 4, n. 11 (6 novembre 2023): 443–52. http://dx.doi.org/10.55248/gengpi.4.1123.113004.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
30

Schweighofer, Nicolas, e Kenji Doya. "Meta-learning in Reinforcement Learning". Neural Networks 16, n. 1 (gennaio 2003): 5–9. http://dx.doi.org/10.1016/s0893-6080(02)00228-9.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
31

Cetin, Edoardo, e Oya Celiktutan. "Learning Pessimism for Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 37, n. 6 (26 giugno 2023): 6971–79. http://dx.doi.org/10.1609/aaai.v37i6.25852.

Testo completo
Abstract (sommario):
Off-policy deep reinforcement learning algorithms commonly compensate for overestimation bias during temporal-difference learning by utilizing pessimistic estimates of the expected target returns. In this work, we propose Generalized Pessimism Learning (GPL), a strategy employing a novel learnable penalty to enact such pessimism. In particular, we propose to learn this penalty alongside the critic with dual TD-learning, a new procedure to estimate and minimize the magnitude of the target returns bias with trivial computational cost. GPL enables us to accurately counteract overestimation bias throughout training without incurring the downsides of overly pessimistic targets. By integrating GPL with popular off-policy algorithms, we achieve state-of-the-art results in both competitive proprioceptive and pixel-based benchmarks.
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Vafashoar, Reza, e Mohammad Reza Meybodi. "Reinforcement learning in learning automata and cellular learning automata via multiple reinforcement signals". Knowledge-Based Systems 169 (aprile 2019): 1–27. http://dx.doi.org/10.1016/j.knosys.2019.01.021.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
33

Pusparini, Desy. "Giving Reinforcement with 2.0 Framework by Teacher: A Photovoice of Undergraduate Students in the EFL Classroom". JSSH (Jurnal Sains Sosial dan Humaniora) 3, n. 1 (13 agosto 2019): 21. http://dx.doi.org/10.30595/jssh.v3i1.3841.

Testo completo
Abstract (sommario):
Abstract - Reinforcement has been used in many areas of educational institution. In the learning activity, reinforcements are given by the teacher as feedback for what students have done. By using reinforcement in the learning activity, the students are expected to feel comfortable to show themselves by responding questions, giving feedback, and expressing their opinions in the class. This study aims to investigate the effect of giving reinforcement by the teacher towards student's learning motivation. This research used the photovoice method and SHOWeD Analysis. The participants are 27 students in 5th semester of English Education Department in online class, consists of 7 males and 20 females with the average age of around 19-21 years old. The finding shows that giving reinforcement encourage student's motivation in the learning activity. As the implication, teachers apply reinforcement in order to make the students high-motivated in the class.
Gli stili APA, Harvard, Vancouver, ISO e altri
34

Agrawal, Avinash J., Rashmi R. Welekar, Namita Parati, Pravin R. Satav, Uma Patel Thakur e Archana V. Potnurwar. "Reinforcement Learning and Advanced Reinforcement Learning to Improve Autonomous Vehicle Planning". International Journal on Recent and Innovation Trends in Computing and Communication 11, n. 7s (25 luglio 2023): 652–60. http://dx.doi.org/10.17762/ijritcc.v11i7s.7526.

Testo completo
Abstract (sommario):
Planning for autonomous vehicles is a challenging process that involves navigating through dynamic and unpredictable surroundings while making judgments in real-time. Traditional planning methods sometimes rely on predetermined rules or customized heuristics, which could not generalize well to various driving conditions. In this article, we provide a unique framework to enhance autonomous vehicle planning by fusing conventional RL methods with cutting-edge reinforcement learning techniques. To handle many elements of planning issues, our system integrates cutting-edge algorithms including deep reinforcement learning, hierarchical reinforcement learning, and meta-learning. Our framework helps autonomous vehicles make decisions that are more reliable and effective by utilizing the advantages of these cutting-edge strategies.With the use of the RLTT technique, an autonomous vehicle can learn about the intentions and preferences of human drivers by inferring the underlying reward function from expert behaviour that has been seen. The autonomous car can make safer and more human-like decisions by learning from expert demonstrations about the fundamental goals and limitations of driving. Large-scale simulations and practical experiments can be carried out to gauge the effectiveness of the suggested approach. On the basis of parameters like safety, effectiveness, and human likeness, the autonomous vehicle planning system's performance can be assessed. The outcomes of these assessments can help to inform future developments and offer insightful information about the strengths and weaknesses of the strategy.
Gli stili APA, Harvard, Vancouver, ISO e altri
35

Vamvoudakis, Kyriakos G., Yan Wan e Frank L. Lewis. "Workshop on Distributed Reinforcement Learning and Reinforcement-Learning Games [Conference Reports]". IEEE Control Systems 39, n. 6 (dicembre 2019): 122–24. http://dx.doi.org/10.1109/mcs.2019.2938053.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
36

Yücesoy, Yiğit E. Yücesoy, e M. Borahan Tümer. "Hierarchical Reinforcement Learning with Context Detection (HRL-CD)". International Journal of Machine Learning and Computing 5, n. 5 (ottobre 2015): 353–58. http://dx.doi.org/10.7763/ijmlc.2015.v5.533.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
37

G Soares Azhari, Teotino. "Semantic Reinforcement Learning Model for Education Question Answering". International Journal of Science and Research (IJSR) 12, n. 2 (5 febbraio 2023): 1648–53. http://dx.doi.org/10.21275/sr23213125341.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
38

Dang, Ngoc Trung, e Phuong Nam Dao. "Data-Driven Reinforcement Learning Control for Quadrotor Systems". International Journal of Mechanical Engineering and Robotics Research 13, n. 5 (2024): 495–501. http://dx.doi.org/10.18178/ijmerr.13.5.495-501.

Testo completo
Abstract (sommario):
This paper aims to solve the tracking problem and optimality effectiveness of an Unmanned Aerial Vehicle (UAV) by model-free data Reinforcement Learning (RL) algorithms in both sub-systems of attitude and position. First, a cascade UAV model structure is given to establish the control system diagram with two corresponding attitude and position control loops. Second, based on the computation of the time derivative of the Bellman function by two different methods, the combination of the Bellman function and the optimal control is adopted to maintain the control signal as time converges to infinity with the addition of a discount factor. Third, according to off policy technique, the two proposed model-free RL algorithms are designed for attitude and position sub-systems in UAV control structure with a discount factor, respectively. In particular, the designed algorithms not only solve the trajectory tracking problem but also guarantee the optimality performance. Finally, an illustrative system is used to verify the performance of the proposed model-free data RL algorithms in the UAV control system.
Gli stili APA, Harvard, Vancouver, ISO e altri
39

Bae, Jung Ho, Yun-Seong Kang, Sukmin Yoon, Yong-Duk Kim e Sungho Kim. "Aircraft Reinforcement Learning using Curriculum Learning". Journal of KIISE 48, n. 6 (30 giugno 2021): 707–12. http://dx.doi.org/10.5626/jok.2021.48.6.707.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
40

Matsubara, Takamitsu. "Learning Control Policies by Reinforcement Learning". Journal of the Robotics Society of Japan 36, n. 9 (2018): 597–600. http://dx.doi.org/10.7210/jrsj.36.597.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
41

Fachantidis, Anestis, Matthew Taylor e Ioannis Vlahavas. "Learning to Teach Reinforcement Learning Agents". Machine Learning and Knowledge Extraction 1, n. 1 (6 dicembre 2017): 21–42. http://dx.doi.org/10.3390/make1010002.

Testo completo
Abstract (sommario):
In this article, we study the transfer learning model of action advice under a budget. We focus on reinforcement learning teachers providing action advice to heterogeneous students playing the game of Pac-Man under a limited advice budget. First, we examine several critical factors affecting advice quality in this setting, such as the average performance of the teacher, its variance and the importance of reward discounting in advising. The experiments show that the best performers are not always the best teachers and reveal the non-trivial importance of the coefficient of variation (CV) as a statistic for choosing policies that generate advice. The CV statistic relates variance to the corresponding mean. Second, the article studies policy learning for distributing advice under a budget. Whereas most methods in the relevant literature rely on heuristics for advice distribution, we formulate the problem as a learning one and propose a novel reinforcement learning algorithm capable of learning when to advise or not. The proposed algorithm is able to advise even when it does not have knowledge of the student’s intended action and needs significantly less training time compared to previous learning approaches. Finally, in this article, we argue that learning to advise under a budget is an instance of a more generic learning problem: Constrained Exploitation Reinforcement Learning.
Gli stili APA, Harvard, Vancouver, ISO e altri
42

NISHIZAWA, Chieko, e Hirokazu MATSUI. "Reinforcement learning with multiplex learning spaces". Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2016 (2016): 1P1–04b3. http://dx.doi.org/10.1299/jsmermd.2016.1p1-04b3.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
43

Liu, Shiyi. "Research of Multi-agent Deep Reinforcement Learning based on Value Factorization". Highlights in Science, Engineering and Technology 39 (1 aprile 2023): 848–54. http://dx.doi.org/10.54097/hset.v39i.6655.

Testo completo
Abstract (sommario):
One of the numerous multi-agents’ deep reinforcements learning methods and a hotspot for research in the field is multi-agent deep reinforcement learning based on value factorization. In order to effectively address the issues of environmental instability and the exponential expansion of action space in multi-agent systems, it uses some constraints to break down the joint action value function of the multi-agent system into a specific combination of individual action value functions. Firstly, in this paper, the reason for the factorization of value function is explained. The fundamentals of multi-agent deep reinforcement learning are then introduced. The multi-agent deep reinforcement learning algorithms based on value factorization may then be separated into simple factorization and attention-mechanism based algorithms depending on whether other mechanisms are incorporated and which various mechanisms are introduced. Then several typical algorithms are introduced and their advantages and disadvantages are compared and analyzed. Finally, the content of reinforcement learning elaborated in this paper is summarized.
Gli stili APA, Harvard, Vancouver, ISO e altri
44

White, Devin, Mingkang Wu, Ellen Novoseller, Vernon J. Lawhern, Nicholas Waytowich e Yongcan Cao. "Rating-Based Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 9 (24 marzo 2024): 10207–15. http://dx.doi.org/10.1609/aaai.v38i9.28886.

Testo completo
Abstract (sommario):
This paper develops a novel rating-based reinforcement learning approach that uses human ratings to obtain human guidance in reinforcement learning. Different from the existing preference-based and ranking-based reinforcement learning paradigms, based on human relative preferences over sample pairs, the proposed rating-based reinforcement learning approach is based on human evaluation of individual trajectories without relative comparisons between sample pairs. The rating-based reinforcement learning approach builds on a new prediction model for human ratings and a novel multi-class loss function. We conduct several experimental studies based on synthetic ratings and real human ratings to evaluate the effectiveness and benefits of the new rating-based reinforcement learning approach.
Gli stili APA, Harvard, Vancouver, ISO e altri
45

Kim, Man-Je, Hyunsoo Park e Chang Wook Ahn. "Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning". Electronics 11, n. 7 (28 marzo 2022): 1069. http://dx.doi.org/10.3390/electronics11071069.

Testo completo
Abstract (sommario):
Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.
Gli stili APA, Harvard, Vancouver, ISO e altri
46

Datta, Shounak, Yanjun Li, Matthew M. Ruppert, Yuanfang Ren, Benjamin Shickel, Tezcan Ozrazgat-Baslanti, Parisa Rashidi e Azra Bihorac. "Reinforcement learning in surgery". Surgery 170, n. 1 (luglio 2021): 329–32. http://dx.doi.org/10.1016/j.surg.2020.11.040.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
47

Khan, Koffka, e Wayne Goodridge. "Reinforcement Learning In DASH". International Journal of Advanced Networking and Applications 11, n. 05 (2020): 4386–92. http://dx.doi.org/10.35444/ijana.2020.11052.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
48

Kaelbling, L. P., M. L. Littman e A. W. Moore. "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research 4 (1 maggio 1996): 237–85. http://dx.doi.org/10.1613/jair.301.

Testo completo
Abstract (sommario):
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Gli stili APA, Harvard, Vancouver, ISO e altri
49

Simpkins, Christopher, e Charles Isbell. "Composable Modular Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 4975–82. http://dx.doi.org/10.1609/aaai.v33i01.33014975.

Testo completo
Abstract (sommario):
Modular reinforcement learning (MRL) decomposes a monolithic multiple-goal problem into modules that solve a portion of the original problem. The modules’ action preferences are arbitrated to determine the action taken by the agent. Truly modular reinforcement learning would support not only decomposition into modules, but composability of separately written modules in new modular reinforcement learning agents. However, the performance of MRL agents that arbitrate module preferences using additive reward schemes degrades when the modules have incomparable reward scales. This performance degradation means that separately written modules cannot be composed in new modular reinforcement learning agents as-is – they may need to be modified to align their reward scales. We solve this problem with a Q-learningbased command arbitration algorithm and demonstrate that it does not exhibit the same performance degradation as existing approaches to MRL, thereby supporting composability.
Gli stili APA, Harvard, Vancouver, ISO e altri
50

Gallego, Victor, Roi Naveiro e David Rios Insua. "Reinforcement Learning under Threats". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 9939–40. http://dx.doi.org/10.1609/aaai.v33i01.33019939.

Testo completo
Abstract (sommario):
In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.
Gli stili APA, Harvard, Vancouver, ISO e altri
Offriamo sconti su tutti i piani premium per gli autori le cui opere sono incluse in raccolte letterarie tematiche. Contattaci per ottenere un codice promozionale unico!

Vai alla bibliografia