Gotowa bibliografia na temat „Reinforcement Learning”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Reinforcement Learning”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Reinforcement Learning"

1

Deora, Merin, i Sumit Mathur. "Reinforcement Learning". IJARCCE 6, nr 4 (30.04.2017): 178–81. http://dx.doi.org/10.17148/ijarcce.2017.6433.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Barto, Andrew G. "Reinforcement Learning". IFAC Proceedings Volumes 31, nr 29 (październik 1998): 5. http://dx.doi.org/10.1016/s1474-6670(17)38315-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Woergoetter, Florentin, i Bernd Porr. "Reinforcement learning". Scholarpedia 3, nr 3 (2008): 1448. http://dx.doi.org/10.4249/scholarpedia.1448.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Moore, Brett L., Anthony G. Doufas i Larry D. Pyeatt. "Reinforcement Learning". Anesthesia & Analgesia 112, nr 2 (luty 2011): 360–67. http://dx.doi.org/10.1213/ane.0b013e31820334a7.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Liaq, Mudassar, i Yungcheol Byun. "Autonomous UAV Navigation Using Reinforcement Learning". International Journal of Machine Learning and Computing 9, nr 6 (grudzień 2019): 756–61. http://dx.doi.org/10.18178/ijmlc.2019.9.6.869.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Alrammal, Muath, i Munir Naveed. "Monte-Carlo Based Reinforcement Learning (MCRL)". International Journal of Machine Learning and Computing 10, nr 2 (luty 2020): 227–32. http://dx.doi.org/10.18178/ijmlc.2020.10.2.924.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Nurmuhammet, Abdullayev. "DEEP REINFORCEMENT LEARNING ON STOCK DATA". Alatoo Academic Studies 23, nr 2 (30.06.2023): 505–18. http://dx.doi.org/10.17015/aas.2023.232.49.

Pełny tekst źródła
Streszczenie:
This study proposes using Deep Reinforcement Learning (DRL) for stock trading decisions and prediction. DRL is a machine learning technique that enables agents to learn optimal strategies by interacting with their environment. The proposed model surpasses traditional models and can make informed trading decisions in real-time. The study highlights the feasibility of applying DRL in financial markets and its advantages in strategic decision- making. The model's ability to learn from market dynamics makes it a promising approach for stock market forecasting. Overall, this paper provides valuable insights into the use of DRL for stock trading decisions and prediction, establishing a strong case for its adoption in financial markets. Keywords: reinforcement learning, stock market, deep reinforcement learning.
Style APA, Harvard, Vancouver, ISO itp.
8

Likas, Aristidis. "A Reinforcement Learning Approach to Online Clustering". Neural Computation 11, nr 8 (1.11.1999): 1915–32. http://dx.doi.org/10.1162/089976699300016025.

Pełny tekst źródła
Streszczenie:
A general technique is proposed for embedding online clustering algorithms based on competitive learning in a reinforcement learning framework. The basic idea is that the clustering system can be viewed as a reinforcement learning system that learns through reinforcements to follow the clustering strategy we wish to implement. In this sense, the reinforcement guided competitive learning (RGCL) algorithm is proposed that constitutes a reinforcement-based adaptation of learning vector quantization (LVQ) with enhanced clustering capabilities. In addition, we suggest extensions of RGCL and LVQ that are characterized by the property of sustained exploration and significantly improve the performance of those algorithms, as indicated by experimental tests on well-known data sets.
Style APA, Harvard, Vancouver, ISO itp.
9

Mardhatillah, Elsy. "Teacher’s Reinforcement in English Classroom in MTSS Darul Makmur Sungai Cubadak". Indonesian Research Journal On Education 3, nr 1 (2.01.2022): 825–32. http://dx.doi.org/10.31004/irje.v3i1.202.

Pełny tekst źródła
Streszczenie:
This research was due to some problems found in MTsS Darul Makmur. First, some students were not motivated in learning. Second, sometime the teacher still uses Indonesian in giving reinforcements. Third, some Students did not care about the teacher's reinforcement. This study aimed to find out the types of reinforcement used by the teacher. Then, to find out the types of reinforcement often and rarely to be usedby the teacher. Then, to find out the reasons the teacher used certain reinforcements. Last, to find out how the teacher understands the reinforcement. This research used a qualitative approach. The design of this research was descriptive because the researcher made a description of the use of reinforcement by theteacher in the English classroom. In this research, the interview and observation sheets were used by the researcher. The researcher found that the type of reinforcement used by the teacher is positive reinforcement and negative reinforcement. First, there were two types of positive reinforcement used by teachers, namely verbal reinforcement and non-verbal reinforcement. The verbal often used by theteacher was a reinforcement in the form of words and reinforcement in the form of phrases. Then, verbal reinforcement in the form of sentences was never done by the teacher in the learning process. While the non-verbal reinforcement often used by the teacher was gestural, activity reinforcement, and proximity reinforcement. Second, the negative reinforcement often used by the teacher was a warning, gesture, and eye contact. Meanwhile, the negative reinforcement rarely used by the teacher was speech volume andpunishment. Third, the reasons teachers reinforce learning are to motivate students and make students feel appreciated and happy while learning.
Style APA, Harvard, Vancouver, ISO itp.
10

Fan, ZiSheng. "An exploration of reinforcement learning and deep reinforcement learning". Applied and Computational Engineering 73, nr 1 (5.07.2024): 154–59. http://dx.doi.org/10.54254/2755-2721/73/20240386.

Pełny tekst źródła
Streszczenie:
Today, machine learning is evolving so quickly that new algorithms are always appearing. Deep neural networks in particular have shown positive outcomes in a variety of areas, including computer vision, natural language processing, and time series prediction. Its development moves at a very sluggish pace due to the high threshold. Therefore, a thorough examination of the reinforcement learning field should be required. This essay examines both the deep learning algorithm and the reinforcement learning operational procedure. The study identifies information retrieval, data mining, intelligent speech, natural language processing, and reinforcement learning as key technologies. The scientific study of reinforcement learning has advanced remarkably quickly, and it is now being used to tackle important decision optimization issues at academic conferences and journal research work in computer networks, computer graphics, etc. Brief introductions and reviews of both types of models are provided in this paper, along with an understanding of some of the most cutting-edge reinforcement learning applications and approaches.
Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat "Reinforcement Learning"

1

Izquierdo, Ayala Pablo. "Learning comparison: Reinforcement Learning vs Inverse Reinforcement Learning : How well does inverse reinforcement learning perform in simple markov decision processes in comparison to reinforcement learning?" Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-259371.

Pełny tekst źródła
Streszczenie:
This research project elaborates a qualitative comparison between two different learning approaches, Reinforcement Learning (RL) and Inverse Reinforcement Learning (IRL) over the Gridworld Markov Decision Process. The interest focus will be set on the second learning paradigm, IRL, as it is considered to be relatively new and little work has been developed in this field of study. As observed, RL outperforms IRL, obtaining a correct solution in all the different scenarios studied. However, the behaviour of the IRL algorithms can be improved and this will be shown and analyzed as part of the scope.
Denna studie är en kvalitativ jämförelse mellan två olika inlärningsangreppssätt, “Reinforcement Learning” (RL) och “Inverse Reinforcement Learning” (IRL), om använder "Gridworld", en "Markov Decision-Process". Fokus ligger på den senare algoritmen, IRL, eftersom den anses relativt ny och få studier har i nuläget gjorts kring den. I studien är RL mer fördelaktig än IRL, som skapar en korrekt lösning i alla olika scenarier som presenteras i studien. Beteendet hos IRL-algoritmen kan dock förbättras vilket också visas och analyseras i denna studie.
Style APA, Harvard, Vancouver, ISO itp.
2

Seymour, B. J. "Aversive reinforcement learning". Thesis, University College London (University of London), 2010. http://discovery.ucl.ac.uk/800107/.

Pełny tekst źródła
Streszczenie:
We hypothesise that human aversive learning can be described algorithmically by Reinforcement Learning models. Our first experiment uses a second-order conditioning design to study sequential outcome prediction. We show that aversive prediction errors are expressed robustly in the ventral striatum, supporting the validity of temporal difference algorithms (as in reward learning), and suggesting a putative critical area for appetitive-aversive interactions. With this in mind, the second experiment explores the nature of pain relief, which as expounded in theories of motivational opponency, is rewarding. In a Pavlovian conditioning task with phasic relief of tonic noxious thermal stimulation, we show that both appetitive and aversive prediction errors are co-expressed in anatomically dissociable regions (in a mirror opponent pattern) and that striatal activity appears to reflect integrated appetitive-aversive processing. Next we designed a Pavlovian task in which cues predicted either financial gains, losses, or both, thereby forcing integration of both motivational streams. This showed anatomical dissociation of aversive and appetitive predictions along a posterior-anterior gradient within the striatum, respectively. Lastly, we studied aversive instrumental control (avoidance). We designed a simultaneous pain avoidance and financial reward learning task, in which subjects had to learn independently learn about each, and trade off aversive and appetitive predictions. We show that predictions for both converge on the medial head of caudate nucleus, suggesting that this is a critical site for appetitive-aversive integration in instrumental decision making. We also study also tested whether serotonin (5HT) modulates either phasic or tonic opponency using acute tryptophan depletion. Both behavioural and imaging data confirm the latter, in which it appears to mediate an average reward term, providing an aspiration level against which the benefits of exploration are judged. In summary, our data provide a basic computational and neuroanatomical framework for human aversive learning. We demonstrate the algorithmic and implementational validity of reinforcement learning models for both aversive prediction and control, illustrate the nature and neuroanatomy of appetitive-aversive integration, and discover the critical (and somewhat unexpected) central role for the striatum.
Style APA, Harvard, Vancouver, ISO itp.
3

Akrour, Riad. "Robust Preference Learning-based Reinforcement Learning". Thesis, Paris 11, 2014. http://www.theses.fr/2014PA112236/document.

Pełny tekst źródła
Streszczenie:
Les contributions de la thèse sont centrées sur la prise de décisions séquentielles et plus spécialement sur l'Apprentissage par Renforcement (AR). Prenant sa source de l'apprentissage statistique au même titre que l'apprentissage supervisé et non-supervisé, l'AR a gagné en popularité ces deux dernières décennies en raisons de percées aussi bien applicatives que théoriques. L'AR suppose que l'agent (apprenant) ainsi que son environnement suivent un processus de décision stochastique Markovien sur un espace d'états et d'actions. Le processus est dit de décision parce que l'agent est appelé à choisir à chaque pas de temps du processus l'action à prendre. Il est dit stochastique parce que le choix d'une action donnée en un état donné n'implique pas le passage systématique à un état particulier mais définit plutôt une distribution sur l'espace d'états. Il est dit Markovien parce que cette distribution ne dépend que de l'état et de l'action courante. En conséquence d'un choix d'action, l'agent reçoit une récompense. Le but de l'AR est alors de résoudre le problème d'optimisation retournant le comportement qui assure à l'agent une récompense maximale tout au long de son interaction avec l'environnement. D'un point de vue pratique, un large éventail de problèmes peuvent être transformés en un problème d'AR, du Backgammon (cf. TD-Gammon, l'une des premières grandes réussites de l'AR et de l'apprentissage statistique en général, donnant lieu à un joueur expert de classe internationale) à des problèmes de décision dans le monde industriel ou médical. Seulement, le problème d'optimisation résolu par l'AR dépend de la définition préalable d'une fonction de récompense adéquate nécessitant une expertise certaine du domaine d'intérêt mais aussi du fonctionnement interne des algorithmes d'AR. En ce sens, la première contribution de la thèse a été de proposer un nouveau cadre d'apprentissage, allégeant les prérequis exigés à l'utilisateur. Ainsi, ce dernier n'a plus besoin de connaître la solution exacte du problème mais seulement de pouvoir désigner entre deux comportements, celui qui s'approche le plus de la solution. L'apprentissage se déroule en interaction entre l'utilisateur et l'agent. Cette interaction s'articule autour des trois points suivants : i) L'agent exhibe un nouveau comportement ii) l'expert le compare au meilleur comportement jusqu'à présent iii) l'agent utilise ce retour pour mettre à jour son modèle des préférences puis choisit le prochain comportement à démontrer. Afin de réduire le nombre d'interactions nécessaires entre l'utilisateur et l'agent pour que ce dernier trouve le comportement optimal, la seconde contribution de la thèse a été de définir un critère théoriquement justifié faisant le compromis entre les désirs parfois contradictoires de prendre en compte les préférences de l'utilisateur tout en exhibant des comportements suffisamment différents de ceux déjà proposés. La dernière contribution de la thèse est d'assurer la robustesse de l'algorithme face aux éventuelles erreurs d'appréciation de l'utilisateur. Ce qui arrive souvent en pratique, spécialement au début de l'interaction, quand tous les comportements proposés par l'agent sont loin de la solution attendue
The thesis contributions resolves around sequential decision taking and more precisely Reinforcement Learning (RL). Taking its root in Machine Learning in the same way as supervised and unsupervised learning, RL quickly grow in popularity within the last two decades due to a handful of achievements on both the theoretical and applicative front. RL supposes that the learning agent and its environment follow a stochastic Markovian decision process over a state and action space. The process is said of decision as the agent is asked to choose at each time step an action to take. It is said stochastic as the effect of selecting a given action in a given state does not systematically yield the same state but rather defines a distribution over the state space. It is said to be Markovian as this distribution only depends on the current state-action pair. Consequently to the choice of an action, the agent receives a reward. The RL goal is then to solve the underlying optimization problem of finding the behaviour that maximizes the sum of rewards all along the interaction of the agent with its environment. From an applicative point of view, a large spectrum of problems can be cast onto an RL one, from Backgammon (TD-Gammon, was one of Machine Learning first success giving rise to a world class player of advanced level) to decision problems in the industrial and medical world. However, the optimization problem solved by RL depends on the prevous definition of a reward function that requires a certain level of domain expertise and also knowledge of the internal quirks of RL algorithms. As such, the first contribution of the thesis was to propose a learning framework that lightens the requirements made to the user. The latter does not need anymore to know the exact solution of the problem but to only be able to choose between two behaviours exhibited by the agent, the one that matches more closely the solution. Learning is interactive between the agent and the user and resolves around the three main following points: i) The agent demonstrates a behaviour ii) The user compares it w.r.t. to the current best one iii) The agent uses this feedback to update its preference model of the user and uses it to find the next behaviour to demonstrate. To reduce the number of required interactions before finding the optimal behaviour, the second contribution of the thesis was to define a theoretically sound criterion making the trade-off between the sometimes contradicting desires of complying with the user's preferences and demonstrating sufficiently different behaviours. The last contribution was to ensure the robustness of the algorithm w.r.t. the feedback errors that the user might make. Which happens more often than not in practice, especially at the initial phase of the interaction, when all the behaviours are far from the expected solution
Style APA, Harvard, Vancouver, ISO itp.
4

Tabell, Johnsson Marco, i Ala Jafar. "Efficiency Comparison Between Curriculum Reinforcement Learning & Reinforcement Learning Using ML-Agents". Thesis, Blekinge Tekniska Högskola, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-20218.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Yang, Zhaoyuan Yang. "Adversarial Reinforcement Learning for Control System Design: A Deep Reinforcement Learning Approach". The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu152411491981452.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Cortesi, Daniele. "Reinforcement Learning in Rogue". Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16138/.

Pełny tekst źródła
Streszczenie:
In this work we use Reinforcement Learning to play the famous Rogue, a dungeon-crawler videogame father of the rogue-like genre. By employing different algorithms we substantially improve on the results obtained in previous work, addressing and solving the problems that were arisen. We then devise and perform new experiments to test the limits of our own solution and encounter additional and unexpected issues in the process. In one of the investigated scenario we clearly see that our approach is not yet enough to even perform better than a random agent and propose ideas for future works.
Style APA, Harvard, Vancouver, ISO itp.
7

Girgin, Sertan. "Abstraction In Reinforcement Learning". Phd thesis, METU, 2007. http://etd.lib.metu.edu.tr/upload/12608257/index.pdf.

Pełny tekst źródła
Streszczenie:
Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment. Generally, the problem to be solved contains subtasks that repeat at different regions of the state space. Without any guidance an agent has to learn the solutions of all subtask instances independently, which degrades the learning performance. In this thesis, we propose two approaches to build connections between different regions of the search space leading to better utilization of gained experience and accelerate learning is proposed. In the first approach, we first extend existing work of McGovern and propose the formalization of stochastic conditionally terminating sequences with higher representational power. Then, we describe how to efficiently discover and employ useful abstractions during learning based on such sequences. The method constructs a tree structure to keep track of frequently used action sequences together with visited states. This tree is then used to select actions to be executed at each step. In the second approach, we propose a novel method to identify states with similar sub-policies, and show how they can be integrated into reinforcement learning framework to improve the learning performance. The method uses an efficient data structure to find common action sequences started from observed states and defines a similarity function between states based on the number of such sequences. Using this similarity function, updates on the action-value function of a state are reflected to all similar states. This, consequently, allows experience acquired during learning be applied to a broader context. Effectiveness of both approaches is demonstrated empirically by conducting extensive experiments on various domains.
Style APA, Harvard, Vancouver, ISO itp.
8

Suay, Halit Bener. "Reinforcement Learning from Demonstration". Digital WPI, 2016. https://digitalcommons.wpi.edu/etd-dissertations/173.

Pełny tekst źródła
Streszczenie:
Off-the-shelf Reinforcement Learning (RL) algorithms suffer from slow learning performance, partly because they are expected to learn a task from scratch merely through an agent's own experience. In this thesis, we show that learning from scratch is a limiting factor for the learning performance, and that when prior knowledge is available RL agents can learn a task faster. We evaluate relevant previous work and our own algorithms in various experiments. Our first contribution is the first implementation and evaluation of an existing interactive RL algorithm in a real-world domain with a humanoid robot. Interactive RL was evaluated in a simulated domain which motivated us for evaluating its practicality on a robot. Our evaluation shows that guidance reduces learning time, and that its positive effects increase with state space size. A natural follow up question after our first evaluation was, how do some other previous works compare to interactive RL. Our second contribution is an analysis of a user study, where na"ive human teachers demonstrated a real-world object catching with a humanoid robot. We present the first comparison of several previous works in a common real-world domain with a user study. One conclusion of the user study was the high potential of RL despite poor usability due to slow learning rate. As an effort to improve the learning efficiency of RL learners, our third contribution is a novel human-agent knowledge transfer algorithm. Using demonstrations from three teachers with varying expertise in a simulated domain, we show that regardless of the skill level, human demonstrations can improve the asymptotic performance of an RL agent. As an alternative approach for encoding human knowledge in RL, we investigated the use of reward shaping. Our final contributions are Static Inverse Reinforcement Learning Shaping and Dynamic Inverse Reinforcement Learning Shaping algorithms that use human demonstrations for recovering a shaping reward function. Our experiments in simulated domains show that our approach outperforms the state-of-the-art in cumulative reward, learning rate and asymptotic performance. Overall we show that human demonstrators with varying skills can help RL agents to learn tasks more efficiently.
Style APA, Harvard, Vancouver, ISO itp.
9

Gao, Yang. "Argumentation accelerated reinforcement learning". Thesis, Imperial College London, 2014. http://hdl.handle.net/10044/1/26603.

Pełny tekst źródła
Streszczenie:
Reinforcement Learning (RL) is a popular statistical Artificial Intelligence (AI) technique for building autonomous agents, but it suffers from the curse of dimensionality: the computational requirement for obtaining the optimal policies grows exponentially with the size of the state space. Integrating heuristics into RL has proven to be an effective approach to combat this curse, but deriving high-quality heuristics from people's (typically conflicting) domain knowledge is challenging, yet it received little research attention. Argumentation theory is a logic-based AI technique well-known for its conflict resolution capability and intuitive appeal. In this thesis, we investigate the integration of argumentation frameworks into RL algorithms, so as to improve the convergence speed of RL algorithms. In particular, we propose a variant of Value-based Argumentation Framework (VAF) to represent domain knowledge and to derive heuristics from this knowledge. We prove that the heuristics derived from this framework can effectively instruct individual learning agents as well as multiple cooperative learning agents. In addition,we propose the Argumentation Accelerated RL (AARL) framework to integrate these heuristics into different RL algorithms via Potential Based Reward Shaping (PBRS) techniques: we use classical PBRS techniques for flat RL (e.g. SARSA(λ)) based AARL, and propose a novel PBRS technique for MAXQ-0, a hierarchical RL (HRL) algorithm, so as to implement HRL based AARL. We empirically test two AARL implementations - SARSA(λ)-based AARL and MAXQ-based AARL - in multiple application domains, including single-agent and multi-agent learning problems. Empirical results indicate that AARL can improve the convergence speed of RL, and can also be easily used by people that have little background in Argumentation and RL.
Style APA, Harvard, Vancouver, ISO itp.
10

Alexander, John W. "Transfer in reinforcement learning". Thesis, University of Aberdeen, 2015. http://digitool.abdn.ac.uk:80/webclient/DeliveryManager?pid=227908.

Pełny tekst źródła
Streszczenie:
The problem of developing skill repertoires autonomously in robotics and artificial intelligence is becoming ever more pressing. Currently, the issues of how to apply prior knowledge to new situations and which knowledge to apply have not been sufficiently studied. We present a transfer setting where a reinforcement learning agent faces multiple problem solving tasks drawn from an unknown generative process, where each task has similar dynamics. The task dynamics are changed by varying in the transition function between states. The tasks are presented sequentially with the latest task presented considered as the target for transfer. We describe two approaches to solving this problem. Firstly we present an algorithm for transfer of the function encoding the stateaction value, defined as value function transfer. This algorithm uses the value function of a source policy to initialise the policy of a target task. We varied the type of basis the algorithm used to approximate the value function. Empirical results in several well known domains showed that the learners benefited from the transfer in the majority of cases. Results also showed that the Radial basis performed better in general than the Fourier. However contrary to expectation the Fourier basis benefited most from the transfer. Secondly, we present an algorithm for learning an informative prior which encodes beliefs about the underlying dynamics shared across all tasks. We call this agent the Informative Prior agent (IP). The prior is learnt though experience and captures the commonalities in the transition dynamics of the domain and allows for a quantification of the agent's uncertainty about these. By using a sparse distribution of the uncertainty in the dynamics as a prior, the IP agent can successfully learn a model of 1) the set of feasible transitions rather than the set of possible transitions, and 2) the likelihood of each of the feasible transitions. Analysis focusing on the accuracy of the learned model showed that IP had a very good accuracy bound, which is expressible in terms of only the permissible error and the diffusion, a factor that describes the concentration of the prior mass around the truth, and which decreases as the number of tasks experienced grows. The empirical evaluation of IP showed that an agent which uses the informative prior outperforms several existing Bayesian reinforcement learning algorithms on tasks with shared structure in a domain where multiple related tasks were presented only once to the learners. IP is a step towards the autonomous acquisition of behaviours in artificial intelligence. IP also provides a contribution towards the analysis of exploration and exploitation in the transfer paradigm.
Style APA, Harvard, Vancouver, ISO itp.

Książki na temat "Reinforcement Learning"

1

S, Sutton Richard, red. Reinforcement learning. Boston: Kluwer Academic Publishers, 1992.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Sutton, Richard S. Reinforcement Learning. Boston, MA: Springer US, 1992.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Wiering, Marco, i Martijn van Otterlo, red. Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-27645-3.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Sutton, Richard S., red. Reinforcement Learning. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Lorenz, Uwe. Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2020. http://dx.doi.org/10.1007/978-3-662-61651-2.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Nandy, Abhishek, i Manisha Biswas. Reinforcement Learning. Berkeley, CA: Apress, 2018. http://dx.doi.org/10.1007/978-1-4842-3285-9.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Li, Jinna, Frank L. Lewis i Jialu Fan. Reinforcement Learning. Cham: Springer International Publishing, 2023. http://dx.doi.org/10.1007/978-3-031-28394-9.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Lorenz, Uwe. Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2024. http://dx.doi.org/10.1007/978-3-662-68311-8.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Merrick, Kathryn, i Mary Lou Maher. Motivated Reinforcement Learning. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-540-89187-1.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Dong, Hao, Zihan Ding i Shanghang Zhang, red. Deep Reinforcement Learning. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-4095-0.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Części książek na temat "Reinforcement Learning"

1

Sutton, Richard S. "Introduction: The Challenge of Reinforcement Learning". W Reinforcement Learning, 1–3. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_1.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Williams, Ronald J. "Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning". W Reinforcement Learning, 5–32. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_2.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Tesauro, Gerald. "Practical Issues in Temporal Difference Learning". W Reinforcement Learning, 33–53. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_3.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Watkins, Christopher J. C. H., i Peter Dayan. "Technical Note". W Reinforcement Learning, 55–68. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_4.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Lin, Long-Ji. "Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching". W Reinforcement Learning, 69–97. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Singh, Satinder Pal. "Transfer of Learning by Composing Solutions of Elemental Sequential Tasks". W Reinforcement Learning, 99–115. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_6.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Dayan, Peter. "The Convergence of TD(λ) for General λ". W Reinforcement Learning, 117–38. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_7.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Millán, José R., i Carme Torras. "A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments". W Reinforcement Learning, 139–71. Boston, MA: Springer US, 1992. http://dx.doi.org/10.1007/978-1-4615-3618-5_8.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Lorenz, Uwe. "Bestärkendes Lernen als Teilgebiet des Maschinellen Lernens". W Reinforcement Learning, 1–11. Berlin, Heidelberg: Springer Berlin Heidelberg, 2020. http://dx.doi.org/10.1007/978-3-662-61651-2_1.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Lorenz, Uwe. "Grundbegriffe des Bestärkenden Lernens". W Reinforcement Learning, 13–20. Berlin, Heidelberg: Springer Berlin Heidelberg, 2020. http://dx.doi.org/10.1007/978-3-662-61651-2_2.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Reinforcement Learning"

1

Yang, Kun, Chengshuai Shi i Cong Shen. "Teaching Reinforcement Learning Agents via Reinforcement Learning". W 2023 57th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2023. http://dx.doi.org/10.1109/ciss56502.2023.10089695.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Doshi, Finale, Joelle Pineau i Nicholas Roy. "Reinforcement learning with limited reinforcement". W the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390189.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Li, Zhiyi. "Reinforcement Learning". W SIGCSE '19: The 50th ACM Technical Symposium on Computer Science Education. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3287324.3293703.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Shen, Shitian, i Min Chi. "Reinforcement Learning". W UMAP '16: User Modeling, Adaptation and Personalization Conference. New York, NY, USA: ACM, 2016. http://dx.doi.org/10.1145/2930238.2930247.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Kuroe, Yasuaki, i Kenya Takeuchi. "Sophisticated Swarm Reinforcement Learning by Incorporating Inverse Reinforcement Learning". W 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2023. http://dx.doi.org/10.1109/smc53992.2023.10394525.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Lyu, Le, Yang Shen i Sicheng Zhang. "The Advance of Reinforcement Learning and Deep Reinforcement Learning". W 2022 IEEE International Conference on Electrical Engineering, Big Data and Algorithms (EEBDA). IEEE, 2022. http://dx.doi.org/10.1109/eebda53927.2022.9744760.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Epshteyn, Arkady, Adam Vogel i Gerald DeJong. "Active reinforcement learning". W the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390194.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Epshteyn, Arkady, i Gerald DeJong. "Qualitative reinforcement learning". W the 23rd international conference. New York, New York, USA: ACM Press, 2006. http://dx.doi.org/10.1145/1143844.1143883.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Vargas, Danilo Vasconcellos. "Evolutionary reinforcement learning". W GECCO '18: Genetic and Evolutionary Computation Conference. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3205651.3207865.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Langford, John. "Contextual reinforcement learning". W 2017 IEEE International Conference on Big Data (Big Data). IEEE, 2017. http://dx.doi.org/10.1109/bigdata.2017.8257902.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Raporty organizacyjne na temat "Reinforcement Learning"

1

Singh, Satinder, Andrew G. Barto i Nuttapong Chentanez. Intrinsically Motivated Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, styczeń 2005. http://dx.doi.org/10.21236/ada440280.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Ghavamzadeh, Mohammad, i Sridhar Mahadevan. Hierarchical Multiagent Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, styczeń 2004. http://dx.doi.org/10.21236/ada440418.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Harmon, Mance E., i Stephanie S. Harmon. Reinforcement Learning: A Tutorial. Fort Belvoir, VA: Defense Technical Information Center, styczeń 1997. http://dx.doi.org/10.21236/ada323194.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Tadepalli, Prasad, i Alan Fern. Partial Planning Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, sierpień 2012. http://dx.doi.org/10.21236/ada574717.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Ghavamzadeh, Mohammad, i Sridhar Mahadevan. Hierarchical Average Reward Reinforcement Learning. Fort Belvoir, VA: Defense Technical Information Center, czerwiec 2003. http://dx.doi.org/10.21236/ada445728.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Johnson, Daniel W. Drive-Reinforcement Learning System Applications. Fort Belvoir, VA: Defense Technical Information Center, lipiec 1992. http://dx.doi.org/10.21236/ada264514.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Cleland, Andrew. Bounding Box Improvement With Reinforcement Learning. Portland State University Library, styczeń 2000. http://dx.doi.org/10.15760/etd.6322.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Li, Jiajie. Learning Financial Investment Strategies using Reinforcement Learning and 'Chan theory'. Ames (Iowa): Iowa State University, sierpień 2022. http://dx.doi.org/10.31274/cc-20240624-946.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Baird, III, Klopf Leemon C. i A. H. Reinforcement Learning With High-Dimensional, Continuous Actions. Fort Belvoir, VA: Defense Technical Information Center, listopad 1993. http://dx.doi.org/10.21236/ada280844.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Obert, James, i Angie Shia. Optimizing Dynamic Timing Analysis with Reinforcement Learning. Office of Scientific and Technical Information (OSTI), listopad 2019. http://dx.doi.org/10.2172/1573933.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii