Academic literature on the topic 'Improper reinforcement learning'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Improper reinforcement learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Improper reinforcement learning"

1

Dass, Shuvalaxmi, and Akbar Siami Namin. "Reinforcement Learning for Generating Secure Configurations." Electronics 10, no. 19 (September 30, 2021): 2392. http://dx.doi.org/10.3390/electronics10192392.

Full text
Abstract:
Many security problems in software systems are because of vulnerabilities caused by improper configurations. A poorly configured software system leads to a multitude of vulnerabilities that can be exploited by adversaries. The problem becomes even more serious when the architecture of the underlying system is static and the misconfiguration remains for a longer period of time, enabling adversaries to thoroughly inspect the software system under attack during the reconnaissance stage. Employing diversification techniques such as Moving Target Defense (MTD) can minimize the risk of exposing vulnerabilities. MTD is an evolving defense technique through which the attack surface of the underlying system is continuously changing. However, the effectiveness of such dynamically changing platform depends not only on the goodness of the next configuration setting with respect to minimization of attack surfaces but also the diversity of set of configurations generated. To address the problem of generating a diverse and large set of secure software and system configurations, this paper introduces an approach based on Reinforcement Learning (RL) through which an agent is trained to generate the desirable set of configurations. The paper reports the performance of the RL-based secure and diverse configurations through some case studies.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhai, Peng, Jie Luo, Zhiyan Dong, Lihua Zhang, Shunli Wang, and Dingkang Yang. "Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 5 (June 28, 2022): 5431–39. http://dx.doi.org/10.1609/aaai.v36i5.20481.

Full text
Abstract:
Robust adversarial reinforcement learning is an effective method to train agents to manage uncertain disturbance and modeling errors in real environments. However, for systems that are sensitive to disturbances or those that are difficult to stabilize, it is easier to learn a powerful adversary than establish a stable control policy. An improper strong adversary can destabilize the system, introduce biases in the sampling process, make the learning process unstable, and even reduce the robustness of the policy. In this study, we consider the problem of ensuring system stability during training in the adversarial reinforcement learning architecture. The dissipative principle of robust H-infinity control is extended to the Markov Decision Process, and robust stability constraints are obtained based on L2 gain performance in the reinforcement learning system. Thus, we propose a dissipation-inequation-constraint-based adversarial reinforcement learning architecture. This architecture ensures the stability of the system during training by imposing constraints on the normal and adversarial agents. Theoretically, this architecture can be applied to a large family of deep reinforcement learning algorithms. Results of experiments in MuJoCo and GymFc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop. Both our theoretical and empirical results provide new and critical outlooks on the adversarial reinforcement learning architecture from a rigorous robust control perspective.
APA, Harvard, Vancouver, ISO, and other styles
3

Chen, Ya-Ling, Yan-Rou Cai, and Ming-Yang Cheng. "Vision-Based Robotic Object Grasping—A Deep Reinforcement Learning Approach." Machines 11, no. 2 (February 12, 2023): 275. http://dx.doi.org/10.3390/machines11020275.

Full text
Abstract:
This paper focuses on developing a robotic object grasping approach that possesses the ability of self-learning, is suitable for small-volume large variety production, and has a high success rate in object grasping/pick-and-place tasks. The proposed approach consists of a computer vision-based object detection algorithm and a deep reinforcement learning algorithm with self-learning capability. In particular, the You Only Look Once (YOLO) algorithm is employed to detect and classify all objects of interest within the field of view of a camera. Based on the detection/localization and classification results provided by YOLO, the Soft Actor-Critic deep reinforcement learning algorithm is employed to provide a desired grasp pose for the robot manipulator (i.e., learning agent) to perform object grasping. In order to speed up the training process and reduce the cost of training data collection, this paper employs the Sim-to-Real technique so as to reduce the likelihood of damaging the robot manipulator due to improper actions during the training process. The V-REP platform is used to construct a simulation environment for training the deep reinforcement learning neural network. Several experiments have been conducted and experimental results indicate that the 6-DOF industrial manipulator successfully performs object grasping with the proposed approach, even for the case of previously unseen objects.
APA, Harvard, Vancouver, ISO, and other styles
4

Hurtado-Gómez, Julián, Juan David Romo, Ricardo Salazar-Cabrera, Álvaro Pachón de la Cruz, and Juan Manuel Madrid Molina. "Traffic Signal Control System Based on Intelligent Transportation System and Reinforcement Learning." Electronics 10, no. 19 (September 28, 2021): 2363. http://dx.doi.org/10.3390/electronics10192363.

Full text
Abstract:
Traffic congestion has several causes, including insufficient road capacity, unrestricted demand and improper scheduling of traffic signal phases. A great variety of efforts have been made to properly program such phases. Some of them are based on traditional transportation assumptions, and others are adaptive, allowing the system to learn the control law (signal program) from data obtained from different sources. Reinforcement Learning (RL) is a technique commonly used in previous research. However, properly determining the states and the reward is key to obtain good results and to have a real chance to implement it. This paper proposes and implements a traffic signal control system (TSCS), detailing its development stages: (a) Intelligent Transportation System (ITS) architecture design for the TSCS; (b) design and development of a system prototype, including an RL algorithm to minimize the vehicle queue at intersections, and detection and calculation of such queues by adapting a computer vision algorithm; and (c) design and development of system tests to validate operation of the algorithms and the system prototype. Results include the development of the tests for each module (vehicle queue measurement and RL algorithm) and real-time integration tests. Finally, the article presents a system simulation in the context of a medium-sized city in a developing country, showing that the proposed system allowed reduction of vehicle queues by 29%, of waiting time by 50%, and of lost time by 50%, when compared to fixed phase times in traffic signals.
APA, Harvard, Vancouver, ISO, and other styles
5

Ziwei Pan, Ziwei Pan. "Design of Interactive Cultural Brand Marketing System based on Cloud Service Platform." 網際網路技術學刊 23, no. 2 (March 2022): 321–34. http://dx.doi.org/10.53106/160792642022032302012.

Full text
Abstract:
<p>Changes in the marketing environment and consumer behavior are the driving force for the development of online marketing. Although traditional marketing communication still exists, it has been unable to adapt to the marketing needs of modern cultural brands. On this basis, this paper combines the cloud service platform to design an interactive cultural brand marketing system. In view of the problems of improper task scheduling and resource waste in cloud platform resource scheduling in actual situations, a dynamic resource scheduling optimization model under the cloud platform environment is established, and fuzzy evaluation rules are designed. Moreover, through problem analysis, based on the reinforcement learning algorithm, this paper proposes a deep reinforcement learning resource scheduling algorithm based on tabu search, and combines the algorithm to design the functional module of the marketing system. On this basis, this paper designs an experiment to verify the performance of this interactive cultural brand marketing system. The research results prove that the marketing system constructed in this paper has certain reliability.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
6

Kim, Byeongjun, Gunam Kwon, Chaneun Park, and Nam Kyu Kwon. "The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place." Biomimetics 8, no. 2 (June 6, 2023): 240. http://dx.doi.org/10.3390/biomimetics8020240.

Full text
Abstract:
This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping is implemented via simple logic which is easily designable but may result in improper gripping. To assist the grasping task properly, a dedicated reward system for approaching the object is designed through using individual axis-based weights. To verify the validity of the proposed method, wecarry out various experiments in the MuJoCo physics engine with the Robosuite framework. According to the simulation results of four trials, the robot manipulator picked up and released the object in the goal position with an average success rate of 93.2%.
APA, Harvard, Vancouver, ISO, and other styles
7

Ritonga, Mahyudin, and Fitria Sartika. "Muyûl al-Talâmidh fî Tadrîs al-Qirâ’ah." Jurnal Alfazuna : Jurnal Pembelajaran Bahasa Arab dan Kebahasaaraban 6, no. 1 (December 21, 2021): 36–52. http://dx.doi.org/10.15642/alfazuna.v6i1.1715.

Full text
Abstract:
Purpose- This study aims to reveal the spirit and motivation of learners in studying Qiro'ah, specifically the study is focused on the description of the forms of motivation of learners, factors that affect the motivation of learners in learning qiro'ah, as well as the steps taken by teachers in improving the spirit of learners in learning qiro'ah. Design/Methodology/Approach- Research is carried out with a qualitative approach, data collection techniques are observation, interview and documentation studies. This approach was chosen considering that the research data found and analyzed is natural without treatment. Findings: The results of the study are first, the spirit of learners in learning qiro'ah is very low, this is evidenced by the tasks given by Arabic teachers are not completed thoroughly by many students. Second, low motivation is influenced by internal and external factors, ineternal factors include weak ability to apply qawa'id al-lughah in reading Arabic scripts, knowledge that is less related to the urgency of reading ability, while external factors include less support from more able friends, improper media selection, methods used by teachers sometimes out of sync with learners. Third, teachers make efforts to increase reading motivation by providing reinforcement for learners related to the importance of reading skills, finding and using the right qiro'ah learning media, updating qiro'ah learning methods, increasing supervision of established tasks. Research Limitation/Implications-Researchers have not revealed anything related to the ability to read Arabic, therefore researchers and observers of Arabic learning can continue research on various aspects relevant to this study, such as studying the correlation of motivation to study qiro'ah with understanding of Arabic reading manuscripts.
APA, Harvard, Vancouver, ISO, and other styles
8

Likas, Aristidis. "A Reinforcement Learning Approach to Online Clustering." Neural Computation 11, no. 8 (November 1, 1999): 1915–32. http://dx.doi.org/10.1162/089976699300016025.

Full text
Abstract:
A general technique is proposed for embedding online clustering algorithms based on competitive learning in a reinforcement learning framework. The basic idea is that the clustering system can be viewed as a reinforcement learning system that learns through reinforcements to follow the clustering strategy we wish to implement. In this sense, the reinforcement guided competitive learning (RGCL) algorithm is proposed that constitutes a reinforcement-based adaptation of learning vector quantization (LVQ) with enhanced clustering capabilities. In addition, we suggest extensions of RGCL and LVQ that are characterized by the property of sustained exploration and significantly improve the performance of those algorithms, as indicated by experimental tests on well-known data sets.
APA, Harvard, Vancouver, ISO, and other styles
9

Ying-Ming Shi, Ying-Ming Shi, and Zhiyuan Zhang Ying-Ming Shi. "Research on Path Planning Strategy of Rescue Robot Based on Reinforcement Learning." 電腦學刊 33, no. 3 (June 2022): 187–94. http://dx.doi.org/10.53106/199115992022063303015.

Full text
Abstract:
<p>How rescue robots reach their destinations quickly and efficiently has become a hot research topic in recent years. Aiming at the complex unstructured environment faced by rescue robots, this paper proposes an artificial potential field algorithm based on reinforcement learning. Firstly, use the traditional artificial potential field method to perform basic path planning for the robot. Secondly, in order to solve the local minimum problem in planning and improve the robot’s adaptive ability, the reinforcement learning algorithm is run by fixing preset parameters on the simulation platform. After intensive training, the robot continuously improves the decision-making ability of crossing typical concave obstacles. Finally, through simulation experiments, it is concluded that the rescue robot can combine the artificial potential field method and reinforcement learning to improve the ability to adapt to the environment, and can reach the destination with the optimal route.</p> <p>&nbsp;</p>
APA, Harvard, Vancouver, ISO, and other styles
10

Santos, John Paul E., Joseph A. Villarama, Joseph P. Adsuara, Jordan F. Gundran, Aileen G. De Guzman, and Evelyn M. Ben. "Students’ Time Management, Academic Procrastination, and Performance during Online Science and Mathematics Classes." International Journal of Learning, Teaching and Educational Research 21, no. 12 (December 30, 2022): 142–61. http://dx.doi.org/10.26803/ijlter.21.12.8.

Full text
Abstract:
COVID-19 affected all sectors, including academia, which resulted in an increase in online learning. While education continued through online platforms, various students-related problems arose, including improper time management, procrastination, and fluctuating academic performance. It is in this context that this quantitative study was carried out to determine how time management and procrastination affected students’ performance in science and mathematics during the pandemic. We surveyed 650 Filipino high school students using the Procrastination Assessment Scale-Students and Wayne State University’s Time Management questionnaire with a 0.93 reliability coefficient. The findings revealed that in science and mathematics, female students outperformed males. Eleven 12-year-olds had the highest mean grades in science and mathematics, while 15 16-year-olds had the lowest. Younger respondents (11-14) were more likely to have better time management in than older ones. Further, older respondents (15-18) procrastinate more than younger ones. Time management correlates positively with success in science and mathematics. Achievement in science and mathematics is the highest among students with good time management. Procrastination negatively affects achievement. High school students who procrastinated less fare better in mathematics. With this, the study opens possibilities for teaching older learners in time management to boost their performance. Students across ages should be urged to avoid procrastinating as it negatively affects academic performance. As reinforcement, schools may educate learners on time management and procrastination avoidance through orientations and other platforms.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Improper reinforcement learning"

1

BRUCHON, NIKY. "Feasibility Investigation on Several Reinforcement Learning Techniques to Improve the Performance of the FERMI Free-Electron Laser." Doctoral thesis, Università degli Studi di Trieste, 2021. http://hdl.handle.net/11368/2982117.

Full text
Abstract:
The research carried out in particle accelerator facilities does not concern only particle and condensed matter physics, although these are the main topics covered in the field. Indeed, since a particle accelerator is composed of many different sub-systems, its proper functioning depends both on each of these parts and their interconnection. It follows that the study, implementation, and improvement of the various sub-systems are fundamental points of investigation too. In particular, an interesting aspect for the automation engineering community is the control of such systems that usually are complex, large, noise-affected, and non-linear. The doctoral project fits into this scope, investigating the introduction of new methods to automatically improve the performance of a specific type of particle accelerators: seeded free-electron lasers. The optimization of such systems is a challenging task, already faced in years by many different approaches in order to find and attain an optimal working point, keeping it optimally tuned despite drift or disturbances. Despite the good results achieved, better ones are always sought for. For this reason, several methods belonging to reinforcement learning, an area of machine learning that is attracting more and more attention in the scientific field, have been applied on FERMI, the free-electron laser facility at Elettra Sincrotrone Trieste. The research activity has been carried out by applying both model-free and model-based techniques belonging to reinforcement learning. Satisfactory preliminary results have been obtained, that present the first step toward a new fully automatic procedure for the alignment of the seed laser to the electron beam. In the meantime, at the Conseil Européen pour la Recherche Nucléaire, CERN, a similar investigation was ongoing. In the last year of the doctoral course, a collaboration to share the knowledge on the topic took place. Some of the results collected on the largest particle physics laboratory in the world are presented in the doctoral dissertation.
APA, Harvard, Vancouver, ISO, and other styles
2

Kreutmayr, Fabian, and Markus Imlauer. "Application of machine learning to improve to performance of a pressure-controlled system." Technische Universität Dresden, 2020. https://tud.qucosa.de/id/qucosa%3A71076.

Full text
Abstract:
Due to the robustness and flexibility of hydraulic components, hydraulic control systems are used in a wide range of applications under various environmental conditions. However, the coverage of this broad field of applications often comes with a loss of performance. Especially when conditions and working points change often, hydraulic control systems cannot work at their optimum. Flexible electronic controllers in combination with techniques from the field of machine learning have the potential to overcome these issues. By applying a reinforcement learning algorithm, this paper examines whether learned controllers can compete with an expert-tuned solution. Thereby, the method is thoroughly validated by using simulations and experiments as well.
APA, Harvard, Vancouver, ISO, and other styles
3

Zaki, Mohammadi. "Algorithms for Online Learning in Structured Environments." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/6080.

Full text
Abstract:
Online learning deals with the study of making decisions sequentially using information gathered along the way. Typical goals of an online learning agent can be to maximize the reward gained during learning or to identify the best possible action to take with the maximum (expected) reward. We study this problem in the setting where the environment has some inbuilt structure. This structure can be exploited by the learning agent while making decisions to accelerate the process of learning from data. We study a number of such problems in this dissertation. We begin with regret minimization for multi-user online recommendation where the expected user-item reward matrix has low rank (much smaller than the number of users or items). We address the cold-start problem in recommendation systems where the agent initially has no information about the reward matrix and only gathers information gradually by interactions (i.e., recommending items) with arriving users. We use results from low-rank matrix estimation to design an efficient online algorithm to exploit the low rank of the underlying reward matrix. We analyze this algorithm and show that it enjoys better regret than algorithms that do not take the low-rank structure into account. We then study the problem of pure exploration (or best arm identification) in linear bandits. In this problem, each time the learner chooses an action vector (arm), she receives a noisy realization of a reward whose mean is linearly dependent on an unknown vector and the chosen action. The aim here is to identify the arm which yields the maximum reward (in expectation) as quickly as possible. We are specifically interested in the situation where the ambient dimension of the unknown parameter vector is very small as compared to the number of arms. We show that by using this inherent problem structure, one can design provably optimal and efficient algorithms to identify the best arm quickly. We study how exploiting the intrinsic geometry of the problem leads to the design of statistically and computationally efficient algorithms for the best arm identification problem. We finally formulate and study the problem of improper reinforcement learning, where for a given (unknown) Markov Decision Process (MDP), we are given a bag of (pre-designed) controllers or policies for the MDP. We study how the agent can leverage (combine) these pre-trained base controllers (instead of just the rudimentary actions of the MDP) to accelerate learning. This can be useful in tuning across ensembles of controllers, learning in mismatched or simulated environments, etc., to obtain a good controller for a given target environment with relatively few trials. This differs from the usual reinforcement learning setup where the learner observes the current state of the environment and chooses an action to play. In contrast, the improper learner chooses a given base controller and plays whichever action is recommended by the chosen controller. This indirect selection of actions via the base controllers helps to inherit desirable properties (e.g., interpretability, principled design, safety, etc) into the learned policy.
APA, Harvard, Vancouver, ISO, and other styles
4

Chi, Lu-cheng, and 紀律呈. "An Improved Deep Reinforcement Learning with Sparse Rewards." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/eq94pr.

Full text
Abstract:
碩士
國立中山大學
電機工程學系研究所
107
In reinforcement learning, how an agent explores in an environment with sparse rewards is a long-standing problem. An improved deep reinforcement learning described in this thesis encourages an agent to explore unvisited environmental states in an environment with sparse rewards. In deep reinforcement learning, an agent directly uses an image observation from environment as an input to the neural network. However, some neglected observations from environment, such as depth, might provide valuable information. An improved deep reinforcement learning described in this thesis is based on the Actor-Critic algorithm and uses the convolutional neural network as a hetero-encoder between an image input and other observations from environment. In the environment with sparse rewards, we use these neglected observations from environment as a target output of supervised learning and provide an agent denser training signals through supervised learning to bootstrap reinforcement learning. In addition, we use the loss from supervised learning as the feedback for an agent’s exploration behavior in an environment, called the label reward, to encourage an agent to explore unvisited environmental states. Finally, we construct multiple neural networks by Asynchronous Advantage Actor-Critic algorithm and learn the policy with multiple agents. An improved deep reinforcement learning described in this thesis is compared with other deep reinforcement learning in an environment with sparse rewards and achieves better performance.
APA, Harvard, Vancouver, ISO, and other styles
5

Hsin-Jung, Huang, and 黃信榮. "Applying Reinforcement Learning to Improve NPC game Character Intelligence." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/38802886766630465543.

Full text
Abstract:
碩士
大葉大學
資訊管理學系碩士班
95
Today, video games are the most popular entertainment for young people. With rapidly developed computer technology, the quality and complexity of AI (Artificial In-telligence) used in computer games are gradually increasing. Today, AI has become a vital element of computer games. Intelligent NPC (Non-Player Character) which can act as playmates is becoming the essential element for most video games. How to enhance the intelligence of game characters has become an important research topic. This study proposes a cooperative reinforcement learning structure of NPC agents that share the common global states and the overall reward mechanism. Agents trained through our reinforcement learning mechanism will be able to develop an action strat-egy to complete their missions in the virtual game environment. Our empirical result has shown some promising result. Even the NPC agents are tested in different game level environments, all agents that share with the same goal will learn to perform ap-propriate actions and achieve the common goal reasonably.
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Chia-Hao, and 陳家豪. "Improve Top ASR Hypothesis with Re-correction by Reinforcement Learning." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/zde779.

Full text
Abstract:
碩士
國立中央大學
資訊工程學系
107
In real situations, utterances are transcribed by ASR(Automatic Speech Recognition) systems, which usually propose multiple candidate transcriptions(hypothesis). Most of the time, the first hypothesis is the best and most commonly used. But the first hypothesis of ASR in a noisy environment often misses some words that are important to the LU(Language Understanding), and these words can be found among second hypothesis. But on the whole, the first ASR hypothesis is significantly better than the second ASR hypothesis. It is not the best choice if we abandon the first ASR hypothesis because it lacks some words. If we can refer to the 2th ASR hypothesis to modify the missing or redundant words of the first ASR hypothesis, we can get utterances closer to the user's true intentions. In this paper we propose a method to automatically correct the 1th ASR hypothesis by the reinforcement learning model. It can correct the first hypothesis word by word by other hypothesis. Our method raises the bleu score of 1th ASR hypothesis from 70.18 to 76.74.
APA, Harvard, Vancouver, ISO, and other styles
7

Hsu, Yung-Chi, and 徐永吉. "Improved Safe Reinforcement Learning Based Self Adaptive Evolutionary Algorithms for Neuro-Fuzzy Controller Design." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/43659775487135397105.

Full text
Abstract:
博士
國立交通大學
電機與控制工程系所
97
In this dissertation, improved safe reinforcement learning based self adaptive evolutionary algorithms (ISRL-SAEAs) are proposed for TSK-type neuro-fuzzy controller design. The ISRL-SAEAs can improve not only the reinforcement signal designed but also traditional evolutionary algorithms. There are two parts in the proposed ISRL-SAEAs. In the first part, the SAEAs are proposed to solve the following problems: 1) all the fuzzy rules are encoded into one chromosome; 2) the number of fuzzy rules has to be assigned in advance; and 3) the population cannot evaluate each fuzzy rule locally. The second part of the ISRL-SAEAs is the ISRL. In the ISRL, two different strategies (judgment and evaluation) are used to design the reinforcement signal. Moreover the Lyapunov stability is considered in ISRL. To demonstrate the performance of the proposed method, the inverted pendulum control system and tandem pendulum control system are presented. As shown in simulation, the ISRL-SAEAs perform better than other reinforcement evolution methods.
APA, Harvard, Vancouver, ISO, and other styles
8

Lin, Ching-Pin, and 林敬斌. "Using Reinforcement Learning to Improve a Simple Intra-day Trading System of Taiwan Stock Index Future." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/34369847383488676186.

Full text
Abstract:
碩士
國立臺灣大學
資訊工程學研究所
97
This thesis applied Q-learning algorithm of reinforcement learning to improve a simple intra-day trading system of Taiwan stock index future. We simulate the performance of the original strategy by back-testing it with historical data. Furthermore, we use historical information as training data for reinforcement learning and examine the improved achievement. The training data are the tick data of every trading day from 2003 to 2007 and the testing period is from January 2008 to May 2009. The original strategy is a trend-following channel breakout system. We take the result of reinforcement learning to determine whether to do trend following or countertrend trading every time the system plans to make position.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Improper reinforcement learning"

1

Urtāns, Ēvalds. Function shaping in deep learning. RTU Press, 2021. http://dx.doi.org/10.7250/9789934226854.

Full text
Abstract:
This work describes the importance of loss functions and related methods for deep reinforcement learning and deep metric learning. A novel MDQN loss function outperformed DDQN loss function in PLE computer game environments, and a novel Exponential Triplet loss function outperformed the Triplet loss function in the face re-identification task with VGGFace2 dataset reaching 85,7 % accuracy using zero-shot setting. This work also presents a novel UNet-RNN-Skip model to improve the performance of the value function for path planning tasks.
APA, Harvard, Vancouver, ISO, and other styles
2

Rohsenow, Damaris J., and Megan M. Pinkston-Camp. Cognitive-Behavioral Approaches. Edited by Kenneth J. Sher. Oxford University Press, 2014. http://dx.doi.org/10.1093/oxfordhb/9780199381708.013.010.

Full text
Abstract:
Cognitive-behavioral approaches to treatment are derived from learning principles underlying behavioral and/or cognitive therapy. Only evidence-based approaches are recommended for practice. Support for different approaches varies across substance use disorders. For alcohol use disorders, cognitive-behavioral coping skills training and cue-exposure treatment are beneficial when added to an integrated treatment program. For cocaine dependence, contingency management combined with coping skills training or community reinforcement, and coping skills training added to a full treatment program, produce increased abstinence. For marijuana abuse, contingency management or coping skills training improve outcomes. For opiate dependence, contingency management decreases use of other drugs while on methadone. For smoking, aversive conditioning produces good results and key elements of coping skills training are supported, best when medication is also used. Recent advances include Web-based coping skills training, virtual reality to present cues during cue exposure, and text-messaging to remind clients to use coping skills in the natural environment.
APA, Harvard, Vancouver, ISO, and other styles
3

Carmo, Mafalda. Education Applications & Developments VI. inScience Press, 2021. http://dx.doi.org/10.36315/2021eadvi.

Full text
Abstract:
In this sixth volume, a dedicated set of authors explore the Education field, contributing to the frontlines of knowledge. Success depends on the participation of those who wish to find creative solutions and believe in their potential to change the world, altogether to increase public engagement and cooperation from communities. Part of our mission is to serve society with these initiatives and promote knowledge, therefore it requires the reinforcement of research efforts, education and science and cooperation between the most diverse studies and backgrounds. The contents of this 6th edition show us how to navigate in the most broadening issues in contemporary education and research. In particular, this book explores four major topics within the broad theme of Education, corresponding to four sections: “Teachers and Students”, “Teachers and Learning”, “Projects and Trends” and “Organizational Issues”. Each section comprises chapters that have emerged from extended and peer reviewed selected papers, originally published last year in the proceedings of the International Conference on Education and New Developments (END) conference series (http://end-educationconference.org/). This meeting occurs annually always with successful outcomes. Original papers have been selected and the authors were invited to extend and to submit them to a new evaluation’s process. Afterwards the authors of the accepted chapters were requested to make the necessary corrections and improve the final submitted chapters. This process has resulted in the final publication of 27 high quality chapters organized into 4 sections.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Improper reinforcement learning"

1

Wang, Kunfu, Ruolin Xing, Wei Feng, and Baiqiao Huang. "A Method of UAV Formation Transformation Based on Reinforcement Learning Multi-agent." In Proceeding of 2021 International Conference on Wireless Communications, Networking and Applications, 187–95. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-2456-9_20.

Full text
Abstract:
AbstractIn the face of increasingly complex combat tasks and unpredictable combat environment, a single UAV can not meet the operational requirements, and UAVs perform tasks in a cooperative way. In this paper, an improved heuristic reinforcement learning algorithm is proposed to solve the formation transformation problem of multiple UAVs by using multi-agent reinforcement learning algorithm and heuristic function. With the help of heuristic back-propagation algorithm for formation transformation, the convergence efficiency of reinforcement learning is improved. Through the above reinforcement learning algorithm, the problem of low efficiency of formation transformation of multiple UAVs in confrontation environment is solved.
APA, Harvard, Vancouver, ISO, and other styles
2

Singh, Moirangthem Tiken, Aninda Chakrabarty, Bhargab Sarma, and Sourav Dutta. "An Improved On-Policy Reinforcement Learning Algorithm." In Advances in Intelligent Systems and Computing, 321–30. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-7394-1_30.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ma, Ping, and Hong-Li Zhang. "Improved Artificial Bee Colony Algorithm Based on Reinforcement Learning." In Intelligent Computing Theories and Application, 721–32. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-42294-7_64.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Dai, Zixiang, and Mingyan Jiang. "An Improved Lion Swarm Algorithm Based on Reinforcement Learning." In Advances in Intelligent Automation and Soft Computing, 76–86. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-81007-8_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Kim, Jongrae. "Improved Robustness Analysis of Reinforcement Learning Embedded Control Systems." In Robot Intelligence Technology and Applications 6, 104–15. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-97672-9_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Reid, Mark, and Malcolm Ryan. "Using ILP to Improve Planning in Hierarchical Reinforcement Learning." In Inductive Logic Programming, 174–90. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. http://dx.doi.org/10.1007/3-540-44960-4_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Callegari, Daniel Antonio, and Flávio Moreira de Oliveira. "Applying Reinforcement Learning to Improve MCOE, an Intelligent Learning Environment for Ecology." In Lecture Notes in Computer Science, 284–93. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. http://dx.doi.org/10.1007/10720076_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Fountain, Jake, Josiah Walker, David Budden, Alexandre Mendes, and Stephan K. Chalup. "Motivated Reinforcement Learning for Improved Head Actuation of Humanoid Robots." In RoboCup 2013: Robot World Cup XVII, 268–79. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-44468-9_24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Jun, Yi Zhou, Yimin Qiu, and Zhongfeng Li. "An Improved Multi-objective Optimization Algorithm Based on Reinforcement Learning." In Lecture Notes in Computer Science, 501–13. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-09677-8_42.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zhong, Chen, Chutong Ye, Chenyu Wu, and Ao Zhan. "An Improved Dynamic Spectrum Access Algorithm Based on Reinforcement Learning." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 13–25. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-30237-4_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Improper reinforcement learning"

1

Narvekar, Sanmit. "Curriculum Learning in Reinforcement Learning." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/757.

Full text
Abstract:
Transfer learning in reinforcement learning is an area of research that seeks to speed up or improve learning of a complex target task, by leveraging knowledge from one or more source tasks. This thesis will extend the concept of transfer learning to curriculum learning, where the goal is to design a sequence of source tasks for an agent to train on, such that final performance or learning speed is improved. We discuss completed work on this topic, including methods for semi-automatically generating source tasks tailored to an agent and the characteristics of a target domain, and automatically sequencing such tasks into a curriculum. Finally, we also present ideas for future work.
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Zhaodong, and Matthew E. Taylor. "Improving Reinforcement Learning with Confidence-Based Demonstrations." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/422.

Full text
Abstract:
Reinforcement learning has had many successes, but in practice it often requires significant amounts of data to learn high-performing policies. One common way to improve learning is to allow a trained (source) agent to assist a new (target) agent. The goals in this setting are to 1) improve the target agent's performance, relative to learning unaided, and 2) allow the target agent to outperform the source agent. Our approach leverages source agent demonstrations, removing any requirements on the source agent's learning algorithm or representation. The target agent then estimates the source agent's policy and improves upon it. The key contribution of this work is to show that leveraging the target agent's uncertainty in the source agent's policy can significantly improve learning in two complex simulated domains, Keepaway and Mario.
APA, Harvard, Vancouver, ISO, and other styles
3

Vuong, Tung-Long, Do-Van Nguyen, Tai-Long Nguyen, Cong-Minh Bui, Hai-Dang Kieu, Viet-Cuong Ta, Quoc-Long Tran, and Thanh-Ha Le. "Sharing Experience in Multitask Reinforcement Learning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/505.

Full text
Abstract:
In multitask reinforcement learning, tasks often have sub-tasks that share the same solution, even though the overall tasks are different. If the shared-portions could be effectively identified, then the learning process could be improved since all the samples between tasks in the shared space could be used. In this paper, we propose a Sharing Experience Framework (SEF) for simultaneously training of multiple tasks. In SEF, a confidence sharing agent uses task-specific rewards from the environment to identify similar parts that should be shared across tasks and defines those parts as shared-regions between tasks. The shared-regions are expected to guide task-policies sharing their experience during the learning process. The experiments highlight that our framework improves the performance and the stability of learning task-policies, and is possible to help task-policies avoid local optimums.
APA, Harvard, Vancouver, ISO, and other styles
4

Gabel, Thomas, Christian Lutz, and Martin Riedmiller. "Improved neural fitted Q iteration applied to a novel computer gaming and learning benchmark." In 2011 Ieee Symposium On Adaptive Dynamic Programming And Reinforcement Learning. IEEE, 2011. http://dx.doi.org/10.1109/adprl.2011.5967361.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Wu, Yuechen, Wei Zhang, and Ke Song. "Master-Slave Curriculum Design for Reinforcement Learning." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/211.

Full text
Abstract:
Curriculum learning is often introduced as a leverage to improve the agent training for complex tasks, where the goal is to generate a sequence of easier subasks for an agent to train on, such that final performance or learning speed is improved. However, conventional curriculum is mainly designed for one agent with fixed action space and sequential simple-to-hard training manner. Instead, we present a novel curriculum learning strategy by introducing the concept of master-slave agents and enabling flexible action setting for agent training. Multiple agents, referred as master agent for the target task and slave agents for the subtasks, are trained concurrently within different action spaces by sharing a perception network with an asynchronous strategy. Extensive evaluation on the VizDoom platform demonstrates the joint learning of master agent and slave agents mutually benefit each other. Significant improvement is obtained over A3C in terms of learning speed and performance.
APA, Harvard, Vancouver, ISO, and other styles
6

Qin, Yunxiao, Weiguo Zhang, Jingping Shi, and Jinglong Liu. "Improve PID controller through reinforcement learning." In 2018 IEEE CSAA Guidance, Navigation and Control Conference (GNCC). IEEE, 2018. http://dx.doi.org/10.1109/gncc42960.2018.9019095.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

DESHPANDE, PRATHAMESH P., KAREN J. DEMILLE, AOWABIN RAHMAN, SUSANTA GHOSH, ASHLEY D. SPEAR, and GREGORY M. ODEGARD. "DESIGNING AN IMPROVED INTERFACE IN GRAPHENE/POLYMER COMPOSITES THROUGH MACHINE LEARNING." In Proceedings for the American Society for Composites-Thirty Seventh Technical Conference. Destech Publications, Inc., 2022. http://dx.doi.org/10.12783/asc37/36458.

Full text
Abstract:
The matrix-reinforcement interface has been studied extensively to enhance the performance of polymer matrix composites (PMCs). One commonly practiced approach is functionalization of the reinforcement, which significantly improves the interfacial interaction. A molecular dynamics (MD) and machine learning (ML) workflow is proposed to identify the optimal functionalization parameters that result in improved mechanical performance of a 3-layer graphene nanoplatelet (GNP)/ bismaleimide (BMI) nanocomposite. MD is used to generate the training set for a graph convolutional neural network (GCN). This article reports the MD methodology and an example mechanical response from a pull-out simulation. Upcoming work in the proposed MD-ML workflow for designing a nanocomposite with improved mechanical performance is also discussed.
APA, Harvard, Vancouver, ISO, and other styles
8

Eaglin, Gerald, and Joshua Vaughan. "Leveraging Conventional Control to Improve Performance of Systems Using Reinforcement Learning." In ASME 2020 Dynamic Systems and Control Conference. American Society of Mechanical Engineers, 2020. http://dx.doi.org/10.1115/dscc2020-3307.

Full text
Abstract:
Abstract While many model-based methods have been proposed for optimal control, it is often difficult to generate model-based optimal controllers for nonlinear systems. One model-free method to solve for optimal control policies is reinforcement learning. Reinforcement learning iteratively trains an agent to optimize a reward function. However, agents often perform poorly at the beginning of training and require a large number of trials to converge to a successful policy. A method is proposed to incorporate domain knowledge of dynamics and control into the controllers using reinforcement learning to reduce the training time needed. Simulations are presented to compare the performance of agents utilizing domain knowledge to those that do not use domain knowledge. The results show that the agents with domain knowledge can accomplish the desired task with less training time than those without domain knowledge.
APA, Harvard, Vancouver, ISO, and other styles
9

Song, Haolin, Mingxiao Feng, Wengang Zhou, and Houqiang Li. "MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement Learning." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/470.

Full text
Abstract:
Recent approaches have utilized self-supervised auxiliary tasks as representation learning to improve the performance and sample efficiency of vision-based reinforcement learning algorithms in single-agent settings. However, in multi-agent reinforcement learning (MARL), these techniques face challenges because each agent only receives partial observation from an environment influenced by others, resulting in correlated observations in the agent dimension. So it is necessary to consider agent-level information in representation learning for MARL. In this paper, we propose an effective framework called Multi-Agent Masked Attentive Contrastive Learning (MA2CL), which encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space. Specifically, we use an attention reconstruction model for recovering and the model is trained via contrastive learning. MA2CL allows better utilization of contextual information at the agent level, facilitating the training of MARL agents for cooperation tasks. Extensive experiments demonstrate that our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
APA, Harvard, Vancouver, ISO, and other styles
10

Zhu, Hanhua. "Generalized Representation Learning Methods for Deep Reinforcement Learning." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/748.

Full text
Abstract:
Deep reinforcement learning (DRL) increases the successful applications of reinforcement learning (RL) techniques but also brings challenges such as low sample efficiency. In this work, I propose generalized representation learning methods to obtain compact state space suitable for RL from a raw observation state. I expect my new methods will increase sample efficiency of RL by understandable representations of state and therefore improve the performance of RL.
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Improper reinforcement learning"

1

Miles, Gaines E., Yael Edan, F. Tom Turpin, Avshalom Grinstein, Thomas N. Jordan, Amots Hetzroni, Stephen C. Weller, Marvin M. Schreiber, and Okan K. Ersoy. Expert Sensor for Site Specification Application of Agricultural Chemicals. United States Department of Agriculture, August 1995. http://dx.doi.org/10.32747/1995.7570567.bard.

Full text
Abstract:
In this work multispectral reflectance images are used in conjunction with a neural network classifier for the purpose of detecting and classifying weeds under real field conditions. Multispectral reflectance images which contained different combinations of weeds and crops were taken under actual field conditions. This multispectral reflectance information was used to develop algorithms that could segment the plants from the background as well as classify them into weeds or crops. In order to segment the plants from the background the multispectrial reflectance of plants and background were studied and a relationship was derived. It was found that using a ratio of two wavelenght reflectance images (750nm and 670nm) it was possible to segment the plants from the background. Once ths was accomplished it was then possible to classify the segmented images into weed or crop by use of the neural network. The neural network developed for this work is a modification of the standard learning vector quantization algorithm. This neural network was modified by replacing the time-varying adaptation gain with a constant adaptation gain and a binary reinforcement function. This improved accuracy and training time as well as introducing several new properties such as hill climbing and momentum addition. The network was trained and tested with different wavelength combinations in order to find the best results. Finally, the results of the classifier were evaluated using a pixel based method and a block based method. In the pixel based method every single pixel is evaluated to test whether it was classified correctly or not and the best weed classification results were 81% and its associated crop classification accuracy is 57%. In the block based classification method, the image was divided into blocks and each block was evaluated to determine whether they contained weeds or not. Different block sizes and thesholds were tested. The best results for this method were 97% for a block size of 8 inches and a pixel threshold of 60. A simulation model was developed to 1) quantify the effectiveness of a site-specific sprayer, 2) evaluate influence of diffeent design parameters on efficiency of the site-specific sprayer. In each iteration of this model, infected areas (weed patches) in the field were randomly generated and the amount of required herbicides for spraying these areas were calculated. The effectiveness of the sprayer was estimated for different stain sizes, nozzle types (conic and flat), nozzle sizes and stain detection levels of the identification system. Simulation results indicated that the flat nozzle is much more effective as compared to the conic nozzle and its relative efficiency is greater for small nozzle sizes. By using a site-specific sprayer, the average ratio between the spraying areas and the stain areas is about 1.1 to 1.8 which can save up to 92% of herbicides, especially when the proportion of the stain areas is small.
APA, Harvard, Vancouver, ISO, and other styles
2

A Decision-Making Method for Connected Autonomous Driving Based on Reinforcement Learning. SAE International, December 2020. http://dx.doi.org/10.4271/2020-01-5154.

Full text
Abstract:
At present, with the development of Intelligent Vehicle Infrastructure Cooperative Systems (IVICS), the decision-making for automated vehicle based on connected environment conditions has attracted more attentions. Reliability, efficiency and generalization performance are the basic requirements for the vehicle decision-making system. Therefore, this paper proposed a decision-making method for connected autonomous driving based on Wasserstein Generative Adversarial Nets-Deep Deterministic Policy Gradient (WGAIL-DDPG) algorithm. In which, the key components for reinforcement learning (RL) model, reward function, is designed from the aspect of vehicle serviceability, such as safety, ride comfort and handling stability. To reduce the complexity of the proposed model, an imitation learning strategy is introduced to improve the RL training process. Meanwhile, the model training strategy based on cloud computing effectively solves the problem of insufficient computing resources of the vehicle-mounted system. Test results show that the proposed method can improve the efficiency for RL training process with reliable decision making performance and reveals excellent generalization capability.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography