Log in

Relevant bibliographies by topics / Improper reinforcement learning / Journal articles

To see the other types of publications on this topic, follow the link: Improper reinforcement learning.

Journal articles on the topic 'Improper reinforcement learning'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Improper reinforcement learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Dass, Shuvalaxmi, and Akbar Siami Namin. "Reinforcement Learning for Generating Secure Configurations." Electronics 10, no. 19 (September 30, 2021): 2392. http://dx.doi.org/10.3390/electronics10192392.

Full text

Abstract:

Many security problems in software systems are because of vulnerabilities caused by improper configurations. A poorly configured software system leads to a multitude of vulnerabilities that can be exploited by adversaries. The problem becomes even more serious when the architecture of the underlying system is static and the misconfiguration remains for a longer period of time, enabling adversaries to thoroughly inspect the software system under attack during the reconnaissance stage. Employing diversification techniques such as Moving Target Defense (MTD) can minimize the risk of exposing vulnerabilities. MTD is an evolving defense technique through which the attack surface of the underlying system is continuously changing. However, the effectiveness of such dynamically changing platform depends not only on the goodness of the next configuration setting with respect to minimization of attack surfaces but also the diversity of set of configurations generated. To address the problem of generating a diverse and large set of secure software and system configurations, this paper introduces an approach based on Reinforcement Learning (RL) through which an agent is trained to generate the desirable set of configurations. The paper reports the performance of the RL-based secure and diverse configurations through some case studies.

APA, Harvard, Vancouver, ISO, and other styles

2

Zhai, Peng, Jie Luo, Zhiyan Dong, Lihua Zhang, Shunli Wang, and Dingkang Yang. "Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 5 (June 28, 2022): 5431–39. http://dx.doi.org/10.1609/aaai.v36i5.20481.

Full text

Abstract:

Robust adversarial reinforcement learning is an effective method to train agents to manage uncertain disturbance and modeling errors in real environments. However, for systems that are sensitive to disturbances or those that are difficult to stabilize, it is easier to learn a powerful adversary than establish a stable control policy. An improper strong adversary can destabilize the system, introduce biases in the sampling process, make the learning process unstable, and even reduce the robustness of the policy. In this study, we consider the problem of ensuring system stability during training in the adversarial reinforcement learning architecture. The dissipative principle of robust H-inﬁnity control is extended to the Markov Decision Process, and robust stability constraints are obtained based on L2 gain performance in the reinforcement learning system. Thus, we propose a dissipation-inequation-constraint-based adversarial reinforcement learning architecture. This architecture ensures the stability of the system during training by imposing constraints on the normal and adversarial agents. Theoretically, this architecture can be applied to a large family of deep reinforcement learning algorithms. Results of experiments in MuJoCo and GymFc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop. Both our theoretical and empirical results provide new and critical outlooks on the adversarial reinforcement learning architecture from a rigorous robust control perspective.

APA, Harvard, Vancouver, ISO, and other styles

3

Chen, Ya-Ling, Yan-Rou Cai, and Ming-Yang Cheng. "Vision-Based Robotic Object Grasping—A Deep Reinforcement Learning Approach." Machines 11, no. 2 (February 12, 2023): 275. http://dx.doi.org/10.3390/machines11020275.

Full text

Abstract:

This paper focuses on developing a robotic object grasping approach that possesses the ability of self-learning, is suitable for small-volume large variety production, and has a high success rate in object grasping/pick-and-place tasks. The proposed approach consists of a computer vision-based object detection algorithm and a deep reinforcement learning algorithm with self-learning capability. In particular, the You Only Look Once (YOLO) algorithm is employed to detect and classify all objects of interest within the field of view of a camera. Based on the detection/localization and classification results provided by YOLO, the Soft Actor-Critic deep reinforcement learning algorithm is employed to provide a desired grasp pose for the robot manipulator (i.e., learning agent) to perform object grasping. In order to speed up the training process and reduce the cost of training data collection, this paper employs the Sim-to-Real technique so as to reduce the likelihood of damaging the robot manipulator due to improper actions during the training process. The V-REP platform is used to construct a simulation environment for training the deep reinforcement learning neural network. Several experiments have been conducted and experimental results indicate that the 6-DOF industrial manipulator successfully performs object grasping with the proposed approach, even for the case of previously unseen objects.

APA, Harvard, Vancouver, ISO, and other styles

4

Hurtado-Gómez, Julián, Juan David Romo, Ricardo Salazar-Cabrera, Álvaro Pachón de la Cruz, and Juan Manuel Madrid Molina. "Traffic Signal Control System Based on Intelligent Transportation System and Reinforcement Learning." Electronics 10, no. 19 (September 28, 2021): 2363. http://dx.doi.org/10.3390/electronics10192363.

Full text

Abstract:

Traffic congestion has several causes, including insufficient road capacity, unrestricted demand and improper scheduling of traffic signal phases. A great variety of efforts have been made to properly program such phases. Some of them are based on traditional transportation assumptions, and others are adaptive, allowing the system to learn the control law (signal program) from data obtained from different sources. Reinforcement Learning (RL) is a technique commonly used in previous research. However, properly determining the states and the reward is key to obtain good results and to have a real chance to implement it. This paper proposes and implements a traffic signal control system (TSCS), detailing its development stages: (a) Intelligent Transportation System (ITS) architecture design for the TSCS; (b) design and development of a system prototype, including an RL algorithm to minimize the vehicle queue at intersections, and detection and calculation of such queues by adapting a computer vision algorithm; and (c) design and development of system tests to validate operation of the algorithms and the system prototype. Results include the development of the tests for each module (vehicle queue measurement and RL algorithm) and real-time integration tests. Finally, the article presents a system simulation in the context of a medium-sized city in a developing country, showing that the proposed system allowed reduction of vehicle queues by 29%, of waiting time by 50%, and of lost time by 50%, when compared to fixed phase times in traffic signals.

APA, Harvard, Vancouver, ISO, and other styles

5

Ziwei Pan, Ziwei Pan. "Design of Interactive Cultural Brand Marketing System based on Cloud Service Platform." 網際網路技術學刊 23, no. 2 (March 2022): 321–34. http://dx.doi.org/10.53106/160792642022032302012.

Full text

Abstract:

<p>Changes in the marketing environment and consumer behavior are the driving force for the development of online marketing. Although traditional marketing communication still exists, it has been unable to adapt to the marketing needs of modern cultural brands. On this basis, this paper combines the cloud service platform to design an interactive cultural brand marketing system. In view of the problems of improper task scheduling and resource waste in cloud platform resource scheduling in actual situations, a dynamic resource scheduling optimization model under the cloud platform environment is established, and fuzzy evaluation rules are designed. Moreover, through problem analysis, based on the reinforcement learning algorithm, this paper proposes a deep reinforcement learning resource scheduling algorithm based on tabu search, and combines the algorithm to design the functional module of the marketing system. On this basis, this paper designs an experiment to verify the performance of this interactive cultural brand marketing system. The research results prove that the marketing system constructed in this paper has certain reliability.</p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

6

Kim, Byeongjun, Gunam Kwon, Chaneun Park, and Nam Kyu Kwon. "The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place." Biomimetics 8, no. 2 (June 6, 2023): 240. http://dx.doi.org/10.3390/biomimetics8020240.

Full text

Abstract:

This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping is implemented via simple logic which is easily designable but may result in improper gripping. To assist the grasping task properly, a dedicated reward system for approaching the object is designed through using individual axis-based weights. To verify the validity of the proposed method, wecarry out various experiments in the MuJoCo physics engine with the Robosuite framework. According to the simulation results of four trials, the robot manipulator picked up and released the object in the goal position with an average success rate of 93.2%.

APA, Harvard, Vancouver, ISO, and other styles

7

Ritonga, Mahyudin, and Fitria Sartika. "Muyûl al-Talâmidh fî Tadrîs al-Qirâ’ah." Jurnal Alfazuna : Jurnal Pembelajaran Bahasa Arab dan Kebahasaaraban 6, no. 1 (December 21, 2021): 36–52. http://dx.doi.org/10.15642/alfazuna.v6i1.1715.

Full text

Abstract:

Purpose- This study aims to reveal the spirit and motivation of learners in studying Qiro'ah, specifically the study is focused on the description of the forms of motivation of learners, factors that affect the motivation of learners in learning qiro'ah, as well as the steps taken by teachers in improving the spirit of learners in learning qiro'ah. Design/Methodology/Approach- Research is carried out with a qualitative approach, data collection techniques are observation, interview and documentation studies. This approach was chosen considering that the research data found and analyzed is natural without treatment. Findings: The results of the study are first, the spirit of learners in learning qiro'ah is very low, this is evidenced by the tasks given by Arabic teachers are not completed thoroughly by many students. Second, low motivation is influenced by internal and external factors, ineternal factors include weak ability to apply qawa'id al-lughah in reading Arabic scripts, knowledge that is less related to the urgency of reading ability, while external factors include less support from more able friends, improper media selection, methods used by teachers sometimes out of sync with learners. Third, teachers make efforts to increase reading motivation by providing reinforcement for learners related to the importance of reading skills, finding and using the right qiro'ah learning media, updating qiro'ah learning methods, increasing supervision of established tasks. Research Limitation/Implications-Researchers have not revealed anything related to the ability to read Arabic, therefore researchers and observers of Arabic learning can continue research on various aspects relevant to this study, such as studying the correlation of motivation to study qiro'ah with understanding of Arabic reading manuscripts.

APA, Harvard, Vancouver, ISO, and other styles

8

Likas, Aristidis. "A Reinforcement Learning Approach to Online Clustering." Neural Computation 11, no. 8 (November 1, 1999): 1915–32. http://dx.doi.org/10.1162/089976699300016025.

Full text

Abstract:

A general technique is proposed for embedding online clustering algorithms based on competitive learning in a reinforcement learning framework. The basic idea is that the clustering system can be viewed as a reinforcement learning system that learns through reinforcements to follow the clustering strategy we wish to implement. In this sense, the reinforcement guided competitive learning (RGCL) algorithm is proposed that constitutes a reinforcement-based adaptation of learning vector quantization (LVQ) with enhanced clustering capabilities. In addition, we suggest extensions of RGCL and LVQ that are characterized by the property of sustained exploration and significantly improve the performance of those algorithms, as indicated by experimental tests on well-known data sets.

APA, Harvard, Vancouver, ISO, and other styles

9

Ying-Ming Shi, Ying-Ming Shi, and Zhiyuan Zhang Ying-Ming Shi. "Research on Path Planning Strategy of Rescue Robot Based on Reinforcement Learning." 電腦學刊 33, no. 3 (June 2022): 187–94. http://dx.doi.org/10.53106/199115992022063303015.

Full text

Abstract:

<p>How rescue robots reach their destinations quickly and efficiently has become a hot research topic in recent years. Aiming at the complex unstructured environment faced by rescue robots, this paper proposes an artificial potential field algorithm based on reinforcement learning. Firstly, use the traditional artificial potential field method to perform basic path planning for the robot. Secondly, in order to solve the local minimum problem in planning and improve the robot’s adaptive ability, the reinforcement learning algorithm is run by fixing preset parameters on the simulation platform. After intensive training, the robot continuously improves the decision-making ability of crossing typical concave obstacles. Finally, through simulation experiments, it is concluded that the rescue robot can combine the artificial potential field method and reinforcement learning to improve the ability to adapt to the environment, and can reach the destination with the optimal route.</p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

10

Santos, John Paul E., Joseph A. Villarama, Joseph P. Adsuara, Jordan F. Gundran, Aileen G. De Guzman, and Evelyn M. Ben. "Students’ Time Management, Academic Procrastination, and Performance during Online Science and Mathematics Classes." International Journal of Learning, Teaching and Educational Research 21, no. 12 (December 30, 2022): 142–61. http://dx.doi.org/10.26803/ijlter.21.12.8.

Full text

Abstract:

COVID-19 affected all sectors, including academia, which resulted in an increase in online learning. While education continued through online platforms, various students-related problems arose, including improper time management, procrastination, and fluctuating academic performance. It is in this context that this quantitative study was carried out to determine how time management and procrastination affected students’ performance in science and mathematics during the pandemic. We surveyed 650 Filipino high school students using the Procrastination Assessment Scale-Students and Wayne State University’s Time Management questionnaire with a 0.93 reliability coefficient. The findings revealed that in science and mathematics, female students outperformed males. Eleven 12-year-olds had the highest mean grades in science and mathematics, while 15 16-year-olds had the lowest. Younger respondents (11-14) were more likely to have better time management in than older ones. Further, older respondents (15-18) procrastinate more than younger ones. Time management correlates positively with success in science and mathematics. Achievement in science and mathematics is the highest among students with good time management. Procrastination negatively affects achievement. High school students who procrastinated less fare better in mathematics. With this, the study opens possibilities for teaching older learners in time management to boost their performance. Students across ages should be urged to avoid procrastinating as it negatively affects academic performance. As reinforcement, schools may educate learners on time management and procrastination avoidance through orientations and other platforms.

APA, Harvard, Vancouver, ISO, and other styles

11

Minghai Yuan, Minghai Yuan, Chenxi Zhang Minghai Yuan, Kaiwen Zhou Chenxi Zhang, and Fengque Pei Kaiwen Zhou. "Real-time Allocation of Shared Parking Spaces Based on Deep Reinforcement Learning." 網際網路技術學刊 24, no. 1 (January 2023): 035–43. http://dx.doi.org/10.53106/160792642023012401004.

Full text

Abstract:

<p>Aiming at the parking space heterogeneity problem in shared parking space management, a multi-objective optimization model for parking space allocation is constructed with the optimization objectives of reducing the average walking distance of users and improving the utilization rate of parking spaces, a real-time allocation method for shared parking spaces based on deep reinforcement learning is proposed, which includes a state space for heterogeneous regions, an action space based on policy selection and a reward function with variable coefficients. To accurately evaluate the model performance, dynamic programming is used to derive the theoretical optimal values. Simulation results show that the improved algorithm not only improves the training success rate, but also increases the Agent performance by at least 12.63% and maintains the advantage for different sizes of parking demand, reducing the user walking distance by 53.58% and improving the parking utilization by 6.67% on average, and keeping the response time less than 0.2 seconds.</p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

12

West, Joseph, Frederic Maire, Cameron Browne, and Simon Denman. "Improved reinforcement learning with curriculum." Expert Systems with Applications 158 (November 2020): 113515. http://dx.doi.org/10.1016/j.eswa.2020.113515.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Zini, Floriano, Fabio Le Piane, and Mauro Gaspari. "Adaptive Cognitive Training with Reinforcement Learning." ACM Transactions on Interactive Intelligent Systems 12, no. 1 (March 31, 2022): 1–29. http://dx.doi.org/10.1145/3476777.

Full text

Abstract:

Computer-assisted cognitive training can help patients affected by several illnesses alleviate their cognitive deficits or healthy people improve their mental performance. In most computer-based systems, training sessions consist of graded exercises, which should ideally be able to gradually improve the trainee’s cognitive functions. Indeed, adapting the difficulty of the exercises to how individuals perform in their execution is crucial to improve the effectiveness of cognitive training activities. In this article, we propose the use of reinforcement learning (RL) to learn how to automatically adapt the difficulty of computerized exercises for cognitive training. In our approach, trainees’ performance in performed exercises is used as a reward to learn a policy that changes over time the values of the parameters that determine exercise difficulty. We illustrate a method to be initially used to learn difficulty-variation policies tailored for specific categories of trainees, and then to refine these policies for single individuals. We present the results of two user studies that provide evidence for the effectiveness of our method: a first study, in which a student category policy obtained via RL was found to have better effects on the cognitive function than a standard baseline training that adopts a mechanism to vary the difficulty proposed by neuropsychologists, and a second study, demonstrating that adding an RL-based individual customization further improves the training process.

APA, Harvard, Vancouver, ISO, and other styles

14

Chen, Junyan, Yong Wang, Jiangtao Ou, Chengyuan Fan, Xiaoye Lu, Cenhuishan Liao, Xuefeng Huang, and Hongmei Zhang. "ALBRL: Automatic Load-Balancing Architecture Based on Reinforcement Learning in Software-Defined Networking." Wireless Communications and Mobile Computing 2022 (May 2, 2022): 1–17. http://dx.doi.org/10.1155/2022/3866143.

Full text

Abstract:

Due to the rapid development of network communication technology and the significant increase in network terminal equipment, the application of new network architecture software-defined networking (SDN) combined with reinforcement learning in network traffic scheduling has become an important focus of research. Because of network traffic transmission variability and complexity, the traditional reinforcement-learning algorithms in SDN face problems such as slow convergence rates and unbalanced loads. The problems seriously affect network performance, resulting in network link congestion and the low efficiency of inter-stream bandwidth allocation. This paper proposes an automatic load-balancing architecture based on reinforcement learning (ALBRL) in SDN. In this architecture, we design a load-balancing optimization model in high-load traffic scenarios and adapt the improved Deep Deterministic Policy Gradient (DDPG) algorithm to find a near-optimal path between network hosts. The proposed ALBRL uses the sampling method of updating the experience pool with the SumTree structure to improve the random extraction strategy of the empirical-playback mechanism in DDPG. It extracts a more meaningful experience for network updating with greater probability, which can effectively improve the convergence rate. The experiment results show that the proposed ALBRL has a faster training speed than existing reinforcement-learning algorithms and significantly improves network throughput.

APA, Harvard, Vancouver, ISO, and other styles

15

Tessler, Chen, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, and Shie Mannor. "Reinforcement Learning for Datacenter Congestion Control." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 11 (June 28, 2022): 12615–21. http://dx.doi.org/10.1609/aaai.v36i11.21535.

Full text

Abstract:

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, non-stationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that these challenges prevent standard RL algorithms from operating within this domain. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, show that our method exhibits improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

APA, Harvard, Vancouver, ISO, and other styles

16

Tessler, Chen, Yuval Shpigelman, Gal Dalal, Amit Mandelbaum, Doron Haritan Kazakov, Benjamin Fuhrer, Gal Chechik, and Shie Mannor. "Reinforcement Learning for Datacenter Congestion Control." ACM SIGMETRICS Performance Evaluation Review 49, no. 2 (January 17, 2022): 43–46. http://dx.doi.org/10.1145/3512798.3512815.

Full text

Abstract:

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, nonstationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, exhibit improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

APA, Harvard, Vancouver, ISO, and other styles

17

Littman, Michael L. "Reinforcement learning improves behaviour from evaluative feedback." Nature 521, no. 7553 (May 2015): 445–51. http://dx.doi.org/10.1038/nature14540.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Yen-Wen Chen, Yen-Wen Chen, and Ji-Zheng You Yen-Wen Chen. "Effective Radio Resource Allocation for IoT Random Access by Using Reinforcement Learning." 網際網路技術學刊 23, no. 5 (September 2022): 1069–75. http://dx.doi.org/10.53106/160792642022092305015.

Full text

Abstract:

<p>Emerging intelligent and highly interactive services result in the mass deployment of internet of things (IoT) devices. They are dominating wireless communication networks compared to human-held devices. Random access performance is one of the most critical issues in providing quick responses to various IoT services. In addition to the anchor carrier, the non-anchor carrier can be flexibly allocated to support the random access procedure in release 14 of the 3rd generation partnership project. However, arranging more non-anchor carriers for the use of random access will squeeze the data transmission bandwidth in a narrowband physical uplink shared channel. In this paper, we propose the prediction-based random access resource allocation (PRARA) scheme to properly allocated the non-anchor carrier by applying reinforcement learning. The simulation results show that the proposed PRARA can improve the random access performance and effectively use the radio resource compared to the rule-based scheme. </p> <p> </p>

APA, Harvard, Vancouver, ISO, and other styles

19

Zhao, Yongqi, Zhangdong Wei, and Jing Wen. "Prediction of Soil Heavy Metal Content Based on Deep Reinforcement Learning." Scientific Programming 2022 (April 15, 2022): 1–10. http://dx.doi.org/10.1155/2022/1476565.

Full text

Abstract:

Since the prediction accuracy of heavy metal content in soil by common spatial prediction algorithms is not ideal, a prediction model based on the improved deep Q network is proposed. The state value reuse is used to accelerate the learning speed of training samples for agents in deep Q network, and the convergence speed of model is improved. At the same time, adaptive fuzzy membership factor is introduced to change the sensitivity of agent to environmental feedback value in different training periods and improve the stability of the model after convergence. Finally, an adaptive inverse distance interpolation method is adopted to predict observed values of interpolation points, which improves the prediction accuracy of the model. The simulation results show that, compared with random forest regression model (RFR) and inverse distance weighted prediction model (IDW), the prediction accuracy of soil heavy metal content of proposed model is higher by 13.03% and 7.47%, respectively.

APA, Harvard, Vancouver, ISO, and other styles

20

McLaverty, Brian, Robert S. Parker, and Gilles Clermont. "Reinforcement learning algorithm to improve intermittent hemodialysis." Journal of Critical Care 74 (April 2023): 154205. http://dx.doi.org/10.1016/j.jcrc.2022.154205.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Lin, Jin. "Path planning based on reinforcement learning." Applied and Computational Engineering 5, no. 1 (June 14, 2023): 853–58. http://dx.doi.org/10.54254/2755-2721/5/20230728.

Full text

Abstract:

With the wide application of mobile robots in industry, path planning has always been a difficult problem for mobile robots. Reinforcement learning algorithms such as Q-learning play a huge role in path planning. Traditional Q-learning algorithm mainly uses - greedy search policy. But for a fixed search factor -greedy. For example, the problems of slow convergence speed, time-consuming and many continuous action transformations (such as the number of turns during robot movement) are not conducive to the stability requirements of mobile robots in industrial transportation. Especially for the transportation of dangerous chemicals, continuous transformation of turns will increase the risk of objects toppling. This paper proposes a new method based on - greedy 's improved dynamic search strategy is used to improve the stability of mobile robots in motion planning. The experiment shows that the dynamic search strategy converges faster, consumes less time, has less continuous transformation times of action, and has higher motion stability in the test environment.

APA, Harvard, Vancouver, ISO, and other styles

22

Huang, Xu, Hong Zhang, and Xiaomeng Zhai. "A Novel Reinforcement Learning Approach for Spark Configuration Parameter Optimization." Sensors 22, no. 15 (August 8, 2022): 5930. http://dx.doi.org/10.3390/s22155930.

Full text

Abstract:

Apache Spark is a popular open-source distributed data processing framework that can efficiently process massive amounts of data. It provides more than 180 configuration parameters for users to manually select the appropriate parameter values according to their own experience. However, due to the large number of parameters and the inherent correlation between them, manual tuning is very tedious. To solve the problem of tuning through personal experience, we designed and implemented a reinforcement-learning-based Spark configuration parameter optimizer. First, we trained a Spark application performance prediction model with deep neural networks, and verified the accuracy and effectiveness of the model from multiple perspectives. Second, in order to improve the search efficiency of better configuration parameters, we improved the Q-learning algorithm, and automatically set start and end states in each iteration of training, which effectively improves the agent’s poor performance in exploring better configuration parameters. Lastly, comparing our proposed configuration with the default configuration as the baseline, experimental results show that the optimized configuration gained an average performance improvement of 47%, 43%, 31%, and 45% for four different types of Spark applications, which indicates that our Spark configuration parameter optimizer could efficiently find the better configuration parameters and improve the performance of various Spark applications.

APA, Harvard, Vancouver, ISO, and other styles

23

Issa, A., and A. Aldair. "Learning the Quadruped Robot by Reinforcement Learning (RL)." Iraqi Journal for Electrical and Electronic Engineering 18, no. 2 (October 6, 2022): 117–26. http://dx.doi.org/10.37917/ijeee.18.2.15.

Full text

Abstract:

In this paper, a simulation was utilized to create and test the suggested controller and to investigate the ability of a quadruped robot based on the SimScape-Multibody toolbox, with PID controllers and deep deterministic policy gradient DDPG Reinforcement learning (RL) techniques. A quadruped robot has been simulated using three different scenarios based on two methods to control its movement, namely PID and DDPG. Instead of using two links per leg, the quadruped robot was constructed with three links per leg, to maximize movement versatility. The quadruped robot-built architecture uses twelve servomotors, three per leg, and 12-PID controllers in total for each servomotor. By utilizing the SimScape-Multibody toolbox, the quadruped robot can build without needing to use the mathematical model. By varying the walking robot's carrying load, the robustness of the developed controller is investigated. Firstly, the walking robot is designed with an open loop system and the result shows that the robot falls at starting of the simulation. Secondly, auto-tuning are used to find the optimal parameter like (KP, KI, and KD) of PID controllers, and resulting shows the robot can walk in a straight line. Finally, DDPG reinforcement learning is proposed to generate and improve the walking motion of the quadruped robot, and the results show that the behaviour of the walking robot has been improved compared with the previous cases, Also, the results produced when RL is employed instead of PID controllers are better.

APA, Harvard, Vancouver, ISO, and other styles

24

Zhao, Yuxin, Yanlong Liu, and Xiong Deng. "Optimization of a Regional Marine Environment Mobile Observation Network Based on Deep Reinforcement Learning." Journal of Marine Science and Engineering 11, no. 1 (January 12, 2023): 208. http://dx.doi.org/10.3390/jmse11010208.

Full text

Abstract:

The observation path planning of an ocean mobile observation network is an important part of the ocean mobile observation system. With the aim of developing a traditional algorithm to solve the observation path of the mobile observation network, a complex objective function needs to be constructed, and an improved deep reinforcement learning algorithm is proposed. The improved deep reinforcement learning algorithm does not need to establish the objective function. The agent samples the marine environment information by exploring and receiving feedback from the environment. Focusing on the real-time dynamic variability of the marine environment, our experiment shows that adding bidirectional recurrency to the Deep Q-network allows the Q-network to better estimate the underlying system state. Compared with the results of existing algorithms, the improved deep reinforcement learning algorithm can effectively improve the sampling efficiency of the observation platform. To improve the prediction accuracy of the marine environment numerical prediction system, we conduct sampling path experiments on a single platform, double platform, and five platforms. The experimental results show that increasing the number of observation platforms can effectively improve the prediction accuracy of the numerical prediction system, but when the number of observation platforms exceeds 2, increasing the number of observation platforms will not improve the prediction accuracy, and there is a certain degree of decline. In addition, in the multi-platform experiment, the improved deep reinforcement learning algorithm is compared with the unimproved algorithm, and the results show that the proposed algorithm is better than the existing algorithm.

APA, Harvard, Vancouver, ISO, and other styles

25

Lecarpentier, Erwan, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, and Michael L. Littman. "Lipschitz Lifelong Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 8270–78. http://dx.doi.org/10.1609/aaai.v35i9.17006.

Full text

Abstract:

We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. Further, we show the method to experience no negative transfer with high probability. We illustrate the benefits of the method in Lifelong RL experiments.

APA, Harvard, Vancouver, ISO, and other styles

26

Liu, Yu, and Ning Zhou. "Jumping Action Recognition for Figure Skating Video in IoT Using Improved Deep Reinforcement Learning." Information Technology and Control 52, no. 2 (July 15, 2023): 309–21. http://dx.doi.org/10.5755/j01.itc.52.2.33300.

Full text

Abstract:

Figure skating video jumping action is a complex combination action, which is difficult to recognize, and the recognition of jumping action can correct athletes’ technical errors, which is of great significance to improve athletes’ performance. Due to the recognition effect of figure skating video jumping action recognition algorithm is poor, we propose a figure skating video jumping action recognition algorithm using improved deep reinforcement learning in Internet of things (IoT). First, IoT technology is used to collect the figure skating video, the figure skating video target is detected, the human bone point features through the feature extraction network is obtained, and centralized processing is performed to complete the optimization of the extraction results. Second, the shallow STGCN network is improved to the DSTG dense connection network structure, based on which an improved deep reinforcement learning action recognition model is constructed, and the actionrecognition results are output through the deep network structure. Finally, a confidence fusion scheme is established to determine the final jumping action recognition result through the confidence is established. The results show that this paper effectively improves the accuracy of figure skating video jumping action recognition results, and the recognition quality is higher. It can be widely used in the field of figure skating action recognition, to improve the training effect of athletes.

APA, Harvard, Vancouver, ISO, and other styles

27

González-Garduño, Ana V. "Reinforcement Learning for Improved Low Resource Dialogue Generation." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9884–85. http://dx.doi.org/10.1609/aaai.v33i01.33019884.

Full text

Abstract:

In this thesis, I focus on language independent methods of improving utterance understanding and response generation and attempt to tackle some of the issues surrounding current systems. The aim is to create a unified approach to dialogue generation inspired by developments in both goal oriented and open ended dialogue systems. The main contributions in this thesis are: 1) Introducing hybrid approaches to dialogue generation using retrieval and encoder-decoder architectures to produce fluent but precise utterances in dialogues, 2) Proposing supervised, semi-supervised and Reinforcement Learning methods for domain adaptation in goal oriented dialogue and 3) Introducing models that can adapt cross lingually.

APA, Harvard, Vancouver, ISO, and other styles

28

Kuremoto, Takashi, Tetsuya Tsurusaki, Kunikazu Kobayashi, Shingo Mabu, and Masanao Obayashi. "An Improved Reinforcement Learning System Using Affective Factors." Robotics 2, no. 3 (July 10, 2013): 149–64. http://dx.doi.org/10.3390/robotics2030149.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Luo, Teng. "Improved reinforcement learning algorithm for mobile robot path planning." ITM Web of Conferences 47 (2022): 02030. http://dx.doi.org/10.1051/itmconf/20224702030.

Full text

Abstract:

In order to solve the problem that traditional Q-learning algorithm has a large number of invalid iterations in the early convergence stage of robot path planning, an improved reinforcement learning algorithm is proposed. Firstly, the gravitational potential field in the improved artificial potential field algorithm is introduced when the Q table is initialized to accelerate the convergence. Secondly, the Tent Chaotic Mapping algorithm is added to the initial state determination process of the algorithm, which allows the algorithm to explore the environment more fully. In addition, an ε-greed strategy with the number of iterations changing the ε value becomes the action selection strategy of the algorithm, which improves the performance of the algorithm. Finally, the grid map simulation results based on MATLAB show that the improved Q-learning algorithm has greatly reduced the path planning time and the number of non-convergence iterations compared with the traditional algorithm.

APA, Harvard, Vancouver, ISO, and other styles

30

Wu, Yukun, Xuncheng Wu, Siyuan Qiu, and Wenbin Xiang. "A Method for High-Value Driving Demonstration Data Generation Based on One-Dimensional Deep Convolutional Generative Adversarial Networks." Electronics 11, no. 21 (October 31, 2022): 3553. http://dx.doi.org/10.3390/electronics11213553.

Full text

Abstract:

As a promising sequential decision-making algorithm, deep reinforcement learning (RL) has been applied in many fields. However, the related methods often demand a large amount of time before they can achieve acceptable performance. While learning from demonstration has greatly improved reinforcement learning efficiency, it poses some challenges. In the past, it has required collecting demonstration data from controllers (either human or controller). However, demonstration data are not always available in some sparse reward tasks. Most importantly, there exist unknown differences between agents and human experts in observing the environment. This means that not all of the human expert’s demonstration data conform to a Markov decision process (MDP). In this paper, a method of reinforcement learning from generated data (RLfGD) is presented, and consists of a generative model and a learning model. The generative model introduces a method to generate the demonstration data with a one-dimensional deep convolutional generative adversarial network. The learning model applies the demonstration data to the reinforcement learning process to greatly improve the effectiveness of training. Two complex traffic scenarios were tested to evaluate the proposed algorithm. The experimental results demonstrate that RLfGD is capable of obtaining higher scores more quickly than DDQN in both of two complex traffic scenarios. The performance of reinforcement learning algorithms can be greatly improved with this approach to sparse reward problems.

APA, Harvard, Vancouver, ISO, and other styles

31

Maree, Charl, and Christian W. Omlin. "Can Interpretable Reinforcement Learning Manage Prosperity Your Way?" AI 3, no. 2 (June 13, 2022): 526–37. http://dx.doi.org/10.3390/ai3020030.

Full text

Abstract:

Personalisation of products and services is fast becoming the driver of success in banking and commerce. Machine learning holds the promise of gaining a deeper understanding of and tailoring to customers’ needs and preferences. Whereas traditional solutions to financial decision problems frequently rely on model assumptions, reinforcement learning is able to exploit large amounts of data to improve customer modelling and decision-making in complex financial environments with fewer assumptions. Model explainability and interpretability present challenges from a regulatory perspective which demands transparency for acceptance; they also offer the opportunity for improved insight into and understanding of customers. Post-hoc approaches are typically used for explaining pretrained reinforcement learning models. Based on our previous modeling of customer spending behaviour, we adapt our recent reinforcement learning algorithm that intrinsically characterizes desirable behaviours and we transition to the problem of prosperity management. We train inherently interpretable reinforcement learning agents to give investment advice that is aligned with prototype financial personality traits which are combined to make a final recommendation. We observe that the trained agents’ advice adheres to their intended characteristics, they learn the value of compound growth, and, without any explicit reference, the notion of risk as well as improved policy convergence.

APA, Harvard, Vancouver, ISO, and other styles

32

Fang, Qiang, Wenzhuo Zhang, and Xitong Wang. "Visual Navigation Using Inverse Reinforcement Learning and an Extreme Learning Machine." Electronics 10, no. 16 (August 18, 2021): 1997. http://dx.doi.org/10.3390/electronics10161997.

Full text

Abstract:

In this paper, we focus on the challenges of training efficiency, the designation of reward functions, and generalization in reinforcement learning for visual navigation and propose a regularized extreme learning machine-based inverse reinforcement learning approach (RELM-IRL) to improve the navigation performance. Our contributions are mainly three-fold: First, a framework combining extreme learning machine with inverse reinforcement learning is presented. This framework can improve the sample efficiency and obtain the reward function directly from the image information observed by the agent and improve the generation for the new target and the new environment. Second, the extreme learning machine is regularized by multi-response sparse regression and the leave-one-out method, which can further improve the generalization ability. Simulation experiments in the AI-THOR environment showed that the proposed approach outperformed previous end-to-end approaches, thus, demonstrating the effectiveness and efficiency of our approach.

APA, Harvard, Vancouver, ISO, and other styles

33

Omidshafiei, Shayegan, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell, and Jonathan P. How. "Learning to Teach in Cooperative Multiagent Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6128–36. http://dx.doi.org/10.1609/aaai.v33i01.33016128.

Full text

Abstract:

Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.

APA, Harvard, Vancouver, ISO, and other styles

34

Ma, Guoqing, Zhifu Wang, Xianfeng Yuan, and Fengyu Zhou. "Improving Model-Based Deep Reinforcement Learning with Learning Degree Networks and Its Application in Robot Control." Journal of Robotics 2022 (March 4, 2022): 1–14. http://dx.doi.org/10.1155/2022/7169594.

Full text

Abstract:

Deep reinforcement learning is the technology of artificial neural networks in the field of decision-making and control. The traditional model-free reinforcement learning algorithm requires a large amount of environment interactive data to iterate the algorithm. This model’s performance also suffers due to low utilization of training data, while the model-based reinforcement learning (MBRL) algorithm improves the efficiency of the data, MBRL locks into low prediction accuracy. Although MBRL can utilize the additional data generated by the dynamic model, a system dynamics model with low prediction accuracy will provide low-quality data and affect the algorithm’s final result. In this paper, based on the A3C (Asynchronous Advantage Actor-Critic) algorithm, an improved model-based deep reinforcement learning algorithm using a learning degree network (MBRL-LDN) is presented. By comparing the differences between the predicted states outputted by the proposed multidynamic model and the original predicted states, the learning degree of the system dynamics model is calculated. The learning degree represents the quality of the data generated by the dynamic model and is used to decide whether to continue to interact with the dynamic model during a particular episode. Thus, low-quality data will be discarded. The superiority of the proposed method is verified by conducting extensive contrast experiments.

APA, Harvard, Vancouver, ISO, and other styles

35

FRIEDRICH, JOHANNES, ROBERT URBANCZIK, and WALTER SENN. "CODE-SPECIFIC LEARNING RULES IMPROVE ACTION SELECTION BY POPULATIONS OF SPIKING NEURONS." International Journal of Neural Systems 24, no. 05 (May 30, 2014): 1450002. http://dx.doi.org/10.1142/s0129065714500026.

Full text

Abstract:

Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action selection. We show that code-specific learning rules speed up learning both for the discrete classification and the continuous regression tasks. The suggested learning rules also speed up with increasing population size as opposed to standard reinforcement learning rules. Continuous action selection is further shown to explain realistic learning speeds in the Morris water maze. Finally, we introduce the concept of action perturbation as opposed to the classical weight- or node-perturbation as an exploration mechanism underlying reinforcement learning. Exploration in the action space greatly increases the speed of learning as compared to exploration in the neuron or weight space.

APA, Harvard, Vancouver, ISO, and other styles

36

Ren, Jing, Xishi Huang, and Raymond N. Huang. "Efficient Deep Reinforcement Learning for Optimal Path Planning." Electronics 11, no. 21 (November 7, 2022): 3628. http://dx.doi.org/10.3390/electronics11213628.

Full text

Abstract:

In this paper, we propose a novel deep reinforcement learning (DRL) method for optimal path planning for mobile robots using dynamic programming (DP)-based data collection. The proposed method can overcome the slow learning process and improve training data quality inherently in DRL algorithms. The main idea of our approach is as follows. First, we mapped the dynamic programming method to typical optimal path planning problems for mobile robots, and created a new efficient DP-based method to find an exact, analytical, optimal solution for the path planning problem. Then, we used high-quality training data gathered using the DP method for DRL, which greatly improves training data quality and learning efficiency. Next, we established a two-stage reinforcement learning method where, prior to the DRL, we employed extreme learning machines (ELM) to initialize the parameters of actor and critic neural networks to a near-optimal solution in order to significantly improve the learning performance. Finally, we illustrated our method using some typical path planning tasks. The experimental results show that our DRL method can converge much easier and faster than other methods. The resulting action neural network is able to successfully guide robots from any start position in the environment to the goal position while following the optimal path and avoiding collision with obstacles.

APA, Harvard, Vancouver, ISO, and other styles

37

Bai, Fengshuo, Hongming Zhang, Tianyang Tao, Zhiheng Wu, Yanna Wang, and Bo Xu. "PiCor: Multi-Task Deep Reinforcement Learning with Policy Correction." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 6728–36. http://dx.doi.org/10.1609/aaai.v37i6.25825.

Full text

Abstract:

Multi-task deep reinforcement learning (DRL) ambitiously aims to train a general agent that masters multiple tasks simultaneously. However, varying learning speeds of different tasks compounding with negative gradients interference makes policy learning inefficient. In this work, we propose PiCor, an efficient multi-task DRL framework that splits learning into policy optimization and policy correction phases. The policy optimization phase improves the policy by any DRL algothrim on the sampled single task without considering other tasks. The policy correction phase first constructs an adaptive adjusted performance constraint set. Then the intermediate policy learned by the first phase is constrained to the set, which controls the negative interference and balances the learning speeds across tasks. Empirically, we demonstrate that PiCor outperforms previous methods and significantly improves sample efficiency on simulated robotic manipulation and continuous control tasks. We additionally show that adaptive weight adjusting can further improve data efficiency and performance.

APA, Harvard, Vancouver, ISO, and other styles

38

Zajdel, Roman. "Epoch-incremental reinforcement learning algorithms." International Journal of Applied Mathematics and Computer Science 23, no. 3 (September 1, 2013): 623–35. http://dx.doi.org/10.2478/amcs-2013-0047.

Full text

Abstract:

Abstract In this article, a new class of the epoch-incremental reinforcement learning algorithm is proposed. In the incremental mode, the fundamental TD(0) or TD(λ) algorithm is performed and an environment model is created. In the epoch mode, on the basis of the environment model, the distances of past-active states to the terminal state are computed. These distances and the reinforcement terminal state signal are used to improve the agent policy.

APA, Harvard, Vancouver, ISO, and other styles

39

Yu, Ning, Lin Nan, and Tao Ku. "Multipolicy Robot-Following Model Based on Reinforcement Learning." Scientific Programming 2021 (November 8, 2021): 1–8. http://dx.doi.org/10.1155/2021/5692105.

Full text

Abstract:

We propose in this paper a new approach to solve the decision problem of robot-following. Different from the existing single policy model, we propose a multipolicy model, which can change the following policy in time according to the scene. The value of this paper is to obtain a multipolicy robot-following model by the self-learning method, which is used to improve the safety, efficiency, and stability of robot-following in the complex environments. Empirical investigation on a number of datasets reveals that overall, the proposed approach tends to have superior out-of-sample performance when compared to alternative robot-following decision methods. The performance of the model has been improved by about 2 times in situations where there are few obstacles and about 6 times in situations where there are lots of obstacles.

APA, Harvard, Vancouver, ISO, and other styles

40

Zhou, Minghui. "Multithreshold Microbial Image Segmentation Using Improved Deep Reinforcement Learning." Mathematical Problems in Engineering 2022 (August 23, 2022): 1–11. http://dx.doi.org/10.1155/2022/5096298.

Full text

Abstract:

Image segmentation technology can effectively extract the foreground target in the image. However, the microbial image is easily disturbed by noise, its greyscale has the characteristics of nonuniform distribution, and several microorganisms with diverse forms exist in the same image, resulting in insufficient accuracy of microbial image segmentation. Therefore, a multithreshold microbial image segmentation algorithm using improved deep reinforcement learning is proposed. The wavelet transform method is used to remove the noise of the microbial image, the threshold number of the microbial image after denoising is determined by calculating the number of peaks of the grey histogram, and the foreground target of the microbial image is enhanced by the mean iterative threshold segmentation method, the preliminary segmentation of the microbial image is realized, the multithreshold microbial image segmentation model based on ResNet-Unet is constructed, and the cavity convolution and dual Q network mechanism are introduced to improve the segmentation model. The preliminary segmented microbial image is input into the improved segmentation model to realize the segmentation of the multithreshold microbial image. The results show that the proposed algorithm can effectively remove the noise of microbial images. With the increase in the number of thresholds, the peak signal-to-noise ratio, structural similarity, and feature similarity show an upward trend, and the loss rate of the model is less than 0.05%. The minimum running time of the algorithm is 3.804 s. It can effectively and quickly segment multithreshold microbial images and has important application value in the field of microbial recognition.

APA, Harvard, Vancouver, ISO, and other styles

41

Kaddour, N., P. Del Moral, and E. Ikonen. "Improved version of the McMurtry-Fu reinforcement learning scheme." International Journal of Systems Science 34, no. 1 (January 2003): 37–47. http://dx.doi.org/10.1080/0020772031000115560.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Shi, Zhen, Keyin Wang, and Jianhui Zhang. "Improved reinforcement learning path planning algorithm integrating prior knowledge." PLOS ONE 18, no. 5 (May 4, 2023): e0284942. http://dx.doi.org/10.1371/journal.pone.0284942.

Full text

Abstract:

In order to realize the optimization of autonomous navigation of mobile robot under the condition of partial environmental knowledge known. An improved Q-learning reinforcement learning algorithm based on prior knowledge is proposed to solve the problem of slow convergence and low learning efficiency in mobile robot path planning. Prior knowledge is used to initialize the Q-value, so as to guide the agent to move toward the target direction with a greater probability from the early stage of the algorithm, eliminating a large number of invalid iterations. The greedy factor ε is dynamically adjusted based on the number of times the agent successfully reaches the target position, so as to better balance exploration and exploitation and accelerate convergence. Simulation results show that the improved Q-learning algorithm has a faster convergence rate and higher learning efficiency than the traditional algorithm. The improved algorithm has practical significance for improving the efficiency of autonomous navigation of mobile robots.

APA, Harvard, Vancouver, ISO, and other styles

43

Béres, András, and Bálint Gyires-Tóth. "Enhancing Visual Domain Randomization with Real Images for Sim-to-Real Transfer." Infocommunications journal 15, no. 1 (2023): 15–25. http://dx.doi.org/10.36244/icj.2023.1.3.

Full text

Abstract:

In order to train reinforcement learning algorithms, a significant amount of experience is required, so it is common practice to train them in simulation, even when they are intended to be applied in the real world. To improve robustness, camerabased agents can be trained using visual domain randomization, which involves changing the visual characteristics of the simulator between training episodes in order to improve their resilience to visual changes in their environment. In this work, we propose a method, which includes realworld images alongside visual domain randomization in the reinforcement learning training procedure to further enhance the performance after sim-to-real transfer. We train variational autoencoders using both real and simulated frames, and the representations produced by the encoders are then used to train reinforcement learning agents. The proposed method is evaluated against a variety of baselines, including direct and indirect visual domain randomization, end-to-end reinforcement learning, and supervised and unsupervised state representation learning. By controlling a differential drive vehicle using only camera images, the method is tested in the Duckietown self-driving car environment. We demonstrate through our experimental results that our method improves learnt representation effectiveness and robustness by achieving the best performance of all tested methods.

APA, Harvard, Vancouver, ISO, and other styles

44

Szepesvári, Csaba, and Michael L. Littman. "A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms." Neural Computation 11, no. 8 (November 1, 1999): 2017–60. http://dx.doi.org/10.1162/089976699300016070.

Full text

Abstract:

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

APA, Harvard, Vancouver, ISO, and other styles

45

Huang, Yong, Xin Xu, Yong Li, Xinglong Zhang, Yao Liu, and Xiaochuan Zhang. "Vehicle-Following Control Based on Deep Reinforcement Learning." Applied Sciences 12, no. 20 (October 21, 2022): 10648. http://dx.doi.org/10.3390/app122010648.

Full text

Abstract:

Intelligent vehicle-following control presents a great challenge in autonomous driving. In vehicle-intensive roads of city environments, frequent starting and stopping of vehicles is one of the important cause of front-end collision accidents. Therefore, this paper proposes a subsection proximal policy optimization method (Subsection-PPO), which divides the vehicle-following process into the start–stop and steady stages and provides control at different stages with two different actor networks. It improves security in the vehicle-following control using the proximal policy optimization algorithm. To improve the training efficiency and reduce the variance of advantage function, the weighted importance sampling method is employed instead of the importance sampling method to estimate the data distribution. Finally, based on the TORCS simulation engine, the advantages and robustness of the method in vehicle-following control is verified. The results show that compared with other deep learning learning, the Subsection-PPO algorithm has better algorithm efficiency and higher safety than PPO and DDPG in vehicle-following control.

APA, Harvard, Vancouver, ISO, and other styles

46

Jiang, Huawei, Tao Guo, Zhen Yang, and Like Zhao. "Deep reinforcement learning algorithm for solving material emergency dispatching problem." Mathematical Biosciences and Engineering 19, no. 11 (2022): 10864–81. http://dx.doi.org/10.3934/mbe.2022508.

Full text

Abstract:

<abstract> <p>In order to solve the problem that the scheduling scheme cannot be updated in real time due to the dynamic change of node demand in material emergency dispatching, this article proposes a dynamic attention model based on improved gated recurrent unit. The dynamic codec framework is used to track the change of node demand to update the node information. The improved gated recurrent unit is embedded between codecs to improve the representation ability of the model. By weighted combination of the node information of the previous time, the current time and the initial time, a more representative node embedding is obtained. The results show that compared with the elitism-based immigrants ant colony optimization algorithm, the solution quality of the proposed model was improved by 27.89, 27.94, 28.09 and 28.12% when the problem scale is 10, 20, 50 and 100, respectively, which can effectively deal with the instability caused by the change of node demand, so as to minimize the cost of material distribution.</p> </abstract>

APA, Harvard, Vancouver, ISO, and other styles

47

Koga, Marcelo L., Valdinei Freire, and Anna H. R. Costa. "Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning." IEEE Transactions on Cybernetics 45, no. 1 (January 2015): 77–88. http://dx.doi.org/10.1109/tcyb.2014.2319733.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Li, Xiali, Zhengyu Lv, Licheng Wu, Yue Zhao, and Xiaona Xu. "Hybrid Online and Offline Reinforcement Learning for Tibetan Jiu Chess." Complexity 2020 (May 11, 2020): 1–11. http://dx.doi.org/10.1155/2020/4708075.

Full text

Abstract:

In this study, hybrid state-action-reward-state-action (SARSAλ) and Q-learning algorithms are applied to different stages of an upper confidence bound applied to tree search for Tibetan Jiu chess. Q-learning is also used to update all the nodes on the search path when each game ends. A learning strategy that uses SARSAλ and Q-learning algorithms combining domain knowledge for a feedback function for layout and battle stages is proposed. An improved deep neural network based on ResNet18 is used for self-play training. Experimental results show that hybrid online and offline reinforcement learning with a deep neural network can improve the game program’s learning efficiency and understanding ability for Tibetan Jiu chess.

APA, Harvard, Vancouver, ISO, and other styles

49

Tantu, Year Rezeki Patricia, and Kirey Eleison Oloi Marina. "Teachers' efforts to improve discipline of elementary school students using positive reinforcement methods in online learning." JURNAL PENDIDIKAN DASAR NUSANTARA 8, no. 2 (January 31, 2023): 288–98. http://dx.doi.org/10.29407/jpdn.v8i2.19118.

Full text

Abstract:

: Discipline is an attitude that students need to have in learning in order to help students achieve learning goals. On the other hand, learning aims to train discipline so that students can behave correctly and have good character in society. This study aims to examine teachers' efforts to use positive reinforcement in increasing student discipline in the learning process. The research method used is descriptive qualitative. The data found by the author shows that the percentage of discipline in grade IV elementary school students is 63.2%. The teacher applies a positive reinforcement method that aims to improve student discipline. The use of positive reinforcement methods shows an increase in student discipline to 73.3%. The teacher has a role as a role model and guide in building student character. The conclusion obtained is that positive reinforcement shows the percentage of success of positive reinforcement on student discipline of 10.1%. The advice given is to make more specific disciplinary indicators and research is carried out over a longer period of time so that discipline improvements are more visible.

APA, Harvard, Vancouver, ISO, and other styles

50

Huang, Wenya, Youjin Liu, and Xizheng Zhang. "Hybrid Particle Swarm Optimization Algorithm Based on the Theory of Reinforcement Learning in Psychology." Systems 11, no. 2 (February 6, 2023): 83. http://dx.doi.org/10.3390/systems11020083.

Full text

Abstract:

To more effectively solve the complex optimization problems that exist in nonlinear, high-dimensional, large-sample and complex systems, many intelligent optimization methods have been proposed. Among these algorithms, the particle swarm optimization (PSO) algorithm has attracted scholars’ attention. However, the traditional PSO can easily become an individual optimal solution, leading to the transition of the optimization process from global exploration to local development. To solve this problem, in this paper, we propose a Hybrid Reinforcement Learning Particle Swarm Algorithm (HRLPSO) based on the theory of reinforcement learning in psychology. First, we used the reinforcement learning strategy to optimize the initial population in the population initialization stage; then, chaotic adaptive weights and adaptive learning factors were used to balance the global exploration and local development process, and the individual optimal solution and the global optimal solution were obtained using dimension learning. Finally, the improved reinforcement learning strategy and mutation strategy were applied to the traditional PSO to improve the quality of the individual optimal solution and the global optimal solution. The HRLPSO algorithm was tested by optimizing the solution of 12 benchmarks as well as the CEC2013 test suite, and the results show it can balance the individual learning ability and social learning ability, verifying its effectiveness.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!