Zaloguj się

Gotowe bibliografie tematyczne / Improper reinforcement learning / Artykuły w czasopismach

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Improper reinforcement learning.

Artykuły w czasopismach na temat „Improper reinforcement learning”

Autor: Grafiati

Data publikacji: 6 września 2023

Data aktualizacji: 31 lipca 2025

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Improper reinforcement learning”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Dass, Shuvalaxmi, and Akbar Siami Namin. "Reinforcement Learning for Generating Secure Configurations." Electronics 10, no. 19 (2021): 2392. http://dx.doi.org/10.3390/electronics10192392.

Pełny tekst źródła

Streszczenie:

Many security problems in software systems are because of vulnerabilities caused by improper configurations. A poorly configured software system leads to a multitude of vulnerabilities that can be exploited by adversaries. The problem becomes even more serious when the architecture of the underlying system is static and the misconfiguration remains for a longer period of time, enabling adversaries to thoroughly inspect the software system under attack during the reconnaissance stage. Employing diversification techniques such as Moving Target Defense (MTD) can minimize the risk of exposing vuln

Style APA, Harvard, Vancouver, ISO itp.

2

Zhai, Peng, Jie Luo, Zhiyan Dong, Lihua Zhang, Shunli Wang, and Dingkang Yang. "Robust Adversarial Reinforcement Learning with Dissipation Inequation Constraint." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 5 (2022): 5431–39. http://dx.doi.org/10.1609/aaai.v36i5.20481.

Pełny tekst źródła

Streszczenie:

Robust adversarial reinforcement learning is an effective method to train agents to manage uncertain disturbance and modeling errors in real environments. However, for systems that are sensitive to disturbances or those that are difficult to stabilize, it is easier to learn a powerful adversary than establish a stable control policy. An improper strong adversary can destabilize the system, introduce biases in the sampling process, make the learning process unstable, and even reduce the robustness of the policy. In this study, we consider the problem of ensuring system stability during training

Style APA, Harvard, Vancouver, ISO itp.

3

Chen, Ya-Ling, Yan-Rou Cai, and Ming-Yang Cheng. "Vision-Based Robotic Object Grasping—A Deep Reinforcement Learning Approach." Machines 11, no. 2 (2023): 275. http://dx.doi.org/10.3390/machines11020275.

Pełny tekst źródła

Streszczenie:

This paper focuses on developing a robotic object grasping approach that possesses the ability of self-learning, is suitable for small-volume large variety production, and has a high success rate in object grasping/pick-and-place tasks. The proposed approach consists of a computer vision-based object detection algorithm and a deep reinforcement learning algorithm with self-learning capability. In particular, the You Only Look Once (YOLO) algorithm is employed to detect and classify all objects of interest within the field of view of a camera. Based on the detection/localization and classificat

Style APA, Harvard, Vancouver, ISO itp.

4

Bi, Yunrui, Qinglin Ding, Yijun Du, Di Liu, and Shuaihang Ren. "Intelligent Traffic Control Decision-Making Based on Type-2 Fuzzy and Reinforcement Learning." Electronics 13, no. 19 (2024): 3894. http://dx.doi.org/10.3390/electronics13193894.

Pełny tekst źródła

Streszczenie:

Intelligent traffic control decision-making has long been a crucial issue for improving the efficiency and safety of the intelligent transportation system. The deficiencies of the Type-1 fuzzy traffic control system in dealing with uncertainty have led to a reduced ability to address traffic congestion. Therefore, this paper proposes a Type-2 fuzzy controller for a single intersection. Based on real-time traffic flow information, the green timing of each phase is dynamically determined to achieve the minimum average vehicle delay. Additionally, in traffic light control, various factors (such a

Style APA, Harvard, Vancouver, ISO itp.

5

Hurtado-Gómez, Julián, Juan David Romo, Ricardo Salazar-Cabrera, Álvaro Pachón de la Cruz, and Juan Manuel Madrid Molina. "Traffic Signal Control System Based on Intelligent Transportation System and Reinforcement Learning." Electronics 10, no. 19 (2021): 2363. http://dx.doi.org/10.3390/electronics10192363.

Pełny tekst źródła

Streszczenie:

Traffic congestion has several causes, including insufficient road capacity, unrestricted demand and improper scheduling of traffic signal phases. A great variety of efforts have been made to properly program such phases. Some of them are based on traditional transportation assumptions, and others are adaptive, allowing the system to learn the control law (signal program) from data obtained from different sources. Reinforcement Learning (RL) is a technique commonly used in previous research. However, properly determining the states and the reward is key to obtain good results and to have a rea

Style APA, Harvard, Vancouver, ISO itp.

6

Ziwei Pan, Ziwei Pan. "Design of Interactive Cultural Brand Marketing System based on Cloud Service Platform." 網際網路技術學刊 23, no. 2 (2022): 321–34. http://dx.doi.org/10.53106/160792642022032302012.

Pełny tekst źródła

Streszczenie:

<p>Changes in the marketing environment and consumer behavior are the driving force for the development of online marketing. Although traditional marketing communication still exists, it has been unable to adapt to the marketing needs of modern cultural brands. On this basis, this paper combines the cloud service platform to design an interactive cultural brand marketing system. In view of the problems of improper task scheduling and resource waste in cloud platform resource scheduling in actual situations, a dynamic resource scheduling optimization model under the cloud platform environ

Style APA, Harvard, Vancouver, ISO itp.

7

Kim, Byeongjun, Gunam Kwon, Chaneun Park, and Nam Kyu Kwon. "The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place." Biomimetics 8, no. 2 (2023): 240. http://dx.doi.org/10.3390/biomimetics8020240.

Pełny tekst źródła

Streszczenie:

This paper proposes a task decomposition and dedicated reward-system-based reinforcement learning algorithm for the Pick-and-Place task, which is one of the high-level tasks of robot manipulators. The proposed method decomposes the Pick-and-Place task into three subtasks: two reaching tasks and one grasping task. One of the two reaching tasks is approaching the object, and the other is reaching the place position. These two reaching tasks are carried out using each optimal policy of the agents which are trained using Soft Actor-Critic (SAC). Different from the two reaching tasks, the grasping

Style APA, Harvard, Vancouver, ISO itp.

8

Wang, Na. "Edge computing based english translation model using fuzzy semantic optimal control technique." PLOS One 20, no. 6 (2025): e0320481. https://doi.org/10.1371/journal.pone.0320481.

Pełny tekst źródła

Streszczenie:

People’s need for English translation is gradually growing in the modern era of technological advancements, and a computer that can comprehend and interpret English is now more crucial than ever. Some issues, including ambiguity in English translation and improper word choice in translation techniques, must be addressed to enhance the quality of the English translation model and accuracy based on the corpus. Hence, an edge computing-based translation model (FSRL-P2O) is proposed to improve translation accuracy by using huge bilingual corpora, considering Fuzzy Semantic (FS) properties, and max

Style APA, Harvard, Vancouver, ISO itp.

9

Zhu, Wangwang, Shuli Wen, Qiang Zhao, Bing Zhang, Yuqing Huang, and Miao Zhu. "Deep Reinforcement Learning Based Optimal Operation of Low-Carbon Island Microgrid with High Renewables and Hybrid Hydrogen–Energy Storage System." Journal of Marine Science and Engineering 13, no. 2 (2025): 225. https://doi.org/10.3390/jmse13020225.

Pełny tekst źródła

Streszczenie:

Hybrid hydrogen–energy storage systems play a significant role in the operation of islands microgrid with high renewable energy penetration: maintaining balance between the power supply and load demand. However, improper operation leads to undesirable costs and increases risks to voltage stability. Here, multi-time-scale scheduling is developed to reduce power costs and improve the operation performance of an island microgrid by integrating deep reinforcement learning with discrete wavelet transform to decompose and mitigate power fluctuations. Specifically, in the day-ahead stage, hydrogen pr

Style APA, Harvard, Vancouver, ISO itp.

10

Ritonga, Mahyudin, and Fitria Sartika. "Muyûl al-Talâmidh fî Tadrîs al-Qirâ’ah." Jurnal Alfazuna : Jurnal Pembelajaran Bahasa Arab dan Kebahasaaraban 6, no. 1 (2021): 36–52. http://dx.doi.org/10.15642/alfazuna.v6i1.1715.

Pełny tekst źródła

Streszczenie:

Purpose- This study aims to reveal the spirit and motivation of learners in studying Qiro'ah, specifically the study is focused on the description of the forms of motivation of learners, factors that affect the motivation of learners in learning qiro'ah, as well as the steps taken by teachers in improving the spirit of learners in learning qiro'ah. Design/Methodology/Approach- Research is carried out with a qualitative approach, data collection techniques are observation, interview and documentation studies. This approach was chosen considering that the research data found and analyzed is natu

Style APA, Harvard, Vancouver, ISO itp.

11

Wang, Ruohan. "Developing an optimization model for minimizing musculoskeletal stress in repetitive motion tasks." Molecular & Cellular Biomechanics 21, no. 3 (2024): 567. http://dx.doi.org/10.62617/mcb567.

Pełny tekst źródła

Streszczenie:

Repetitive motion tasks are widely prevalent in various industries, including manufacturing and office environments, often leading to significant musculoskeletal stress and associated injuries. The continuous nature of these tasks, coupled with improper posture, excessive force exertion, and inadequate rest periods, exacerbates the risk of long-term damage to muscles, joints, and tendons. This paper presents a novel approach to minimizing musculoskeletal stress by developing a Reinforcement Learning (RL)—based optimization model. The model dynamically adjusts real-time task parameters, such as

Style APA, Harvard, Vancouver, ISO itp.

12

Wenjing Ma, Wenjing Ma, Jianguang Zhao Wenjing Ma, and Guangquan Zhu Jianguang Zhao. "Estimation on Human Motion Posture using Improved Deep Reinforcement Learning." 電腦學刊 34, no. 4 (2023): 097–110. http://dx.doi.org/10.53106/199115992023083404008.

Pełny tekst źródła

Streszczenie:

<p>Estimating human motion posture can provide important data for intelligent monitoring systems, human-computer interaction, motion capture, and other fields. However, the traditional human motion posture estimation algorithm is difficult to achieve the goal of fast estimation of human motion posture. To address the problems of traditional algorithms, in the paper, we propose an estimation algorithm for human motion posture using improved deep reinforcement learning. First, the double deep Q network is constructed to improve the deep reinforcement learning algorithm. The improved deep r

Style APA, Harvard, Vancouver, ISO itp.

13

Krishnamurthy, Bhargavi, and Sajjan G. Shiva. "Large Language Model-Guided SARSA Algorithm for Dynamic Task Scheduling in Cloud Computing." Mathematics 13, no. 6 (2025): 926. https://doi.org/10.3390/math13060926.

Pełny tekst źródła

Streszczenie:

Nowadays, more enterprises are rapidly transitioning to cloud computing as it has become an ideal platform to perform the development and deployment of software systems. Because of its growing popularity, around ninety percent of enterprise applications rely on cloud computing solutions. The inherent dynamic and uncertain nature of cloud computing makes it difficult to accurately measure the exact state of a system at any given point in time. Potential challenges arise with respect to task scheduling, load balancing, resource allocation, governance, compliance, migration, data loss, and lack o

Style APA, Harvard, Vancouver, ISO itp.

14

Santos, John Paul E., Joseph A. Villarama, Joseph P. Adsuara, Jordan F. Gundran, Aileen G. De Guzman, and Evelyn M. Ben. "Students’ Time Management, Academic Procrastination, and Performance during Online Science and Mathematics Classes." International Journal of Learning, Teaching and Educational Research 21, no. 12 (2022): 142–61. http://dx.doi.org/10.26803/ijlter.21.12.8.

Pełny tekst źródła

Streszczenie:

COVID-19 affected all sectors, including academia, which resulted in an increase in online learning. While education continued through online platforms, various students-related problems arose, including improper time management, procrastination, and fluctuating academic performance. It is in this context that this quantitative study was carried out to determine how time management and procrastination affected students’ performance in science and mathematics during the pandemic. We surveyed 650 Filipino high school students using the Procrastination Assessment Scale-Students and Wayne State Un

Style APA, Harvard, Vancouver, ISO itp.

15

Laxmi, Gautam, and Kumar Rajneesh. "Trajectory Data to Improve Unsupervised Learning and Intrinsic." Applied Science and Biotechnology Journal for Advanced Research 3, no. 1 (2024): 16–20. https://doi.org/10.5281/zenodo.10656240.

Pełny tekst źródła

Streszczenie:

The three primary components of machine learning (ML) are reinforcement learning, unstructured learning, and structured learning. The last level, reinforcement learning, will be the main topic of this study. We'll cover a few of the more well-liked reinforcement learning techniques, though there are many more. Reinforcement agents are software agents that make use of reinforcement learning to optimize their rewards within a specific context. The two primary categories of rewards are extrinsic and intrinsic. It's a certain result we obtain after abiding by a set of guidelines and achieving a pa

Style APA, Harvard, Vancouver, ISO itp.

16

Jha, Ashutosh Chandra. "Automated Firewall Policy Generation with Reinforcement Learning." International journal of IoT 5, no. 1 (2025): 190–211. https://doi.org/10.55640/ijiot-05-01-10.

Pełny tekst źródła

Streszczenie:

Network security would be incomplete without firewalls that control traffic flow through rule-based policies. The manual way to configure and manage firewall rules, however, is prone to various pitfalls; rules tend to become overly complex, human error occurs, and cyber threats continue to evolve. This work investigates the reinforcement learning (RL) - driven method for firewall policy generation, utilizing RL as an automated means for policy generation to increase adaptability and reduce administrative overhead. The proposed system utilizes RL agents that learn an optimal policy from real-ti

Style APA, Harvard, Vancouver, ISO itp.

17

Balkrishna, Rasiklal Yadav. "Machine Learning Algorithms: Optimizing Efficiency in AI Applications." International Journal of Engineering and Management Research 14, no. 5 (2024): 49–57. https://doi.org/10.5281/zenodo.14005017.

Pełny tekst źródła

Streszczenie:

Machine learning (ML) is an AI technology that creates programs and data models that can perform tasks without being instructed. It has three major types: guided learning, uncontrolled learning, and reinforcement learning. ML is essential for big projects like real-time decision-making systems and self-driving cars, robots, and drones. It improves AI systems by making it easier to create models, work with data, and run algorithms. ML algorithms have different types of learning, require different amounts of data and training times, and can be improved by tuning hyperparameters. Techniques like

Style APA, Harvard, Vancouver, ISO itp.

18

Minghai Yuan, Minghai Yuan, Chenxi Zhang Minghai Yuan, Kaiwen Zhou Chenxi Zhang, and Fengque Pei Kaiwen Zhou. "Real-time Allocation of Shared Parking Spaces Based on Deep Reinforcement Learning." 網際網路技術學刊 24, no. 1 (2023): 035–43. http://dx.doi.org/10.53106/160792642023012401004.

Pełny tekst źródła

Streszczenie:

<p>Aiming at the parking space heterogeneity problem in shared parking space management, a multi-objective optimization model for parking space allocation is constructed with the optimization objectives of reducing the average walking distance of users and improving the utilization rate of parking spaces, a real-time allocation method for shared parking spaces based on deep reinforcement learning is proposed, which includes a state space for heterogeneous regions, an action space based on policy selection and a reward function with variable coefficients. To accurately evaluate the model

Style APA, Harvard, Vancouver, ISO itp.

19

West, Joseph, Frederic Maire, Cameron Browne, and Simon Denman. "Improved reinforcement learning with curriculum." Expert Systems with Applications 158 (November 2020): 113515. http://dx.doi.org/10.1016/j.eswa.2020.113515.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

20

Agrawal, Avinash J., Rashmi R. Welekar, Namita Parati, Pravin R. Satav, Uma Patel Thakur, and Archana V. Potnurwar. "Reinforcement Learning and Advanced Reinforcement Learning to Improve Autonomous Vehicle Planning." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 7s (2023): 652–60. http://dx.doi.org/10.17762/ijritcc.v11i7s.7526.

Pełny tekst źródła

Streszczenie:

Planning for autonomous vehicles is a challenging process that involves navigating through dynamic and unpredictable surroundings while making judgments in real-time. Traditional planning methods sometimes rely on predetermined rules or customized heuristics, which could not generalize well to various driving conditions. In this article, we provide a unique framework to enhance autonomous vehicle planning by fusing conventional RL methods with cutting-edge reinforcement learning techniques. To handle many elements of planning issues, our system integrates cutting-edge algorithms including deep

Style APA, Harvard, Vancouver, ISO itp.

21

Littman, Michael L. "Reinforcement learning improves behaviour from evaluative feedback." Nature 521, no. 7553 (2015): 445–51. http://dx.doi.org/10.1038/nature14540.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

22

Luo, Teng. "Improved reinforcement learning algorithm for mobile robot path planning." ITM Web of Conferences 47 (2022): 02030. http://dx.doi.org/10.1051/itmconf/20224702030.

Pełny tekst źródła

Streszczenie:

In order to solve the problem that traditional Q-learning algorithm has a large number of invalid iterations in the early convergence stage of robot path planning, an improved reinforcement learning algorithm is proposed. Firstly, the gravitational potential field in the improved artificial potential field algorithm is introduced when the Q table is initialized to accelerate the convergence. Secondly, the Tent Chaotic Mapping algorithm is added to the initial state determination process of the algorithm, which allows the algorithm to explore the environment more fully. In addition, an ε-greed

Style APA, Harvard, Vancouver, ISO itp.

23

McLaverty, Brian, Robert S. Parker, and Gilles Clermont. "Reinforcement learning algorithm to improve intermittent hemodialysis." Journal of Critical Care 74 (April 2023): 154205. http://dx.doi.org/10.1016/j.jcrc.2022.154205.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

24

Ying-Ming Shi, Ying-Ming Shi, and Zhiyuan Zhang Ying-Ming Shi. "Research on Path Planning Strategy of Rescue Robot Based on Reinforcement Learning." 電腦學刊 33, no. 3 (2022): 187–94. http://dx.doi.org/10.53106/199115992022063303015.

Pełny tekst źródła

Streszczenie:

<p>How rescue robots reach their destinations quickly and efficiently has become a hot research topic in recent years. Aiming at the complex unstructured environment faced by rescue robots, this paper proposes an artificial potential field algorithm based on reinforcement learning. Firstly, use the traditional artificial potential field method to perform basic path planning for the robot. Secondly, in order to solve the local minimum problem in planning and improve the robot’s adaptive ability, the reinforcement learning algorithm is run by fixing preset parameters on the simulation plat

Style APA, Harvard, Vancouver, ISO itp.

25

Lecarpentier, Erwan, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, and Michael L. Littman. "Lipschitz Lifelong Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (2021): 8270–78. http://dx.doi.org/10.1609/aaai.v35i9.17006.

Pełny tekst źródła

Streszczenie:

We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value-transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. Further, we show the method to experience no negative transfer with high probability. We illustrate the ben

Style APA, Harvard, Vancouver, ISO itp.

26

Zhang, Jingjing, Yanlong Liu, and Weidong Zhou. "Adaptive Sampling Path Planning for a 3D Marine Observation Platform Based on Evolutionary Deep Reinforcement Learning." Journal of Marine Science and Engineering 11, no. 12 (2023): 2313. http://dx.doi.org/10.3390/jmse11122313.

Pełny tekst źródła

Streszczenie:

Adaptive sampling of the marine environment may improve the accuracy of marine numerical prediction models. This study considered adaptive sampling path optimization for a three-dimensional (3D) marine observation platform, leading to a path-planning strategy based on evolutionary deep reinforcement learning. The low sampling efficiency of the reinforcement learning algorithm is improved by evolutionary learning. The combination of these two components as a new algorithm has become a current research trend. We first combined the evolutionary algorithm with different reinforcement learning algo

Style APA, Harvard, Vancouver, ISO itp.

27

Ma, Guoqing, Zhifu Wang, Xianfeng Yuan, and Fengyu Zhou. "Improving Model-Based Deep Reinforcement Learning with Learning Degree Networks and Its Application in Robot Control." Journal of Robotics 2022 (March 4, 2022): 1–14. http://dx.doi.org/10.1155/2022/7169594.

Pełny tekst źródła

Streszczenie:

Deep reinforcement learning is the technology of artificial neural networks in the field of decision-making and control. The traditional model-free reinforcement learning algorithm requires a large amount of environment interactive data to iterate the algorithm. This model’s performance also suffers due to low utilization of training data, while the model-based reinforcement learning (MBRL) algorithm improves the efficiency of the data, MBRL locks into low prediction accuracy. Although MBRL can utilize the additional data generated by the dynamic model, a system dynamics model with low predict

Style APA, Harvard, Vancouver, ISO itp.

28

González-Garduño, Ana V. "Reinforcement Learning for Improved Low Resource Dialogue Generation." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9884–85. http://dx.doi.org/10.1609/aaai.v33i01.33019884.

Pełny tekst źródła

Streszczenie:

In this thesis, I focus on language independent methods of improving utterance understanding and response generation and attempt to tackle some of the issues surrounding current systems. The aim is to create a unified approach to dialogue generation inspired by developments in both goal oriented and open ended dialogue systems. The main contributions in this thesis are: 1) Introducing hybrid approaches to dialogue generation using retrieval and encoder-decoder architectures to produce fluent but precise utterances in dialogues, 2) Proposing supervised, semi-supervised and Reinforcement Learnin

Style APA, Harvard, Vancouver, ISO itp.

29

Kuremoto, Takashi, Tetsuya Tsurusaki, Kunikazu Kobayashi, Shingo Mabu, and Masanao Obayashi. "An Improved Reinforcement Learning System Using Affective Factors." Robotics 2, no. 3 (2013): 149–64. http://dx.doi.org/10.3390/robotics2030149.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

30

Yao, Guangyu, Nan Zhang, Zhenhua Duan, and Cong Tian. "Improved SARSA and DQN algorithms for reinforcement learning." Theoretical Computer Science 1027 (February 2025): 115025. https://doi.org/10.1016/j.tcs.2024.115025.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

31

FRIEDRICH, JOHANNES, ROBERT URBANCZIK, and WALTER SENN. "CODE-SPECIFIC LEARNING RULES IMPROVE ACTION SELECTION BY POPULATIONS OF SPIKING NEURONS." International Journal of Neural Systems 24, no. 05 (2014): 1450002. http://dx.doi.org/10.1142/s0129065714500026.

Pełny tekst źródła

Streszczenie:

Population coding is widely regarded as a key mechanism for achieving reliable behavioral decisions. We previously introduced reinforcement learning for population-based decision making by spiking neurons. Here we generalize population reinforcement learning to spike-based plasticity rules that take account of the postsynaptic neural code. We consider spike/no-spike, spike count and spike latency codes. The multi-valued and continuous-valued features in the postsynaptic code allow for a generalization of binary decision making to multi-valued decision making and continuous-valued action select

Style APA, Harvard, Vancouver, ISO itp.

32

Zhang, Gaohan. "Synergistic advantages of deep learning and reinforcement learning in economic forecasting." International Journal of Global Economics and Management 1, no. 1 (2023): 89–95. http://dx.doi.org/10.62051/ijgem.v1n1.13.

Pełny tekst źródła

Streszczenie:

With the progress of science and technology, emerging technologies such as deep learning and reinforcement learning have emerged in economic forecasting, injecting synergy into this field. First of all, deep learning makes the economic model capture the dynamic changes of the economic system more comprehensively and accurately by dealing with nonlinear relations, learning complex characteristics and conducting accurate time series analysis. Its multi-level neural network structure and the ability to learn features automatically improve the adaptability of the model and avoid the tedious proces

Style APA, Harvard, Vancouver, ISO itp.

33

Chen, Yinhe. "Enhancing stability and explainability in reinforcement learning with machine learning." Applied and Computational Engineering 101, no. 1 (2024): 25–34. http://dx.doi.org/10.54254/2755-2721/101/20240943.

Pełny tekst źródła

Streszczenie:

Abstract. In the field of reinforcement learning, training agents using machine learning algorithms to learn and perform tasks in complex environments has become a prevalent approach. However, reinforcement learning faces challenges such as training instability and decision opacity, which limit its feasibility in real-world applications. To solve the problems of stability and transparency in reinforcement learning, this project will use advanced algorithms like Proximal Policy Optimization (PPO), Q-DAGGER, and Gradient Boosting Decision Trees to set up reinforcement learning agents in the Open

Style APA, Harvard, Vancouver, ISO itp.

34

Szepesvári, Csaba, and Michael L. Littman. "A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms." Neural Computation 11, no. 8 (1999): 2017–60. http://dx.doi.org/10.1162/089976699300016070.

Pełny tekst źródła

Streszczenie:

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning

Style APA, Harvard, Vancouver, ISO itp.

35

Zhang, Lige, and Zhen Tian. "Research on Music Emotional Expression Based on Reinforcement Learning and Multimodal Information." Mobile Information Systems 2022 (June 30, 2022): 1–8. http://dx.doi.org/10.1155/2022/2616220.

Pełny tekst źródła

Streszczenie:

With the continuous development of the research in the field of emotion analysis, music, as a common multimodal information carrier in people’s daily life, often transmits emotion through lyrics and melody, so it has been gradually incorporated into the research category of emotion analysis. The fusion classification model based on CNN-LSTM proposed in this paper effectively improves the accuracy of emotional classification of audio and lyrics. At the same time, in view of the problem that the traditional decision-level fusion method ignores the correlation between modes and the limitations of

Style APA, Harvard, Vancouver, ISO itp.

36

Singh, Anunay, Anveet Pal, and Ashish Baghel. "Resolving the Cold-Start Issue in Recommender Systems with Reinforcement Learning." International Scientific Journal of Engineering and Management 04, no. 03 (2025): 1–7. https://doi.org/10.55041/isjem02367.

Pełny tekst źródła

Streszczenie:

The cold-start problem faced by recommender systems is a serious problem, mainly because of the lack of historical data for new users or items. Traditional recommendation techniques, such as collaborative filtering and content-based filtering, are prone to fail in making good recommendations under such conditions. This paper explores the use of reinforcement learning (RL) as a remedy for cold-start problems based on active learning methods and multi-armed bandit models. We propose a novel RL-based approach that learns user preferences incrementally from interaction and improves recommendations

Style APA, Harvard, Vancouver, ISO itp.

37

Zhou, Minghui. "Multithreshold Microbial Image Segmentation Using Improved Deep Reinforcement Learning." Mathematical Problems in Engineering 2022 (August 23, 2022): 1–11. http://dx.doi.org/10.1155/2022/5096298.

Pełny tekst źródła

Streszczenie:

Image segmentation technology can effectively extract the foreground target in the image. However, the microbial image is easily disturbed by noise, its greyscale has the characteristics of nonuniform distribution, and several microorganisms with diverse forms exist in the same image, resulting in insufficient accuracy of microbial image segmentation. Therefore, a multithreshold microbial image segmentation algorithm using improved deep reinforcement learning is proposed. The wavelet transform method is used to remove the noise of the microbial image, the threshold number of the microbial imag

Style APA, Harvard, Vancouver, ISO itp.

38

Kaddour, N., P. Del Moral, and E. Ikonen. "Improved version of the McMurtry-Fu reinforcement learning scheme." International Journal of Systems Science 34, no. 1 (2003): 37–47. http://dx.doi.org/10.1080/0020772031000115560.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

39

Shi, Zhen, Keyin Wang, and Jianhui Zhang. "Improved reinforcement learning path planning algorithm integrating prior knowledge." PLOS ONE 18, no. 5 (2023): e0284942. http://dx.doi.org/10.1371/journal.pone.0284942.

Pełny tekst źródła

Streszczenie:

In order to realize the optimization of autonomous navigation of mobile robot under the condition of partial environmental knowledge known. An improved Q-learning reinforcement learning algorithm based on prior knowledge is proposed to solve the problem of slow convergence and low learning efficiency in mobile robot path planning. Prior knowledge is used to initialize the Q-value, so as to guide the agent to move toward the target direction with a greater probability from the early stage of the algorithm, eliminating a large number of invalid iterations. The greedy factor ε is dynamically adju

Style APA, Harvard, Vancouver, ISO itp.

40

Li, Lihong, Vadim Bulitko, and Russell Greiner. "Focus of Attention in Reinforcement Learning." JUCS - Journal of Universal Computer Science 13, no. (9) (2007): 1246–69. https://doi.org/10.3217/jucs-013-09-1246.

Pełny tekst źródła

Streszczenie:

Classification-based reinforcement learning (RL) methods have recently been pro-posed as an alternative to the traditional value-function based methods. These methods use a classifier to represent a policy, where the input (features) to the classifier is the state and theoutput (class label) for that state is the desired action. The reinforcement-learning community knows that focusing on more important states can lead to improved performance. In this paper,we investigate the idea of focused learning in the context of classification-based RL. Specifically, we define a useful notation of state i

Style APA, Harvard, Vancouver, ISO itp.

41

Yang, Yana, Meng Xi, Huiao Dai, Jiabao Wen, and Jiachen Yang. "Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning." Sensors 24, no. 23 (2024): 7746. https://doi.org/10.3390/s24237746.

Pełny tekst źródła

Streszczenie:

Reinforcement learning, as a machine learning method that does not require pre-training data, seeks the optimal policy through the continuous interaction between an agent and its environment. It is an important approach to solving sequential decision-making problems. By combining it with deep learning, deep reinforcement learning possesses powerful perception and decision-making capabilities and has been widely applied to various domains to tackle complex decision problems. Off-policy reinforcement learning separates exploration and exploitation by storing and replaying interaction experiences

Style APA, Harvard, Vancouver, ISO itp.

42

Ni, Jianjun, Yu Gu, Guangyi Tang, Chunyan Ke, and Yang Gu. "Cooperative Coverage Path Planning for Multi-Mobile Robots Based on Improved K-Means Clustering and Deep Reinforcement Learning." Electronics 13, no. 5 (2024): 944. http://dx.doi.org/10.3390/electronics13050944.

Pełny tekst źródła

Streszczenie:

With the increasing complexity of patrol tasks, the use of deep reinforcement learning for collaborative coverage path planning (CPP) of multi-mobile robots has become a new hotspot. Taking into account the complexity of environmental factors and operational limitations, such as terrain obstacles and the scope of the task area, in order to complete the CPP task better, this paper proposes an improved K-Means clustering algorithm to divide the multi-robot task area. The improved K-Means clustering algorithm improves the selection of the first initial clustering point, which makes the clustering

Style APA, Harvard, Vancouver, ISO itp.

43

Zhao, Yongqi, Zhangdong Wei, and Jing Wen. "Prediction of Soil Heavy Metal Content Based on Deep Reinforcement Learning." Scientific Programming 2022 (April 15, 2022): 1–10. http://dx.doi.org/10.1155/2022/1476565.

Pełny tekst źródła

Streszczenie:

Since the prediction accuracy of heavy metal content in soil by common spatial prediction algorithms is not ideal, a prediction model based on the improved deep Q network is proposed. The state value reuse is used to accelerate the learning speed of training samples for agents in deep Q network, and the convergence speed of model is improved. At the same time, adaptive fuzzy membership factor is introduced to change the sensitivity of agent to environmental feedback value in different training periods and improve the stability of the model after convergence. Finally, an adaptive inverse distan

Style APA, Harvard, Vancouver, ISO itp.

44

Bekele, Yared Zerihun, and Young-June Choi. "Random Access Using Deep Reinforcement Learning in Dense Mobile Networks." Sensors 21, no. 9 (2021): 3210. http://dx.doi.org/10.3390/s21093210.

Pełny tekst źródła

Streszczenie:

5G and Beyond 5G mobile networks use several high-frequency spectrum bands such as the millimeter-wave (mmWave) bands to alleviate the problem of bandwidth scarcity. However high-frequency bands do not cover larger distances. The coverage problem is addressed by using a heterogeneous network which comprises numerous small and macrocells, defined by transmission and reception points (TRxPs). For such a network, random access is considered a challenging function in which users attempt to select an efficient TRxP by random access within a given time. Ideally, an efficient TRxP is less congested,

Style APA, Harvard, Vancouver, ISO itp.

45

Koga, Marcelo L., Valdinei Freire, and Anna H. R. Costa. "Stochastic Abstract Policies: Generalizing Knowledge to Improve Reinforcement Learning." IEEE Transactions on Cybernetics 45, no. 1 (2015): 77–88. http://dx.doi.org/10.1109/tcyb.2014.2319733.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

46

Zhao, Tinglong, Ming Wang, Qianchuan Zhao, Xuehan Zheng, and He Gao. "A Path-Planning Method Based on Improved Soft Actor-Critic Algorithm for Mobile Robots." Biomimetics 8, no. 6 (2023): 481. http://dx.doi.org/10.3390/biomimetics8060481.

Pełny tekst źródła

Streszczenie:

The path planning problem has gained more attention due to the gradual popularization of mobile robots. The utilization of reinforcement learning techniques facilitates the ability of mobile robots to successfully navigate through an environment containing obstacles and effectively plan their path. This is achieved by the robots’ interaction with the environment, even in situations when the environment is unfamiliar. Consequently, we provide a refined deep reinforcement learning algorithm that builds upon the soft actor-critic (SAC) algorithm, incorporating the concept of maximum entropy for t

Style APA, Harvard, Vancouver, ISO itp.

47

Fawzi, Alhussein, Matej Balog, Aja Huang, et al. "Discovering faster matrix multiplication algorithms with reinforcement learning." Nature 610, no. 7930 (2022): 47–53. http://dx.doi.org/10.1038/s41586-022-05172-4.

Pełny tekst źródła

Streszczenie:

AbstractImproving the efficiency of algorithms for fundamental computations can have a widespread impact, as it can affect the overall speed of a large amount of computations. Matrix multiplication is one such primitive task, occurring in many systems—from neural networks to scientific computing routines. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. However, automating the algorithm discovery procedure is intricate, as the space of possible algorithms is enormous

Style APA, Harvard, Vancouver, ISO itp.

48

Tantu, Year Rezeki Patricia, and Kirey Eleison Oloi Marina. "Teachers' efforts to improve discipline of elementary school students using positive reinforcement methods in online learning." JURNAL PENDIDIKAN DASAR NUSANTARA 8, no. 2 (2023): 288–98. http://dx.doi.org/10.29407/jpdn.v8i2.19118.

Pełny tekst źródła

Streszczenie:

: Discipline is an attitude that students need to have in learning in order to help students achieve learning goals. On the other hand, learning aims to train discipline so that students can behave correctly and have good character in society. This study aims to examine teachers' efforts to use positive reinforcement in increasing student discipline in the learning process. The research method used is descriptive qualitative. The data found by the author shows that the percentage of discipline in grade IV elementary school students is 63.2%. The teacher applies a positive reinforcement method

Style APA, Harvard, Vancouver, ISO itp.

49

Zheng, Shujian, Chudi Zhang, Jun Hu, and Shiyou Xu. "Radar-Jamming Decision-Making Based on Improved Q-Learning and FPGA Hardware Implementation." Remote Sensing 16, no. 7 (2024): 1190. http://dx.doi.org/10.3390/rs16071190.

Pełny tekst źródła

Streszczenie:

In contemporary warfare, radar countermeasures have become multifunctional and intelligent,rendering the conventional jamming method and platform unsuitable for the modern radar countermeasures battlefield due to their limited efficiency. Reinforcement learning has been proven to be a practical solution for cognitive jamming decision-making in the cognitive electronic warfare. In this paper, we proposed a radar-jamming decision-making algorithm based on an improved Q-Learning algorithm. This improved Q-Learning algorithm ameliorated the problem of overestimating the Q-value that exists in the

Style APA, Harvard, Vancouver, ISO itp.

50

Wang, Zhijian, Jianpeng Yang, Qiang Zhang, and Li Wang. "Risk-Aware Travel Path Planning Algorithm Based on Reinforcement Learning during COVID-19." Sustainability 14, no. 20 (2022): 13364. http://dx.doi.org/10.3390/su142013364.

Pełny tekst źródła

Streszczenie:

The outbreak of COVID-19 brought great inconvenience to people’s daily travel. In order to provide people with a path planning scheme that takes into account both safety and travel distance, a risk aversion path planning model in urban traffic scenarios was established through reinforcement learning. We have designed a state and action space of agents in a “point-to-point” way. Moreover, we have extracted the road network model and impedance matrix through SUMO simulation, and have designed a Restricted Reinforcement Learning-Artificial Potential Field (RRL-APF) algorithm, which can optimize t

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!