Journal articles on the topic 'Multi-Objective Reinforcement Learning'

To see the other types of publications on this topic, follow the link: Multi-Objective Reinforcement Learning.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multi-Objective Reinforcement Learning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Horie, Naoto, Tohgoroh Matsui, Koichi Moriyama, Atsuko Mutoh, and Nobuhiro Inuzuka. "Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning." Artificial Life and Robotics 24, no. 3 (February 8, 2019): 352–59. http://dx.doi.org/10.1007/s10015-019-00523-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Kim, Man-Je, Hyunsoo Park, and Chang Wook Ahn. "Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning." Electronics 11, no. 7 (March 28, 2022): 1069. http://dx.doi.org/10.3390/electronics11071069.

Full text
Abstract:
Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.
APA, Harvard, Vancouver, ISO, and other styles
3

Drugan, Madalina, Marco Wiering, Peter Vamplew, and Madhu Chetty. "Special issue on multi-objective reinforcement learning." Neurocomputing 263 (November 2017): 1–2. http://dx.doi.org/10.1016/j.neucom.2017.06.020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Perez, Julien, Cécile Germain-Renaud, Balazs Kégl, and Charles Loomis. "Multi-objective Reinforcement Learning for Responsive Grids." Journal of Grid Computing 8, no. 3 (June 8, 2010): 473–92. http://dx.doi.org/10.1007/s10723-010-9161-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Nguyen, Thanh Thi, Ngoc Duy Nguyen, Peter Vamplew, Saeid Nahavandi, Richard Dazeley, and Chee Peng Lim. "A multi-objective deep reinforcement learning framework." Engineering Applications of Artificial Intelligence 96 (November 2020): 103915. http://dx.doi.org/10.1016/j.engappai.2020.103915.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

García, Javier, Rubén Majadas, and Fernando Fernández. "Learning adversarial attack policies through multi-objective reinforcement learning." Engineering Applications of Artificial Intelligence 96 (November 2020): 104021. http://dx.doi.org/10.1016/j.engappai.2020.104021.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Yamamoto, Hiroyuki, Tomohiro Hayashida, Ichiro Nishizaki, and Shinya Sekizaki. "Hypervolume-Based Multi-Objective Reinforcement Learning: Interactive Approach." Advances in Science, Technology and Engineering Systems Journal 4, no. 1 (2019): 93–100. http://dx.doi.org/10.25046/aj040110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

García, Javier, Roberto Iglesias, Miguel A. Rodríguez, and Carlos V. Regueiro. "Incremental reinforcement learning for multi-objective robotic tasks." Knowledge and Information Systems 51, no. 3 (September 22, 2016): 911–40. http://dx.doi.org/10.1007/s10115-016-0992-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Schneider, Stefan, Ramin Khalili, Adnan Manzoor, Haydar Qarawlus, Rafael Schellenberg, Holger Karl, and Artur Hecker. "Self-Learning Multi-Objective Service Coordination Using Deep Reinforcement Learning." IEEE Transactions on Network and Service Management 18, no. 3 (September 2021): 3829–42. http://dx.doi.org/10.1109/tnsm.2021.3076503.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Ferreira, Leonardo Anjoletto, Carlos Henrique Costa Ribeiro, and Reinaldo Augusto da Costa Bianchi. "Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems." Applied Intelligence 41, no. 2 (May 1, 2014): 551–62. http://dx.doi.org/10.1007/s10489-014-0534-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Parisi, Simone, Matteo Pirotta, and Marcello Restelli. "Multi-objective Reinforcement Learning through Continuous Pareto Manifold Approximation." Journal of Artificial Intelligence Research 57 (October 21, 2016): 187–227. http://dx.doi.org/10.1613/jair.4961.

Full text
Abstract:
Many real-world control applications, from economics to robotics, are characterized by the presence of multiple conflicting objectives. In these problems, the standard concept of optimality is replaced by Pareto-optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge. In this paper, we propose a reinforcement learning policy gradient approach to learn a continuous approximation of the Pareto frontier in multi-objective Markov Decision Problems (MOMDPs). Differently from previous policy gradient algorithms, where n optimization routines are executed to have n solutions, our approach performs a single gradient ascent run, generating at each step an improved continuous approximation of the Pareto frontier. The idea is to optimize the parameters of a function defining a manifold in the policy parameters space, so that the corresponding image in the objectives space gets as close as possible to the true Pareto frontier. Besides deriving how to compute and estimate such gradient, we will also discuss the non-trivial issue of defining a metric to assess the quality of the candidate Pareto frontiers. Finally, the properties of the proposed approach are empirically evaluated on two problems, a linear-quadratic Gaussian regulator and a water reservoir control task.
APA, Harvard, Vancouver, ISO, and other styles
12

陶, 海成, 湛. 卜, and 杰. 曹. "A multi-objective reinforcement learning framework for community deception." SCIENTIA SINICA Informationis 51, no. 7 (July 1, 2021): 1131. http://dx.doi.org/10.1360/ssi-2020-0229.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

KOBAYASHI, Taisuke. "Multi-Objective Switchable Reinforcement Learning by using Reservoir Computing." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2017 (2017): 2P1—H03. http://dx.doi.org/10.1299/jsmermd.2017.2p1-h03.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Westbrink, Fabian, Alexander Elbel, Andreas Schwung, and Steven X. Ding. "Optimization of DEM parameters using multi-objective reinforcement learning." Powder Technology 379 (February 2021): 602–16. http://dx.doi.org/10.1016/j.powtec.2020.10.067.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Ruiz-Montiel, Manuela, Lawrence Mandow, and José-Luis Pérez-de-la-Cruz. "A temporal difference method for multi-objective reinforcement learning." Neurocomputing 263 (November 2017): 15–25. http://dx.doi.org/10.1016/j.neucom.2016.10.100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Zou, Fei, Gary G. Yen, Lixin Tang, and Chunfeng Wang. "A reinforcement learning approach for dynamic multi-objective optimization." Information Sciences 546 (February 2021): 815–34. http://dx.doi.org/10.1016/j.ins.2020.08.101.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Qin, Yao, Hua Wang, Shanwen Yi, Xiaole Li, and Linbo Zhai. "Virtual machine placement based on multi-objective reinforcement learning." Applied Intelligence 50, no. 8 (March 6, 2020): 2370–83. http://dx.doi.org/10.1007/s10489-020-01633-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Comsa, Ioan Sorin, Mehmet Aydin, Sijing Zhang, Pierre Kuonen, and Jean–Frédéric Wagen. "Multi Objective Resource Scheduling in LTE Networks Using Reinforcement Learning." International Journal of Distributed Systems and Technologies 3, no. 2 (April 2012): 39–57. http://dx.doi.org/10.4018/jdst.2012040103.

Full text
Abstract:
The use of the intelligent packet scheduling process is absolutely necessary in order to make the radio resources usage more efficient in recent high-bit-rate demanding radio access technologies such as Long Term Evolution (LTE). Packet scheduling procedure works with various dispatching rules with different behaviors. In the literature, the scheduling disciplines are applied for the entire transmission sessions and the scheduler performance strongly depends on the exploited discipline. The method proposed in this paper aims to discuss how a straightforward schedule can be provided within the transmission time interval (TTI) sub-frame using a mixture of dispatching disciplines per TTI instead of a single rule adopted across the whole transmission. This is to maximize the system throughput while assuring the best user fairness. This requires adopting a policy of how to mix the rules and a refinement procedure to call the best rule each time. Two scheduling policies are proposed for how to mix the rules including use of Q learning algorithm for refining the policies. Simulation results indicate that the proposed methods outperform the existing scheduling techniques by maximizing the system throughput without harming the user fairness performance.
APA, Harvard, Vancouver, ISO, and other styles
19

Li, Dazi, Fuqiang Zhu, Xiao Wang, and Qibing Jin. "Multi-objective reinforcement learning for fed-batch fermentation process control." Journal of Process Control 115 (July 2022): 89–99. http://dx.doi.org/10.1016/j.jprocont.2022.05.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Shresthamali, Shaswot, Masaaki Kondo, and Hiroshi Nakamura. "Multi-Objective Resource Scheduling for IoT Systems Using Reinforcement Learning." Journal of Low Power Electronics and Applications 12, no. 4 (October 8, 2022): 53. http://dx.doi.org/10.3390/jlpea12040053.

Full text
Abstract:
IoT embedded systems have multiple objectives that need to be maximized simultaneously. These objectives conflict with each other due to limited resources and tradeoffs that need to be made. This requires multi-objective optimization (MOO) and multiple Pareto-optimal solutions are possible. In such a case, tradeoffs are made w.r.t. a user-defined preference. This work presents a general Multi-objective Reinforcement Learning (MORL) framework for MOO of IoT embedded systems. This framework comprises a general Multi-objective Markov Decision Process (MOMDP) formulation and two novel low-compute MORL algorithms. The algorithms learn policies to tradeoff between multiple objectives using a single preference parameter. We take the energy scheduling problem in general Energy Harvesting Wireless Sensor Nodes (EHWSNs) as a case example in which a sensor node is required to maximize its sensing rate, and transmission performance as well as ensure long-term uninterrupted operation within a very tight energy budget. We simulate single-task and dual-task EHWSN systems to evaluate our framework.. The results demonstrate that our MORL algorithms can learn better policies at lower learning costs and successfully tradeoff between multiple objectives at runtime.
APA, Harvard, Vancouver, ISO, and other styles
21

Chen, SenPeng, Jia Wu, and XiYuan Liu. "EMORL: Effective multi-objective reinforcement learning method for hyperparameter optimization." Engineering Applications of Artificial Intelligence 104 (September 2021): 104315. http://dx.doi.org/10.1016/j.engappai.2021.104315.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Bi, Yu, Carlos Colman Meixner, Monchai Bunyakitanon, Xenofon Vasilakos, Reza Nejabati, and Dimitra Simeonidou. "Multi-Objective Deep Reinforcement Learning Assisted Service Function Chains Placement." IEEE Transactions on Network and Service Management 18, no. 4 (December 2021): 4134–50. http://dx.doi.org/10.1109/tnsm.2021.3127685.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Lepenioti, Katerina, Alexandros Bousdekis, Dimitris Apostolou, and Gregoris Mentzas. "Human-Augmented Prescriptive Analytics With Interactive Multi-Objective Reinforcement Learning." IEEE Access 9 (2021): 100677–93. http://dx.doi.org/10.1109/access.2021.3096662.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Li, Qinyu, Longyu Yang, Pengjie Tang, and Hanli Wang. "Enhancing semantics with multi‐objective reinforcement learning for video description." Electronics Letters 57, no. 25 (October 8, 2021): 977–79. http://dx.doi.org/10.1049/ell2.12334.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Mannion, Patrick, Sam Devlin, Karl Mason, Jim Duggan, and Enda Howley. "Policy invariance under reward transformations for multi-objective reinforcement learning." Neurocomputing 263 (November 2017): 60–73. http://dx.doi.org/10.1016/j.neucom.2017.05.090.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Huo, Lin, and Yuepeng Tang. "Multi-Objective Deep Reinforcement Learning for Personalized Dose Optimization Based on Multi-Indicator Experience Replay." Applied Sciences 13, no. 1 (December 27, 2022): 325. http://dx.doi.org/10.3390/app13010325.

Full text
Abstract:
Chemotherapy as an effective method is now widely used to treat various types of malignant tumors. With advances in medicine and drug dosimetry, the precise dose adjustment of chemotherapy drugs has become a significant challenge. Several academics have investigated this problem in depth. However, these studies have concentrated on the efficiency of cancer treatment while ignoring other significant bodily indicators in the patient, which could cause other complications. Therefore, to handle the above problem, this research creatively proposes a multi-objective deep reinforcement learning. First, in order to balance the competing indications inside the optimization process and to give each indicator a better outcome, we propose a multi-criteria decision-making strategy based on the integration concept. In addition, we provide a novel multi-indicator experience replay for multi-objective deep reinforcement learning, which significantly speeds up learning compared to conventional approaches. By modeling various indications in the body of the patient, our approach is used to simulate the treatment of tumors. The experimental results demonstrate that the treatment plan generated by our method can better balance the contradiction between the tumor’s treatment effect and other biochemical indicators than other treatment plans, and its treatment time is only one-third that of multi-objective deep reinforcement learning, which is now in use.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhang, Kai, Sterling McLeod, Minwoo Lee, and Jing Xiao. "Continuous reinforcement learning to adapt multi-objective optimization online for robot motion." International Journal of Advanced Robotic Systems 17, no. 2 (March 1, 2020): 172988142091149. http://dx.doi.org/10.1177/1729881420911491.

Full text
Abstract:
This article introduces a continuous reinforcement learning framework to enable online adaptation of multi-objective optimization functions for guiding a mobile robot to move in changing dynamic environments. The robot with this framework can continuously learn from multiple or changing environments where it encounters different numbers of obstacles moving in unknown ways at different times. Using both planned trajectories from a real-time motion planner and already executed trajectories as feedback observations, our reinforcement learning agent enables the robot to adapt motion behaviors to environmental changes. The agent contains a Q network connected to a long short-term memory network. The proposed framework is tested in both simulations and real robot experiments over various, dynamically varied task environments. The results show the efficacy of online continuous reinforcement learning for quick adaption to different, unknown, and dynamic environments.
APA, Harvard, Vancouver, ISO, and other styles
28

Wang, Yuandou, Hang Liu, Wanbo Zheng, Yunni Xia, Yawen Li, Peng Chen, Kunyin Guo, and Hong Xie. "Multi-Objective Workflow Scheduling With Deep-Q-Network-Based Multi-Agent Reinforcement Learning." IEEE Access 7 (2019): 39974–82. http://dx.doi.org/10.1109/access.2019.2902846.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

YAMADA, Kazuaki. "Acquiring Conflict Avoidance Behaviors with Multi-Objective Reinforcement Learning in Multi-Agent Systems." Proceedings of JSME annual Conference on Robotics and Mechatronics (Robomec) 2018 (2018): 2P2—F15. http://dx.doi.org/10.1299/jsmermd.2018.2p2-f15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Abdelfattah, Sherif, Kathryn Kasmarik, and Jiankun Hu. "A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments." Adaptive Behavior 28, no. 4 (August 15, 2019): 273–92. http://dx.doi.org/10.1177/1059712319869313.

Full text
Abstract:
Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this kind of problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity in order to evolve a coverage set of policies that can solve the problem. This article introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.
APA, Harvard, Vancouver, ISO, and other styles
31

García, Javier, Roberto Iglesias, Miguel A. Rodríguez, and Carlos V. Regueiro. "Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning." International Journal of Information Technology & Decision Making 18, no. 03 (May 2019): 1045–82. http://dx.doi.org/10.1142/s0219622019500093.

Full text
Abstract:
Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL is a generalization of standard Reinforcement Learning (RL) where the single reward signal is extended to multiple signals, in particular, one for each objective. MORL is the process of learning policies that optimize multiple objectives simultaneously. In these problems, the use of directional/gradient information can be useful to guide the exploration to better and better behaviors. However, traditional policy-gradient approaches have two main drawbacks: they require the use of a batch of episodes to properly estimate the gradient information (reducing in this way the learning speed), and they use stochastic policies which could have a disastrous impact on the safety of the learning system. In this paper, we present a novel population-based MORL algorithm for problems in which the underlying objectives are reasonably smooth. It presents two main characteristics: fast computation of the gradient information for each objective through the use of neighboring solutions, and the use of this information to carry out a geometric partition of the search space and thus direct the exploration to promising areas. Finally, the algorithm is evaluated and compared to policy gradient MORL algorithms on different multi-objective problems: the water reservoir and the biped walking problem (the latter both on simulation and on a real robot).
APA, Harvard, Vancouver, ISO, and other styles
32

Wang, Hao, Zhongli Wang, and Xin Cui. "Multi-objective Optimization Based Deep Reinforcement Learning for Autonomous Driving Policy." Journal of Physics: Conference Series 1861, no. 1 (March 1, 2021): 012097. http://dx.doi.org/10.1088/1742-6596/1861/1/012097.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Beeks, Martijn, Reza Refaei Afshar, Yingqian Zhang, Remco Dijkman, Claudy Van Dorst, and Stijn De Looijer. "Deep Reinforcement Learning for a Multi-Objective Online Order Batching Problem." Proceedings of the International Conference on Automated Planning and Scheduling 32 (June 13, 2022): 435–43. http://dx.doi.org/10.1609/icaps.v32i1.19829.

Full text
Abstract:
On-time delivery and low service costs are two important performance metrics in warehousing operations. This paper proposes a Deep Reinforcement Learning (DRL) based approach to solve the online Order Batching and Sequence Problem (OBSP) to optimize these two objectives. To learn how to balance the trade-off between two objectives, we introduce a Bayesian optimization framework to shape the reward function of the DRL agent, such that the influences of learning to these objectives are adjusted to different environments. We compare our approach with several heuristics using problem instances of real-world size where thousands of orders arrive dynamically per hour. We show the Proximal Policy Optimization (PPO) algorithm with Bayesian optimization outperforms the heuristics in all tested scenarios on both objectives. In addition, it finds different weights for the components in the reward function in different scenarios, indicating its capability of learning how to set the importance of two objectives under different environments. We also provide policy analysis on the learned DRL agent, where a decision tree is used to infer decision rules to enable the interpretability of the DRL approach.
APA, Harvard, Vancouver, ISO, and other styles
34

Qin, Sheng, Shuyue Wang, Liyue Wang, Cong Wang, Gang Sun, and Yongjian Zhong. "Multi-Objective Optimization of Cascade Blade Profile Based on Reinforcement Learning." Applied Sciences 11, no. 1 (December 24, 2020): 106. http://dx.doi.org/10.3390/app11010106.

Full text
Abstract:
The multi-objective optimization of compressor cascade rotor blade is important for aero engine design. Many conventional approaches are thus proposed; however, they lack a methodology for utilizing existing design data/experiences to guide actual design. Therefore, the conventional methods require and consume large computational resources due to their need for large numbers of stochastic cases for determining optimization direction in the design space of problem. This paper proposed a Reinforcement Learning method as a new approach for compressor blade multi-objective optimization. By using Deep Deterministic Policy Gradient (DDPG), the approach modifies the blade profile as an intelligent designer according to the design policy: it learns the design experience of cascade blade as accumulated knowledge from interaction with the computation-based environment; the design policy can thus be updated. The accumulated computational data is therefore transformed into design experience and policies, which are directly applied to the cascade optimization, and the good-performance profiles can be thus approached. In a case study provided in this paper, the proposed approach is applied on a blade profile, which is thus optimized in terms of total pressure loss and laminar flow area. Compared with the initial profile, the total pressure loss coefficient is reduced by 3.59%, and the relative laminar flow area at the suction surface is improved by 25.4%.
APA, Harvard, Vancouver, ISO, and other styles
35

Studley, Matthew, and Larry Bull. "Using the XCS Classifier System for Multi-objective Reinforcement Learning Problems." Artificial Life 13, no. 1 (January 2007): 69–86. http://dx.doi.org/10.1162/artl.2007.13.1.69.

Full text
Abstract:
We investigate the performance of a learning classifier system in some simple multi-objective, multi-step maze problems, using both random and biased action-selection policies for exploration. Results show that the choice of action-selection policy can significantly affect the performance of the system in such environments. Further, this effect is directly related to population size, and we relate this finding to recent theoretical studies of learning classifier systems in single-step problems.
APA, Harvard, Vancouver, ISO, and other styles
36

Ma, Lianbo, Shi Cheng, Xingwei Wang, Min Huang, Hai Shen, Xiaoxian He, and Yuhui Shi. "Cooperative two-engine multi-objective bee foraging algorithm with reinforcement learning." Knowledge-Based Systems 133 (October 2017): 278–93. http://dx.doi.org/10.1016/j.knosys.2017.07.024.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Yliniemi, Logan, and Kagan Tumer. "Multi-objective multiagent credit assignment in reinforcement learning and NSGA-II." Soft Computing 20, no. 10 (March 28, 2016): 3869–87. http://dx.doi.org/10.1007/s00500-016-2124-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Lim, Cheolsun, and Myungsun Kim. "NAS based on Reinforcement Learning with Improved Multi-objective Reward Function." Journal of the Institute of Electronics and Information Engineers 59, no. 11 (November 30, 2022): 39–45. http://dx.doi.org/10.5573/ieie.2022.59.11.39.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Song, Fuhong, Huanlai Xing, Xinhan Wang, Shouxi Luo, Penglin Dai, and Ke Li. "Offloading dependent tasks in multi-access edge computing: A multi-objective reinforcement learning approach." Future Generation Computer Systems 128 (March 2022): 333–48. http://dx.doi.org/10.1016/j.future.2021.10.013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Jyothi, Rangappa, and Gorappa Ningappa Krishnamurthy. "Deep-Reinforcement Learning-Based Architecture for Multi-Objective Optimization of Stock Prediction." European Journal of Electrical Engineering and Computer Science 6, no. 4 (July 31, 2022): 9–16. http://dx.doi.org/10.24018/ejece.2022.6.4.436.

Full text
Abstract:
Artificial Intelligence has been established to predict the future performance of the trading in modern era including Statistics, Computer Science, and economics, especially the stack market. By analyzing the big data, making the financial decision is important for investors in the stock market. As records, several techniques like Back Propagation Neural Network (BPNN) Recurrent Neural Network (RNN) are used to predict the stock price but due to its computation complexity is major challenging of time serious financial data in the decision-making process. In this paper, we have proposed the Recurrent Q Network Learning (RQNL) to make the right decision according to movements of stock prices as reliable attention. To overcome the computational complexity shortcoming, the forgotten memory is developed in a neural network. In this way, the error arousal is pruned in a significant manner. In addition, Enhancing the prediction ability is outstanding, and reinforcement learning is demonstrated to reduce the correlations in time series data. Discussion and evaluation were based on the NSE dataset are experimented with three different learning approaches, with the proposed method for stock prediction of financial markets. The experiment result carried out that our proposed Recurrent Q Network Learning (RQNL) performs the perfect prediction compared with other existing learning.
APA, Harvard, Vancouver, ISO, and other styles
41

Hu, Can, Zhengwei Zhu, Lijia Wang, Chenyang Zhu, and Yanfei Yang. "An Improved Multi-Objective Deep Reinforcement Learning Algorithm Based on Envelope Update." Electronics 11, no. 16 (August 9, 2022): 2479. http://dx.doi.org/10.3390/electronics11162479.

Full text
Abstract:
Multi-objective reinforcement learning (MORL) aims to uniformly approximate the Pareto frontier in multi-objective decision-making problems, which suffers from insufficient exploration and unstable convergence. We propose a multi-objective deep reinforcement learning algorithm (envelope with dueling structure, Noisynet, and soft update (EDNs)) to improve the ability of the agent to learn optimal multi-objective strategies. Firstly, the EDNs algorithm uses neural networks to approximate the value function and update the parameters based on the convex envelope of the solution boundary. Then, the DQN structure is replaced with the dueling structure, and the state value function is split into the dominance function and value function to make it converge faster. Secondly, the Noisynet method is used to add exploration noise to the neural network parameters to make the agent have a more efficient exploration ability. Finally, the soft update method updates the target network parameters to stabilize the training procedure. We use the DST environment as a case study, and the experimental results show that the EDNs algorithm has better stability and exploration capability than the EMODRL algorithm. In 1000 episodes, the EDNs algorithm improved the coverage by 5.39% and reduced the adaptation error by 36.87%.
APA, Harvard, Vancouver, ISO, and other styles
42

Tao, Lue, Gongshu Wang, Yang Yang, Yun Dong, and Lijie Su. "Reinforcement Learning for Dynamic Mutation Process Control in Multi-Objective Differential Evolution." IFAC-PapersOnLine 55, no. 15 (2022): 117–22. http://dx.doi.org/10.1016/j.ifacol.2022.07.618.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Luo, Shu, Linxuan Zhang, and Yushun Fan. "Dynamic multi-objective scheduling for flexible job shop by deep reinforcement learning." Computers & Industrial Engineering 159 (September 2021): 107489. http://dx.doi.org/10.1016/j.cie.2021.107489.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

M. Altaf, Meteb, Ahmed Samir Roshdy, and Hatoon S. AlSagri. "Deep Reinforcement Learning Model for Blood Bank Vehicle Routing Multi-Objective Optimization." Computers, Materials & Continua 70, no. 2 (2022): 3955–67. http://dx.doi.org/10.32604/cmc.2022.019448.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Kozjek, Dominik, Andreja Malus, and Rok Vrabič. "Multi-objective adjustment of remaining useful life predictions based on reinforcement learning." Procedia CIRP 93 (2020): 425–30. http://dx.doi.org/10.1016/j.procir.2020.03.051.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Wang, Zheng, Tiansheng Zeng, Xuening Chu, and Deyi Xue. "Multi-objective deep reinforcement learning for optimal design of wind turbine blade." Renewable Energy 203 (February 2023): 854–69. http://dx.doi.org/10.1016/j.renene.2023.01.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Ding, Li, and Lee Spector. "Multi-Objective Evolutionary Architecture Search for Parameterized Quantum Circuits." Entropy 25, no. 1 (January 3, 2023): 93. http://dx.doi.org/10.3390/e25010093.

Full text
Abstract:
Recent work on hybrid quantum-classical machine learning systems has demonstrated success in utilizing parameterized quantum circuits (PQCs) to solve the challenging reinforcement learning (RL) tasks, with provable learning advantages over classical systems, e.g., deep neural networks. While existing work demonstrates and exploits the strength of PQC-based models, the design choices of PQC architectures and the interactions between different quantum circuits on learning tasks are generally underexplored. In this work, we introduce a Multi-objective Evolutionary Architecture Search framework for parameterized quantum circuits (MEAS-PQC), which uses a multi-objective genetic algorithm with quantum-specific configurations to perform efficient searching of optimal PQC architectures. Experimental results show that our method can find architectures that have superior learning performance on three benchmark RL tasks, and are also optimized for additional objectives including reductions in quantum noise and model size. Further analysis of patterns and probability distributions of quantum operations helps identify performance-critical design choices of hybrid quantum-classical learning systems.
APA, Harvard, Vancouver, ISO, and other styles
48

He, Yuanzhi, Biao Sheng, Hao Yin, Di Yan, and Yingchao Zhang. "Multi-objective deep reinforcement learning based time-frequency resource allocation for multi-beam satellite communications." China Communications 19, no. 1 (January 2022): 77–91. http://dx.doi.org/10.23919/jcc.2022.01.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Park, Bumjin, Cheongwoong Kang, and Jaesik Choi. "Cooperative Multi-Robot Task Allocation with Reinforcement Learning." Applied Sciences 12, no. 1 (December 28, 2021): 272. http://dx.doi.org/10.3390/app12010272.

Full text
Abstract:
This paper deals with the concept of multi-robot task allocation, referring to the assignment of multiple robots to tasks such that an objective function is maximized. The performance of existing meta-heuristic methods worsens as the number of robots or tasks increases. To tackle this problem, a novel Markov decision process formulation for multi-robot task allocation is presented for reinforcement learning. The proposed formulation sequentially allocates robots to tasks to minimize the total time taken to complete them. Additionally, we propose a deep reinforcement learning method to find the best allocation schedule for each problem. Our method adopts the cross-attention mechanism to compute the preference of robots to tasks. The experimental results show that the proposed method finds better solutions than meta-heuristic methods, especially when solving large-scale allocation problems.
APA, Harvard, Vancouver, ISO, and other styles
50

Mutti, Mirco, Mattia Mancassola, and Marcello Restelli. "Unsupervised Reinforcement Learning in Multiple Environments." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7850–58. http://dx.doi.org/10.1609/aaai.v36i7.20754.

Full text
Abstract:
Several recent works have been dedicated to unsupervised reinforcement learning in a single environment, in which a policy is first pre-trained with unsupervised interactions, and then fine-tuned towards the optimal policy for several downstream supervised tasks defined over the same environment. Along this line, we address the problem of unsupervised reinforcement learning in a class of multiple environments, in which the policy is pre-trained with interactions from the whole class, and then fine-tuned for several tasks in any environment of the class. Notably, the problem is inherently multi-objective as we can trade off the pre-training objective between environments in many ways. In this work, we foster an exploration strategy that is sensitive to the most adverse cases within the class. Hence, we cast the exploration problem as the maximization of the mean of a critical percentile of the state visitation entropy induced by the exploration strategy over the class of environments. Then, we present a policy gradient algorithm, alphaMEPOL, to optimize the introduced objective through mediated interactions with the class. Finally, we empirically demonstrate the ability of the algorithm in learning to explore challenging classes of continuous environments and we show that reinforcement learning greatly benefits from the pre-trained exploration strategy w.r.t. learning from scratch.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography