Journal articles on the topic 'Off-Policy learning'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 journal articles for your research on the topic 'Off-Policy learning.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.
Meng, Wenjia, Qian Zheng, Gang Pan, and Yilong Yin. "Off-Policy Proximal Policy Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 9162–70. http://dx.doi.org/10.1609/aaai.v37i8.26099.
Schmitt, Simon, John Shawe-Taylor, and Hado van Hasselt. "Chaining Value Functions for Off-Policy Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8187–95. http://dx.doi.org/10.1609/aaai.v36i8.20792.
Xu, Da, Yuting Ye, Chuanwei Ruan, and Bo Yang. "Towards Robust Off-Policy Learning for Runtime Uncertainty." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (June 28, 2022): 10101–9. http://dx.doi.org/10.1609/aaai.v36i9.21249.
Peters, James F., and Christopher Henry. "Approximation spaces in off-policy Monte Carlo learning." Engineering Applications of Artificial Intelligence 20, no. 5 (August 2007): 667–75. http://dx.doi.org/10.1016/j.engappai.2006.11.005.
Yu, Jiayu, Jingyao Li, Shuai Lü, and Shuai Han. "Mixed experience sampling for off-policy reinforcement learning." Expert Systems with Applications 251 (October 2024): 124017. http://dx.doi.org/10.1016/j.eswa.2024.124017.
Cetin, Edoardo, and Oya Celiktutan. "Learning Pessimism for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 6971–79. http://dx.doi.org/10.1609/aaai.v37i6.25852.
Kong, Seung-Hyun, I. Made Aswin Nahrendra, and Dong-Hee Paek. "Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay." IEEE Access 9 (2021): 93152–64. http://dx.doi.org/10.1109/access.2021.3085142.
Li, Lihong. "A perspective on off-policy evaluation in reinforcement learning." Frontiers of Computer Science 13, no. 5 (June 17, 2019): 911–12. http://dx.doi.org/10.1007/s11704-019-9901-7.
Luo, Biao, Huai-Ning Wu, and Tingwen Huang. "Off-Policy Reinforcement Learning for $ H_\infty $ Control Design." IEEE Transactions on Cybernetics 45, no. 1 (January 2015): 65–76. http://dx.doi.org/10.1109/tcyb.2014.2319577.
Sun, Mingfei, Sam Devlin, Katja Hofmann, and Shimon Whiteson. "Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8378–85. http://dx.doi.org/10.1609/aaai.v36i8.20813.
Jain, Arushi, Gandharv Patil, Ayush Jain, Khimya Khetarpal, and Doina Precup. "Variance Penalized On-Policy and Off-Policy Actor-Critic." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 7899–907. http://dx.doi.org/10.1609/aaai.v35i9.16964.
Hao, Longyan, Chaoli Wang, and Yibo Shi. "Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method." Mathematics 12, no. 10 (May 14, 2024): 1533. http://dx.doi.org/10.3390/math12101533.
Gelada, Carles, and Marc G. Bellemare. "Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3647–55. http://dx.doi.org/10.1609/aaai.v33i01.33013647.
Xiao, Teng, and Suhang Wang. "Towards Off-Policy Learning for Ranking Policies with Logged Feedback." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8700–8707. http://dx.doi.org/10.1609/aaai.v36i8.20849.
Li, Jinna, Hamidreza Modares, Tianyou Chai, Frank L. Lewis, and Lihua Xie. "Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games." IEEE Transactions on Neural Networks and Learning Systems 28, no. 10 (October 2017): 2434–45. http://dx.doi.org/10.1109/tnnls.2016.2609500.
Zhang, Hengrui, Youfang Lin, Shuo Shen, Sheng Han, and Kai Lv. "Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21770–78. http://dx.doi.org/10.1609/aaai.v38i19.30177.
Zhang, Shangtong, Bo Liu, and Shimon Whiteson. "Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10905–13. http://dx.doi.org/10.1609/aaai.v35i12.17302.
Ali, Raja Farrukh, Kevin Duong, Nasik Muhammad Nafi, and William Hsu. "Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 16150–51. http://dx.doi.org/10.1609/aaai.v37i13.26935.
Tennenholtz, Guy, Uri Shalit, and Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.
Nakamura, Yutaka, Takeshi Mori, Yoichi Tokita, Tomohiro Shibata, and Shin Ishii. "Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller." Journal of Robotics and Mechatronics 17, no. 6 (December 20, 2005): 636–44. http://dx.doi.org/10.20965/jrm.2005.p0636.
Wang, Mingyang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang, and Alois Knoll. "Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 10157–65. http://dx.doi.org/10.1609/aaai.v37i8.26210.
Cao, Jiaqing, Quan Liu, Fei Zhu, Qiming Fu, and Shan Zhong. "Gradient temporal-difference learning for off-policy evaluation using emphatic weightings." Information Sciences 580 (November 2021): 311–30. http://dx.doi.org/10.1016/j.ins.2021.08.082.
Tian, Chang, An Liu, Guan Huang, and Wu Luo. "Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning." IEEE Transactions on Signal Processing 70 (2022): 1609–24. http://dx.doi.org/10.1109/tsp.2022.3158737.
Karimpanal, Thommen George, and Erik Wilhelm. "Identification and off-policy learning of multiple objectives using adaptive clustering." Neurocomputing 263 (November 2017): 39–47. http://dx.doi.org/10.1016/j.neucom.2017.04.074.
Kiumarsi, Bahare, Frank L. Lewis, and Zhong-Ping Jiang. "H∞ control of linear discrete-time systems: Off-policy reinforcement learning." Automatica 78 (April 2017): 144–52. http://dx.doi.org/10.1016/j.automatica.2016.12.009.
Li, Jinna, Zhenfei Xiao, and Ping Li. "Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning." IEEE Access 7 (2019): 134647–59. http://dx.doi.org/10.1109/access.2019.2939384.
Kiumarsi, Bahare, Wei Kang, and Frank L. Lewis. "H∞ Control of Nonaffine Aerial Systems Using Off-policy Reinforcement Learning." Unmanned Systems 04, no. 01 (January 2016): 51–60. http://dx.doi.org/10.1142/s2301385016400069.
Lian, Bosen, Wenqian Xue, Yijing Xie, Frank L. Lewis, and Ali Davoudi. "Off-policy inverse Q-learning for discrete-time antagonistic unknown systems." Automatica 155 (September 2023): 111171. http://dx.doi.org/10.1016/j.automatica.2023.111171.
Kim, Man-Je, Hyunsoo Park, and Chang Wook Ahn. "Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning." Electronics 11, no. 7 (March 28, 2022): 1069. http://dx.doi.org/10.3390/electronics11071069.
Chaudhari, Shreyas, David Arbour, Georgios Theocharous, and Nikos Vlassis. "Distributional Off-Policy Evaluation for Slate Recommendations." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8265–73. http://dx.doi.org/10.1609/aaai.v38i8.28667.
Zhang, Ruiyi, Tong Yu, Yilin Shen, and Hongxia Jin. "Text-Based Interactive Recommendation via Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 11694–702. http://dx.doi.org/10.1609/aaai.v36i10.21424.
Xu, Z., L. Cao, and X. Chen. "Deep Reinforcement Learning with Adaptive Update Target Combination." Computer Journal 63, no. 7 (August 15, 2019): 995–1003. http://dx.doi.org/10.1093/comjnl/bxz066.
Shahid, Asad Ali, Dario Piga, Francesco Braghin, and Loris Roveda. "Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning." Autonomous Robots 46, no. 3 (February 9, 2022): 483–98. http://dx.doi.org/10.1007/s10514-022-10034-z.
Hollenstein, Jakob, Georg Martius, and Justus Piater. "Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 11 (March 24, 2024): 12466–72. http://dx.doi.org/10.1609/aaai.v38i11.29139.
Ren, He, Jing Dai, Huaguang Zhang, and Kun Zhang. "Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems." Transactions of the Institute of Measurement and Control 42, no. 15 (July 6, 2020): 2919–28. http://dx.doi.org/10.1177/0142331220932634.
Levine, Alexander, and Soheil Feizi. "Goal-Conditioned Q-learning as Knowledge Distillation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8500–8509. http://dx.doi.org/10.1609/aaai.v37i7.26024.
Yang, Hyunjun, Hyeonjun Park, and Kyungjae Lee. "A Selective Portfolio Management Algorithm with Off-Policy Reinforcement Learning Using Dirichlet Distribution." Axioms 11, no. 12 (November 23, 2022): 664. http://dx.doi.org/10.3390/axioms11120664.
Suttle, Wesley, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Başar, and Ji Liu. "A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning." IFAC-PapersOnLine 53, no. 2 (2020): 1549–54. http://dx.doi.org/10.1016/j.ifacol.2020.12.2021.
Stanković, Miloš S., Marko Beko, and Srdjan S. Stanković. "Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence." IFAC-PapersOnLine 53, no. 2 (2020): 1563–68. http://dx.doi.org/10.1016/j.ifacol.2020.12.2184.
Li, Jinna, Zhenfei Xiao, Tianyou Chai, Frank L. Lewis, and Sarangapani Jagannathan. "Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems." IFAC-PapersOnLine 53, no. 2 (2020): 9189–94. http://dx.doi.org/10.1016/j.ifacol.2020.12.2180.
Kim and Park. "Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning." Symmetry 11, no. 11 (November 1, 2019): 1352. http://dx.doi.org/10.3390/sym11111352.
Chen, Ning, Shuhan Luo, Jiayang Dai, Biao Luo, and Weihua Gui. "Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning." IEEE Access 8 (2020): 149730–40. http://dx.doi.org/10.1109/access.2020.3015801.
Hachiya, Hirotaka, Takayuki Akiyama, Masashi Sugiayma, and Jan Peters. "Adaptive importance sampling for value function approximation in off-policy reinforcement learning." Neural Networks 22, no. 10 (December 2009): 1399–410. http://dx.doi.org/10.1016/j.neunet.2009.01.002.
Zuo, Guoyu, Qishen Zhao, Kexin Chen, Jiangeng Li, and Daoxiong Gong. "Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations." Applied Soft Computing 97 (December 2020): 106795. http://dx.doi.org/10.1016/j.asoc.2020.106795.
Givchi, Arash, and Maziar Palhang. "Off-policy temporal difference learning with distribution adaptation in fast mixing chains." Soft Computing 22, no. 3 (January 30, 2017): 737–50. http://dx.doi.org/10.1007/s00500-017-2490-1.
Liu, Mushuang, Yan Wan, Frank L. Lewis, and Victor G. Lopez. "Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning." IEEE Transactions on Neural Networks and Learning Systems 31, no. 12 (December 2020): 5522–33. http://dx.doi.org/10.1109/tnnls.2020.2969215.
Pritchett, Lant, and Justin Sandefur. "Learning from Experiments when Context Matters." American Economic Review 105, no. 5 (May 1, 2015): 471–75. http://dx.doi.org/10.1257/aer.p20151016.
Chen, Zaiwei. "A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms." ACM SIGMETRICS Performance Evaluation Review 50, no. 3 (December 30, 2022): 12–15. http://dx.doi.org/10.1145/3579342.3579346.
Narita, Yusuke, Kyohei Okumura, Akihiro Shimizu, and Kohei Yata. "Counterfactual Learning with General Data-Generating Policies." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 9286–93. http://dx.doi.org/10.1609/aaai.v37i8.26113.
Kim, MyeongSeop, Jung-Su Kim, Myoung-Su Choi, and Jae-Han Park. "Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty." Sensors 22, no. 19 (September 25, 2022): 7266. http://dx.doi.org/10.3390/s22197266.