Статті в журналах з теми "Off-Policy learning"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Off-Policy learning".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.
Meng, Wenjia, Qian Zheng, Gang Pan, and Yilong Yin. "Off-Policy Proximal Policy Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 9162–70. http://dx.doi.org/10.1609/aaai.v37i8.26099.
Повний текст джерелаSchmitt, Simon, John Shawe-Taylor, and Hado van Hasselt. "Chaining Value Functions for Off-Policy Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8187–95. http://dx.doi.org/10.1609/aaai.v36i8.20792.
Повний текст джерелаXu, Da, Yuting Ye, Chuanwei Ruan, and Bo Yang. "Towards Robust Off-Policy Learning for Runtime Uncertainty." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (June 28, 2022): 10101–9. http://dx.doi.org/10.1609/aaai.v36i9.21249.
Повний текст джерелаPeters, James F., and Christopher Henry. "Approximation spaces in off-policy Monte Carlo learning." Engineering Applications of Artificial Intelligence 20, no. 5 (August 2007): 667–75. http://dx.doi.org/10.1016/j.engappai.2006.11.005.
Повний текст джерелаYu, Jiayu, Jingyao Li, Shuai Lü, and Shuai Han. "Mixed experience sampling for off-policy reinforcement learning." Expert Systems with Applications 251 (October 2024): 124017. http://dx.doi.org/10.1016/j.eswa.2024.124017.
Повний текст джерелаCetin, Edoardo, and Oya Celiktutan. "Learning Pessimism for Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 6971–79. http://dx.doi.org/10.1609/aaai.v37i6.25852.
Повний текст джерелаKong, Seung-Hyun, I. Made Aswin Nahrendra, and Dong-Hee Paek. "Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay." IEEE Access 9 (2021): 93152–64. http://dx.doi.org/10.1109/access.2021.3085142.
Повний текст джерелаLi, Lihong. "A perspective on off-policy evaluation in reinforcement learning." Frontiers of Computer Science 13, no. 5 (June 17, 2019): 911–12. http://dx.doi.org/10.1007/s11704-019-9901-7.
Повний текст джерелаLuo, Biao, Huai-Ning Wu, and Tingwen Huang. "Off-Policy Reinforcement Learning for $ H_\infty $ Control Design." IEEE Transactions on Cybernetics 45, no. 1 (January 2015): 65–76. http://dx.doi.org/10.1109/tcyb.2014.2319577.
Повний текст джерелаSun, Mingfei, Sam Devlin, Katja Hofmann, and Shimon Whiteson. "Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8378–85. http://dx.doi.org/10.1609/aaai.v36i8.20813.
Повний текст джерелаJain, Arushi, Gandharv Patil, Ayush Jain, Khimya Khetarpal, and Doina Precup. "Variance Penalized On-Policy and Off-Policy Actor-Critic." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 7899–907. http://dx.doi.org/10.1609/aaai.v35i9.16964.
Повний текст джерелаHao, Longyan, Chaoli Wang, and Yibo Shi. "Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method." Mathematics 12, no. 10 (May 14, 2024): 1533. http://dx.doi.org/10.3390/math12101533.
Повний текст джерелаGelada, Carles, and Marc G. Bellemare. "Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3647–55. http://dx.doi.org/10.1609/aaai.v33i01.33013647.
Повний текст джерелаXiao, Teng, and Suhang Wang. "Towards Off-Policy Learning for Ranking Policies with Logged Feedback." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8700–8707. http://dx.doi.org/10.1609/aaai.v36i8.20849.
Повний текст джерелаLi, Jinna, Hamidreza Modares, Tianyou Chai, Frank L. Lewis, and Lihua Xie. "Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games." IEEE Transactions on Neural Networks and Learning Systems 28, no. 10 (October 2017): 2434–45. http://dx.doi.org/10.1109/tnnls.2016.2609500.
Повний текст джерелаZhang, Hengrui, Youfang Lin, Shuo Shen, Sheng Han, and Kai Lv. "Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 19 (March 24, 2024): 21770–78. http://dx.doi.org/10.1609/aaai.v38i19.30177.
Повний текст джерелаZhang, Shangtong, Bo Liu, and Shimon Whiteson. "Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10905–13. http://dx.doi.org/10.1609/aaai.v35i12.17302.
Повний текст джерелаAli, Raja Farrukh, Kevin Duong, Nasik Muhammad Nafi, and William Hsu. "Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 13 (June 26, 2023): 16150–51. http://dx.doi.org/10.1609/aaai.v37i13.26935.
Повний текст джерелаTennenholtz, Guy, Uri Shalit, and Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.
Повний текст джерелаNakamura, Yutaka, Takeshi Mori, Yoichi Tokita, Tomohiro Shibata, and Shin Ishii. "Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller." Journal of Robotics and Mechatronics 17, no. 6 (December 20, 2005): 636–44. http://dx.doi.org/10.20965/jrm.2005.p0636.
Повний текст джерелаWang, Mingyang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang, and Alois Knoll. "Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 10157–65. http://dx.doi.org/10.1609/aaai.v37i8.26210.
Повний текст джерелаCao, Jiaqing, Quan Liu, Fei Zhu, Qiming Fu, and Shan Zhong. "Gradient temporal-difference learning for off-policy evaluation using emphatic weightings." Information Sciences 580 (November 2021): 311–30. http://dx.doi.org/10.1016/j.ins.2021.08.082.
Повний текст джерелаTian, Chang, An Liu, Guan Huang, and Wu Luo. "Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning." IEEE Transactions on Signal Processing 70 (2022): 1609–24. http://dx.doi.org/10.1109/tsp.2022.3158737.
Повний текст джерелаKarimpanal, Thommen George, and Erik Wilhelm. "Identification and off-policy learning of multiple objectives using adaptive clustering." Neurocomputing 263 (November 2017): 39–47. http://dx.doi.org/10.1016/j.neucom.2017.04.074.
Повний текст джерелаKiumarsi, Bahare, Frank L. Lewis, and Zhong-Ping Jiang. "H∞ control of linear discrete-time systems: Off-policy reinforcement learning." Automatica 78 (April 2017): 144–52. http://dx.doi.org/10.1016/j.automatica.2016.12.009.
Повний текст джерелаLi, Jinna, Zhenfei Xiao, and Ping Li. "Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning." IEEE Access 7 (2019): 134647–59. http://dx.doi.org/10.1109/access.2019.2939384.
Повний текст джерелаKiumarsi, Bahare, Wei Kang, and Frank L. Lewis. "H∞ Control of Nonaffine Aerial Systems Using Off-policy Reinforcement Learning." Unmanned Systems 04, no. 01 (January 2016): 51–60. http://dx.doi.org/10.1142/s2301385016400069.
Повний текст джерелаLian, Bosen, Wenqian Xue, Yijing Xie, Frank L. Lewis, and Ali Davoudi. "Off-policy inverse Q-learning for discrete-time antagonistic unknown systems." Automatica 155 (September 2023): 111171. http://dx.doi.org/10.1016/j.automatica.2023.111171.
Повний текст джерелаKim, Man-Je, Hyunsoo Park, and Chang Wook Ahn. "Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning." Electronics 11, no. 7 (March 28, 2022): 1069. http://dx.doi.org/10.3390/electronics11071069.
Повний текст джерелаChaudhari, Shreyas, David Arbour, Georgios Theocharous, and Nikos Vlassis. "Distributional Off-Policy Evaluation for Slate Recommendations." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 8 (March 24, 2024): 8265–73. http://dx.doi.org/10.1609/aaai.v38i8.28667.
Повний текст джерелаZhang, Ruiyi, Tong Yu, Yilin Shen, and Hongxia Jin. "Text-Based Interactive Recommendation via Offline Reinforcement Learning." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 10 (June 28, 2022): 11694–702. http://dx.doi.org/10.1609/aaai.v36i10.21424.
Повний текст джерелаXu, Z., L. Cao, and X. Chen. "Deep Reinforcement Learning with Adaptive Update Target Combination." Computer Journal 63, no. 7 (August 15, 2019): 995–1003. http://dx.doi.org/10.1093/comjnl/bxz066.
Повний текст джерелаShahid, Asad Ali, Dario Piga, Francesco Braghin, and Loris Roveda. "Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning." Autonomous Robots 46, no. 3 (February 9, 2022): 483–98. http://dx.doi.org/10.1007/s10514-022-10034-z.
Повний текст джерелаHollenstein, Jakob, Georg Martius, and Justus Piater. "Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 11 (March 24, 2024): 12466–72. http://dx.doi.org/10.1609/aaai.v38i11.29139.
Повний текст джерелаRen, He, Jing Dai, Huaguang Zhang, and Kun Zhang. "Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems." Transactions of the Institute of Measurement and Control 42, no. 15 (July 6, 2020): 2919–28. http://dx.doi.org/10.1177/0142331220932634.
Повний текст джерелаLevine, Alexander, and Soheil Feizi. "Goal-Conditioned Q-learning as Knowledge Distillation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 7 (June 26, 2023): 8500–8509. http://dx.doi.org/10.1609/aaai.v37i7.26024.
Повний текст джерелаYang, Hyunjun, Hyeonjun Park, and Kyungjae Lee. "A Selective Portfolio Management Algorithm with Off-Policy Reinforcement Learning Using Dirichlet Distribution." Axioms 11, no. 12 (November 23, 2022): 664. http://dx.doi.org/10.3390/axioms11120664.
Повний текст джерелаSuttle, Wesley, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Başar, and Ji Liu. "A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning." IFAC-PapersOnLine 53, no. 2 (2020): 1549–54. http://dx.doi.org/10.1016/j.ifacol.2020.12.2021.
Повний текст джерелаStanković, Miloš S., Marko Beko, and Srdjan S. Stanković. "Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence." IFAC-PapersOnLine 53, no. 2 (2020): 1563–68. http://dx.doi.org/10.1016/j.ifacol.2020.12.2184.
Повний текст джерелаLi, Jinna, Zhenfei Xiao, Tianyou Chai, Frank L. Lewis, and Sarangapani Jagannathan. "Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems." IFAC-PapersOnLine 53, no. 2 (2020): 9189–94. http://dx.doi.org/10.1016/j.ifacol.2020.12.2180.
Повний текст джерелаKim та Park. "Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning". Symmetry 11, № 11 (1 листопада 2019): 1352. http://dx.doi.org/10.3390/sym11111352.
Повний текст джерелаChen, Ning, Shuhan Luo, Jiayang Dai, Biao Luo, and Weihua Gui. "Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning." IEEE Access 8 (2020): 149730–40. http://dx.doi.org/10.1109/access.2020.3015801.
Повний текст джерелаHachiya, Hirotaka, Takayuki Akiyama, Masashi Sugiayma, and Jan Peters. "Adaptive importance sampling for value function approximation in off-policy reinforcement learning." Neural Networks 22, no. 10 (December 2009): 1399–410. http://dx.doi.org/10.1016/j.neunet.2009.01.002.
Повний текст джерелаZuo, Guoyu, Qishen Zhao, Kexin Chen, Jiangeng Li, and Daoxiong Gong. "Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations." Applied Soft Computing 97 (December 2020): 106795. http://dx.doi.org/10.1016/j.asoc.2020.106795.
Повний текст джерелаGivchi, Arash, and Maziar Palhang. "Off-policy temporal difference learning with distribution adaptation in fast mixing chains." Soft Computing 22, no. 3 (January 30, 2017): 737–50. http://dx.doi.org/10.1007/s00500-017-2490-1.
Повний текст джерелаLiu, Mushuang, Yan Wan, Frank L. Lewis, and Victor G. Lopez. "Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning." IEEE Transactions on Neural Networks and Learning Systems 31, no. 12 (December 2020): 5522–33. http://dx.doi.org/10.1109/tnnls.2020.2969215.
Повний текст джерелаPritchett, Lant, and Justin Sandefur. "Learning from Experiments when Context Matters." American Economic Review 105, no. 5 (May 1, 2015): 471–75. http://dx.doi.org/10.1257/aer.p20151016.
Повний текст джерелаChen, Zaiwei. "A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms." ACM SIGMETRICS Performance Evaluation Review 50, no. 3 (December 30, 2022): 12–15. http://dx.doi.org/10.1145/3579342.3579346.
Повний текст джерелаNarita, Yusuke, Kyohei Okumura, Akihiro Shimizu, and Kohei Yata. "Counterfactual Learning with General Data-Generating Policies." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 9286–93. http://dx.doi.org/10.1609/aaai.v37i8.26113.
Повний текст джерелаKim, MyeongSeop, Jung-Su Kim, Myoung-Su Choi, and Jae-Han Park. "Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty." Sensors 22, no. 19 (September 25, 2022): 7266. http://dx.doi.org/10.3390/s22197266.
Повний текст джерела