Artykuły w czasopismach na temat „Off-Policy learning”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Off-Policy learning”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.
Meng, Wenjia, Qian Zheng, Gang Pan i Yilong Yin. "Off-Policy Proximal Policy Optimization". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 8 (26.06.2023): 9162–70. http://dx.doi.org/10.1609/aaai.v37i8.26099.
Pełny tekst źródłaSchmitt, Simon, John Shawe-Taylor i Hado van Hasselt. "Chaining Value Functions for Off-Policy Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 8 (28.06.2022): 8187–95. http://dx.doi.org/10.1609/aaai.v36i8.20792.
Pełny tekst źródłaXu, Da, Yuting Ye, Chuanwei Ruan i Bo Yang. "Towards Robust Off-Policy Learning for Runtime Uncertainty". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 9 (28.06.2022): 10101–9. http://dx.doi.org/10.1609/aaai.v36i9.21249.
Pełny tekst źródłaPeters, James F., i Christopher Henry. "Approximation spaces in off-policy Monte Carlo learning". Engineering Applications of Artificial Intelligence 20, nr 5 (sierpień 2007): 667–75. http://dx.doi.org/10.1016/j.engappai.2006.11.005.
Pełny tekst źródłaYu, Jiayu, Jingyao Li, Shuai Lü i Shuai Han. "Mixed experience sampling for off-policy reinforcement learning". Expert Systems with Applications 251 (październik 2024): 124017. http://dx.doi.org/10.1016/j.eswa.2024.124017.
Pełny tekst źródłaCetin, Edoardo, i Oya Celiktutan. "Learning Pessimism for Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 6 (26.06.2023): 6971–79. http://dx.doi.org/10.1609/aaai.v37i6.25852.
Pełny tekst źródłaKong, Seung-Hyun, I. Made Aswin Nahrendra i Dong-Hee Paek. "Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay". IEEE Access 9 (2021): 93152–64. http://dx.doi.org/10.1109/access.2021.3085142.
Pełny tekst źródłaLi, Lihong. "A perspective on off-policy evaluation in reinforcement learning". Frontiers of Computer Science 13, nr 5 (17.06.2019): 911–12. http://dx.doi.org/10.1007/s11704-019-9901-7.
Pełny tekst źródłaLuo, Biao, Huai-Ning Wu i Tingwen Huang. "Off-Policy Reinforcement Learning for $ H_\infty $ Control Design". IEEE Transactions on Cybernetics 45, nr 1 (styczeń 2015): 65–76. http://dx.doi.org/10.1109/tcyb.2014.2319577.
Pełny tekst źródłaSun, Mingfei, Sam Devlin, Katja Hofmann i Shimon Whiteson. "Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 8 (28.06.2022): 8378–85. http://dx.doi.org/10.1609/aaai.v36i8.20813.
Pełny tekst źródłaJain, Arushi, Gandharv Patil, Ayush Jain, Khimya Khetarpal i Doina Precup. "Variance Penalized On-Policy and Off-Policy Actor-Critic". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 9 (18.05.2021): 7899–907. http://dx.doi.org/10.1609/aaai.v35i9.16964.
Pełny tekst źródłaHao, Longyan, Chaoli Wang i Yibo Shi. "Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method". Mathematics 12, nr 10 (14.05.2024): 1533. http://dx.doi.org/10.3390/math12101533.
Pełny tekst źródłaGelada, Carles, i Marc G. Bellemare. "Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 3647–55. http://dx.doi.org/10.1609/aaai.v33i01.33013647.
Pełny tekst źródłaXiao, Teng, i Suhang Wang. "Towards Off-Policy Learning for Ranking Policies with Logged Feedback". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 8 (28.06.2022): 8700–8707. http://dx.doi.org/10.1609/aaai.v36i8.20849.
Pełny tekst źródłaLi, Jinna, Hamidreza Modares, Tianyou Chai, Frank L. Lewis i Lihua Xie. "Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games". IEEE Transactions on Neural Networks and Learning Systems 28, nr 10 (październik 2017): 2434–45. http://dx.doi.org/10.1109/tnnls.2016.2609500.
Pełny tekst źródłaZhang, Hengrui, Youfang Lin, Shuo Shen, Sheng Han i Kai Lv. "Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 19 (24.03.2024): 21770–78. http://dx.doi.org/10.1609/aaai.v38i19.30177.
Pełny tekst źródłaZhang, Shangtong, Bo Liu i Shimon Whiteson. "Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 12 (18.05.2021): 10905–13. http://dx.doi.org/10.1609/aaai.v35i12.17302.
Pełny tekst źródłaAli, Raja Farrukh, Kevin Duong, Nasik Muhammad Nafi i William Hsu. "Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 13 (26.06.2023): 16150–51. http://dx.doi.org/10.1609/aaai.v37i13.26935.
Pełny tekst źródłaTennenholtz, Guy, Uri Shalit i Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments". Proceedings of the AAAI Conference on Artificial Intelligence 34, nr 06 (3.04.2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.
Pełny tekst źródłaNakamura, Yutaka, Takeshi Mori, Yoichi Tokita, Tomohiro Shibata i Shin Ishii. "Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller". Journal of Robotics and Mechatronics 17, nr 6 (20.12.2005): 636–44. http://dx.doi.org/10.20965/jrm.2005.p0636.
Pełny tekst źródłaWang, Mingyang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang i Alois Knoll. "Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 8 (26.06.2023): 10157–65. http://dx.doi.org/10.1609/aaai.v37i8.26210.
Pełny tekst źródłaCao, Jiaqing, Quan Liu, Fei Zhu, Qiming Fu i Shan Zhong. "Gradient temporal-difference learning for off-policy evaluation using emphatic weightings". Information Sciences 580 (listopad 2021): 311–30. http://dx.doi.org/10.1016/j.ins.2021.08.082.
Pełny tekst źródłaTian, Chang, An Liu, Guan Huang i Wu Luo. "Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning". IEEE Transactions on Signal Processing 70 (2022): 1609–24. http://dx.doi.org/10.1109/tsp.2022.3158737.
Pełny tekst źródłaKarimpanal, Thommen George, i Erik Wilhelm. "Identification and off-policy learning of multiple objectives using adaptive clustering". Neurocomputing 263 (listopad 2017): 39–47. http://dx.doi.org/10.1016/j.neucom.2017.04.074.
Pełny tekst źródłaKiumarsi, Bahare, Frank L. Lewis i Zhong-Ping Jiang. "H∞ control of linear discrete-time systems: Off-policy reinforcement learning". Automatica 78 (kwiecień 2017): 144–52. http://dx.doi.org/10.1016/j.automatica.2016.12.009.
Pełny tekst źródłaLi, Jinna, Zhenfei Xiao i Ping Li. "Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning". IEEE Access 7 (2019): 134647–59. http://dx.doi.org/10.1109/access.2019.2939384.
Pełny tekst źródłaKiumarsi, Bahare, Wei Kang i Frank L. Lewis. "H∞ Control of Nonaffine Aerial Systems Using Off-policy Reinforcement Learning". Unmanned Systems 04, nr 01 (styczeń 2016): 51–60. http://dx.doi.org/10.1142/s2301385016400069.
Pełny tekst źródłaLian, Bosen, Wenqian Xue, Yijing Xie, Frank L. Lewis i Ali Davoudi. "Off-policy inverse Q-learning for discrete-time antagonistic unknown systems". Automatica 155 (wrzesień 2023): 111171. http://dx.doi.org/10.1016/j.automatica.2023.111171.
Pełny tekst źródłaKim, Man-Je, Hyunsoo Park i Chang Wook Ahn. "Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning". Electronics 11, nr 7 (28.03.2022): 1069. http://dx.doi.org/10.3390/electronics11071069.
Pełny tekst źródłaChaudhari, Shreyas, David Arbour, Georgios Theocharous i Nikos Vlassis. "Distributional Off-Policy Evaluation for Slate Recommendations". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 8 (24.03.2024): 8265–73. http://dx.doi.org/10.1609/aaai.v38i8.28667.
Pełny tekst źródłaZhang, Ruiyi, Tong Yu, Yilin Shen i Hongxia Jin. "Text-Based Interactive Recommendation via Offline Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 10 (28.06.2022): 11694–702. http://dx.doi.org/10.1609/aaai.v36i10.21424.
Pełny tekst źródłaXu, Z., L. Cao i X. Chen. "Deep Reinforcement Learning with Adaptive Update Target Combination". Computer Journal 63, nr 7 (15.08.2019): 995–1003. http://dx.doi.org/10.1093/comjnl/bxz066.
Pełny tekst źródłaShahid, Asad Ali, Dario Piga, Francesco Braghin i Loris Roveda. "Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning". Autonomous Robots 46, nr 3 (9.02.2022): 483–98. http://dx.doi.org/10.1007/s10514-022-10034-z.
Pełny tekst źródłaHollenstein, Jakob, Georg Martius i Justus Piater. "Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 11 (24.03.2024): 12466–72. http://dx.doi.org/10.1609/aaai.v38i11.29139.
Pełny tekst źródłaRen, He, Jing Dai, Huaguang Zhang i Kun Zhang. "Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems". Transactions of the Institute of Measurement and Control 42, nr 15 (6.07.2020): 2919–28. http://dx.doi.org/10.1177/0142331220932634.
Pełny tekst źródłaLevine, Alexander, i Soheil Feizi. "Goal-Conditioned Q-learning as Knowledge Distillation". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 7 (26.06.2023): 8500–8509. http://dx.doi.org/10.1609/aaai.v37i7.26024.
Pełny tekst źródłaYang, Hyunjun, Hyeonjun Park i Kyungjae Lee. "A Selective Portfolio Management Algorithm with Off-Policy Reinforcement Learning Using Dirichlet Distribution". Axioms 11, nr 12 (23.11.2022): 664. http://dx.doi.org/10.3390/axioms11120664.
Pełny tekst źródłaSuttle, Wesley, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Başar i Ji Liu. "A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning". IFAC-PapersOnLine 53, nr 2 (2020): 1549–54. http://dx.doi.org/10.1016/j.ifacol.2020.12.2021.
Pełny tekst źródłaStanković, Miloš S., Marko Beko i Srdjan S. Stanković. "Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence". IFAC-PapersOnLine 53, nr 2 (2020): 1563–68. http://dx.doi.org/10.1016/j.ifacol.2020.12.2184.
Pełny tekst źródłaLi, Jinna, Zhenfei Xiao, Tianyou Chai, Frank L. Lewis i Sarangapani Jagannathan. "Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems". IFAC-PapersOnLine 53, nr 2 (2020): 9189–94. http://dx.doi.org/10.1016/j.ifacol.2020.12.2180.
Pełny tekst źródłaKim i Park. "Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning". Symmetry 11, nr 11 (1.11.2019): 1352. http://dx.doi.org/10.3390/sym11111352.
Pełny tekst źródłaChen, Ning, Shuhan Luo, Jiayang Dai, Biao Luo i Weihua Gui. "Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning". IEEE Access 8 (2020): 149730–40. http://dx.doi.org/10.1109/access.2020.3015801.
Pełny tekst źródłaHachiya, Hirotaka, Takayuki Akiyama, Masashi Sugiayma i Jan Peters. "Adaptive importance sampling for value function approximation in off-policy reinforcement learning". Neural Networks 22, nr 10 (grudzień 2009): 1399–410. http://dx.doi.org/10.1016/j.neunet.2009.01.002.
Pełny tekst źródłaZuo, Guoyu, Qishen Zhao, Kexin Chen, Jiangeng Li i Daoxiong Gong. "Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations". Applied Soft Computing 97 (grudzień 2020): 106795. http://dx.doi.org/10.1016/j.asoc.2020.106795.
Pełny tekst źródłaGivchi, Arash, i Maziar Palhang. "Off-policy temporal difference learning with distribution adaptation in fast mixing chains". Soft Computing 22, nr 3 (30.01.2017): 737–50. http://dx.doi.org/10.1007/s00500-017-2490-1.
Pełny tekst źródłaLiu, Mushuang, Yan Wan, Frank L. Lewis i Victor G. Lopez. "Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning". IEEE Transactions on Neural Networks and Learning Systems 31, nr 12 (grudzień 2020): 5522–33. http://dx.doi.org/10.1109/tnnls.2020.2969215.
Pełny tekst źródłaPritchett, Lant, i Justin Sandefur. "Learning from Experiments when Context Matters". American Economic Review 105, nr 5 (1.05.2015): 471–75. http://dx.doi.org/10.1257/aer.p20151016.
Pełny tekst źródłaChen, Zaiwei. "A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms". ACM SIGMETRICS Performance Evaluation Review 50, nr 3 (30.12.2022): 12–15. http://dx.doi.org/10.1145/3579342.3579346.
Pełny tekst źródłaNarita, Yusuke, Kyohei Okumura, Akihiro Shimizu i Kohei Yata. "Counterfactual Learning with General Data-Generating Policies". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 8 (26.06.2023): 9286–93. http://dx.doi.org/10.1609/aaai.v37i8.26113.
Pełny tekst źródłaKim, MyeongSeop, Jung-Su Kim, Myoung-Su Choi i Jae-Han Park. "Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty". Sensors 22, nr 19 (25.09.2022): 7266. http://dx.doi.org/10.3390/s22197266.
Pełny tekst źródła