Zeitschriftenartikel zum Thema „Off-Policy learning“
Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an
Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Off-Policy learning" bekannt.
Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.
Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.
Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.
Meng, Wenjia, Qian Zheng, Gang Pan und Yilong Yin. „Off-Policy Proximal Policy Optimization“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 8 (26.06.2023): 9162–70. http://dx.doi.org/10.1609/aaai.v37i8.26099.
Der volle Inhalt der QuelleSchmitt, Simon, John Shawe-Taylor und Hado van Hasselt. „Chaining Value Functions for Off-Policy Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 8 (28.06.2022): 8187–95. http://dx.doi.org/10.1609/aaai.v36i8.20792.
Der volle Inhalt der QuelleXu, Da, Yuting Ye, Chuanwei Ruan und Bo Yang. „Towards Robust Off-Policy Learning for Runtime Uncertainty“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 9 (28.06.2022): 10101–9. http://dx.doi.org/10.1609/aaai.v36i9.21249.
Der volle Inhalt der QuellePeters, James F., und Christopher Henry. „Approximation spaces in off-policy Monte Carlo learning“. Engineering Applications of Artificial Intelligence 20, Nr. 5 (August 2007): 667–75. http://dx.doi.org/10.1016/j.engappai.2006.11.005.
Der volle Inhalt der QuelleYu, Jiayu, Jingyao Li, Shuai Lü und Shuai Han. „Mixed experience sampling for off-policy reinforcement learning“. Expert Systems with Applications 251 (Oktober 2024): 124017. http://dx.doi.org/10.1016/j.eswa.2024.124017.
Der volle Inhalt der QuelleCetin, Edoardo, und Oya Celiktutan. „Learning Pessimism for Reinforcement Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 6 (26.06.2023): 6971–79. http://dx.doi.org/10.1609/aaai.v37i6.25852.
Der volle Inhalt der QuelleKong, Seung-Hyun, I. Made Aswin Nahrendra und Dong-Hee Paek. „Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay“. IEEE Access 9 (2021): 93152–64. http://dx.doi.org/10.1109/access.2021.3085142.
Der volle Inhalt der QuelleLi, Lihong. „A perspective on off-policy evaluation in reinforcement learning“. Frontiers of Computer Science 13, Nr. 5 (17.06.2019): 911–12. http://dx.doi.org/10.1007/s11704-019-9901-7.
Der volle Inhalt der QuelleLuo, Biao, Huai-Ning Wu und Tingwen Huang. „Off-Policy Reinforcement Learning for $ H_\infty $ Control Design“. IEEE Transactions on Cybernetics 45, Nr. 1 (Januar 2015): 65–76. http://dx.doi.org/10.1109/tcyb.2014.2319577.
Der volle Inhalt der QuelleSun, Mingfei, Sam Devlin, Katja Hofmann und Shimon Whiteson. „Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 8 (28.06.2022): 8378–85. http://dx.doi.org/10.1609/aaai.v36i8.20813.
Der volle Inhalt der QuelleJain, Arushi, Gandharv Patil, Ayush Jain, Khimya Khetarpal und Doina Precup. „Variance Penalized On-Policy and Off-Policy Actor-Critic“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 9 (18.05.2021): 7899–907. http://dx.doi.org/10.1609/aaai.v35i9.16964.
Der volle Inhalt der QuelleHao, Longyan, Chaoli Wang und Yibo Shi. „Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method“. Mathematics 12, Nr. 10 (14.05.2024): 1533. http://dx.doi.org/10.3390/math12101533.
Der volle Inhalt der QuelleGelada, Carles, und Marc G. Bellemare. „Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift“. Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 3647–55. http://dx.doi.org/10.1609/aaai.v33i01.33013647.
Der volle Inhalt der QuelleXiao, Teng, und Suhang Wang. „Towards Off-Policy Learning for Ranking Policies with Logged Feedback“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 8 (28.06.2022): 8700–8707. http://dx.doi.org/10.1609/aaai.v36i8.20849.
Der volle Inhalt der QuelleLi, Jinna, Hamidreza Modares, Tianyou Chai, Frank L. Lewis und Lihua Xie. „Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games“. IEEE Transactions on Neural Networks and Learning Systems 28, Nr. 10 (Oktober 2017): 2434–45. http://dx.doi.org/10.1109/tnnls.2016.2609500.
Der volle Inhalt der QuelleZhang, Hengrui, Youfang Lin, Shuo Shen, Sheng Han und Kai Lv. „Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 19 (24.03.2024): 21770–78. http://dx.doi.org/10.1609/aaai.v38i19.30177.
Der volle Inhalt der QuelleZhang, Shangtong, Bo Liu und Shimon Whiteson. „Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 35, Nr. 12 (18.05.2021): 10905–13. http://dx.doi.org/10.1609/aaai.v35i12.17302.
Der volle Inhalt der QuelleAli, Raja Farrukh, Kevin Duong, Nasik Muhammad Nafi und William Hsu. „Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 13 (26.06.2023): 16150–51. http://dx.doi.org/10.1609/aaai.v37i13.26935.
Der volle Inhalt der QuelleTennenholtz, Guy, Uri Shalit und Shie Mannor. „Off-Policy Evaluation in Partially Observable Environments“. Proceedings of the AAAI Conference on Artificial Intelligence 34, Nr. 06 (03.04.2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.
Der volle Inhalt der QuelleNakamura, Yutaka, Takeshi Mori, Yoichi Tokita, Tomohiro Shibata und Shin Ishii. „Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller“. Journal of Robotics and Mechatronics 17, Nr. 6 (20.12.2005): 636–44. http://dx.doi.org/10.20965/jrm.2005.p0636.
Der volle Inhalt der QuelleWang, Mingyang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang und Alois Knoll. „Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 8 (26.06.2023): 10157–65. http://dx.doi.org/10.1609/aaai.v37i8.26210.
Der volle Inhalt der QuelleCao, Jiaqing, Quan Liu, Fei Zhu, Qiming Fu und Shan Zhong. „Gradient temporal-difference learning for off-policy evaluation using emphatic weightings“. Information Sciences 580 (November 2021): 311–30. http://dx.doi.org/10.1016/j.ins.2021.08.082.
Der volle Inhalt der QuelleTian, Chang, An Liu, Guan Huang und Wu Luo. „Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning“. IEEE Transactions on Signal Processing 70 (2022): 1609–24. http://dx.doi.org/10.1109/tsp.2022.3158737.
Der volle Inhalt der QuelleKarimpanal, Thommen George, und Erik Wilhelm. „Identification and off-policy learning of multiple objectives using adaptive clustering“. Neurocomputing 263 (November 2017): 39–47. http://dx.doi.org/10.1016/j.neucom.2017.04.074.
Der volle Inhalt der QuelleKiumarsi, Bahare, Frank L. Lewis und Zhong-Ping Jiang. „H∞ control of linear discrete-time systems: Off-policy reinforcement learning“. Automatica 78 (April 2017): 144–52. http://dx.doi.org/10.1016/j.automatica.2016.12.009.
Der volle Inhalt der QuelleLi, Jinna, Zhenfei Xiao und Ping Li. „Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning“. IEEE Access 7 (2019): 134647–59. http://dx.doi.org/10.1109/access.2019.2939384.
Der volle Inhalt der QuelleKiumarsi, Bahare, Wei Kang und Frank L. Lewis. „H∞ Control of Nonaffine Aerial Systems Using Off-policy Reinforcement Learning“. Unmanned Systems 04, Nr. 01 (Januar 2016): 51–60. http://dx.doi.org/10.1142/s2301385016400069.
Der volle Inhalt der QuelleLian, Bosen, Wenqian Xue, Yijing Xie, Frank L. Lewis und Ali Davoudi. „Off-policy inverse Q-learning for discrete-time antagonistic unknown systems“. Automatica 155 (September 2023): 111171. http://dx.doi.org/10.1016/j.automatica.2023.111171.
Der volle Inhalt der QuelleKim, Man-Je, Hyunsoo Park und Chang Wook Ahn. „Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning“. Electronics 11, Nr. 7 (28.03.2022): 1069. http://dx.doi.org/10.3390/electronics11071069.
Der volle Inhalt der QuelleChaudhari, Shreyas, David Arbour, Georgios Theocharous und Nikos Vlassis. „Distributional Off-Policy Evaluation for Slate Recommendations“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 8 (24.03.2024): 8265–73. http://dx.doi.org/10.1609/aaai.v38i8.28667.
Der volle Inhalt der QuelleZhang, Ruiyi, Tong Yu, Yilin Shen und Hongxia Jin. „Text-Based Interactive Recommendation via Offline Reinforcement Learning“. Proceedings of the AAAI Conference on Artificial Intelligence 36, Nr. 10 (28.06.2022): 11694–702. http://dx.doi.org/10.1609/aaai.v36i10.21424.
Der volle Inhalt der QuelleXu, Z., L. Cao und X. Chen. „Deep Reinforcement Learning with Adaptive Update Target Combination“. Computer Journal 63, Nr. 7 (15.08.2019): 995–1003. http://dx.doi.org/10.1093/comjnl/bxz066.
Der volle Inhalt der QuelleShahid, Asad Ali, Dario Piga, Francesco Braghin und Loris Roveda. „Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning“. Autonomous Robots 46, Nr. 3 (09.02.2022): 483–98. http://dx.doi.org/10.1007/s10514-022-10034-z.
Der volle Inhalt der QuelleHollenstein, Jakob, Georg Martius und Justus Piater. „Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling“. Proceedings of the AAAI Conference on Artificial Intelligence 38, Nr. 11 (24.03.2024): 12466–72. http://dx.doi.org/10.1609/aaai.v38i11.29139.
Der volle Inhalt der QuelleRen, He, Jing Dai, Huaguang Zhang und Kun Zhang. „Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems“. Transactions of the Institute of Measurement and Control 42, Nr. 15 (06.07.2020): 2919–28. http://dx.doi.org/10.1177/0142331220932634.
Der volle Inhalt der QuelleLevine, Alexander, und Soheil Feizi. „Goal-Conditioned Q-learning as Knowledge Distillation“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 7 (26.06.2023): 8500–8509. http://dx.doi.org/10.1609/aaai.v37i7.26024.
Der volle Inhalt der QuelleYang, Hyunjun, Hyeonjun Park und Kyungjae Lee. „A Selective Portfolio Management Algorithm with Off-Policy Reinforcement Learning Using Dirichlet Distribution“. Axioms 11, Nr. 12 (23.11.2022): 664. http://dx.doi.org/10.3390/axioms11120664.
Der volle Inhalt der QuelleSuttle, Wesley, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Başar und Ji Liu. „A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning“. IFAC-PapersOnLine 53, Nr. 2 (2020): 1549–54. http://dx.doi.org/10.1016/j.ifacol.2020.12.2021.
Der volle Inhalt der QuelleStanković, Miloš S., Marko Beko und Srdjan S. Stanković. „Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence“. IFAC-PapersOnLine 53, Nr. 2 (2020): 1563–68. http://dx.doi.org/10.1016/j.ifacol.2020.12.2184.
Der volle Inhalt der QuelleLi, Jinna, Zhenfei Xiao, Tianyou Chai, Frank L. Lewis und Sarangapani Jagannathan. „Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems“. IFAC-PapersOnLine 53, Nr. 2 (2020): 9189–94. http://dx.doi.org/10.1016/j.ifacol.2020.12.2180.
Der volle Inhalt der QuelleKim und Park. „Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning“. Symmetry 11, Nr. 11 (01.11.2019): 1352. http://dx.doi.org/10.3390/sym11111352.
Der volle Inhalt der QuelleChen, Ning, Shuhan Luo, Jiayang Dai, Biao Luo und Weihua Gui. „Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning“. IEEE Access 8 (2020): 149730–40. http://dx.doi.org/10.1109/access.2020.3015801.
Der volle Inhalt der QuelleHachiya, Hirotaka, Takayuki Akiyama, Masashi Sugiayma und Jan Peters. „Adaptive importance sampling for value function approximation in off-policy reinforcement learning“. Neural Networks 22, Nr. 10 (Dezember 2009): 1399–410. http://dx.doi.org/10.1016/j.neunet.2009.01.002.
Der volle Inhalt der QuelleZuo, Guoyu, Qishen Zhao, Kexin Chen, Jiangeng Li und Daoxiong Gong. „Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations“. Applied Soft Computing 97 (Dezember 2020): 106795. http://dx.doi.org/10.1016/j.asoc.2020.106795.
Der volle Inhalt der QuelleGivchi, Arash, und Maziar Palhang. „Off-policy temporal difference learning with distribution adaptation in fast mixing chains“. Soft Computing 22, Nr. 3 (30.01.2017): 737–50. http://dx.doi.org/10.1007/s00500-017-2490-1.
Der volle Inhalt der QuelleLiu, Mushuang, Yan Wan, Frank L. Lewis und Victor G. Lopez. „Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning“. IEEE Transactions on Neural Networks and Learning Systems 31, Nr. 12 (Dezember 2020): 5522–33. http://dx.doi.org/10.1109/tnnls.2020.2969215.
Der volle Inhalt der QuellePritchett, Lant, und Justin Sandefur. „Learning from Experiments when Context Matters“. American Economic Review 105, Nr. 5 (01.05.2015): 471–75. http://dx.doi.org/10.1257/aer.p20151016.
Der volle Inhalt der QuelleChen, Zaiwei. „A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms“. ACM SIGMETRICS Performance Evaluation Review 50, Nr. 3 (30.12.2022): 12–15. http://dx.doi.org/10.1145/3579342.3579346.
Der volle Inhalt der QuelleNarita, Yusuke, Kyohei Okumura, Akihiro Shimizu und Kohei Yata. „Counterfactual Learning with General Data-Generating Policies“. Proceedings of the AAAI Conference on Artificial Intelligence 37, Nr. 8 (26.06.2023): 9286–93. http://dx.doi.org/10.1609/aaai.v37i8.26113.
Der volle Inhalt der QuelleKim, MyeongSeop, Jung-Su Kim, Myoung-Su Choi und Jae-Han Park. „Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty“. Sensors 22, Nr. 19 (25.09.2022): 7266. http://dx.doi.org/10.3390/s22197266.
Der volle Inhalt der Quelle