Artigos de revistas sobre o tema "Off-Policy learning"
Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos
Veja os 50 melhores artigos de revistas para estudos sobre o assunto "Off-Policy learning".
Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.
Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.
Veja os artigos de revistas das mais diversas áreas científicas e compile uma bibliografia correta.
Meng, Wenjia, Qian Zheng, Gang Pan e Yilong Yin. "Off-Policy Proximal Policy Optimization". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 8 (26 de junho de 2023): 9162–70. http://dx.doi.org/10.1609/aaai.v37i8.26099.
Texto completo da fonteSchmitt, Simon, John Shawe-Taylor e Hado van Hasselt. "Chaining Value Functions for Off-Policy Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 8 (28 de junho de 2022): 8187–95. http://dx.doi.org/10.1609/aaai.v36i8.20792.
Texto completo da fonteXu, Da, Yuting Ye, Chuanwei Ruan e Bo Yang. "Towards Robust Off-Policy Learning for Runtime Uncertainty". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 9 (28 de junho de 2022): 10101–9. http://dx.doi.org/10.1609/aaai.v36i9.21249.
Texto completo da fontePeters, James F., e Christopher Henry. "Approximation spaces in off-policy Monte Carlo learning". Engineering Applications of Artificial Intelligence 20, n.º 5 (agosto de 2007): 667–75. http://dx.doi.org/10.1016/j.engappai.2006.11.005.
Texto completo da fonteYu, Jiayu, Jingyao Li, Shuai Lü e Shuai Han. "Mixed experience sampling for off-policy reinforcement learning". Expert Systems with Applications 251 (outubro de 2024): 124017. http://dx.doi.org/10.1016/j.eswa.2024.124017.
Texto completo da fonteCetin, Edoardo, e Oya Celiktutan. "Learning Pessimism for Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 6 (26 de junho de 2023): 6971–79. http://dx.doi.org/10.1609/aaai.v37i6.25852.
Texto completo da fonteKong, Seung-Hyun, I. Made Aswin Nahrendra e Dong-Hee Paek. "Enhanced Off-Policy Reinforcement Learning With Focused Experience Replay". IEEE Access 9 (2021): 93152–64. http://dx.doi.org/10.1109/access.2021.3085142.
Texto completo da fonteLi, Lihong. "A perspective on off-policy evaluation in reinforcement learning". Frontiers of Computer Science 13, n.º 5 (17 de junho de 2019): 911–12. http://dx.doi.org/10.1007/s11704-019-9901-7.
Texto completo da fonteLuo, Biao, Huai-Ning Wu e Tingwen Huang. "Off-Policy Reinforcement Learning for $ H_\infty $ Control Design". IEEE Transactions on Cybernetics 45, n.º 1 (janeiro de 2015): 65–76. http://dx.doi.org/10.1109/tcyb.2014.2319577.
Texto completo da fonteSun, Mingfei, Sam Devlin, Katja Hofmann e Shimon Whiteson. "Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 8 (28 de junho de 2022): 8378–85. http://dx.doi.org/10.1609/aaai.v36i8.20813.
Texto completo da fonteJain, Arushi, Gandharv Patil, Ayush Jain, Khimya Khetarpal e Doina Precup. "Variance Penalized On-Policy and Off-Policy Actor-Critic". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 9 (18 de maio de 2021): 7899–907. http://dx.doi.org/10.1609/aaai.v35i9.16964.
Texto completo da fonteHao, Longyan, Chaoli Wang e Yibo Shi. "Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method". Mathematics 12, n.º 10 (14 de maio de 2024): 1533. http://dx.doi.org/10.3390/math12101533.
Texto completo da fonteGelada, Carles, e Marc G. Bellemare. "Off-Policy Deep Reinforcement Learning by Bootstrapping the Covariate Shift". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 de julho de 2019): 3647–55. http://dx.doi.org/10.1609/aaai.v33i01.33013647.
Texto completo da fonteXiao, Teng, e Suhang Wang. "Towards Off-Policy Learning for Ranking Policies with Logged Feedback". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 8 (28 de junho de 2022): 8700–8707. http://dx.doi.org/10.1609/aaai.v36i8.20849.
Texto completo da fonteLi, Jinna, Hamidreza Modares, Tianyou Chai, Frank L. Lewis e Lihua Xie. "Off-Policy Reinforcement Learning for Synchronization in Multiagent Graphical Games". IEEE Transactions on Neural Networks and Learning Systems 28, n.º 10 (outubro de 2017): 2434–45. http://dx.doi.org/10.1109/tnnls.2016.2609500.
Texto completo da fonteZhang, Hengrui, Youfang Lin, Shuo Shen, Sheng Han e Kai Lv. "Enhancing Off-Policy Constrained Reinforcement Learning through Adaptive Ensemble C Estimation". Proceedings of the AAAI Conference on Artificial Intelligence 38, n.º 19 (24 de março de 2024): 21770–78. http://dx.doi.org/10.1609/aaai.v38i19.30177.
Texto completo da fonteZhang, Shangtong, Bo Liu e Shimon Whiteson. "Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 12 (18 de maio de 2021): 10905–13. http://dx.doi.org/10.1609/aaai.v35i12.17302.
Texto completo da fonteAli, Raja Farrukh, Kevin Duong, Nasik Muhammad Nafi e William Hsu. "Multi-Horizon Learning in Procedurally-Generated Environments for Off-Policy Reinforcement Learning (Student Abstract)". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 13 (26 de junho de 2023): 16150–51. http://dx.doi.org/10.1609/aaai.v37i13.26935.
Texto completo da fonteTennenholtz, Guy, Uri Shalit e Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments". Proceedings of the AAAI Conference on Artificial Intelligence 34, n.º 06 (3 de abril de 2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.
Texto completo da fonteNakamura, Yutaka, Takeshi Mori, Yoichi Tokita, Tomohiro Shibata e Shin Ishii. "Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller". Journal of Robotics and Mechatronics 17, n.º 6 (20 de dezembro de 2005): 636–44. http://dx.doi.org/10.20965/jrm.2005.p0636.
Texto completo da fonteWang, Mingyang, Zhenshan Bing, Xiangtong Yao, Shuai Wang, Huang Kai, Hang Su, Chenguang Yang e Alois Knoll. "Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 8 (26 de junho de 2023): 10157–65. http://dx.doi.org/10.1609/aaai.v37i8.26210.
Texto completo da fonteCao, Jiaqing, Quan Liu, Fei Zhu, Qiming Fu e Shan Zhong. "Gradient temporal-difference learning for off-policy evaluation using emphatic weightings". Information Sciences 580 (novembro de 2021): 311–30. http://dx.doi.org/10.1016/j.ins.2021.08.082.
Texto completo da fonteTian, Chang, An Liu, Guan Huang e Wu Luo. "Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning". IEEE Transactions on Signal Processing 70 (2022): 1609–24. http://dx.doi.org/10.1109/tsp.2022.3158737.
Texto completo da fonteKarimpanal, Thommen George, e Erik Wilhelm. "Identification and off-policy learning of multiple objectives using adaptive clustering". Neurocomputing 263 (novembro de 2017): 39–47. http://dx.doi.org/10.1016/j.neucom.2017.04.074.
Texto completo da fonteKiumarsi, Bahare, Frank L. Lewis e Zhong-Ping Jiang. "H∞ control of linear discrete-time systems: Off-policy reinforcement learning". Automatica 78 (abril de 2017): 144–52. http://dx.doi.org/10.1016/j.automatica.2016.12.009.
Texto completo da fonteLi, Jinna, Zhenfei Xiao e Ping Li. "Discrete-Time Multi-Player Games Based on Off-Policy Q-Learning". IEEE Access 7 (2019): 134647–59. http://dx.doi.org/10.1109/access.2019.2939384.
Texto completo da fonteKiumarsi, Bahare, Wei Kang e Frank L. Lewis. "H∞ Control of Nonaffine Aerial Systems Using Off-policy Reinforcement Learning". Unmanned Systems 04, n.º 01 (janeiro de 2016): 51–60. http://dx.doi.org/10.1142/s2301385016400069.
Texto completo da fonteLian, Bosen, Wenqian Xue, Yijing Xie, Frank L. Lewis e Ali Davoudi. "Off-policy inverse Q-learning for discrete-time antagonistic unknown systems". Automatica 155 (setembro de 2023): 111171. http://dx.doi.org/10.1016/j.automatica.2023.111171.
Texto completo da fonteKim, Man-Je, Hyunsoo Park e Chang Wook Ahn. "Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning". Electronics 11, n.º 7 (28 de março de 2022): 1069. http://dx.doi.org/10.3390/electronics11071069.
Texto completo da fonteChaudhari, Shreyas, David Arbour, Georgios Theocharous e Nikos Vlassis. "Distributional Off-Policy Evaluation for Slate Recommendations". Proceedings of the AAAI Conference on Artificial Intelligence 38, n.º 8 (24 de março de 2024): 8265–73. http://dx.doi.org/10.1609/aaai.v38i8.28667.
Texto completo da fonteZhang, Ruiyi, Tong Yu, Yilin Shen e Hongxia Jin. "Text-Based Interactive Recommendation via Offline Reinforcement Learning". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 10 (28 de junho de 2022): 11694–702. http://dx.doi.org/10.1609/aaai.v36i10.21424.
Texto completo da fonteXu, Z., L. Cao e X. Chen. "Deep Reinforcement Learning with Adaptive Update Target Combination". Computer Journal 63, n.º 7 (15 de agosto de 2019): 995–1003. http://dx.doi.org/10.1093/comjnl/bxz066.
Texto completo da fonteShahid, Asad Ali, Dario Piga, Francesco Braghin e Loris Roveda. "Continuous control actions learning and adaptation for robotic manipulation through reinforcement learning". Autonomous Robots 46, n.º 3 (9 de fevereiro de 2022): 483–98. http://dx.doi.org/10.1007/s10514-022-10034-z.
Texto completo da fonteHollenstein, Jakob, Georg Martius e Justus Piater. "Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling". Proceedings of the AAAI Conference on Artificial Intelligence 38, n.º 11 (24 de março de 2024): 12466–72. http://dx.doi.org/10.1609/aaai.v38i11.29139.
Texto completo da fonteRen, He, Jing Dai, Huaguang Zhang e Kun Zhang. "Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems". Transactions of the Institute of Measurement and Control 42, n.º 15 (6 de julho de 2020): 2919–28. http://dx.doi.org/10.1177/0142331220932634.
Texto completo da fonteLevine, Alexander, e Soheil Feizi. "Goal-Conditioned Q-learning as Knowledge Distillation". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 7 (26 de junho de 2023): 8500–8509. http://dx.doi.org/10.1609/aaai.v37i7.26024.
Texto completo da fonteYang, Hyunjun, Hyeonjun Park e Kyungjae Lee. "A Selective Portfolio Management Algorithm with Off-Policy Reinforcement Learning Using Dirichlet Distribution". Axioms 11, n.º 12 (23 de novembro de 2022): 664. http://dx.doi.org/10.3390/axioms11120664.
Texto completo da fonteSuttle, Wesley, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Başar e Ji Liu. "A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning". IFAC-PapersOnLine 53, n.º 2 (2020): 1549–54. http://dx.doi.org/10.1016/j.ifacol.2020.12.2021.
Texto completo da fonteStanković, Miloš S., Marko Beko e Srdjan S. Stanković. "Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence". IFAC-PapersOnLine 53, n.º 2 (2020): 1563–68. http://dx.doi.org/10.1016/j.ifacol.2020.12.2184.
Texto completo da fonteLi, Jinna, Zhenfei Xiao, Tianyou Chai, Frank L. Lewis e Sarangapani Jagannathan. "Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems". IFAC-PapersOnLine 53, n.º 2 (2020): 9189–94. http://dx.doi.org/10.1016/j.ifacol.2020.12.2180.
Texto completo da fonteKim e Park. "Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning". Symmetry 11, n.º 11 (1 de novembro de 2019): 1352. http://dx.doi.org/10.3390/sym11111352.
Texto completo da fonteChen, Ning, Shuhan Luo, Jiayang Dai, Biao Luo e Weihua Gui. "Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning". IEEE Access 8 (2020): 149730–40. http://dx.doi.org/10.1109/access.2020.3015801.
Texto completo da fonteHachiya, Hirotaka, Takayuki Akiyama, Masashi Sugiayma e Jan Peters. "Adaptive importance sampling for value function approximation in off-policy reinforcement learning". Neural Networks 22, n.º 10 (dezembro de 2009): 1399–410. http://dx.doi.org/10.1016/j.neunet.2009.01.002.
Texto completo da fonteZuo, Guoyu, Qishen Zhao, Kexin Chen, Jiangeng Li e Daoxiong Gong. "Off-policy adversarial imitation learning for robotic tasks with low-quality demonstrations". Applied Soft Computing 97 (dezembro de 2020): 106795. http://dx.doi.org/10.1016/j.asoc.2020.106795.
Texto completo da fonteGivchi, Arash, e Maziar Palhang. "Off-policy temporal difference learning with distribution adaptation in fast mixing chains". Soft Computing 22, n.º 3 (30 de janeiro de 2017): 737–50. http://dx.doi.org/10.1007/s00500-017-2490-1.
Texto completo da fonteLiu, Mushuang, Yan Wan, Frank L. Lewis e Victor G. Lopez. "Adaptive Optimal Control for Stochastic Multiplayer Differential Games Using On-Policy and Off-Policy Reinforcement Learning". IEEE Transactions on Neural Networks and Learning Systems 31, n.º 12 (dezembro de 2020): 5522–33. http://dx.doi.org/10.1109/tnnls.2020.2969215.
Texto completo da fontePritchett, Lant, e Justin Sandefur. "Learning from Experiments when Context Matters". American Economic Review 105, n.º 5 (1 de maio de 2015): 471–75. http://dx.doi.org/10.1257/aer.p20151016.
Texto completo da fonteChen, Zaiwei. "A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms". ACM SIGMETRICS Performance Evaluation Review 50, n.º 3 (30 de dezembro de 2022): 12–15. http://dx.doi.org/10.1145/3579342.3579346.
Texto completo da fonteNarita, Yusuke, Kyohei Okumura, Akihiro Shimizu e Kohei Yata. "Counterfactual Learning with General Data-Generating Policies". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 8 (26 de junho de 2023): 9286–93. http://dx.doi.org/10.1609/aaai.v37i8.26113.
Texto completo da fonteKim, MyeongSeop, Jung-Su Kim, Myoung-Su Choi e Jae-Han Park. "Adaptive Discount Factor for Deep Reinforcement Learning in Continuing Tasks with Uncertainty". Sensors 22, n.º 19 (25 de setembro de 2022): 7266. http://dx.doi.org/10.3390/s22197266.
Texto completo da fonte