Gotowa bibliografia na temat „Actor-critic methods”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Actor-critic methods”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Actor-critic methods"
Parisi, Simone, Voot Tangkaratt, Jan Peters i Mohammad Emtiyaz Khan. "TD-regularized actor-critic methods". Machine Learning 108, nr 8-9 (21.02.2019): 1467–501. http://dx.doi.org/10.1007/s10994-019-05788-0.
Pełny tekst źródłaWang, Jing, Xuchu Ding, Morteza Lahijanian, Ioannis Ch Paschalidis i Calin A. Belta. "Temporal logic motion control using actor–critic methods". International Journal of Robotics Research 34, nr 10 (26.05.2015): 1329–44. http://dx.doi.org/10.1177/0278364915581505.
Pełny tekst źródłaGrondman, I., M. Vaandrager, L. Busoniu, R. Babuska i E. Schuitema. "Efficient Model Learning Methods for Actor–Critic Control". IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, nr 3 (czerwiec 2012): 591–602. http://dx.doi.org/10.1109/tsmcb.2011.2170565.
Pełny tekst źródłaWang, Mingyi, Jianhao Tang, Haoli Zhao, Zhenni Li i Shengli Xie. "Automatic Compression of Neural Network with Deep Reinforcement Learning Based on Proximal Gradient Method". Mathematics 11, nr 2 (9.01.2023): 338. http://dx.doi.org/10.3390/math11020338.
Pełny tekst źródłaSu, Jianyu, Stephen Adams i Peter Beling. "Value-Decomposition Multi-Agent Actor-Critics". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 13 (18.05.2021): 11352–60. http://dx.doi.org/10.1609/aaai.v35i13.17353.
Pełny tekst źródłaSaglam, Baturay, Furkan B. Mutlu, Dogan C. Cicek i Suleyman S. Kozat. "Actor Prioritized Experience Replay". Journal of Artificial Intelligence Research 78 (16.11.2023): 639–72. http://dx.doi.org/10.1613/jair.1.14819.
Pełny tekst źródłaSeo, Kanghyeon, i Jihoon Yang. "Differentially Private Actor and Its Eligibility Trace". Electronics 9, nr 9 (10.09.2020): 1486. http://dx.doi.org/10.3390/electronics9091486.
Pełny tekst źródłaSaglam, Baturay, Furkan Mutlu, Dogan Cicek i Suleyman Kozat. "Actor Prioritized Experience Replay (Abstract Reprint)". Proceedings of the AAAI Conference on Artificial Intelligence 38, nr 20 (24.03.2024): 22710. http://dx.doi.org/10.1609/aaai.v38i20.30610.
Pełny tekst źródłaHafez, Muhammad Burhan, Cornelius Weber, Matthias Kerzel i Stefan Wermter. "Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning". Paladyn, Journal of Behavioral Robotics 10, nr 1 (1.01.2019): 14–29. http://dx.doi.org/10.1515/pjbr-2019-0005.
Pełny tekst źródłaKong, Minseok, i Jungmin So. "Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning". Applied Sciences 13, nr 1 (3.01.2023): 633. http://dx.doi.org/10.3390/app13010633.
Pełny tekst źródłaRozprawy doktorskie na temat "Actor-critic methods"
Barakat, Anas. "Contributions to non-convex stochastic optimization and reinforcement learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAT030.
Pełny tekst źródłaThis thesis is focused on the convergence analysis of some popular stochastic approximation methods in use in the machine learning community with applications to optimization and reinforcement learning.The first part of the thesis is devoted to a popular algorithm in deep learning called ADAM used for training neural networks. This variant of stochastic gradient descent is more generally useful for finding a local minimizer of a function. Assuming that the objective function is differentiable and non-convex, we establish the convergence of the iterates in the long run to the set of critical points under a stability condition in the constant stepsize regime. Then, we introduce a novel decreasing stepsize version of ADAM. Under mild assumptions, it is shown that the iterates are almost surely bounded and converge almost surely to critical points of the objective function. Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.In the second part of the thesis, in the vanishing stepsizes regime, we generalize our convergence and fluctuations results to a stochastic optimization procedure unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm, and the widely used ADAM algorithm. We conclude this second part by an avoidance of traps result establishing the non-convergence of the general algorithm to undesired critical points, such as local maxima or saddle points. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.Finally, the last part of this thesis which is independent from the two previous parts, is concerned with the analysis of a stochastic approximation algorithm for reinforcement learning. In this last part, we propose an analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods
Pereira, Bruno Alexandre Barbosa. "Deep reinforcement learning for robotic manipulation tasks". Master's thesis, 2021. http://hdl.handle.net/10773/33654.
Pełny tekst źródłaOs avanços recentes na Inteligência Artificial (IA) demonstram um conjunto de novas oportunidades para a robótica. A Aprendizagem Profunda por Reforço (DRL) é uma subárea da IA que resulta da combinação de Aprendizagem Profunda (DL) com Aprendizagem por Reforço (RL). Esta subárea define algoritmos de aprendizagem automática que aprendem diretamente por experiência e oferece uma abordagem compreensiva para o estudo da interação entre aprendizagem, representação e a decisão. Estes algoritmos já têm sido utilizados com sucesso em diferentes domínios. Nomeadamente, destaca-se a aplicação de agentes de DRL que aprenderam a jogar vídeo jogos da consola Atari 2600 diretamente a partir de pixels e atingiram um desempenho comparável a humanos em 49 desses jogos. Mais recentemente, a DRL em conjunto com outras técnicas originou agentes capazes de jogar o jogo de tabuleiro Go a um nível profissional, algo que até ao momento era visto como um problema demasiado complexo para ser resolvido devido ao seu enorme espaço de procura. No âmbito da robótica, a DRL tem vindo a ser utilizada em problemas de planeamento, navegação, controlo ótimo e outros. Nestas aplicações, as excelentes capacidades de aproximação de funções e aprendizagem de representação das Redes Neuronais Profundas permitem à RL escalar a problemas com espaços de estado e ação multidimensionais. Adicionalmente, propriedades inerentes à DRL fazem a transferência de aprendizagem útil ao passar da simulação para o mundo real. Esta dissertação visa investigar a aplicabilidade e eficácia de técnicas de DRL para aprender políticas de sucesso no domínio das tarefas de manipulação robótica. Inicialmente, um conjunto de três problemas clássicos de RL foram resolvidos utilizando algoritmos de RL e DRL de forma a explorar a sua implementação prática e chegar a uma classe de algoritmos apropriados para estas tarefas de robótica. Posteriormente, foi definida uma tarefa em simulação onde um agente tem como objetivo controlar um manipulador com 6 graus de liberdade de forma a atingir um alvo com o seu terminal. Esta é utilizada para avaliar o efeito no desempenho de diferentes representações do estado, hiperparâmetros e algoritmos do estado da arte de DRL, o que resultou em agentes com taxas de sucesso elevadas. O foco é depois colocado na velocidade e restrições de tempo do posicionamento do terminal. Para este fim, diferentes sistemas de recompensa foram testados para que um agente possa aprender uma versão modificada da tarefa anterior para velocidades de juntas superiores. Neste cenário, foram verificadas várias melhorias em relação ao sistema de recompensa original. Finalmente, uma aplicação do melhor agente obtido nas experiências anteriores é demonstrada num cenário implicado de captura de bola.
Mestrado em Engenharia de Computadores e Telemática
Duarte, Ana Filipa de Sampaio Calçada. "Using Reinforcement Learning in the tuning of Central Pattern Generators". Master's thesis, 2012. http://hdl.handle.net/1822/28037.
Pełny tekst źródłaÉ objetivo deste trabalho aplicar técnicas de Reinforcement Learning em tarefas de aprendizagem e locomoção de robôs. Reinforcement Learning é uma técnica de aprendizagem útil no que diz respeito à locomoção de robôs, devido à ênfase que dá à interação direta entre o agente e o meio ambiente, e ao facto de não exigir supervisão ou modelos completos, ao contrário do que acontece nas abordagens clássicas. O objetivo desta técnica consiste na decisão das ações a tomar, de forma a maximizar uma recompensa cumulativa, tendo em conta o facto de que as decisões podem afetar não só as recompensas imediatas, como também as futuras. Neste trabalho será apresentada a estrutura e funcionamento do Reinforcement Learning e a sua aplicação em Central Pattern Generators, com o objetivo de gerar locomoção adaptativa otimizada. De forma a investigar e identificar os pontos fortes e capacidades do Reinforcement Learning, e para demonstrar de uma forma simples este tipo de algoritmos, foram implementados dois casos de estudo baseados no estado da arte. No que diz respeito ao objetivo principal desta tese, duas soluções diferentes foram abordadas: uma primeira baseada em métodos Natural-Actor Critic, e a segunda, em Cross-Entropy Method. Este último algoritmo provou ser capaz de lidar com a integração das duas abordagens propostas. As soluções de integração foram testadas e validadas com recurso ao simulador Webots e ao modelo do robô DARwIN-OP.
In this work, it is intended to apply Reinforcement Learning techniques in tasks involving learning and robot locomotion. Reinforcement Learning is a very useful learning technique with regard to legged robot locomotion, due to its ability to provide direct interaction between the agent and the environment, and the fact of not requiring supervision or complete models, in contrast with other classic approaches. Its aim consists in making decisions about which actions to take so as to maximize a cumulative reward or reinforcement signal, taking into account the fact that the decisions may affect not only the immediate reward, but also the future ones. In this work it will be studied and presented the Reinforcement Learning framework and its application in the tuning of Central Pattern Generators, with the aim of generating optimized robot locomotion. In order to investigate the strengths and abilities of Reinforcement Learning, and to demonstrate in a simple way the learning process of such algorithms, two case studies were implemented based on the state-of-the-art. With regard to the main purpose of the thesis, two different solutions are addressed: a first one based on Natural-Actor Critic methods, and a second, based on the Cross-Entropy Method. This last algorithm was found to be very capable of handling with the integration of the two proposed approaches. The integration solutions were tested and validated resorting to Webots simulation and DARwIN-OP robot model.
Części książek na temat "Actor-critic methods"
Shang, Wenling, Douwe van der Wal, Herke van Hoof i Max Welling. "Stochastic Activation Actor Critic Methods". W Machine Learning and Knowledge Discovery in Databases, 103–17. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-46133-1_7.
Pełny tekst źródłaGirgin, Sertan, i Philippe Preux. "Basis Expansion in Natural Actor Critic Methods". W Lecture Notes in Computer Science, 110–23. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89722-4_9.
Pełny tekst źródłaHolzleitner, Markus, Lukas Gruber, José Arjona-Medina, Johannes Brandstetter i Sepp Hochreiter. "Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER". W Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVIII, 105–30. Berlin, Heidelberg: Springer Berlin Heidelberg, 2021. http://dx.doi.org/10.1007/978-3-662-63519-3_5.
Pełny tekst źródłaFernandez-Gauna, Borja, Igor Ansoategui, Ismael Etxeberria-Agiriano i Manuel Graña. "An Empirical Study of Actor-Critic Methods for Feedback Controllers of Ball-Screw Drivers". W Natural and Artificial Computation in Engineering and Medical Applications, 441–50. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-38622-0_46.
Pełny tekst źródłaZai, Alexander, i Brandon Brown. "Bewältigung komplexerer Probleme mit Actor-Critic-Methoden". W Einstieg in Deep Reinforcement Learning, 121–49. München: Carl Hanser Verlag GmbH & Co. KG, 2020. http://dx.doi.org/10.3139/9783446466081.005.
Pełny tekst źródłaIima, Hitoshi, i Yasuaki Kuroe. "Swarm Reinforcement Learning Method Based on an Actor-Critic Method". W Lecture Notes in Computer Science, 279–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-17298-4_29.
Pełny tekst źródłaGuo, Ziyue, Hongxu Hou, Nier Wu i Shuo Sun. "Neural Machine Translation Based on Improved Actor-Critic Method". W Artificial Neural Networks and Machine Learning – ICANN 2020, 346–57. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-61616-8_28.
Pełny tekst źródłaCai, Jiarun. "WD3-MPER: A Method to Alleviate Approximation Bias in Actor-Critic". W Neural Information Processing, 713–24. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63833-7_60.
Pełny tekst źródłaWei, Bo, Hang Song, Quang Ngoc Nguyen i Jiro Katto. "DASH Live Video Streaming Control Using Actor-Critic Reinforcement Learning Method". W Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 17–24. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-94763-7_2.
Pełny tekst źródłaXin, Guo-jing, Kai Zhang, Zhong-zheng Wang, Zi-feng Sun, Li-ming Zhang, Pi-yang Liu, Yong-fei Yang, Hai Sun i Jun Yao. "Soft Actor-Critic Based Deep Reinforcement Learning Method for Production Optimization". W Springer Series in Geomechanics and Geoengineering, 353–66. Singapore: Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-0272-5_31.
Pełny tekst źródłaStreszczenia konferencji na temat "Actor-critic methods"
Miranda, Thiago S., i Heder S. Bernardino. "Distributional Safety Critic for Stochastic Latent Actor-Critic". W Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2023. http://dx.doi.org/10.5753/eniac.2023.234620.
Pełny tekst źródłaLi, Jinke, Ruonan Rao i Jun Shi. "Learning to Trade with Deep Actor Critic Methods". W 2018 11th International Symposium on Computational Intelligence and Design (ISCID). IEEE, 2018. http://dx.doi.org/10.1109/iscid.2018.10116.
Pełny tekst źródłaDing, Xu Chu, Jing Wang, Morteza Lahijanian, Ioannis Ch Paschalidis i Calin A. Belta. "Temporal logic motion control using actor-critic methods". W 2012 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2012. http://dx.doi.org/10.1109/icra.2012.6225290.
Pełny tekst źródłaLi, Xiaomu, i Quan Liu. "Master-Slave Policy Collaboration for Actor-Critic Methods". W 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892603.
Pełny tekst źródłaFan, Zhou, Rui Su, Weinan Zhang i Yong Yu. "Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space". W Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/316.
Pełny tekst źródłaChun-Gui Li, Meng Wang i Qing-Neng Yuan. "A Multi-agent Reinforcement Learning using Actor-Critic methods". W 2008 International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, 2008. http://dx.doi.org/10.1109/icmlc.2008.4620528.
Pełny tekst źródłaN, Sandeep Varma, Pradyumna Rahul K i Vaishnavi Sinha. "Data augmented Approach to Optimizing Asynchronous Actor-Critic Methods". W 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). IEEE, 2022. http://dx.doi.org/10.1109/icdcece53908.2022.9792764.
Pełny tekst źródłaWan, Tianjiao, Haibo Mi, Zijian Gao, Yuanzhao Zhai, Bo Ding i Dawei Feng. "Bi-level Multi-Agent Actor-Critic Methods with ransformers". W 2023 IEEE International Conference on Joint Cloud Computing (JCC). IEEE, 2023. http://dx.doi.org/10.1109/jcc59055.2023.00007.
Pełny tekst źródłaKhemlichi, Firdaous, Houda Elyousfi Elfilali, Hiba Chougrad, Safae Elhaj Ben Ali i Youness Idrissi Khamlichi. "Actor-Critic Methods in Stock Trading : A Comparative Study". W 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). IEEE, 2023. http://dx.doi.org/10.1109/iceccme57830.2023.10253277.
Pełny tekst źródłaPeng, Peixi, Junliang Xing i Lili Cao. "Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations". W Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/420.
Pełny tekst źródła