Littérature scientifique sur le sujet « Actor-critic methods »
Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres
Consultez les listes thématiques d’articles de revues, de livres, de thèses, de rapports de conférences et d’autres sources académiques sur le sujet « Actor-critic methods ».
À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.
Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.
Articles de revues sur le sujet "Actor-critic methods"
Parisi, Simone, Voot Tangkaratt, Jan Peters et Mohammad Emtiyaz Khan. « TD-regularized actor-critic methods ». Machine Learning 108, no 8-9 (21 février 2019) : 1467–501. http://dx.doi.org/10.1007/s10994-019-05788-0.
Texte intégralWang, Jing, Xuchu Ding, Morteza Lahijanian, Ioannis Ch Paschalidis et Calin A. Belta. « Temporal logic motion control using actor–critic methods ». International Journal of Robotics Research 34, no 10 (26 mai 2015) : 1329–44. http://dx.doi.org/10.1177/0278364915581505.
Texte intégralGrondman, I., M. Vaandrager, L. Busoniu, R. Babuska et E. Schuitema. « Efficient Model Learning Methods for Actor–Critic Control ». IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, no 3 (juin 2012) : 591–602. http://dx.doi.org/10.1109/tsmcb.2011.2170565.
Texte intégralWang, Mingyi, Jianhao Tang, Haoli Zhao, Zhenni Li et Shengli Xie. « Automatic Compression of Neural Network with Deep Reinforcement Learning Based on Proximal Gradient Method ». Mathematics 11, no 2 (9 janvier 2023) : 338. http://dx.doi.org/10.3390/math11020338.
Texte intégralSu, Jianyu, Stephen Adams et Peter Beling. « Value-Decomposition Multi-Agent Actor-Critics ». Proceedings of the AAAI Conference on Artificial Intelligence 35, no 13 (18 mai 2021) : 11352–60. http://dx.doi.org/10.1609/aaai.v35i13.17353.
Texte intégralSaglam, Baturay, Furkan B. Mutlu, Dogan C. Cicek et Suleyman S. Kozat. « Actor Prioritized Experience Replay ». Journal of Artificial Intelligence Research 78 (16 novembre 2023) : 639–72. http://dx.doi.org/10.1613/jair.1.14819.
Texte intégralSeo, Kanghyeon, et Jihoon Yang. « Differentially Private Actor and Its Eligibility Trace ». Electronics 9, no 9 (10 septembre 2020) : 1486. http://dx.doi.org/10.3390/electronics9091486.
Texte intégralSaglam, Baturay, Furkan Mutlu, Dogan Cicek et Suleyman Kozat. « Actor Prioritized Experience Replay (Abstract Reprint) ». Proceedings of the AAAI Conference on Artificial Intelligence 38, no 20 (24 mars 2024) : 22710. http://dx.doi.org/10.1609/aaai.v38i20.30610.
Texte intégralHafez, Muhammad Burhan, Cornelius Weber, Matthias Kerzel et Stefan Wermter. « Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning ». Paladyn, Journal of Behavioral Robotics 10, no 1 (1 janvier 2019) : 14–29. http://dx.doi.org/10.1515/pjbr-2019-0005.
Texte intégralKong, Minseok, et Jungmin So. « Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning ». Applied Sciences 13, no 1 (3 janvier 2023) : 633. http://dx.doi.org/10.3390/app13010633.
Texte intégralThèses sur le sujet "Actor-critic methods"
Barakat, Anas. « Contributions to non-convex stochastic optimization and reinforcement learning ». Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAT030.
Texte intégralThis thesis is focused on the convergence analysis of some popular stochastic approximation methods in use in the machine learning community with applications to optimization and reinforcement learning.The first part of the thesis is devoted to a popular algorithm in deep learning called ADAM used for training neural networks. This variant of stochastic gradient descent is more generally useful for finding a local minimizer of a function. Assuming that the objective function is differentiable and non-convex, we establish the convergence of the iterates in the long run to the set of critical points under a stability condition in the constant stepsize regime. Then, we introduce a novel decreasing stepsize version of ADAM. Under mild assumptions, it is shown that the iterates are almost surely bounded and converge almost surely to critical points of the objective function. Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.In the second part of the thesis, in the vanishing stepsizes regime, we generalize our convergence and fluctuations results to a stochastic optimization procedure unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm, and the widely used ADAM algorithm. We conclude this second part by an avoidance of traps result establishing the non-convergence of the general algorithm to undesired critical points, such as local maxima or saddle points. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.Finally, the last part of this thesis which is independent from the two previous parts, is concerned with the analysis of a stochastic approximation algorithm for reinforcement learning. In this last part, we propose an analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods
Pereira, Bruno Alexandre Barbosa. « Deep reinforcement learning for robotic manipulation tasks ». Master's thesis, 2021. http://hdl.handle.net/10773/33654.
Texte intégralOs avanços recentes na Inteligência Artificial (IA) demonstram um conjunto de novas oportunidades para a robótica. A Aprendizagem Profunda por Reforço (DRL) é uma subárea da IA que resulta da combinação de Aprendizagem Profunda (DL) com Aprendizagem por Reforço (RL). Esta subárea define algoritmos de aprendizagem automática que aprendem diretamente por experiência e oferece uma abordagem compreensiva para o estudo da interação entre aprendizagem, representação e a decisão. Estes algoritmos já têm sido utilizados com sucesso em diferentes domínios. Nomeadamente, destaca-se a aplicação de agentes de DRL que aprenderam a jogar vídeo jogos da consola Atari 2600 diretamente a partir de pixels e atingiram um desempenho comparável a humanos em 49 desses jogos. Mais recentemente, a DRL em conjunto com outras técnicas originou agentes capazes de jogar o jogo de tabuleiro Go a um nível profissional, algo que até ao momento era visto como um problema demasiado complexo para ser resolvido devido ao seu enorme espaço de procura. No âmbito da robótica, a DRL tem vindo a ser utilizada em problemas de planeamento, navegação, controlo ótimo e outros. Nestas aplicações, as excelentes capacidades de aproximação de funções e aprendizagem de representação das Redes Neuronais Profundas permitem à RL escalar a problemas com espaços de estado e ação multidimensionais. Adicionalmente, propriedades inerentes à DRL fazem a transferência de aprendizagem útil ao passar da simulação para o mundo real. Esta dissertação visa investigar a aplicabilidade e eficácia de técnicas de DRL para aprender políticas de sucesso no domínio das tarefas de manipulação robótica. Inicialmente, um conjunto de três problemas clássicos de RL foram resolvidos utilizando algoritmos de RL e DRL de forma a explorar a sua implementação prática e chegar a uma classe de algoritmos apropriados para estas tarefas de robótica. Posteriormente, foi definida uma tarefa em simulação onde um agente tem como objetivo controlar um manipulador com 6 graus de liberdade de forma a atingir um alvo com o seu terminal. Esta é utilizada para avaliar o efeito no desempenho de diferentes representações do estado, hiperparâmetros e algoritmos do estado da arte de DRL, o que resultou em agentes com taxas de sucesso elevadas. O foco é depois colocado na velocidade e restrições de tempo do posicionamento do terminal. Para este fim, diferentes sistemas de recompensa foram testados para que um agente possa aprender uma versão modificada da tarefa anterior para velocidades de juntas superiores. Neste cenário, foram verificadas várias melhorias em relação ao sistema de recompensa original. Finalmente, uma aplicação do melhor agente obtido nas experiências anteriores é demonstrada num cenário implicado de captura de bola.
Mestrado em Engenharia de Computadores e Telemática
Duarte, Ana Filipa de Sampaio Calçada. « Using Reinforcement Learning in the tuning of Central Pattern Generators ». Master's thesis, 2012. http://hdl.handle.net/1822/28037.
Texte intégralÉ objetivo deste trabalho aplicar técnicas de Reinforcement Learning em tarefas de aprendizagem e locomoção de robôs. Reinforcement Learning é uma técnica de aprendizagem útil no que diz respeito à locomoção de robôs, devido à ênfase que dá à interação direta entre o agente e o meio ambiente, e ao facto de não exigir supervisão ou modelos completos, ao contrário do que acontece nas abordagens clássicas. O objetivo desta técnica consiste na decisão das ações a tomar, de forma a maximizar uma recompensa cumulativa, tendo em conta o facto de que as decisões podem afetar não só as recompensas imediatas, como também as futuras. Neste trabalho será apresentada a estrutura e funcionamento do Reinforcement Learning e a sua aplicação em Central Pattern Generators, com o objetivo de gerar locomoção adaptativa otimizada. De forma a investigar e identificar os pontos fortes e capacidades do Reinforcement Learning, e para demonstrar de uma forma simples este tipo de algoritmos, foram implementados dois casos de estudo baseados no estado da arte. No que diz respeito ao objetivo principal desta tese, duas soluções diferentes foram abordadas: uma primeira baseada em métodos Natural-Actor Critic, e a segunda, em Cross-Entropy Method. Este último algoritmo provou ser capaz de lidar com a integração das duas abordagens propostas. As soluções de integração foram testadas e validadas com recurso ao simulador Webots e ao modelo do robô DARwIN-OP.
In this work, it is intended to apply Reinforcement Learning techniques in tasks involving learning and robot locomotion. Reinforcement Learning is a very useful learning technique with regard to legged robot locomotion, due to its ability to provide direct interaction between the agent and the environment, and the fact of not requiring supervision or complete models, in contrast with other classic approaches. Its aim consists in making decisions about which actions to take so as to maximize a cumulative reward or reinforcement signal, taking into account the fact that the decisions may affect not only the immediate reward, but also the future ones. In this work it will be studied and presented the Reinforcement Learning framework and its application in the tuning of Central Pattern Generators, with the aim of generating optimized robot locomotion. In order to investigate the strengths and abilities of Reinforcement Learning, and to demonstrate in a simple way the learning process of such algorithms, two case studies were implemented based on the state-of-the-art. With regard to the main purpose of the thesis, two different solutions are addressed: a first one based on Natural-Actor Critic methods, and a second, based on the Cross-Entropy Method. This last algorithm was found to be very capable of handling with the integration of the two proposed approaches. The integration solutions were tested and validated resorting to Webots simulation and DARwIN-OP robot model.
Chapitres de livres sur le sujet "Actor-critic methods"
Shang, Wenling, Douwe van der Wal, Herke van Hoof et Max Welling. « Stochastic Activation Actor Critic Methods ». Dans Machine Learning and Knowledge Discovery in Databases, 103–17. Cham : Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-46133-1_7.
Texte intégralGirgin, Sertan, et Philippe Preux. « Basis Expansion in Natural Actor Critic Methods ». Dans Lecture Notes in Computer Science, 110–23. Berlin, Heidelberg : Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89722-4_9.
Texte intégralHolzleitner, Markus, Lukas Gruber, José Arjona-Medina, Johannes Brandstetter et Sepp Hochreiter. « Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER ». Dans Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVIII, 105–30. Berlin, Heidelberg : Springer Berlin Heidelberg, 2021. http://dx.doi.org/10.1007/978-3-662-63519-3_5.
Texte intégralFernandez-Gauna, Borja, Igor Ansoategui, Ismael Etxeberria-Agiriano et Manuel Graña. « An Empirical Study of Actor-Critic Methods for Feedback Controllers of Ball-Screw Drivers ». Dans Natural and Artificial Computation in Engineering and Medical Applications, 441–50. Berlin, Heidelberg : Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-38622-0_46.
Texte intégralZai, Alexander, et Brandon Brown. « Bewältigung komplexerer Probleme mit Actor-Critic-Methoden ». Dans Einstieg in Deep Reinforcement Learning, 121–49. München : Carl Hanser Verlag GmbH & Co. KG, 2020. http://dx.doi.org/10.3139/9783446466081.005.
Texte intégralIima, Hitoshi, et Yasuaki Kuroe. « Swarm Reinforcement Learning Method Based on an Actor-Critic Method ». Dans Lecture Notes in Computer Science, 279–88. Berlin, Heidelberg : Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-17298-4_29.
Texte intégralGuo, Ziyue, Hongxu Hou, Nier Wu et Shuo Sun. « Neural Machine Translation Based on Improved Actor-Critic Method ». Dans Artificial Neural Networks and Machine Learning – ICANN 2020, 346–57. Cham : Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-61616-8_28.
Texte intégralCai, Jiarun. « WD3-MPER : A Method to Alleviate Approximation Bias in Actor-Critic ». Dans Neural Information Processing, 713–24. Cham : Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63833-7_60.
Texte intégralWei, Bo, Hang Song, Quang Ngoc Nguyen et Jiro Katto. « DASH Live Video Streaming Control Using Actor-Critic Reinforcement Learning Method ». Dans Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 17–24. Cham : Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-94763-7_2.
Texte intégralXin, Guo-jing, Kai Zhang, Zhong-zheng Wang, Zi-feng Sun, Li-ming Zhang, Pi-yang Liu, Yong-fei Yang, Hai Sun et Jun Yao. « Soft Actor-Critic Based Deep Reinforcement Learning Method for Production Optimization ». Dans Springer Series in Geomechanics and Geoengineering, 353–66. Singapore : Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-0272-5_31.
Texte intégralActes de conférences sur le sujet "Actor-critic methods"
Miranda, Thiago S., et Heder S. Bernardino. « Distributional Safety Critic for Stochastic Latent Actor-Critic ». Dans Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2023. http://dx.doi.org/10.5753/eniac.2023.234620.
Texte intégralLi, Jinke, Ruonan Rao et Jun Shi. « Learning to Trade with Deep Actor Critic Methods ». Dans 2018 11th International Symposium on Computational Intelligence and Design (ISCID). IEEE, 2018. http://dx.doi.org/10.1109/iscid.2018.10116.
Texte intégralDing, Xu Chu, Jing Wang, Morteza Lahijanian, Ioannis Ch Paschalidis et Calin A. Belta. « Temporal logic motion control using actor-critic methods ». Dans 2012 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2012. http://dx.doi.org/10.1109/icra.2012.6225290.
Texte intégralLi, Xiaomu, et Quan Liu. « Master-Slave Policy Collaboration for Actor-Critic Methods ». Dans 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892603.
Texte intégralFan, Zhou, Rui Su, Weinan Zhang et Yong Yu. « Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space ». Dans Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California : International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/316.
Texte intégralChun-Gui Li, Meng Wang et Qing-Neng Yuan. « A Multi-agent Reinforcement Learning using Actor-Critic methods ». Dans 2008 International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, 2008. http://dx.doi.org/10.1109/icmlc.2008.4620528.
Texte intégralN, Sandeep Varma, Pradyumna Rahul K et Vaishnavi Sinha. « Data augmented Approach to Optimizing Asynchronous Actor-Critic Methods ». Dans 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). IEEE, 2022. http://dx.doi.org/10.1109/icdcece53908.2022.9792764.
Texte intégralWan, Tianjiao, Haibo Mi, Zijian Gao, Yuanzhao Zhai, Bo Ding et Dawei Feng. « Bi-level Multi-Agent Actor-Critic Methods with ransformers ». Dans 2023 IEEE International Conference on Joint Cloud Computing (JCC). IEEE, 2023. http://dx.doi.org/10.1109/jcc59055.2023.00007.
Texte intégralKhemlichi, Firdaous, Houda Elyousfi Elfilali, Hiba Chougrad, Safae Elhaj Ben Ali et Youness Idrissi Khamlichi. « Actor-Critic Methods in Stock Trading : A Comparative Study ». Dans 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). IEEE, 2023. http://dx.doi.org/10.1109/iceccme57830.2023.10253277.
Texte intégralPeng, Peixi, Junliang Xing et Lili Cao. « Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations ». Dans Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California : International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/420.
Texte intégral