Academic literature on the topic 'Actor-critic methods'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Actor-critic methods.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Actor-critic methods":

1

Parisi, Simone, Voot Tangkaratt, Jan Peters, and Mohammad Emtiyaz Khan. "TD-regularized actor-critic methods." Machine Learning 108, no. 8-9 (February 21, 2019): 1467–501. http://dx.doi.org/10.1007/s10994-019-05788-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Jing, Xuchu Ding, Morteza Lahijanian, Ioannis Ch Paschalidis, and Calin A. Belta. "Temporal logic motion control using actor–critic methods." International Journal of Robotics Research 34, no. 10 (May 26, 2015): 1329–44. http://dx.doi.org/10.1177/0278364915581505.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Grondman, I., M. Vaandrager, L. Busoniu, R. Babuska, and E. Schuitema. "Efficient Model Learning Methods for Actor–Critic Control." IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, no. 3 (June 2012): 591–602. http://dx.doi.org/10.1109/tsmcb.2011.2170565.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Mingyi, Jianhao Tang, Haoli Zhao, Zhenni Li, and Shengli Xie. "Automatic Compression of Neural Network with Deep Reinforcement Learning Based on Proximal Gradient Method." Mathematics 11, no. 2 (January 9, 2023): 338. http://dx.doi.org/10.3390/math11020338.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In recent years, the model compression technique is very effective for deep neural network compression. However, many existing model compression methods rely heavily on human experience to explore a compression strategy between network structure, speed, and accuracy, which is usually suboptimal and time-consuming. In this paper, we propose a framework for automatically compressing models through the actor–critic structured deep reinforcement learning (DRL) which interacts with each layer in the neural network, where the actor network determines the compression strategy and the critic network ensures the decision accuracy of the actor network through predicted values, thus improving the compression quality of the network. To enhance the prediction performance of the critic network, we impose the L1 norm regularizer on the weights of the critic network to obtain a distinct activation output feature on the representation, thus enhancing the prediction accuracy of the critic network. Moreover, to improve the decision performance of the actor network, we impose the L1 norm regularizer on the weights of the actor network to improve the decision accuracy of the actor network by removing the redundant weights in the actor network. Furthermore, to improve the training efficiency, we use the proximal gradient method to optimize the weights of the actor network and the critic network, which can obtain an effective weight solution and thus improve the compression performance. In the experiment, in MNIST datasets, the proposed method has only a 0.2% loss of accuracy when compressing more than 70% of neurons. Similarly, in CIFAR-10 datasets, the proposed method compresses more than 60% of neurons, with only 7.1% accuracy loss, which is superior to other existing methods. In terms of efficiency, the proposed method also cost the lowest time among the existing methods.
5

Su, Jianyu, Stephen Adams, and Peter Beling. "Value-Decomposition Multi-Agent Actor-Critics." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 13 (May 18, 2021): 11352–60. http://dx.doi.org/10.1609/aaai.v35i13.17353.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance on the StarCraft II micromanagement testbed, a common MARL benchmark. However, our experiments demonstrate that, in some cases, QMIX performs sub-optimally with the A2C framework, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critic methods that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critic (VDAC). We evaluate VDAC on the StarCraft II micromanagement task and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDAC.
6

Saglam, Baturay, Furkan B. Mutlu, Dogan C. Cicek, and Suleyman S. Kozat. "Actor Prioritized Experience Replay." Journal of Artificial Intelligence Research 78 (November 16, 2023): 639–72. http://dx.doi.org/10.1613/jair.1.14819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical studies indicate that it considerably underperforms off-policy actor-critic algorithms. We theoretically show that actor networks cannot be effectively trained with transitions that have large TD errors. As a result, the approximate policy gradient computed under the Q-network diverges from the actual gradient computed under the optimal Q-function. Motivated by this, we introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. The introduced algorithm suggests a new branch of improvements to PER and schedules effective and efficient training for both actor and critic networks. An extensive set of experiments verifies our theoretical findings, showing that our method outperforms competing approaches and achieves state-of-the-art results over the standard off-policy actor-critic algorithms.
7

Seo, Kanghyeon, and Jihoon Yang. "Differentially Private Actor and Its Eligibility Trace." Electronics 9, no. 9 (September 10, 2020): 1486. http://dx.doi.org/10.3390/electronics9091486.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We present a differentially private actor and its eligibility trace in an actor-critic approach, wherein an actor takes actions directly interacting with an environment; however, the critic estimates only the state values that are obtained through bootstrapping. In other words, the actor reflects the more detailed information about the sequence of taken actions on its parameter than the critic. Moreover, their corresponding eligibility traces have the same properties. Therefore, it is necessary to preserve the privacy of an actor and its eligibility trace while training on private or sensitive data. In this paper, we confirm the applicability of differential privacy methods to the actors updated using the policy gradient algorithm and discuss the advantages of such an approach with regard to differentially private critic learning. In addition, we measured the cosine similarity between the differentially private applied eligibility trace and the non-differentially private eligibility trace to analyze whether their anonymity is appropriately protected in the differentially private actor or the critic. We conducted the experiments considering two synthetic examples imitating real-world problems in medical and autonomous navigation domains, and the results confirmed the feasibility of the proposed method.
8

Saglam, Baturay, Furkan Mutlu, Dogan Cicek, and Suleyman Kozat. "Actor Prioritized Experience Replay (Abstract Reprint)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 20 (March 24, 2024): 22710. http://dx.doi.org/10.1609/aaai.v38i20.30610.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A widely-studied deep reinforcement learning (RL) technique known as Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error. Although it has been shown that PER is one of the most crucial components for the overall performance of deep RL methods in discrete action domains, many empirical studies indicate that it considerably underperforms off-policy actor-critic algorithms. We theoretically show that actor networks cannot be effectively trained with transitions that have large TD errors. As a result, the approximate policy gradient computed under the Q-network diverges from the actual gradient computed under the optimal Q-function. Motivated by this, we introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER. The introduced algorithm suggests a new branch of improvements to PER and schedules effective and efficient training for both actor and critic networks. An extensive set of experiments verifies our theoretical findings, showing that our method outperforms competing approaches and achieves state-of-the-art results over the standard off-policy actor-critic algorithms.
9

Hafez, Muhammad Burhan, Cornelius Weber, Matthias Kerzel, and Stefan Wermter. "Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning." Paladyn, Journal of Behavioral Robotics 10, no. 1 (January 1, 2019): 14–29. http://dx.doi.org/10.1515/pjbr-2019-0005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.
10

Kong, Minseok, and Jungmin So. "Empirical Analysis of Automated Stock Trading Using Deep Reinforcement Learning." Applied Sciences 13, no. 1 (January 3, 2023): 633. http://dx.doi.org/10.3390/app13010633.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
There are several automated stock trading programs using reinforcement learning, one of which is an ensemble strategy. The main idea of the ensemble strategy is to train DRL agents and make an ensemble with three different actor–critic algorithms: Advantage Actor–Critic (A2C), Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). This novel idea was the concept mainly used in this paper. However, we did not stop there, but we refined the automated stock trading in two areas. First, we made another DRL-based ensemble and employed it as a new trading agent. We named it Remake Ensemble, and it combines not only A2C, DDPG, and PPO but also Actor–Critic using Kronecker-Factored Trust Region (ACKTR), Soft Actor–Critic (SAC), Twin Delayed DDPG (TD3), and Trust Region Policy Optimization (TRPO). Furthermore, we expanded the application domain of automated stock trading. Although the existing stock trading method treats only 30 Dow Jones stocks, ours handles KOSPI stocks, JPX stocks, and Dow Jones stocks. We conducted experiments with our modified automated stock trading system to validate its robustness in terms of cumulative return. Finally, we suggested some methods to gain relatively stable profits following the experiments.

Dissertations / Theses on the topic "Actor-critic methods":

1

Barakat, Anas. "Contributions to non-convex stochastic optimization and reinforcement learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAT030.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Cette thèse est centrée autour de l'analyse de convergence de certains algorithmes d'approximation stochastiques utilisés en machine learning appliqués à l'optimisation et à l'apprentissage par renforcement. La première partie de la thèse est dédiée à un célèbre algorithme en apprentissage profond appelé ADAM, utilisé pour entraîner des réseaux de neurones. Cette célèbre variante de la descente de gradient stochastique est plus généralement utilisée pour la recherche d'un minimiseur local d'une fonction. En supposant que la fonction objective est différentiable et non convexe, nous établissons la convergence des itérées au temps long vers l'ensemble des points critiques sous une hypothèse de stabilité dans le régime des pas constants. Ensuite, nous introduisons une nouvelle variante de l'algorithme ADAM à pas décroissants. Nous montrons alors sous certaines hypothèses réalistes que les itérées sont presque sûrement bornées et convergent presque sûrement vers des points critiques de la fonction objective. Enfin, nous analysons les fluctuations de l'algorithme par le truchement d'un théorème central limite conditionnel. Dans la deuxième partie de cette thèse, dans le régime des pas décroissants, nous généralisons nos résultats de convergence et de fluctuations à une procédure d'optimisation stochastique unifiant plusieurs variantes de descente de gradient stochastique comme la méthode de la boule pesante, l'algorithme stochastique de Nesterov accéléré ou encore le célèbre algorithme ADAM, parmi d'autres. Nous concluons cette partie par un résultat d'évitement de pièges qui établit la non convergence de l'algorithme général vers des points critiques indésirables comme les maxima locaux ou les points-selles. Ici, le principal ingrédient est un nouveau résultat indépendant d'évitement de pièges pour un contexte non-autonome. Enfin, la dernière partie de cette thèse qui est indépendante des deux premières parties est dédiée à l'analyse d'un algorithme d'approximation stochastique pour l'apprentissage par renforcement. Dans cette dernière partie, dans le cadre des processus décisionnels de Markov avec critère de récompense gamma-pondéré, nous proposons une analyse d'un algorithme acteur-critique en ligne intégrant un réseau cible et avec approximation de fonction linéraire. Notre algorithme utilise trois échelles de temps distinctes: une échelle pour l'acteur et deux autres pour la critique. Au lieu d'utiliser l'algorithme de différence temporelle (TD) standard à une échelle de temps, nous utilisons une version de l'algorithme TD à deux échelles de temps intégrant un réseau cible inspiré des algorithmes acteur-critique utilisés en pratique. Tout d'abord, nous établissons des résultats de convergence pour la critique et l'acteur sous échantillonnage Markovien. Ensuite, nous menons une analyse à temps fini montrant l'impact de l'utilisation d'un réseau cible sur les méthodes acteur-critique
This thesis is focused on the convergence analysis of some popular stochastic approximation methods in use in the machine learning community with applications to optimization and reinforcement learning.The first part of the thesis is devoted to a popular algorithm in deep learning called ADAM used for training neural networks. This variant of stochastic gradient descent is more generally useful for finding a local minimizer of a function. Assuming that the objective function is differentiable and non-convex, we establish the convergence of the iterates in the long run to the set of critical points under a stability condition in the constant stepsize regime. Then, we introduce a novel decreasing stepsize version of ADAM. Under mild assumptions, it is shown that the iterates are almost surely bounded and converge almost surely to critical points of the objective function. Finally, we analyze the fluctuations of the algorithm by means of a conditional central limit theorem.In the second part of the thesis, in the vanishing stepsizes regime, we generalize our convergence and fluctuations results to a stochastic optimization procedure unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm, and the widely used ADAM algorithm. We conclude this second part by an avoidance of traps result establishing the non-convergence of the general algorithm to undesired critical points, such as local maxima or saddle points. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.Finally, the last part of this thesis which is independent from the two previous parts, is concerned with the analysis of a stochastic approximation algorithm for reinforcement learning. In this last part, we propose an analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods
2

Pereira, Bruno Alexandre Barbosa. "Deep reinforcement learning for robotic manipulation tasks." Master's thesis, 2021. http://hdl.handle.net/10773/33654.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The recent advances in Artificial Intelligence (AI) present new opportunities for robotics on many fronts. Deep Reinforcement Learning (DRL) is a sub-field of AI which results from the combination of Deep Learning (DL) and Reinforcement Learning (RL). It categorizes machine learning algorithms which learn directly from experience and offers a comprehensive framework for studying the interplay among learning, representation and decision-making. It has already been successfully used to solve tasks in many domains. Most notably, DRL agents learned to play Atari 2600 video games directly from pixels and achieved human comparable performance in 49 of those games. Additionally, recent efforts using DRL in conjunction with other techniques produced agents capable of playing the board game of Go at a professional level, which has long been viewed as an intractable problem due to its enormous search space. In the context of robotics, DRL is often applied to planning, navigation, optimal control and others. Here, the powerful function approximation and representation learning properties of Deep Neural Networks enable RL to scale up to problems with highdimensional state and action spaces. Additionally, inherent properties of DRL make transfer learning useful when moving from simulation to the real world. This dissertation aims to investigate the applicability and effectiveness of DRL to learn successful policies on the domain of robot manipulator tasks. Initially, a set of three classic RL problems were solved using RL and DRL algorithms in order to explore their practical implementation and arrive at class of algorithms appropriate for these robotic tasks. Afterwards, a task in simulation is defined such that an agent is set to control a 6 DoF manipulator to reach a target with its end effector. This is used to evaluate the effects on performance of different state representations, hyperparameters and state-of-the-art DRL algorithms, resulting in agents with high success rates. The emphasis is then placed on the speed and time restrictions of the end effector's positioning. To this end, different reward systems were tested for an agent learning a modified version of the previous reaching task with faster joint speeds. In this setting, a number of improvements were verified in relation to the original reward system. Finally, an application of the best reaching agent obtained from the previous experiments is demonstrated on a simplified ball catching scenario.
Os avanços recentes na Inteligência Artificial (IA) demonstram um conjunto de novas oportunidades para a robótica. A Aprendizagem Profunda por Reforço (DRL) é uma subárea da IA que resulta da combinação de Aprendizagem Profunda (DL) com Aprendizagem por Reforço (RL). Esta subárea define algoritmos de aprendizagem automática que aprendem diretamente por experiência e oferece uma abordagem compreensiva para o estudo da interação entre aprendizagem, representação e a decisão. Estes algoritmos já têm sido utilizados com sucesso em diferentes domínios. Nomeadamente, destaca-se a aplicação de agentes de DRL que aprenderam a jogar vídeo jogos da consola Atari 2600 diretamente a partir de pixels e atingiram um desempenho comparável a humanos em 49 desses jogos. Mais recentemente, a DRL em conjunto com outras técnicas originou agentes capazes de jogar o jogo de tabuleiro Go a um nível profissional, algo que até ao momento era visto como um problema demasiado complexo para ser resolvido devido ao seu enorme espaço de procura. No âmbito da robótica, a DRL tem vindo a ser utilizada em problemas de planeamento, navegação, controlo ótimo e outros. Nestas aplicações, as excelentes capacidades de aproximação de funções e aprendizagem de representação das Redes Neuronais Profundas permitem à RL escalar a problemas com espaços de estado e ação multidimensionais. Adicionalmente, propriedades inerentes à DRL fazem a transferência de aprendizagem útil ao passar da simulação para o mundo real. Esta dissertação visa investigar a aplicabilidade e eficácia de técnicas de DRL para aprender políticas de sucesso no domínio das tarefas de manipulação robótica. Inicialmente, um conjunto de três problemas clássicos de RL foram resolvidos utilizando algoritmos de RL e DRL de forma a explorar a sua implementação prática e chegar a uma classe de algoritmos apropriados para estas tarefas de robótica. Posteriormente, foi definida uma tarefa em simulação onde um agente tem como objetivo controlar um manipulador com 6 graus de liberdade de forma a atingir um alvo com o seu terminal. Esta é utilizada para avaliar o efeito no desempenho de diferentes representações do estado, hiperparâmetros e algoritmos do estado da arte de DRL, o que resultou em agentes com taxas de sucesso elevadas. O foco é depois colocado na velocidade e restrições de tempo do posicionamento do terminal. Para este fim, diferentes sistemas de recompensa foram testados para que um agente possa aprender uma versão modificada da tarefa anterior para velocidades de juntas superiores. Neste cenário, foram verificadas várias melhorias em relação ao sistema de recompensa original. Finalmente, uma aplicação do melhor agente obtido nas experiências anteriores é demonstrada num cenário implicado de captura de bola.
Mestrado em Engenharia de Computadores e Telemática
3

Duarte, Ana Filipa de Sampaio Calçada. "Using Reinforcement Learning in the tuning of Central Pattern Generators." Master's thesis, 2012. http://hdl.handle.net/1822/28037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Dissertação de mestrado em Engenharia Informática
É objetivo deste trabalho aplicar técnicas de Reinforcement Learning em tarefas de aprendizagem e locomoção de robôs. Reinforcement Learning é uma técnica de aprendizagem útil no que diz respeito à locomoção de robôs, devido à ênfase que dá à interação direta entre o agente e o meio ambiente, e ao facto de não exigir supervisão ou modelos completos, ao contrário do que acontece nas abordagens clássicas. O objetivo desta técnica consiste na decisão das ações a tomar, de forma a maximizar uma recompensa cumulativa, tendo em conta o facto de que as decisões podem afetar não só as recompensas imediatas, como também as futuras. Neste trabalho será apresentada a estrutura e funcionamento do Reinforcement Learning e a sua aplicação em Central Pattern Generators, com o objetivo de gerar locomoção adaptativa otimizada. De forma a investigar e identificar os pontos fortes e capacidades do Reinforcement Learning, e para demonstrar de uma forma simples este tipo de algoritmos, foram implementados dois casos de estudo baseados no estado da arte. No que diz respeito ao objetivo principal desta tese, duas soluções diferentes foram abordadas: uma primeira baseada em métodos Natural-Actor Critic, e a segunda, em Cross-Entropy Method. Este último algoritmo provou ser capaz de lidar com a integração das duas abordagens propostas. As soluções de integração foram testadas e validadas com recurso ao simulador Webots e ao modelo do robô DARwIN-OP.
In this work, it is intended to apply Reinforcement Learning techniques in tasks involving learning and robot locomotion. Reinforcement Learning is a very useful learning technique with regard to legged robot locomotion, due to its ability to provide direct interaction between the agent and the environment, and the fact of not requiring supervision or complete models, in contrast with other classic approaches. Its aim consists in making decisions about which actions to take so as to maximize a cumulative reward or reinforcement signal, taking into account the fact that the decisions may affect not only the immediate reward, but also the future ones. In this work it will be studied and presented the Reinforcement Learning framework and its application in the tuning of Central Pattern Generators, with the aim of generating optimized robot locomotion. In order to investigate the strengths and abilities of Reinforcement Learning, and to demonstrate in a simple way the learning process of such algorithms, two case studies were implemented based on the state-of-the-art. With regard to the main purpose of the thesis, two different solutions are addressed: a first one based on Natural-Actor Critic methods, and a second, based on the Cross-Entropy Method. This last algorithm was found to be very capable of handling with the integration of the two proposed approaches. The integration solutions were tested and validated resorting to Webots simulation and DARwIN-OP robot model.

Book chapters on the topic "Actor-critic methods":

1

Shang, Wenling, Douwe van der Wal, Herke van Hoof, and Max Welling. "Stochastic Activation Actor Critic Methods." In Machine Learning and Knowledge Discovery in Databases, 103–17. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-46133-1_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Girgin, Sertan, and Philippe Preux. "Basis Expansion in Natural Actor Critic Methods." In Lecture Notes in Computer Science, 110–23. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89722-4_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Holzleitner, Markus, Lukas Gruber, José Arjona-Medina, Johannes Brandstetter, and Sepp Hochreiter. "Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER." In Transactions on Large-Scale Data- and Knowledge-Centered Systems XLVIII, 105–30. Berlin, Heidelberg: Springer Berlin Heidelberg, 2021. http://dx.doi.org/10.1007/978-3-662-63519-3_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Fernandez-Gauna, Borja, Igor Ansoategui, Ismael Etxeberria-Agiriano, and Manuel Graña. "An Empirical Study of Actor-Critic Methods for Feedback Controllers of Ball-Screw Drivers." In Natural and Artificial Computation in Engineering and Medical Applications, 441–50. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-38622-0_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zai, Alexander, and Brandon Brown. "Bewältigung komplexerer Probleme mit Actor-Critic-Methoden." In Einstieg in Deep Reinforcement Learning, 121–49. München: Carl Hanser Verlag GmbH & Co. KG, 2020. http://dx.doi.org/10.3139/9783446466081.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Iima, Hitoshi, and Yasuaki Kuroe. "Swarm Reinforcement Learning Method Based on an Actor-Critic Method." In Lecture Notes in Computer Science, 279–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-17298-4_29.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Guo, Ziyue, Hongxu Hou, Nier Wu, and Shuo Sun. "Neural Machine Translation Based on Improved Actor-Critic Method." In Artificial Neural Networks and Machine Learning – ICANN 2020, 346–57. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-61616-8_28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Cai, Jiarun. "WD3-MPER: A Method to Alleviate Approximation Bias in Actor-Critic." In Neural Information Processing, 713–24. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-63833-7_60.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wei, Bo, Hang Song, Quang Ngoc Nguyen, and Jiro Katto. "DASH Live Video Streaming Control Using Actor-Critic Reinforcement Learning Method." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 17–24. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-94763-7_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Xin, Guo-jing, Kai Zhang, Zhong-zheng Wang, Zi-feng Sun, Li-ming Zhang, Pi-yang Liu, Yong-fei Yang, Hai Sun, and Jun Yao. "Soft Actor-Critic Based Deep Reinforcement Learning Method for Production Optimization." In Springer Series in Geomechanics and Geoengineering, 353–66. Singapore: Springer Nature Singapore, 2024. http://dx.doi.org/10.1007/978-981-97-0272-5_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Actor-critic methods":

1

Miranda, Thiago S., and Heder S. Bernardino. "Distributional Safety Critic for Stochastic Latent Actor-Critic." In Encontro Nacional de Inteligência Artificial e Computacional. Sociedade Brasileira de Computação - SBC, 2023. http://dx.doi.org/10.5753/eniac.2023.234620.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
When employing reinforcement learning techniques in real-world applications, one may desire to constrain the agent by limiting actions that lead to potential damage, harm, or unwanted scenarios. Particularly, recent approaches focus on developing safe behavior under partial observability conditions. In this vein, we develop a method that combines distributional reinforcement learning techniques with methods used to facilitate learning in partially observable environments, called distributional safe stochastic latent actor-critic (DS-SLAC). We evaluate the DS-SLAC performance on four Safety-Gym tasks and DS-SLAC obtained results better than those reached by state-of-the-art algorithms in two of the evaluated environments while being able to develop a safe policy in three of them. Lastly, we also identify the main challenges of performing distributional reinforcement learning in the safety-constrained partially observable setting.
2

Li, Jinke, Ruonan Rao, and Jun Shi. "Learning to Trade with Deep Actor Critic Methods." In 2018 11th International Symposium on Computational Intelligence and Design (ISCID). IEEE, 2018. http://dx.doi.org/10.1109/iscid.2018.10116.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Ding, Xu Chu, Jing Wang, Morteza Lahijanian, Ioannis Ch Paschalidis, and Calin A. Belta. "Temporal logic motion control using actor-critic methods." In 2012 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2012. http://dx.doi.org/10.1109/icra.2012.6225290.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Li, Xiaomu, and Quan Liu. "Master-Slave Policy Collaboration for Actor-Critic Methods." In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892603.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Fan, Zhou, Rui Su, Weinan Zhang, and Yong Yu. "Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning.
6

Chun-Gui Li, Meng Wang, and Qing-Neng Yuan. "A Multi-agent Reinforcement Learning using Actor-Critic methods." In 2008 International Conference on Machine Learning and Cybernetics (ICMLC). IEEE, 2008. http://dx.doi.org/10.1109/icmlc.2008.4620528.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

N, Sandeep Varma, Pradyumna Rahul K, and Vaishnavi Sinha. "Data augmented Approach to Optimizing Asynchronous Actor-Critic Methods." In 2022 IEEE International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE). IEEE, 2022. http://dx.doi.org/10.1109/icdcece53908.2022.9792764.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Wan, Tianjiao, Haibo Mi, Zijian Gao, Yuanzhao Zhai, Bo Ding, and Dawei Feng. "Bi-level Multi-Agent Actor-Critic Methods with ransformers." In 2023 IEEE International Conference on Joint Cloud Computing (JCC). IEEE, 2023. http://dx.doi.org/10.1109/jcc59055.2023.00007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Khemlichi, Firdaous, Houda Elyousfi Elfilali, Hiba Chougrad, Safae Elhaj Ben Ali, and Youness Idrissi Khamlichi. "Actor-Critic Methods in Stock Trading : A Comparative Study." In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME). IEEE, 2023. http://dx.doi.org/10.1109/iceccme57830.2023.10253277.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Peng, Peixi, Junliang Xing, and Lili Cao. "Hybrid Learning for Multi-agent Cooperation with Sub-optimal Demonstrations." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/420.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper aims to learn multi-agent cooperation where each agent performs its actions in a decentralized way. In this case, it is very challenging to learn decentralized policies when the rewards are global and sparse. Recently, learning from demonstrations (LfD) provides a promising way to handle this challenge. However, in many practical tasks, the available demonstrations are often sub-optimal. To learn better policies from these sub-optimal demonstrations, this paper follows a centralized learning and decentralized execution framework and proposes a novel hybrid learning method based on multi-agent actor-critic. At first, the expert trajectory returns generated from demonstration actions are used to pre-train the centralized critic network. Then, multi-agent decisions are made by best response dynamics based on the critic and used to train the decentralized actor networks. Finally, the demonstrations are updated by the actor networks, and the critic and actor networks are learned jointly by running the above two steps alliteratively. We evaluate the proposed approach on a real-time strategy combat game. Experimental results show that the approach outperforms many competing demonstration-based methods.

To the bibliography