Gotowa bibliografia na temat „Actor-critic algorithm”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Actor-critic algorithm”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Actor-critic algorithm"

1

Wang, Jing, and Ioannis Ch Paschalidis. "An Actor-Critic Algorithm With Second-Order Actor and Critic." IEEE Transactions on Automatic Control 62, no. 6 (2017): 2689–703. http://dx.doi.org/10.1109/tac.2016.2616384.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Zheng, Liyuan, Tanner Fiez, Zane Alumbaugh, Benjamin Chasnov, and Lillian J. Ratliff. "Stackelberg Actor-Critic: Game-Theoretic Reinforcement Learning Algorithms." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (2022): 9217–24. http://dx.doi.org/10.1609/aaai.v36i8.20908.

Pełny tekst źródła
Streszczenie:
The hierarchical interaction between the actor and critic in actor-critic based reinforcement learning algorithms naturally lends itself to a game-theoretic interpretation. We adopt this viewpoint and model the actor and critic interaction as a two-player general-sum game with a leader-follower structure known as a Stackelberg game. Given this abstraction, we propose a meta-framework for Stackelberg actor-critic algorithms where the leader player follows the total derivative of its objective instead of the usual individual gradient. From a theoretical standpoint, we develop a policy gradient t
Style APA, Harvard, Vancouver, ISO itp.
3

Iwaki, Ryo, and Minoru Asada. "Implicit incremental natural actor critic algorithm." Neural Networks 109 (January 2019): 103–12. http://dx.doi.org/10.1016/j.neunet.2018.10.007.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Kim, Gi-Soo, Jane P. Kim, and Hyun-Joon Yang. "Robust Tests in Online Decision-Making." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (2022): 10016–24. http://dx.doi.org/10.1609/aaai.v36i9.21240.

Pełny tekst źródła
Streszczenie:
Bandit algorithms are widely used in sequential decision problems to maximize the cumulative reward. One potential application is mobile health, where the goal is to promote the user's health through personalized interventions based on user specific information acquired through wearable devices. Important considerations include the type of, and frequency with which data is collected (e.g. GPS, or continuous monitoring), as such factors can severely impact app performance and users’ adherence. In order to balance the need to collect data that is useful with the constraint of impacting app perfo
Style APA, Harvard, Vancouver, ISO itp.
5

Sergey, Denisov, and Jee-Hyong Lee. "Actor-Critic Algorithm with Transition Cost Estimation." International Journal of Fuzzy Logic and Intelligent Systems 16, no. 4 (2016): 270–75. http://dx.doi.org/10.5391/ijfis.2016.16.4.270.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Ahmed, Ayman Elshabrawy M. "Controller parameter tuning using actor-critic algorithm." IOP Conference Series: Materials Science and Engineering 610 (October 11, 2019): 012054. http://dx.doi.org/10.1088/1757-899x/610/1/012054.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Ding, Siyuan, Shengxiang Li, Guangyi Liu, et al. "Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion." Journal of Sensors 2021 (December 8, 2021): 1–14. http://dx.doi.org/10.1155/2021/8739206.

Pełny tekst źródła
Streszczenie:
The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network
Style APA, Harvard, Vancouver, ISO itp.
8

Hafez, Muhammad Burhan, Cornelius Weber, Matthias Kerzel, and Stefan Wermter. "Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning." Paladyn, Journal of Behavioral Robotics 10, no. 1 (2019): 14–29. http://dx.doi.org/10.1515/pjbr-2019-0005.

Pełny tekst źródła
Streszczenie:
Abstract In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the
Style APA, Harvard, Vancouver, ISO itp.
9

Zhang, Haifei, Jian Xu, Jian Zhang, and Quan Liu. "Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms." Computational Intelligence and Neuroscience 2022 (November 18, 2022): 1–10. http://dx.doi.org/10.1155/2022/1117781.

Pełny tekst źródła
Streszczenie:
The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability o
Style APA, Harvard, Vancouver, ISO itp.
10

Jain, Arushi, Gandharv Patil, Ayush Jain, Khimya Khetarpal, and Doina Precup. "Variance Penalized On-Policy and Off-Policy Actor-Critic." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (2021): 7899–907. http://dx.doi.org/10.1609/aaai.v35i9.16964.

Pełny tekst źródła
Streszczenie:
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an agent. However, in many practical applications, low variance in the return is desired to ensure the reliability of an algorithm. In this paper, we propose on-policy and off-policy actor-critic algorithms that optimize a performance criterion involving both mean and variance in the return. Previous work uses the second moment of return to estimate the variance indirectly. Instead, we use a much simpler recently proposed direct variance estimator which updates the estimates incrementally using tem
Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat "Actor-critic algorithm"

1

Konda, Vijaymohan (Vijaymohan Gao) 1973. "Actor-critic algorithms." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/8120.

Pełny tekst źródła
Streszczenie:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.<br>Includes bibliographical references (leaves 143-147).<br>Many complex decision making problems like scheduling in manufacturing systems, portfolio management in finance, admission control in communication networks etc., with clear and precise objectives, can be formulated as stochastic dynamic programming problems in which the objective of decision making is to maximize a single "overall" reward. In these formulations, finding an optimal decision policy involves computing a ce
Style APA, Harvard, Vancouver, ISO itp.
2

Saxena, Naman. "Average Reward Actor-Critic with Deterministic Policy Search." Thesis, 2023. https://etd.iisc.ac.in/handle/2005/6175.

Pełny tekst źródła
Streszczenie:
The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first sho
Style APA, Harvard, Vancouver, ISO itp.
3

Diddigi, Raghuram Bharadwaj. "Reinforcement Learning Algorithms for Off-Policy, Multi-Agent Learning and Applications to Smart Grids." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/5673.

Pełny tekst źródła
Streszczenie:
Reinforcement Learning (RL) algorithms are a popular class of algorithms for training an agent to learn desired behavior through interaction with an environment whose dynamics is unknown to the agent. RL algorithms combined with neural network architectures have enjoyed much success in various disciplines like games, medicine, energy management, economics and supply chain management. In our thesis, we study interesting extensions of standard single-agent RL settings, like off-policy and multi-agent settings. We discuss the motivations and importance of these settings and propose convergen
Style APA, Harvard, Vancouver, ISO itp.
4

Lakshmanan, K. "Online Learning and Simulation Based Algorithms for Stochastic Optimization." Thesis, 2012. http://etd.iisc.ac.in/handle/2005/3245.

Pełny tekst źródła
Streszczenie:
In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we hav
Style APA, Harvard, Vancouver, ISO itp.
5

Lakshmanan, K. "Online Learning and Simulation Based Algorithms for Stochastic Optimization." Thesis, 2012. http://hdl.handle.net/2005/3245.

Pełny tekst źródła
Streszczenie:
In many optimization problems, the relationship between the objective and parameters is not known. The objective function itself may be stochastic such as a long-run average over some random cost samples. In such cases finding the gradient of the objective is not possible. It is in this setting that stochastic approximation algorithms are used. These algorithms use some estimates of the gradient and are stochastic in nature. Amongst gradient estimation techniques, Simultaneous Perturbation Stochastic Approximation (SPSA) and Smoothed Functional(SF) scheme are widely used. In this thesis we hav
Style APA, Harvard, Vancouver, ISO itp.

Części książek na temat "Actor-critic algorithm"

1

Kim, Chayoung, Jung-min Park, and Hye-young Kim. "An Actor-Critic Algorithm for SVM Hyperparameters." In Information Science and Applications 2018. Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-1056-0_64.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Zha, ZhongYi, XueSong Tang, and Bo Wang. "An Advanced Actor-Critic Algorithm for Training Video Game AI." In Neural Computing for Advanced Applications. Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-7670-6_31.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Melo, Francisco S., and Manuel Lopes. "Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs." In Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-87481-2_5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Sun, Qifeng, Hui Ren, Youxiang Duan, and Yanan Yan. "The Adaptive PID Controlling Algorithm Using Asynchronous Advantage Actor-Critic Learning Method." In Simulation Tools and Techniques. Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-32216-8_48.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Liu, Guiliang, Xu Li, Miningming Sun, and Ping Li. "An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction." In Proceedings of the 2020 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2020. http://dx.doi.org/10.1137/1.9781611976236.25.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Cheng, Yuhu, Huanting Feng, and Xuesong Wang. "Actor-Critic Algorithm Based on Incremental Least-Squares Temporal Difference with Eligibility Trace." In Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence. Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-25944-9_24.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Jiang, Haobo, Jianjun Qian, Jin Xie, and Jian Yang. "Episode-Experience Replay Based Tree-Backup Method for Off-Policy Actor-Critic Algorithm." In Pattern Recognition and Computer Vision. Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-03398-9_48.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Chuyen, T. D., Dao Huy Du, N. D. Dien, R. V. Hoa, and N. V. Toan. "Building Intelligent Navigation System for Mobile Robots Based on the Actor – Critic Algorithm." In Advances in Engineering Research and Application. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-92574-1_24.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Zhang, Huaqing, Hongbin Ma, and Ying Jin. "An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control." In Intelligent Robotics and Applications. Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-13841-6_41.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Park, Jooyoung, Jongho Kim, and Daesung Kang. "An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm." In Computational Intelligence and Security. Springer Berlin Heidelberg, 2005. http://dx.doi.org/10.1007/11596448_9.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Actor-critic algorithm"

1

Wang, Jing, and Ioannis Ch Paschalidis. "A Hessian actor-critic algorithm." In 2014 IEEE 53rd Annual Conference on Decision and Control (CDC). IEEE, 2014. http://dx.doi.org/10.1109/cdc.2014.7039533.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Yaputra, Jordi, and Suyanto Suyanto. "The Effect of Discounting Actor-loss in Actor-Critic Algorithm." In 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). IEEE, 2021. http://dx.doi.org/10.1109/isriti54043.2021.9702883.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Aleixo, Everton, Juan Colonna, and Raimundo Barreto. "SVC-A2C - Actor Critic Algorithm to Improve Smart Vacuum Cleaner." In IX Simpósio Brasileiro de Engenharia de Sistemas Computacionais. Sociedade Brasileira de Computação - SBC, 2019. http://dx.doi.org/10.5753/sbesc_estendido.2019.8637.

Pełny tekst źródła
Streszczenie:
This work present a new approach to develop a vacuum cleaner. This use actor-critic algorithm. We execute tests with three other algoritms to compare. Even that, we develop a new simulator based on Gym to execute the tests.
Style APA, Harvard, Vancouver, ISO itp.
4

Prabuchandran K.J., Shalabh Bhatnagar, and Vivek S. Borkar. "An actor critic algorithm based on Grassmanian search." In 2014 IEEE 53rd Annual Conference on Decision and Control (CDC). IEEE, 2014. http://dx.doi.org/10.1109/cdc.2014.7039948.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Yang, Zhuoran, Kaiqing Zhang, Mingyi Hong, and Tamer Basar. "A Finite Sample Analysis of the Actor-Critic Algorithm." In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018. http://dx.doi.org/10.1109/cdc.2018.8619440.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Vrushabh, D., Shalini K, and K. Sonam. "Actor-Critic Algorithm for Optimal Synchronization of Kuramoto Oscillator." In 2020 7th International Conference on Control, Decision and Information Technologies (CoDIT). IEEE, 2020. http://dx.doi.org/10.1109/codit49905.2020.9263785.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Paschalidis, Ioannis Ch, and Yingwei Lin. "Mobile agent coordination via a distributed actor-critic algorithm." In Automation (MED 2011). IEEE, 2011. http://dx.doi.org/10.1109/med.2011.5983038.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Diddigi, Raghuram Bharadwaj, Prateek Jain, Prabuchandran K. J, and Shalabh Bhatnagar. "Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm." In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022. http://dx.doi.org/10.1109/ijcnn55064.2022.9892303.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Liu, Bo, Yue Zhang, Shupo Fu, and Xuan Liu. "Reduce UAV Coverage Energy Consumption through Actor-Critic Algorithm." In 2019 15th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN). IEEE, 2019. http://dx.doi.org/10.1109/msn48538.2019.00069.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Zhong, Shan, Quan Liu, Shengrong Gong, Qiming Fu, and Jin Xu. "Efficient actor-critic algorithm with dual piecewise model learning." In 2017 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2017. http://dx.doi.org/10.1109/ssci.2017.8280911.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!