Academic literature on the topic 'Policy gradient'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Policy gradient.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Policy gradient"

1

Cai, Qingpeng, Ling Pan, and Pingzhong Tang. "Deterministic Value-Policy Gradients." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3316–23. http://dx.doi.org/10.1609/aaai.v34i04.5732.

Full text
Abstract:
Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.
APA, Harvard, Vancouver, ISO, and other styles
2

Peters, Jan. "Policy gradient methods." Scholarpedia 5, no. 11 (2010): 3698. http://dx.doi.org/10.4249/scholarpedia.3698.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhao, Tingting, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, and Masashi Sugiyama. "Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration." Neural Computation 25, no. 6 (June 2013): 1512–47. http://dx.doi.org/10.1162/neco_a_00452.

Full text
Abstract:
The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.
APA, Harvard, Vancouver, ISO, and other styles
4

Baxter, J., P. L. Bartlett, and L. Weaver. "Experiments with Infinite-Horizon, Policy-Gradient Estimation." Journal of Artificial Intelligence Research 15 (November 1, 2001): 351–81. http://dx.doi.org/10.1613/jair.807.

Full text
Abstract:
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.
APA, Harvard, Vancouver, ISO, and other styles
5

Le, Hung, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, and Svetha Venkatesh. "Episodic Policy Gradient Training." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7317–25. http://dx.doi.org/10.1609/aaai.v36i7.20694.

Full text
Abstract:
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any policy update step, the policy learner refers to the stored experiences, and adaptively reconfigures its learning algorithm with the new hyperparameters determined by the memory. This mechanism, dubbed as Episodic Policy Gradient Training (EPGT), enables an episodic learning process, and jointly learns the policy and the learning algorithm's hyperparameters within a single run. Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.
APA, Harvard, Vancouver, ISO, and other styles
6

Baxter, J., and P. L. Bartlett. "Infinite-Horizon Policy-Gradient Estimation." Journal of Artificial Intelligence Research 15 (November 1, 2001): 319–50. http://dx.doi.org/10.1613/jair.806.

Full text
Abstract:
Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes POMDPs controlled by parameterized stochastic policies. A similar algorithm was proposed by (Kimura et al. 1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free beta (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter beta is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter et al., this volume) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward.
APA, Harvard, Vancouver, ISO, and other styles
7

Pajarinen, Joni, Hong Linh Thai, Riad Akrour, Jan Peters, and Gerhard Neumann. "Compatible natural gradient policy search." Machine Learning 108, no. 8-9 (May 20, 2019): 1443–66. http://dx.doi.org/10.1007/s10994-019-05807-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Buffet, Olivier, and Douglas Aberdeen. "The factored policy-gradient planner." Artificial Intelligence 173, no. 5-6 (April 2009): 722–47. http://dx.doi.org/10.1016/j.artint.2008.11.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Wang, Lin, Xingang Xu, Xuhui Zhao, Baozhu Li, Ruijuan Zheng, and Qingtao Wu. "A randomized block policy gradient algorithm with differential privacy in Content Centric Networks." International Journal of Distributed Sensor Networks 17, no. 12 (December 2021): 155014772110599. http://dx.doi.org/10.1177/15501477211059934.

Full text
Abstract:
Policy gradient methods are effective means to solve the problems of mobile multimedia data transmission in Content Centric Networks. Current policy gradient algorithms impose high computational cost in processing high-dimensional data. Meanwhile, the issue of privacy disclosure has not been taken into account. However, privacy protection is important in data training. Therefore, we propose a randomized block policy gradient algorithm with differential privacy. In order to reduce computational complexity when processing high-dimensional data, we randomly select a block coordinate to update the gradients at each round. To solve the privacy protection problem, we add a differential privacy protection mechanism to the algorithm, and we prove that it preserves the [Formula: see text]-privacy level. We conduct extensive simulations in four environments, which are CartPole, Walker, HalfCheetah, and Hopper. Compared with the methods such as important-sampling momentum-based policy gradient, Hessian-Aided momentum-based policy gradient, REINFORCE, the experimental results of our algorithm show a faster convergence rate than others in the same environment.
APA, Harvard, Vancouver, ISO, and other styles
10

Akella, Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Animashree Anandkumar, and Yisong Yue. "Deep Bayesian Quadrature Policy Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 6600–6608. http://dx.doi.org/10.1609/aaai.v35i8.16817.

Full text
Abstract:
We study the problem of obtaining accurate policy gradient estimates using a finite number of samples. Monte-Carlo methods have been the default choice for policy gradient estimation, despite suffering from high variance in the gradient estimates. On the other hand, more sample efficient alternatives like Bayesian quadrature methods have received little attention due to their high computational complexity. In this work, we propose deep Bayesian quadrature policy gradient (DBQPG), a computationally efficient high-dimensional generalization of Bayesian quadrature, for policy gradient estimation. We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks. In comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient estimates with a significantly lower variance, (ii) a consistent improvement in the sample complexity and average return for several deep policy gradient algorithms, and, (iii) the uncertainty in gradient estimation that can be incorporated to further improve the performance.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Policy gradient"

1

Jacobzon, Gustaf, and Martin Larsson. "Generalizing Deep Deterministic Policy Gradient." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239365.

Full text
Abstract:
We extend Deep Deterministic Policy Gradient, a state of the art algorithm for continuous control, in order to achieve a high generalization capability. To achieve better generalization capabilities for the agent we introduce drop-out to the algorithm one of the most successful regularization techniques for generalization in machine learning. We use the recently published exploration technique, parameter space noise, to achieve higher stability and less likelihood of converging to a poor local minimum. We also replace the nonlinearity Rectified Linear Unit (ReLU) with Exponential Linear Unit (ELU) for greater stability and faster learning for the agent. Our results show that an agent trained with drop-out has generalization capabilities that far exceeds one that was trained with L2-regularization, when evaluated in the racing simulator TORCS. Further we found ELU to produce a more stable and faster learning process than ReLU when evaluated in the physics simulator MuJoCo.
APA, Harvard, Vancouver, ISO, and other styles
2

Greensmith, Evan, and evan greensmith@gmail com. "Policy Gradient Methods: Variance Reduction and Stochastic Convergence." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20060106.193712.

Full text
Abstract:
In a reinforcement learning task an agent must learn a policy for performing actions so as to perform well in a given environment. Policy gradient methods consider a parameterized class of policies, and using a policy from the class, and a trajectory through the environment taken by the agent using this policy, estimate the performance of the policy with respect to the parameters. Policy gradient methods avoid some of the problems of value function methods, such as policy degradation, where inaccuracy in the value function leads to the choice of a poor policy. However, the estimates produced by policy gradient methods can have high variance.¶ In Part I of this thesis we study the estimation variance of policy gradient algorithms, in particular, when augmenting the estimate with a baseline, a common method for reducing estimation variance, and when using actor-critic methods. A baseline adjusts the reward signal supplied by the environment, and can be used to reduce the variance of a policy gradient estimate without adding any bias. We find the baseline that minimizes the variance. We also consider the class of constant baselines, and find the constant baseline that minimizes the variance. We compare this to the common technique of adjusting the rewards by an estimate of the performance measure. Actor-critic methods usually attempt to learn a value function accurate enough to be used in a gradient estimate without adding much bias. In this thesis we propose that in learning the value function we should also consider the variance. We show how considering the variance of the gradient estimate when learning a value function can be beneficial, and we introduce a new optimization criterion for selecting a value function.¶ In Part II of this thesis we consider online versions of policy gradient algorithms, where we update our policy for selecting actions at each step in time, and study the convergence of the these online algorithms. For such online gradient-based algorithms, convergence results aim to show that the gradient of the performance measure approaches zero. Such a result has been shown for an algorithm which is based on observing trajectories between visits to a special state of the environment. However, the algorithm is not suitable in a partially observable setting, where we are unable to access the full state of the environment, and its variance depends on the time between visits to the special state, which may be large even when only few samples are needed to estimate the gradient. To date, convergence results for algorithms that do not rely on a special state are weaker. We show that, for a certain algorithm that does not rely on a special state, the gradient of the performance measure approaches zero. We show that this continues to hold when using certain baseline algorithms suggested by the results of Part I.
APA, Harvard, Vancouver, ISO, and other styles
3

Greensmith, Evan. "Policy gradient methods : variance reduction and stochastic convergence /." View thesis entry in Australian Digital Theses Program, 2005. http://thesis.anu.edu.au/public/adt-ANU20060106.193712/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Aberdeen, Douglas Alexander, and doug aberdeen@anu edu au. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes." The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20030410.111006.

Full text
Abstract:
Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). ¶ In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ¶ Monte-Carlo policy-gradient approaches tend to produce gradient estimates with high variance. Two novel methods for reducing variance are introduced. The first uses high-order filters to replace the eligibility trace of the gradient estimator. The second uses a low-variance value-function method to learn a subset of the parameters and a policy-gradient method to learn the remainder. ¶ The algorithms are applied to large domains including a simulated robot navigation scenario, a multi-agent scenario with 21,000 states, and the complex real-world task of large vocabulary continuous speech recognition. To the best of the author's knowledge, no other policy-gradient algorithms have performed well at such tasks. ¶ The high variance of Monte-Carlo methods requires lengthy simulation and hence a super-computer to train agents within a reasonable time. The ANU ``Bunyip'' Linux cluster was built with such tasks in mind. It was used for several of the experimental results presented here. One chapter of this thesis describes an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001.
APA, Harvard, Vancouver, ISO, and other styles
5

Aberdeen, Douglas Alexander. "Policy-gradient algorithms for partially observable Markov decision processes /." View thesis entry in Australian Digital Theses Program, 2003. http://thesis.anu.edu.au/public/adt-ANU20030410.111006/index.html.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Lidström, Christian, and Hannes Leskelä. "Learning for RoboCup Soccer : Policy Gradient Reinforcement Learning inmulti-agent systems." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-157469.

Full text
Abstract:
Robo Cup Soccer is a long-running yearly world wide robotics competition,in which teams of autonomous robot agents play soccer against each other.This report focuses on the 2D simulator variant, where no actual robots are needed and the agents instead communicate with a server which keeps trackof the game state. RoboCup Soccer 2D simulation has become a major topic of research for articial intelligence, cooperative behaviour in multi-agent systems, and the learning thereof. Some form of machine learning is mandatory if you want to compete at the highest level, as the problem is too complex for manualconguration of a teams decision making.This report nds that PGRL is a common method for machine learning in Robo Cup teams, it is utilized in some of the best teams in Robo Cup. The report also nds that PGRL is an effective form of machine learning interms of learning speed, but there are many factors which affects this. Most often a compromise have to made between speed of learning and precision.
Robo Cup Soccer är en årlig världsomspännande robotiktävling, i vilken lag av autonoma robotagenter spelar fotboll mot varandra. Denna rapport fokuserar på 2D-simulatorn, vilken är en variant där inga riktiga robotar behövs, utan där spelarklienterna istället kommunicerar med en server vilken håller reda på speltillståndet. RoboCup Soccer 2D simulation har blivit ett stort ämne för forskning inom articiell intelligens, samarbete och beteende i multi-agent-system, och lärandet därav. Någon form av maskininlärning är ett krav om man villkunna tävla på den högsta nivån, då problemet är för komplext för att beslutsfattandet ska kunna programmeras manuellt.Denna rapport finner att PGRL är en vanlig metod för maskininlärning i Robo Cup-lag, den används inom några av de bästa lagen i Robo Cup. Rapporten nner också att PGRL är en effektiv form av maskininlärningn är det gäller inlärningshastighet, men att det finns många faktorer som kan påverka detta. Oftast måste en avvägning ske mellan inlärningshastighet och precision.
APA, Harvard, Vancouver, ISO, and other styles
7

GAVELLI, VIKTOR, and ALEXANDER GOMEZ. "Multi-agent system with Policy Gradient Reinforcement Learning for RoboCup Soccer Simulator." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-157418.

Full text
Abstract:
The RoboCup Soccer Simulator is a multi-agent soccer simulator used in competitions to simulate soccer playing robots. These competitionsare mainly held to promote robotics and AI research by providing a cheap and accessible way to program robot-like agents. In this report alearning multi-agent soccer team is implemented, described and tested.Policy Gradient Reinforcement Learning (PGRL) is used to train and alter the strategical decision making of the agents. The results show that PGRL improves the performance of the learningteam. But when the gap in performance between the learning team and the opponent is big the results were inconclusive.
RoboCup Soccer Simulator är en multiagent fotbollssimulator som används i tävlingar för att simulera robotar som spelar fotboll. Dessa tävlingar hålls huvudsakligen för att marknadsföra forskning inom robotik och articiell intelligens genom att tillhandahålla ett billigt och lättillgängligt sätt att programmera robotlika agenter. I denna rapportbeskrivs och testas en implementation av ett multiagentfotbollslag. PolicyGradiend Reinforcement Learning (PGRL) används för att träna ochförändra lagets beteende. Resultaten visar att PGRL förbättrar lagets prestanda, men närlagets prestanda skiljer sig avsevärt från motståndarens blir resultatetofullständigt.3
APA, Harvard, Vancouver, ISO, and other styles
8

Pianazzi, Enrico. "A deep reinforcement learning approach based on policy gradient for mobile robot navigation." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text
Abstract:
Reinforcement learning is a model-free technique to solve decision-making problems by learning the best behavior to solve a specific task in a given environment. This thesis work focuses on state-of-the-art reinforcement learning methods and their application to mobile robotics navigation and control. Our work is inspired by the recent developments in deep reinforcement learning and from the ever-growing need for complex control and navigation capabilities from autonomous mobile robots. We propose a reinforcement learning controller based on an actor-critic approach to navigate a mobile robot in an initially unknown environment. The task is to navigate the robot from a random initial point on the map to a fixed goal point, while trying to stay within the environment limits and to avoid obstacles on the path. The agent has no initial knowledge of the environment's characteristic, including the goal and obstacles positions. The adopted algorithm is the so-called Deep Deterministic Policy Gradient (DDPG), which is able to deal with continuous states and inputs thanks to the use of neural networks in the actor-critic architecture and of the policy gradient to update the neural network representing the control policy. The learned controller directly outputs velocity commands to the robot, basing its decisions on the robot's position, without the need of additional sensory data. The robot is simulated as a unicycle kinematic model, and we present an implementation of the learning algorithm and robot simulation developed in Python that is able to solve the goal-reaching task while avoiding obstacles with a success rate above 95%.
APA, Harvard, Vancouver, ISO, and other styles
9

Poulin, Nolan. "Proactive Planning through Active Policy Inference in Stochastic Environments." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/1267.

Full text
Abstract:
In multi-agent Markov Decision Processes, a controllable agent must perform optimal planning in a dynamic and uncertain environment that includes another unknown and uncontrollable agent. Given a task specification for the controllable agent, its ability to complete the task can be impeded by an inaccurate model of the intent and behaviors of other agents. In this work, we introduce an active policy inference algorithm that allows a controllable agent to infer a policy of the environmental agent through interaction. Active policy inference is data-efficient and is particularly useful when data are time-consuming or costly to obtain. The controllable agent synthesizes an exploration-exploitation policy that incorporates the knowledge learned about the environment's behavior. Whenever possible, the agent also tries to elicit behavior from the other agent to improve the accuracy of the environmental model. This is done by mapping the uncertainty in the environmental model to a bonus reward, which helps elicit the most informative exploration, and allows the controllable agent to return to its main task as fast as possible. Experiments demonstrate the improved sample efficiency of active learning and the convergence of the policy for the controllable agents.
APA, Harvard, Vancouver, ISO, and other styles
10

Fleming, Brian James. "The social gradient in health : trends in C20th ideas, Australian Health Policy 1970-1998, and a health equity policy evaluation of Australian aged care planning /." Title page, abstract and table of contents only, 2003. http://web4.library.adelaide.edu.au/theses/09PH/09phf5971.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Policy gradient"

1

Deyette, Jeff. Plugging in renewable energy: Grading the states. Cambridge, MA: Union of Concerned Scientists, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Olmstead, Alan L. Hog round marketing, seed quality, and government policy: Institutional change in U.S. cotton production, 1920-1960. Cambridge, Mass: National Bureau of Economic Research, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Harris, Ann. School-based assessment in GCE and CSE boards: A report on policy and practice. London: Secondary Examinations Council, 1986.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Diez, Lara. The use of call grading: How calls to the police are graded and resourced. London: Home Office Police Research Group, 1995.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Torres, Justin. Grading the systems: The guide to state standards, tests, and accountability policies. Washington, D.C: Thomas B. Fordham Foudation, 2004.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Grading the 44th president: A report card on Barack Obama's first term as a progressive leader. Santa Barbara, Calif: Praeger, 2012.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Delahanty, Julie. From social movements to social clauses: Grading strategies for improving conditions for women garment workers. Ottawa: North-South Institute, 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Is Al-Qaeda winning?: Grading the Administration's counterterrorism policy : hearing before the Subcommittee on Terrorism, Nonproliferation, and Trade of the Committee on Foreign Affairs, House of Representatives, One Hundred Thirteenth Congress, second session, April 8, 2014. Washington: U.S. Government Printing Office, 2014.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

A, Prashanth L., and Michael C. Fu. Risk-Sensitive Reinforcement Learning Via Policy Gradient Search. Now Publishers, 2022.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Gorard, Stephen. Education Policy. Policy Press, 2018. http://dx.doi.org/10.1332/policypress/9781447342144.001.0001.

Full text
Abstract:
What has been done to achieve fairer and more efficient education systems, and what more can be done in the future? This book provides a comprehensive examination of crucial policy areas for education, such as differential outcomes, the poverty gradient, and the allocation of resources to education, to identify likely causes of educational disadvantage among students and lifelong learners. This analysis is supported by 20 years of extensive research, based in the home countries of the UK and on work in all EU 28 countries, USA, Pakistan, and Japan. The book brings invaluable insights into the underlying problems within education policy, and proposes practical solutions for a brighter future.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Policy gradient"

1

Huang, Ruitong, Tianyang Yu, Zihan Ding, and Shanghang Zhang. "Policy Gradient." In Deep Reinforcement Learning, 161–212. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-4095-0_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Buffet, Olivier. "Policy-Gradient Algorithms." In Markov Decision Processes in Artificial Intelligence, 127–52. Hoboken, NJ USA: John Wiley & Sons, Inc., 2013. http://dx.doi.org/10.1002/9781118557426.ch5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zeugmann, Thomas, Pascal Poupart, James Kennedy, Xin Jin, Jiawei Han, Lorenza Saitta, Michele Sebag, et al. "Policy Gradient Methods." In Encyclopedia of Machine Learning, 774–76. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_640.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Sanghi, Nimish. "Policy Gradient Algorithms." In Deep Reinforcement Learning with Python, 207–49. Berkeley, CA: Apress, 2021. http://dx.doi.org/10.1007/978-1-4842-6809-4_7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Peters, Jan, and J. Andrew Bagnell. "Policy Gradient Methods." In Encyclopedia of Machine Learning and Data Mining, 1–4. Boston, MA: Springer US, 2016. http://dx.doi.org/10.1007/978-1-4899-7502-7_646-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Peters, Jan, and J. Andrew Bagnell. "Policy Gradient Methods." In Encyclopedia of Machine Learning and Data Mining, 982–85. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_646.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rao, Ashwin, and Tikhon Jelvis. "Policy Gradient Algorithms." In Foundations of Reinforcement Learning with Applications in Finance, 381–408. Boca Raton: Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003229193-14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bono, Guillaume, Jilles Steeve Dibangoye, Laëtitia Matignon, Florian Pereyron, and Olivier Simonin. "Cooperative Multi-agent Policy Gradient." In Machine Learning and Knowledge Discovery in Databases, 459–76. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-10925-7_28.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yan, Yan, and Quan Liu. "Policy Space Noise in Deep Deterministic Policy Gradient." In Neural Information Processing, 624–34. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-04179-3_55.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Wang, Yixiang, and Feng Wu. "Policy Adaptive Multi-agent Deep Deterministic Policy Gradient." In PRIMA 2020: Principles and Practice of Multi-Agent Systems, 165–81. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-69322-0_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Policy gradient"

1

Maggipinto, Marco, Gian Antonio Susto, and Pratik Chaudhari. "Proximal Deterministic Policy Gradient." In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020. http://dx.doi.org/10.1109/iros45743.2020.9341559.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Nilsson, Olle, and Antoine Cully. "Policy gradient assisted MAP-Elites." In GECCO '21: Genetic and Evolutionary Computation Conference. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3449639.3459304.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Peters, Jan, and Stefan Schaal. "Policy Gradient Methods for Robotics." In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2006. http://dx.doi.org/10.1109/iros.2006.282564.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Tsourdos, Antonios, Ir Adhi Dharma Permana, Dewi H. Budiarti, Hyo-Sang Shin, and Chang-Hun Lee. "Developing Flight Control Policy Using Deep Deterministic Policy Gradient." In 2019 IEEE International Conference on Aerospace Electronics and Remote Sensing Technology (ICARES). IEEE, 2019. http://dx.doi.org/10.1109/icares.2019.8914343.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bose, Sourabh, and Manfred Huber. "Training neural networks with policy gradient." In 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, 2017. http://dx.doi.org/10.1109/ijcnn.2017.7966360.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Awate, Yogesh P. "Policy-Gradient Based Actor-Critic Algorithms." In 2009 WRI Global Congress on Intelligent Systems. IEEE, 2009. http://dx.doi.org/10.1109/gcis.2009.372.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Vien, Ngo Anh, and TaeChoong Chung. "Policy Gradient Semi-markov Decision Process." In 2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2008. http://dx.doi.org/10.1109/ictai.2008.51.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Banerjee, Bikramjit, and Jing Peng. "Adaptive policy gradient in multiagent learning." In the second international joint conference. New York, New York, USA: ACM Press, 2003. http://dx.doi.org/10.1145/860575.860686.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Xiao, Bo, Wuguannan Yao, and Xiang Zhou. "Optimal Option Hedging with Policy Gradient." In 2021 International Conference on Data Mining Workshops (ICDMW). IEEE, 2021. http://dx.doi.org/10.1109/icdmw53433.2021.00145.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Sun, Zhou. "Mutual Deep Deterministic Policy Gradient Learning." In 2022 International Conference on Big Data, Information and Computer Network (BDICN). IEEE, 2022. http://dx.doi.org/10.1109/bdicn55575.2022.00099.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Policy gradient"

1

Lleras-Muney, Adriana. Education and Income Gradients in Longevity: The Role of Policy. Cambridge, MA: National Bureau of Economic Research, January 2022. http://dx.doi.org/10.3386/w29694.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Umberger, Pierce. Experimental Evaluation of Dynamic Crack Branching in Poly(methyl methacrylate) (PMMA) Using the Method of Coherent Gradient Sensing. Fort Belvoir, VA: Defense Technical Information Center, February 2010. http://dx.doi.org/10.21236/ada518614.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

A Decision-Making Method for Connected Autonomous Driving Based on Reinforcement Learning. SAE International, December 2020. http://dx.doi.org/10.4271/2020-01-5154.

Full text
Abstract:
At present, with the development of Intelligent Vehicle Infrastructure Cooperative Systems (IVICS), the decision-making for automated vehicle based on connected environment conditions has attracted more attentions. Reliability, efficiency and generalization performance are the basic requirements for the vehicle decision-making system. Therefore, this paper proposed a decision-making method for connected autonomous driving based on Wasserstein Generative Adversarial Nets-Deep Deterministic Policy Gradient (WGAIL-DDPG) algorithm. In which, the key components for reinforcement learning (RL) model, reward function, is designed from the aspect of vehicle serviceability, such as safety, ride comfort and handling stability. To reduce the complexity of the proposed model, an imitation learning strategy is introduced to improve the RL training process. Meanwhile, the model training strategy based on cloud computing effectively solves the problem of insufficient computing resources of the vehicle-mounted system. Test results show that the proposed method can improve the efficiency for RL training process with reliable decision making performance and reveals excellent generalization capability.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography