Log in

Relevant bibliographies by topics / Policy gradient / Journal articles

Journal articles on the topic 'Policy gradient'

To see the other types of publications on this topic, follow the link: Policy gradient.

Author: Grafiati

Published: 4 June 2021

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Policy gradient.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Cai, Qingpeng, Ling Pan, and Pingzhong Tang. "Deterministic Value-Policy Gradients." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3316–23. http://dx.doi.org/10.1609/aaai.v34i04.5732.

Full text

Abstract:

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

APA, Harvard, Vancouver, ISO, and other styles

2

Peters, Jan. "Policy gradient methods." Scholarpedia 5, no. 11 (2010): 3698. http://dx.doi.org/10.4249/scholarpedia.3698.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Zhao, Tingting, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, and Masashi Sugiyama. "Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration." Neural Computation 25, no. 6 (June 2013): 1512–47. http://dx.doi.org/10.1162/neco_a_00452.

Full text

Abstract:

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.

APA, Harvard, Vancouver, ISO, and other styles

4

Baxter, J., P. L. Bartlett, and L. Weaver. "Experiments with Infinite-Horizon, Policy-Gradient Estimation." Journal of Artificial Intelligence Research 15 (November 1, 2001): 351–81. http://dx.doi.org/10.1613/jair.807.

Full text

Abstract:

In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.

APA, Harvard, Vancouver, ISO, and other styles

5

Le, Hung, Majid Abdolshah, Thommen K. George, Kien Do, Dung Nguyen, and Svetha Venkatesh. "Episodic Policy Gradient Training." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7317–25. http://dx.doi.org/10.1609/aaai.v36i7.20694.

Full text

Abstract:

We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly. Unlike other hyperparameter searches, we formulate hyperparameter scheduling as a standard Markov Decision Process and use episodic memory to store the outcome of used hyperparameters and their training contexts. At any policy update step, the policy learner refers to the stored experiences, and adaptively reconfigures its learning algorithm with the new hyperparameters determined by the memory. This mechanism, dubbed as Episodic Policy Gradient Training (EPGT), enables an episodic learning process, and jointly learns the policy and the learning algorithm's hyperparameters within a single run. Experimental results on both continuous and discrete environments demonstrate the advantage of using the proposed method in boosting the performance of various policy gradient algorithms.

APA, Harvard, Vancouver, ISO, and other styles

6

Baxter, J., and P. L. Bartlett. "Infinite-Horizon Policy-Gradient Estimation." Journal of Artificial Intelligence Research 15 (November 1, 2001): 319–50. http://dx.doi.org/10.1613/jair.806.

Full text

Abstract:

Gradient-based approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in value-function methods. In this paper we introduce GPOMDP, a simulation-based algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes POMDPs controlled by parameterized stochastic policies. A similar algorithm was proposed by (Kimura et al. 1995). The algorithm's chief advantages are that it requires storage of only twice the number of policy parameters, uses one free beta (which has a natural interpretation in terms of bias-variance trade-off), and requires no knowledge of the underlying state. We prove convergence of GPOMDP, and show how the correct choice of the parameter beta is related to the mixing time of the controlled POMDP. We briefly describe extensions of GPOMDP to controlled Markov chains, continuous state, observation and control spaces, multiple-agents, higher-order derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter et al., this volume) we show how the gradient estimates generated by GPOMDP can be used in both a traditional stochastic gradient algorithm and a conjugate-gradient procedure to find local optima of the average reward.

APA, Harvard, Vancouver, ISO, and other styles

7

Pajarinen, Joni, Hong Linh Thai, Riad Akrour, Jan Peters, and Gerhard Neumann. "Compatible natural gradient policy search." Machine Learning 108, no. 8-9 (May 20, 2019): 1443–66. http://dx.doi.org/10.1007/s10994-019-05807-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Buffet, Olivier, and Douglas Aberdeen. "The factored policy-gradient planner." Artificial Intelligence 173, no. 5-6 (April 2009): 722–47. http://dx.doi.org/10.1016/j.artint.2008.11.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Wang, Lin, Xingang Xu, Xuhui Zhao, Baozhu Li, Ruijuan Zheng, and Qingtao Wu. "A randomized block policy gradient algorithm with differential privacy in Content Centric Networks." International Journal of Distributed Sensor Networks 17, no. 12 (December 2021): 155014772110599. http://dx.doi.org/10.1177/15501477211059934.

Full text

Abstract:

Policy gradient methods are effective means to solve the problems of mobile multimedia data transmission in Content Centric Networks. Current policy gradient algorithms impose high computational cost in processing high-dimensional data. Meanwhile, the issue of privacy disclosure has not been taken into account. However, privacy protection is important in data training. Therefore, we propose a randomized block policy gradient algorithm with differential privacy. In order to reduce computational complexity when processing high-dimensional data, we randomly select a block coordinate to update the gradients at each round. To solve the privacy protection problem, we add a differential privacy protection mechanism to the algorithm, and we prove that it preserves the [Formula: see text]-privacy level. We conduct extensive simulations in four environments, which are CartPole, Walker, HalfCheetah, and Hopper. Compared with the methods such as important-sampling momentum-based policy gradient, Hessian-Aided momentum-based policy gradient, REINFORCE, the experimental results of our algorithm show a faster convergence rate than others in the same environment.

APA, Harvard, Vancouver, ISO, and other styles

10

Akella, Ravi Tej, Kamyar Azizzadenesheli, Mohammad Ghavamzadeh, Animashree Anandkumar, and Yisong Yue. "Deep Bayesian Quadrature Policy Optimization." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 6600–6608. http://dx.doi.org/10.1609/aaai.v35i8.16817.

Full text

Abstract:

We study the problem of obtaining accurate policy gradient estimates using a finite number of samples. Monte-Carlo methods have been the default choice for policy gradient estimation, despite suffering from high variance in the gradient estimates. On the other hand, more sample efficient alternatives like Bayesian quadrature methods have received little attention due to their high computational complexity. In this work, we propose deep Bayesian quadrature policy gradient (DBQPG), a computationally efficient high-dimensional generalization of Bayesian quadrature, for policy gradient estimation. We show that DBQPG can substitute Monte-Carlo estimation in policy gradient methods, and demonstrate its effectiveness on a set of continuous control benchmarks. In comparison to Monte-Carlo estimation, DBQPG provides (i) more accurate gradient estimates with a significantly lower variance, (ii) a consistent improvement in the sample complexity and average return for several deep policy gradient algorithms, and, (iii) the uncertainty in gradient estimation that can be incorporated to further improve the performance.

APA, Harvard, Vancouver, ISO, and other styles

11

Peters, Jan, Katharina Mulling, and Yasemin Altun. "Relative Entropy Policy Search." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 5, 2010): 1607–12. http://dx.doi.org/10.1609/aaai.v24i1.7727.

Full text

Abstract:

Policy search is a successful approach to reinforcement learning. However, policy improvements often result in the loss of information. Hence, it has been marred by premature convergence and implausible solutions. As first suggested in the context of covariant policy gradients, many of these problems may be addressed by constraining the information loss. In this paper, we continue this path of reasoning and suggest the Relative Entropy Policy Search (REPS) method. The resulting method differs significantly from previous policy gradient approaches and yields an exact update step. It can be shown to work well on typical reinforcement learning benchmark problems.

APA, Harvard, Vancouver, ISO, and other styles

12

Han, Shuai, Wenbo Zhou, Shuai Lü, and Jiayu Yu. "Regularly updated deterministic policy gradient algorithm." Knowledge-Based Systems 214 (February 2021): 106736. http://dx.doi.org/10.1016/j.knosys.2020.106736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

D'Oro, Pierluca, Alberto Maria Metelli, Andrea Tirinzoni, Matteo Papini, and Marcello Restelli. "Gradient-Aware Model-Based Policy Search." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3801–8. http://dx.doi.org/10.1609/aaai.v34i04.5791.

Full text

Abstract:

Traditional model-based reinforcement learning approaches learn a model of the environment dynamics without explicitly considering how it will be used by the agent. In the presence of misspecified model classes, this can lead to poor estimates, as some relevant available information is ignored. In this paper, we introduce a novel model-based policy search approach that exploits the knowledge of the current agent policy to learn an approximate transition model, focusing on the portions of the environment that are most relevant for policy improvement. We leverage a weighting scheme, derived from the minimization of the error on the model-based policy gradient estimator, in order to define a suitable objective function that is optimized for learning the approximate transition model. Then, we integrate this procedure into a batch policy improvement algorithm, named Gradient-Aware Model-based Policy Search (GAMPS), which iteratively learns a transition model and uses it, together with the collected trajectories, to compute the new policy parameters. Finally, we empirically validate GAMPS on benchmark domains analyzing and discussing its properties.

APA, Harvard, Vancouver, ISO, and other styles

14

Li, Luntong, Dazi Li, and Tianheng Song. "Feature selection in deterministic policy gradient." Journal of Engineering 2020, no. 13 (July 1, 2020): 403–6. http://dx.doi.org/10.1049/joe.2019.1193.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Zhang, Chuheng, Yuanqi Li, and Jian Li. "Policy Search by Target Distribution Learning for Continuous Control." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 6770–77. http://dx.doi.org/10.1609/aaai.v34i04.6156.

Full text

Abstract:

It is known that existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may suffer from overly large gradients when the current policy is close to deterministic, leading to an unstable training process. We show that such instability can happen even in a very simple environment. To address this issue, we propose a new method, called target distribution learning (TDL), for policy improvement in reinforcement learning. TDL alternates between proposing a target distribution and training the policy network to approach the target distribution. TDL is more effective in constraining the KL divergence between updated policies, and hence leads to more stable policy improvements over iterations. Our experiments show that TDL algorithms perform comparably to (or better than) state-of-the-art algorithms for most continuous control tasks in the MuJoCo environment while being more stable in training.

APA, Harvard, Vancouver, ISO, and other styles

16

Zhang, Junzi, Jongho Kim, Brendan O'Donoghue, and Stephen Boyd. "Sample Efficient Reinforcement Learning with REINFORCE." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10887–95. http://dx.doi.org/10.1609/aaai.v35i12.17300.

Full text

Abstract:

Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works have either required exact gradients or state-action visitation measure based mini-batch stochastic gradients with a diverging batch size, which limit their applicability in practical scenarios. In this paper, we consider classical policy gradient methods that compute an approximate gradient with a single trajectory or a fixed size mini-batch of trajectories under soft-max parametrization and log-barrier regularization, along with the widely-used REINFORCE gradient estimation procedure. By controlling the number of "bad" episodes and resorting to the classical doubling trick, we establish an anytime sub-linear high probability regret bound as well as almost sure global convergence of the average regret with an asymptotically sub-linear rate. These provide the first set of global convergence and sample efficiency results for the well-known REINFORCE algorithm and contribute to a better understanding of its performance in practice.

APA, Harvard, Vancouver, ISO, and other styles

17

Jiang, Zhanhong, Xian Yeow Lee, Sin Yong Tan, Kai Liang Tan, Aditya Balu, Young M. Lee, Chinmay Hegde, and Soumik Sarkar. "MDPGT: Momentum-Based Decentralized Policy Gradient Tracking." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 9 (June 28, 2022): 9377–85. http://dx.doi.org/10.1609/aaai.v36i9.21169.

Full text

Abstract:

We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. MDPGT provably achieves the best available sample complexity of O(N -1 e -3) for converging to an e-stationary point of the global average of N local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning and when initialized with a single trajectory, the sample complexity matches those obtained by the existing decentralized policy gradient methods. We further validate the theoretical claim for the Gaussian policy function. When the required error tolerance e is small enough, MDPGT leads to a linear speed up, which has been previously established in decentralized stochastic optimization, but not for reinforcement learning. Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.

APA, Harvard, Vancouver, ISO, and other styles

18

Gomoluch, Paweł, Dalal Alrajeh, and Alessandra Russo. "Learning Classical Planning Strategies with Policy Gradient." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 637–45. http://dx.doi.org/10.1609/icaps.v29i1.3531.

Full text

Abstract:

A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on simple best-first search which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy, mapping the state of the search to a probability distribution over the approaches. This enables using policy gradient to learn search strategies tailored to a specific distributions of planning problems and a selected performance metric, e.g. the IPC score. We instantiate the framework by constructing a policy space consisting of five search approaches and a two-dimensional representation of the planner’s state. Then, we train the system on randomly generated problems from five IPC domains using three different performance metrics. Our experimental results show that the learner is able to discover domain-specific search strategies, improving the planner’s performance relative to the baselines of plain bestfirst search and a uniform policy.

APA, Harvard, Vancouver, ISO, and other styles

19

El-Laham, Yousef, and Monica F. Bugallo. "Policy Gradient Importance Sampling for Bayesian Inference." IEEE Transactions on Signal Processing 69 (2021): 4245–56. http://dx.doi.org/10.1109/tsp.2021.3093792.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Yu, Hai-Tao, Degen Huang, Fuji Ren, and Lishuang Li. "Diagnostic Evaluation of Policy-Gradient-Based Ranking." Electronics 11, no. 1 (December 23, 2021): 37. http://dx.doi.org/10.3390/electronics11010037.

Full text

Abstract:

Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient.

APA, Harvard, Vancouver, ISO, and other styles

21

Zhao, Tingting, Hirotaka Hachiya, Gang Niu, and Masashi Sugiyama. "Analysis and improvement of policy gradient estimation." Neural Networks 26 (February 2012): 118–29. http://dx.doi.org/10.1016/j.neunet.2011.09.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Kamiński, Bogumił. "Refined knowledge-gradient policy for learning probabilities." Operations Research Letters 43, no. 2 (March 2015): 143–47. http://dx.doi.org/10.1016/j.orl.2015.01.001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Cherubini, A., F. Giannone, L. Iocchi, D. Nardi, and P. F. Palamara. "Policy gradient learning for quadruped soccer robots." Robotics and Autonomous Systems 58, no. 7 (July 2010): 872–78. http://dx.doi.org/10.1016/j.robot.2010.03.008.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Pirotta, Matteo, Marcello Restelli, and Luca Bascetta. "Policy gradient in Lipschitz Markov Decision Processes." Machine Learning 100, no. 2-3 (March 3, 2015): 255–83. http://dx.doi.org/10.1007/s10994-015-5484-1.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Liu, Jian, and Liming Feng. "Diversity Evolutionary Policy Deep Reinforcement Learning." Computational Intelligence and Neuroscience 2021 (August 3, 2021): 1–11. http://dx.doi.org/10.1155/2021/5300189.

Full text

Abstract:

The reinforcement learning algorithms based on policy gradient may fall into local optimal due to gradient disappearance during the update process, which in turn affects the exploration ability of the reinforcement learning agent. In order to solve the above problem, in this paper, the cross-entropy method (CEM) in evolution policy, maximum mean difference (MMD), and twin delayed deep deterministic policy gradient algorithm (TD3) are combined to propose a diversity evolutionary policy deep reinforcement learning (DEPRL) algorithm. By using the maximum mean discrepancy as a measure of the distance between different policies, some of the policies in the population maximize the distance between them and the previous generation of policies while maximizing the cumulative return during the gradient update. Furthermore, combining the cumulative returns and the distance between policies as the fitness of the population encourages more diversity in the offspring policies, which in turn can reduce the risk of falling into local optimal due to the disappearance of the gradient. The results in the MuJoCo test environment show that DEPRL has achieved excellent performance on continuous control tasks; especially in the Ant-v2 environment, the return of DEPRL ultimately achieved a nearly 20% improvement compared to TD3.

APA, Harvard, Vancouver, ISO, and other styles

26

Yang, Long, Yu Zhang, Gang Zheng, Qian Zheng, Pengfei Li, Jianhang Huang, and Gang Pan. "Policy Optimization with Stochastic Mirror Descent." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8823–31. http://dx.doi.org/10.1609/aaai.v36i8.20863.

Full text

Abstract:

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic mirror descent. In VRMPO, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed VRMPO needs only O(ε−3) sample trajectories to achieve an ε-approximate first-order stationary point, which matches the best sample complexity for policy optimization. Extensive empirical results demonstrate that VRMP outperforms the state-of-the-art policy gradient methods in various settings.

APA, Harvard, Vancouver, ISO, and other styles

27

Zhang, Chongjie, and Victor Lesser. "Multi-Agent Learning with Policy Prediction." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 927–34. http://dx.doi.org/10.1609/aaai.v24i1.7639.

Full text

Abstract:

Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting the basic gradient ascent approach with policy prediction. We prove that this augmentation results in a stronger notion of convergence than the basic gradient ascent, that is, strategies converge to a Nash equilibrium within a restricted class of iterated games. Motivated by this augmentation, we then propose a new practical multi-agent reinforcement learning (MARL) algorithm exploiting approximate policy prediction. Empirical results show that it converges faster and in a wider variety of situations than state-of-the-art MARL algorithms.

APA, Harvard, Vancouver, ISO, and other styles

28

Catling, PC, and RJ Burt. "Studies of the Ground-Dwelling Mammals of Eucalypt Forests in South-Eastern New South Wales: the Effect of Environmental Variables on Distribution and Abundance." Wildlife Research 22, no. 6 (1995): 669. http://dx.doi.org/10.1071/wr9950669.

Full text

Abstract:

The distribution and abundance of ground-dwelling mammals was examined in 13 areas within 500 000 ha of eucalypt (Eucalyptus) forest in SE New South Wales. Data are presented on the distribution and abundance of species in relation to 3 environmental gradient types involving 9 variables: 2 direct gradients (temperature, rainfall); 6 indirect gradients (aspect, steepness of slope, position on slope, landform profile around the site, altitude, season) and a resource gradient (lithology). Many species of ground-dwelling mammal of the forests of SE New South Wales were present along all gradients examined, although wide variation in abundance occurred for some species. Eight species were correlated with direct gradients and all species were correlated with at least one indirect gradient. There was wide variation and species diversity with lithology, but the variation was not related to nutrient status. Although variations in abundance occurred along environmental gradients, the composition of the ground-dwelling mammal fauna in SE New South Wales forests changed little. A fourth gradient type, the substrate gradient (biomass of plants), had the greatest effect, because in the short-term disturbances such as logging and fire play an important role. Disturbance can have a profound influence on the substrate gradient, but no influence on environmental gradients. The results are discussed in relation to the arboreal mammals and avifauna in the region and Environmental and Fauna Impact studies and forest management.

APA, Harvard, Vancouver, ISO, and other styles

29

Lee, Seunghyeon, Seongho Jin, Seonghyeon Hwang, and Inho Lee. "Learning Optimal Trajectory Generation for Low-Cost Redundant Manipulator using Deep Deterministic Policy Gradient(DDPG)." Journal of Korea Robotics Society 17, no. 1 (March 1, 2022): 58–67. http://dx.doi.org/10.7746/jkros.2022.17.1.058.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Zhang, Matthew S., Murat A. Erdogdu, and Animesh Garg. "Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 9066–73. http://dx.doi.org/10.1609/aaai.v36i8.20891.

Full text

Abstract:

Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence rates of policy gradient methods, extending the convergence regime to weakly smooth policy classes with L2 integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. Notably, our analysis also shows that convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide performance guarantees for the converged policies.

APA, Harvard, Vancouver, ISO, and other styles

31

Matsubara, Takamitsu, Jun Morimoto, Jun Nakanishi, Masa-Aki Sato, and Kenji Doya. "Learning a dynamic policy by using policy gradient: application to biped walking." Systems and Computers in Japan 38, no. 4 (2007): 25–38. http://dx.doi.org/10.1002/scj.20441.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Dharmavaram, Akshay, Matthew Riemer, and Shalabh Bhatnagar. "Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 10 (April 3, 2020): 13777–78. http://dx.doi.org/10.1609/aaai.v34i10.7160.

Full text

Abstract:

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an ordinary differential equation based approach for our convergence analysis and prove that the parameters of the intra-option policies, termination functions, and value functions, converge to their corresponding optimal values, with probability one. Finally, we illustrate the competitive advantage of learning options, in the average reward setting, on a grid-world environment with sparse rewards.

APA, Harvard, Vancouver, ISO, and other styles

33

L. A., Prashanth, and Michael C. Fu. "Risk-Sensitive Reinforcement Learning via Policy Gradient Search." Foundations and Trends® in Machine Learning 15, no. 5 (2022): 537–693. http://dx.doi.org/10.1561/2200000091.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Petrović, Andrija, Mladen Nikolić, Miloš Jovanović, Miloš Bijanić, and Boris Delibašić. "Fair classification via Monte Carlo policy gradient method." Engineering Applications of Artificial Intelligence 104 (September 2021): 104398. http://dx.doi.org/10.1016/j.engappai.2021.104398.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Wang, Yingfei, and Warren B. Powell. "Finite-Time Analysis for the Knowledge-Gradient Policy." SIAM Journal on Control and Optimization 56, no. 2 (January 2018): 1105–29. http://dx.doi.org/10.1137/16m1073388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Xi-Ren Cao. "A basic formula for online policy gradient algorithms." IEEE Transactions on Automatic Control 50, no. 5 (May 2005): 696–99. http://dx.doi.org/10.1109/tac.2005.847037.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Shi, Haibo, Yaoru Sun, Guangyuan Li, Fang Wang, Daming Wang, and Jie Li. "Hierarchical Intermittent Motor Control With Deterministic Policy Gradient." IEEE Access 7 (2019): 41799–810. http://dx.doi.org/10.1109/access.2019.2904910.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Li, Xiaoguang, Xin Zhang, Lixin Wang, and Ge Yu. "Offline Multi-Policy Gradient for Latent Mixture Environments." IEEE Access 9 (2021): 801–12. http://dx.doi.org/10.1109/access.2020.3045300.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Frazier, Peter I., Warren B. Powell, and Savas Dayanik. "A Knowledge-Gradient Policy for Sequential Information Collection." SIAM Journal on Control and Optimization 47, no. 5 (January 2008): 2410–39. http://dx.doi.org/10.1137/070693424.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

You, Shixun, Ming Diao, Lipeng Gao, Fulong Zhang, and Huan Wang. "Target tracking strategy using deep deterministic policy gradient." Applied Soft Computing 95 (October 2020): 106490. http://dx.doi.org/10.1016/j.asoc.2020.106490.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Frazier, Peter, Warren Powell, and Savas Dayanik. "The Knowledge-Gradient Policy for Correlated Normal Beliefs." INFORMS Journal on Computing 21, no. 4 (November 2009): 599–613. http://dx.doi.org/10.1287/ijoc.1080.0314.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Cherubini, A., F. Giannone, L. Iocchi, M. Lombardo, and G. Oriolo. "Policy gradient learning for a humanoid soccer robot." Robotics and Autonomous Systems 57, no. 8 (July 2009): 808–18. http://dx.doi.org/10.1016/j.robot.2009.03.006.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Zhang, Huaxiang, and Ying Fan. "An adaptive policy gradient in learning Nash equilibria." Neurocomputing 72, no. 1-3 (December 2008): 533–38. http://dx.doi.org/10.1016/j.neucom.2007.12.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Zhou, Chengmin, Bingding Huang, and Pasi Fränti. "A review of motion planning algorithms for intelligent robots." Journal of Intelligent Manufacturing 33, no. 2 (November 25, 2021): 387–424. http://dx.doi.org/10.1007/s10845-021-01867-z.

Full text

Abstract:

AbstractPrinciples of typical motion planning algorithms are investigated and analyzed in this paper. These algorithms include traditional planning algorithms, classical machine learning algorithms, optimal value reinforcement learning, and policy gradient reinforcement learning. Traditional planning algorithms investigated include graph search algorithms, sampling-based algorithms, interpolating curve algorithms, and reaction-based algorithms. Classical machine learning algorithms include multiclass support vector machine, long short-term memory, Monte-Carlo tree search and convolutional neural network. Optimal value reinforcement learning algorithms include Q learning, deep Q-learning network, double deep Q-learning network, dueling deep Q-learning network. Policy gradient algorithms include policy gradient method, actor-critic algorithm, asynchronous advantage actor-critic, advantage actor-critic, deterministic policy gradient, deep deterministic policy gradient, trust region policy optimization and proximal policy optimization. New general criteria are also introduced to evaluate the performance and application of motion planning algorithms by analytical comparisons. The convergence speed and stability of optimal value and policy gradient algorithms are specially analyzed. Future directions are presented analytically according to principles and analytical comparisons of motion planning algorithms. This paper provides researchers with a clear and comprehensive understanding about advantages, disadvantages, relationships, and future of motion planning algorithms in robots, and paves ways for better motion planning algorithms in academia, engineering, and manufacturing.

APA, Harvard, Vancouver, ISO, and other styles

45

Yang, Lei, James Dankert, and Jennie Si. "A performance gradient perspective on gradient‐based policy iteration and a modified value iteration." International Journal of Intelligent Computing and Cybernetics 1, no. 4 (October 17, 2008): 509–20. http://dx.doi.org/10.1108/17563780810919096.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Gao, Binpin, Yingmei Wu, Chen Li, Kejun Zheng, Yan Wu, Mengjiao Wang, Xin Fan, and Shengya Ou. "Multi-Scenario Prediction of Landscape Ecological Risk in the Sichuan-Yunnan Ecological Barrier Based on Terrain Gradients." Land 11, no. 11 (November 18, 2022): 2079. http://dx.doi.org/10.3390/land11112079.

Full text

Abstract:

Land use changes induced by human activities change landscape patterns and ecological processes, threatening regional and global ecosystems. Terrain gradient and anthropogenic multi-policy regulation can have a pronounced effect on landscape components. Forecasting the changing trend of landscape ecological risk (LER) is important for national ecological security and regional sustainability. The present study assessed changes in LER in the Sichuan-Yunnan Ecological Barrier over a 20-year period using land use data from 2000, 2010, and 2020. The enhanced Markov-PLUS (patch-generating land use simulation) model was used to predict and analyze the spatial distribution pattern of LER under the following three scenarios. These were business-as-usual (BAU), urban development and construction (UDC), and ecological development priority (EDP) in 2030. The influence of terrain conditions on LER was also explored. The results showed that over the past 20 years, the LER index increased and then decreased and was dominated by medium and low risk, accounting for more than 70% of the total risk-rated area. The highest and higher risk areas for the three future scenarios have increased in spatial extent. The UDC scenario showed the largest increase of 3341.13 km2 and 2684.85 km2, respectively. The highest-risk level has a strong selectivity for low gradients, with high-level risks more likely to occur at low gradients. The response of ecological risk to gradient changes shows a positive correlation distribution for high-gradient areas and a negative correlation distribution for low-gradient areas. The influence of future topographic gradient changes on LER remains significant. The value of multiscale geographically weighted regression (MGWR) for identifying the spatial heterogeneity of terrain gradient and LER is highlighted. It can play an important role in the formulation of scientific solutions for LER prevention and of an ecological conservation policy for mountainous areas with complex terrain.

APA, Harvard, Vancouver, ISO, and other styles

47

Chen, Qiulin, Karen Eggleston, Wei Zhang, Jiaying Zhao, and Sen Zhou. "The Educational Gradient in Health in China." China Quarterly 230 (May 15, 2017): 289–322. http://dx.doi.org/10.1017/s0305741017000613.

Full text

Abstract:

AbstractIt has been well established that better educated individuals enjoy better health and longevity. In theory, the educational gradients in health could be flattening if diminishing returns to improved average education levels and the influence of earlier population health interventions outweigh the gradient-steepening effects of new medical and health technologies. This paper documents how the gradients are evolving in China, a rapidly developing country, about which little is known on this topic. Based on recent mortality data and nationally representative health surveys, we find large and, in some cases, steepening educational gradients. We also find that the gradients vary by cohort, gender and region. Further, we find that the gradients can only partially be accounted for by economic factors. These patterns highlight the double disadvantage of those with low education, and suggest the importance of policy interventions that foster both aspects of human capital for them.

APA, Harvard, Vancouver, ISO, and other styles

48

Persson, Bertil R. R., and Freddy Ståhlberg. "Safety Aspects of Magnetic Resonance Examinations." International Journal of Technology Assessment in Health Care 1, no. 3 (July 1985): 647–65. http://dx.doi.org/10.1017/s0266462300001549.

Full text

Abstract:

In a standard whole-body NMR-scanning machine, the static magnetic field is generated by an electric current driven through large solenoid coils. Dynamic magnetic gradient fields are generated by electric current pulses in coils located at various orientations, thus producing magnetic gradients inx, y, andzdirections. The Rf (radiofrequency) radiation is transmitted through a specially shaped coil which also serves as an antenna receiving the NMR signals.

APA, Harvard, Vancouver, ISO, and other styles

49

Cohen, Andrew, Xingye Qiao, Lei Yu, Elliot Way, and Xiangrong Tong. "Diverse Exploration via Conjugate Policies for Policy Gradient Methods." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3404–11. http://dx.doi.org/10.1609/aaai.v33i01.33013404.

Full text

Abstract:

We address the challenge of effective exploration while maintaining good performance in policy gradient methods. As a solution, we propose diverse exploration (DE) via conjugate policies. DE learns and deploys a set of conjugate policies which can be conveniently generated as a byproduct of conjugate gradient descent. We provide both theoretical and empirical results showing the effectiveness of DE at achieving exploration, improving policy performance, and the advantage of DE over exploration by random policy perturbations.

APA, Harvard, Vancouver, ISO, and other styles

50

de Jesus, Junior Costa, Jair Augusto Bottega, Marco Antonio de Souza Leite Cuadros, and Daniel Fernando Tello Gamarra. "Deep Deterministic Policy Gradient for Navigation of Mobile Robots." Journal of Intelligent & Fuzzy Systems 40, no. 1 (January 4, 2021): 349–61. http://dx.doi.org/10.3233/jifs-191711.

Full text

Abstract:

This article describes the use of the Deep Deterministic Policy Gradient network, a deep reinforcement learning algorithm, for mobile robot navigation. The neural network structure has as inputs laser range findings, angular and linear velocities of the robot, and position and orientation of the mobile robot with respect to a goal position. The outputs of the network will be the angular and linear velocities used as control signals for the robot. The experiments demonstrated that deep reinforcement learning’s techniques that uses continuous actions, are efficient for decision-making in a mobile robot. Nevertheless, the design of the reward functions constitutes an important issue in the performance of deep reinforcement learning algorithms. In order to show the performance of the Deep Reinforcement Learning algorithm, we have applied successfully the proposed architecture in simulated environments and in experiments with a real robot.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!