Добірка наукової літератури з теми "Policy gradients"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Policy gradients".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Policy gradients"

1

Cai, Qingpeng, Ling Pan, and Pingzhong Tang. "Deterministic Value-Policy Gradients." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 3316–23. http://dx.doi.org/10.1609/aaai.v34i04.5732.

Повний текст джерела
Анотація:
Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Wierstra, D., A. Forster, J. Peters, and J. Schmidhuber. "Recurrent policy gradients." Logic Journal of IGPL 18, no. 5 (September 9, 2009): 620–34. http://dx.doi.org/10.1093/jigpal/jzp049.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Sehnke, Frank, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, and Jürgen Schmidhuber. "Parameter-exploring policy gradients." Neural Networks 23, no. 4 (May 2010): 551–59. http://dx.doi.org/10.1016/j.neunet.2009.12.004.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Zhao, Tingting, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, and Masashi Sugiyama. "Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration." Neural Computation 25, no. 6 (June 2013): 1512–47. http://dx.doi.org/10.1162/neco_a_00452.

Повний текст джерела
Анотація:
The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Seno, Takuma, and Michita Imai. "Policy Gradients with Memory-Augmented Critic." Transactions of the Japanese Society for Artificial Intelligence 36, no. 1 (January 1, 2021): B—K71_1–8. http://dx.doi.org/10.1527/tjsai.36-1_b-k71.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Millidge, Beren. "Deep active inference as variational policy gradients." Journal of Mathematical Psychology 96 (June 2020): 102348. http://dx.doi.org/10.1016/j.jmp.2020.102348.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Catling, PC, and RJ Burt. "Studies of the Ground-Dwelling Mammals of Eucalypt Forests in South-Eastern New South Wales: the Effect of Environmental Variables on Distribution and Abundance." Wildlife Research 22, no. 6 (1995): 669. http://dx.doi.org/10.1071/wr9950669.

Повний текст джерела
Анотація:
The distribution and abundance of ground-dwelling mammals was examined in 13 areas within 500 000 ha of eucalypt (Eucalyptus) forest in SE New South Wales. Data are presented on the distribution and abundance of species in relation to 3 environmental gradient types involving 9 variables: 2 direct gradients (temperature, rainfall); 6 indirect gradients (aspect, steepness of slope, position on slope, landform profile around the site, altitude, season) and a resource gradient (lithology). Many species of ground-dwelling mammal of the forests of SE New South Wales were present along all gradients examined, although wide variation in abundance occurred for some species. Eight species were correlated with direct gradients and all species were correlated with at least one indirect gradient. There was wide variation and species diversity with lithology, but the variation was not related to nutrient status. Although variations in abundance occurred along environmental gradients, the composition of the ground-dwelling mammal fauna in SE New South Wales forests changed little. A fourth gradient type, the substrate gradient (biomass of plants), had the greatest effect, because in the short-term disturbances such as logging and fire play an important role. Disturbance can have a profound influence on the substrate gradient, but no influence on environmental gradients. The results are discussed in relation to the arboreal mammals and avifauna in the region and Environmental and Fauna Impact studies and forest management.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Baxter, J., P. L. Bartlett, and L. Weaver. "Experiments with Infinite-Horizon, Policy-Gradient Estimation." Journal of Artificial Intelligence Research 15 (November 1, 2001): 351–81. http://dx.doi.org/10.1613/jair.807.

Повний текст джерела
Анотація:
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Chen, Qiulin, Karen Eggleston, Wei Zhang, Jiaying Zhao, and Sen Zhou. "The Educational Gradient in Health in China." China Quarterly 230 (May 15, 2017): 289–322. http://dx.doi.org/10.1017/s0305741017000613.

Повний текст джерела
Анотація:
AbstractIt has been well established that better educated individuals enjoy better health and longevity. In theory, the educational gradients in health could be flattening if diminishing returns to improved average education levels and the influence of earlier population health interventions outweigh the gradient-steepening effects of new medical and health technologies. This paper documents how the gradients are evolving in China, a rapidly developing country, about which little is known on this topic. Based on recent mortality data and nationally representative health surveys, we find large and, in some cases, steepening educational gradients. We also find that the gradients vary by cohort, gender and region. Further, we find that the gradients can only partially be accounted for by economic factors. These patterns highlight the double disadvantage of those with low education, and suggest the importance of policy interventions that foster both aspects of human capital for them.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Peters, Jan, and Stefan Schaal. "Reinforcement learning of motor skills with policy gradients." Neural Networks 21, no. 4 (May 2008): 682–97. http://dx.doi.org/10.1016/j.neunet.2008.02.003.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Policy gradients"

1

Crowley, Mark. "Equilibrium policy gradients for spatiotemporal planning." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/38971.

Повний текст джерела
Анотація:
In spatiotemporal planning, agents choose actions at multiple locations in space over some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actions for thousands of locations in the forest each year. The actions at each location could include harvesting trees, treating trees against disease and pests, or doing nothing. A utility model could place value on sale of forest products, ecosystem sustainability or employment levels, and could incorporate legal and logistical constraints such as avoiding large contiguous areas of clearcutting and managing road access. Planning requires a model of the dynamics. Existing simulators developed by forestry researchers can provide detailed models of the dynamics of a forest over time, but these simulators are often not designed for use in automated planning. This thesis presents spatiotemoral planning in terms of factored Markov decision processes. A policy gradient planning algorithm optimizes a stochastic spatial policy using existing simulators for dynamics. When a planning problem includes spatial interaction between locations, deciding on an action to carry out at one location requires considering the actions performed at other locations. This spatial interdependence is common in forestry and other environmental planning problems and makes policy representation and planning challenging. We define a spatial policy in terms of local policies defined as distributions over actions at one location conditioned upon actions at other locations. A policy gradient planning algorithm using this spatial policy is presented which uses Markov Chain Monte Carlo simulation to sample the landscape policy, estimate its gradient and use this gradient to guide policy improvement. Evaluation is carried out on a forestry planning problem with 1880 locations using a variety of value models and constraints. The distribution over joint actions at all locations can be seen as the equilibrium of a cyclic causal model. This equilibrium semantics is compared to Structural Equation Models. We also define an algorithm for approximating the equilibrium distribution for cyclic causal networks which exploits graphical structure and analyse when the algorithm is exact.
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Sehnke, Frank [Verfasser], Patrick van der [Akademischer Betreuer] Smagt, and Jürgen [Akademischer Betreuer] Schmidhuber. "Parameter Exploring Policy Gradients and their Implications / Frank Sehnke. Gutachter: Jürgen Schmidhuber. Betreuer: Patrick van der Smagt." München : Universitätsbibliothek der TU München, 2012. http://d-nb.info/1030099820/34.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Tolman, Deborah A. "Environmental Gradients, Community Boundaries, and Disturbance the Darlingtonia Fens of Southwestern Oregon." PDXScholar, 2004. https://pdxscholar.library.pdx.edu/open_access_etds/3013.

Повний текст джерела
Анотація:
The Darlingtonia fens, found on serpentine soils in southern Oregon, are distinct communities that frequently undergo dramatic changes in size and shape in response to a wide array of environmental factors. Since few systems demonstrate a balance among high water tables, shallow soils, the presence of heavy metals, and limited nutrients, conservative efforts have been made to preserve them. This dissertation investigates the role of fire on nutrient cycling and succession in three separate fens, each a different time since fire. I specifically analyze the spatial distributions of soil properties, the physical and ecological characteristics of ecotones between Jeffrey pine savanna and Darlingtonia fens, and the vegetation structure of fire-disturbed systems. Soil, water, and vegetation sampling were conducted along an array of transects, oriented perpendicular to community boundaries and main environmental gradients, at each of the three fens. Abrupt changes in vegetation, across communities, were consistently identified at each of the three sites, although statistical analysis did not always identify distinct mid-canopy communities. Below-ground variables were likewise distinguished at the fen and savanna boundary for two of the three sites. At the third site, discontinuities did not align with the fen boundaries, but followed fluctuations in soil NH4. My results suggest that below-ground discontinuities may be more important than fire at preserving these uniquely-adapted systems, while vegetation undergoes postfire succession from fen to mid-canopy to savanna after approximately 100 years since fire. Although restoration of ecosystem structure and processes was not the primary focus of this study, my data suggest that time since fire may drive ecosystem processes in a trajectory away from the normal succession cycle. Moreover, time since fire may decrease overall vigor of Darlingtonia populations.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Masoudi, Mohammad Amin. "Robust Deep Reinforcement Learning for Portfolio Management." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42743.

Повний текст джерела
Анотація:
In Finance, the use of Automated Trading Systems (ATS) on markets is growing every year and the trades generated by an algorithm now account for most of orders that arrive at stock exchanges (Kissell, 2020). Historically, these systems were based on advanced statistical methods and signal processing designed to extract trading signals from financial data. The recent success of Machine Learning has attracted the interest of the financial community. Reinforcement Learning is a subcategory of machine learning and has been broadly applied by investors and researchers in building trading systems (Kissell, 2020). In this thesis, we address the issue that deep reinforcement learning may be susceptible to sampling errors and over-fitting and propose a robust deep reinforcement learning method that integrates techniques from reinforcement learning and robust optimization. We back-test and compare the performance of the developed algorithm, Robust DDPG, with UBAH (Uniform Buy and Hold) benchmark and other RL algorithms and show that the robust algorithm of this research can reduce the downside risk of an investment strategy significantly and can ensure a safer path for the investor’s portfolio value.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Jacobzon, Gustaf, and Martin Larsson. "Generalizing Deep Deterministic Policy Gradient." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239365.

Повний текст джерела
Анотація:
We extend Deep Deterministic Policy Gradient, a state of the art algorithm for continuous control, in order to achieve a high generalization capability. To achieve better generalization capabilities for the agent we introduce drop-out to the algorithm one of the most successful regularization techniques for generalization in machine learning. We use the recently published exploration technique, parameter space noise, to achieve higher stability and less likelihood of converging to a poor local minimum. We also replace the nonlinearity Rectified Linear Unit (ReLU) with Exponential Linear Unit (ELU) for greater stability and faster learning for the agent. Our results show that an agent trained with drop-out has generalization capabilities that far exceeds one that was trained with L2-regularization, when evaluated in the racing simulator TORCS. Further we found ELU to produce a more stable and faster learning process than ReLU when evaluated in the physics simulator MuJoCo.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Ковальов, Костянтин Миколайович. "Комп'ютерна система управління промисловим роботом". Bachelor's thesis, КПІ ім. Ігоря Сікорського, 2019. https://ela.kpi.ua/handle/123456789/28610.

Повний текст джерела
Анотація:
Кваліфікаційна робота включає пояснювальну записку (56 с., 2 додатка). Об’єкт дослідження – алгоритми навчання з підкріпленням для задачі керування промисловою роботичною рукою. Задача непервного керування промисловою роботичною рукою для нетривіальних задач є занадто складною або навіть невирішуваною для класичних методів робототехніки. Методи навчання з підкріпленням можуть бути використані в цьому випадку. Вони є досить простими у реалізації, дозволяють узагальнюватися на небачені випадки, та вчитися на даних великої розмірності. Ми реалізуємо метод градієнту глибокої детермінованої стратегії, який підходить для складних задач непервного управління. В ході дослідження:  проведено аналіз існуючих класичних методів для задачі управління промисловим роботом  проведено аналіз існуючих алгоритмів навчання з підкріпленням та їх використання в області робототехніки  реалізовано алгоритм градієнту глибокої детермінованої стратегії  проведено тестування реалізованого алгоритму у спрощеному середовищі  запропоновано архітектуру нейронної мережі для вирішення поставленої задачі  проведено тестування алгоритму на навчальній виборці  проведено тестування алгоритму на здатність до узагальнення на тестовій виборці Показано здатність алгоритму градієнту глибокої детермінованої стратегії з використанням нейронних мереж для представлення стратегії вирішувати поставлену задачі з зображенням в якості входу та узагальнюватися на небачені до цього об’єкти.
Qualifying work includes an explanatory note (56 p., 2 appendix). The object of the study are reinforcement learning algorithms for the task of an industrial robotic arm control. Continuous control of an industrial robotic arm for non-trivial tasks is too complicated or even unsolvable for classical methods of robotics. Reinforcement learning methods can be used in this case. They are quite simple to implement, allow for generalization to unseen cases, and learn from high-dimensional data. We implement deep deterministic policy gradient algorithm that is suitable for complex continuous contol tasks. During the study: • An analysis of existing classical methods for the problem of industrial robot control was conducted • An analysis of existing algorithms of training with reinforcement learning and their use in the field of robotics has been conducted • Deep deterministic policy gradient algorithm is implemented • Implemented algorithm is tested on a simplified environment • The architecture of the neural network is proposed for solving the problem • Algorithm was tested on the training set of objects • Algorithm was tested for its generalization ability on the test set It was shown that deep deterministic policy gradient algorithm with neural network as policy approximator is able to solve the problem with the image as an input and to generalize to objects not seen before.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Greensmith, Evan, and evan greensmith@gmail com. "Policy Gradient Methods: Variance Reduction and Stochastic Convergence." The Australian National University. Research School of Information Sciences and Engineering, 2005. http://thesis.anu.edu.au./public/adt-ANU20060106.193712.

Повний текст джерела
Анотація:
In a reinforcement learning task an agent must learn a policy for performing actions so as to perform well in a given environment. Policy gradient methods consider a parameterized class of policies, and using a policy from the class, and a trajectory through the environment taken by the agent using this policy, estimate the performance of the policy with respect to the parameters. Policy gradient methods avoid some of the problems of value function methods, such as policy degradation, where inaccuracy in the value function leads to the choice of a poor policy. However, the estimates produced by policy gradient methods can have high variance.¶ In Part I of this thesis we study the estimation variance of policy gradient algorithms, in particular, when augmenting the estimate with a baseline, a common method for reducing estimation variance, and when using actor-critic methods. A baseline adjusts the reward signal supplied by the environment, and can be used to reduce the variance of a policy gradient estimate without adding any bias. We find the baseline that minimizes the variance. We also consider the class of constant baselines, and find the constant baseline that minimizes the variance. We compare this to the common technique of adjusting the rewards by an estimate of the performance measure. Actor-critic methods usually attempt to learn a value function accurate enough to be used in a gradient estimate without adding much bias. In this thesis we propose that in learning the value function we should also consider the variance. We show how considering the variance of the gradient estimate when learning a value function can be beneficial, and we introduce a new optimization criterion for selecting a value function.¶ In Part II of this thesis we consider online versions of policy gradient algorithms, where we update our policy for selecting actions at each step in time, and study the convergence of the these online algorithms. For such online gradient-based algorithms, convergence results aim to show that the gradient of the performance measure approaches zero. Such a result has been shown for an algorithm which is based on observing trajectories between visits to a special state of the environment. However, the algorithm is not suitable in a partially observable setting, where we are unable to access the full state of the environment, and its variance depends on the time between visits to the special state, which may be large even when only few samples are needed to estimate the gradient. To date, convergence results for algorithms that do not rely on a special state are weaker. We show that, for a certain algorithm that does not rely on a special state, the gradient of the performance measure approaches zero. We show that this continues to hold when using certain baseline algorithms suggested by the results of Part I.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Greensmith, Evan. "Policy gradient methods : variance reduction and stochastic convergence /." View thesis entry in Australian Digital Theses Program, 2005. http://thesis.anu.edu.au/public/adt-ANU20060106.193712/index.html.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Aberdeen, Douglas Alexander, and doug aberdeen@anu edu au. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes." The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20030410.111006.

Повний текст джерела
Анотація:
Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). ¶ In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ¶ Monte-Carlo policy-gradient approaches tend to produce gradient estimates with high variance. Two novel methods for reducing variance are introduced. The first uses high-order filters to replace the eligibility trace of the gradient estimator. The second uses a low-variance value-function method to learn a subset of the parameters and a policy-gradient method to learn the remainder. ¶ The algorithms are applied to large domains including a simulated robot navigation scenario, a multi-agent scenario with 21,000 states, and the complex real-world task of large vocabulary continuous speech recognition. To the best of the author's knowledge, no other policy-gradient algorithms have performed well at such tasks. ¶ The high variance of Monte-Carlo methods requires lengthy simulation and hence a super-computer to train agents within a reasonable time. The ANU ``Bunyip'' Linux cluster was built with such tasks in mind. It was used for several of the experimental results presented here. One chapter of this thesis describes an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Aberdeen, Douglas Alexander. "Policy-gradient algorithms for partially observable Markov decision processes /." View thesis entry in Australian Digital Theses Program, 2003. http://thesis.anu.edu.au/public/adt-ANU20030410.111006/index.html.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Книги з теми "Policy gradients"

1

Deep Reinforcement Learning Hands-On: Apply modern RL methods, with deep Q-networks, value iteration, policy gradients, TRPO, AlphaGo Zero and more. Packt Publishing, 2018.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Gorard, Stephen. Education Policy. Policy Press, 2018. http://dx.doi.org/10.1332/policypress/9781447342144.001.0001.

Повний текст джерела
Анотація:
What has been done to achieve fairer and more efficient education systems, and what more can be done in the future? This book provides a comprehensive examination of crucial policy areas for education, such as differential outcomes, the poverty gradient, and the allocation of resources to education, to identify likely causes of educational disadvantage among students and lifelong learners. This analysis is supported by 20 years of extensive research, based in the home countries of the UK and on work in all EU 28 countries, USA, Pakistan, and Japan. The book brings invaluable insights into the underlying problems within education policy, and proposes practical solutions for a brighter future.
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Olsen, Jan Abel. The social environment and health. Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780198794837.003.0007.

Повний текст джерела
Анотація:
This chapter explores three main issues related to the analyses of the social gradient in health: correlations, causations, and interventions. Observed correlations between indicators of socioeconomic position and health do not imply that there are causations. The usefulness of various indicators is discussed, such as education, income, occupation categories, and social class. A causal pathway is presented that suggests a chain from early life circumstances, via education, occupation, income, and perceived status onto health. The chapter ends with a discussion of various policy options to reduce inequalities in health that are caused by social determinants.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Olsen, Jan Abel. Exogenous determinants of health. Oxford University Press, 2017. http://dx.doi.org/10.1093/oso/9780198794837.003.0006.

Повний текст джерела
Анотація:
This chapter considers some determinants that lie completely outside of people’s own control. For such exogenous causes of ill health, the unlucky ones cannot be held responsible for their misfortune. Still, some of these causes are avoidable, in the sense that effective policy interventions exist. Biological variations are in general unavoidable. The chapter investigates two types of determinants associated with early life circumstances. The most systematic health difference that an individual is affected by is whether born a boy or girl: women live 5–6% longer than men. Childhood differences in health follow a strong social gradient, and some figures are included to prove this sad fact. One additional exogenous determinant is the physical environment that affects people’s health. A simple model is presented to show how unhealthy externalities can be reduced by imposing pollution taxes: the polluter pay principle.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Egger, Eva-Maria, Aslihan Arslan, and Emanuele Zucchini. Does connectivity reduce gender gaps in off-farm employment? Evidence from 12 low- and middle-income countries. 3rd ed. UNU-WIDER, 2021. http://dx.doi.org/10.35188/unu-wider/2021/937-2.

Повний текст джерела
Анотація:
Gender gaps in labour force participation in developing countries persist despite income growth or structural change. We assess this persistence across economic geographies within countries, focusing on youth employment in off-farm wage jobs. We combine household survey data from 12 low- and middle-income countries in Asia, Latin America, and sub-Saharan Africa with geospatial data on population density, and estimate simultaneous probit models of different activity choices across the rural-urban gradient. The gender gap increases with connectivity from rural to peri-urban areas, and disappears in high-density urban areas. In non-rural areas, child dependency does not constrain young women, and secondary education improves their access to off-farm employment. The gender gap persists for married young women independent of connectivity improvements, indicating social norm constraints. Marital status and child dependency are associated positively with male participation, and negatively with female participation; other factors such as education are show a positive association for both sexes. These results indicate entry points for policy.
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Policy gradients"

1

Sehnke, Frank, Christian Osendorfer, Jan Sölter, Jürgen Schmidhuber, and Ulrich Rührmair. "Policy Gradients for Cryptanalysis." In Artificial Neural Networks – ICANN 2010, 168–77. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-15825-4_22.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

McClarren, Ryan G. "Reinforcement Learning with Policy Gradients." In Machine Learning for Engineers, 219–37. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-70388-2_9.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Prashanth, L. A. "Policy Gradients for CVaR-Constrained MDPs." In Lecture Notes in Computer Science, 155–69. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-11662-4_12.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Tummon, Evan, Muhammad Adil Raja, and Conor Ryan. "Trading Cryptocurrency with Deep Deterministic Policy Gradients." In Lecture Notes in Computer Science, 245–56. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-62362-3_22.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Wierstra, Daan, Alexander Foerster, Jan Peters, and Jürgen Schmidhuber. "Solving Deep Memory POMDPs with Recurrent Policy Gradients." In Lecture Notes in Computer Science, 697–706. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007. http://dx.doi.org/10.1007/978-3-540-74690-4_71.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Lach, Luca, Timo Korthals, Francesco Ferro, Helge Ritter, and Malte Schilling. "Guiding Representation Learning in Deep Generative Models with Policy Gradients." In Communications in Computer and Information Science, 115–31. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-85672-4_9.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Staroverov, Alexey, Vladislav Vetlin, Stepan Makarenko, Anton Naumov, and Aleksandr I. Panov. "Learning Embodied Agents with Policy Gradients to Navigate in Realistic Environments." In Advances in Neural Computation, Machine Learning, and Cognitive Research IV, 212–21. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-60577-3_24.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Liu, Chujun, Andrew G. Lonsberry, Mark J. Nandor, Musa L. Audu, and Roger D. Quinn. "Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking." In Biomimetic and Biohybrid Systems, 276–87. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-95972-6_29.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Sehnke, Frank, and Tingting Zhao. "Baseline-Free Sampling in Parameter Exploring Policy Gradients: Super Symmetric PGPE." In Springer Series in Bio-/Neuroinformatics, 271–93. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-09903-3_13.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Roy, Kaushik, Qi Zhang, Manas Gaur, and Amit Sheth. "Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits." In Machine Learning and Knowledge Discovery in Databases. Research Track, 35–50. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86486-6_3.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Policy gradients"

1

Kersting, Kristian, and Kurt Driessens. "Non-parametric policy gradients." In the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390214.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Sehnke, Frank, Alex Graves, Christian Osendorfer, and Jurgen Schmidhuber. "Multimodal Parameter-exploring Policy Gradients." In 2010 International Conference on Machine Learning and Applications (ICMLA). IEEE, 2010. http://dx.doi.org/10.1109/icmla.2010.24.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Pan, Feiyang, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, and Qing He. "Policy Gradients for Contextual Recommendations." In The World Wide Web Conference. New York, New York, USA: ACM Press, 2019. http://dx.doi.org/10.1145/3308558.3313616.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Theodorou, Evangelos A., Jiri Najemnik, and Emo Todorov. "Free energy based policy gradients." In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2013. http://dx.doi.org/10.1109/adprl.2013.6614998.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Theodorou, Evangelos A., Krishnamurthy Dvijotham, and Emo Todorov. "Time varying nonlinear Policy Gradients." In 2013 IEEE 52nd Annual Conference on Decision and Control (CDC). IEEE, 2013. http://dx.doi.org/10.1109/cdc.2013.6761122.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Do, Chau, Camilo Gordillo, and Wolfram Burgard. "Learning to Pour using Deep Deterministic Policy Gradients." In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018. http://dx.doi.org/10.1109/iros.2018.8593654.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Nguyen, Hung The, Tung Nguyen, Do-Van Nguyen, and Thanh-Ha Le. "A Hierarchical Deep Deterministic Policy Gradients for Swarm Navigation." In 2019 11th International Conference on Knowledge and Systems Engineering (KSE). IEEE, 2019. http://dx.doi.org/10.1109/kse.2019.8919269.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Mani, Kaustubh, Meha Kaushik, Nirvan Singhania, and K. Madhava Krishna. "Learning Adaptive Driving Behavior Using Recurrent Deterministic Policy Gradients." In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 2019. http://dx.doi.org/10.1109/robio49542.2019.8961480.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Hegde, Shashank, Vishal Kumar, and Atul Singh. "Risk aware portfolio construction using deep deterministic policy gradients." In 2018 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2018. http://dx.doi.org/10.1109/ssci.2018.8628791.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Tahboub, Karim A. "Human-Machine Coadaptation Based on Reinforcement Learning with Policy Gradients." In 2019 8th International Conference on Systems and Control (ICSC). IEEE, 2019. http://dx.doi.org/10.1109/icsc47195.2019.8950660.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Звіти організацій з теми "Policy gradients"

1

Lleras-Muney, Adriana. Education and Income Gradients in Longevity: The Role of Policy. Cambridge, MA: National Bureau of Economic Research, January 2022. http://dx.doi.org/10.3386/w29694.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Umberger, Pierce. Experimental Evaluation of Dynamic Crack Branching in Poly(methyl methacrylate) (PMMA) Using the Method of Coherent Gradient Sensing. Fort Belvoir, VA: Defense Technical Information Center, February 2010. http://dx.doi.org/10.21236/ada518614.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

A Decision-Making Method for Connected Autonomous Driving Based on Reinforcement Learning. SAE International, December 2020. http://dx.doi.org/10.4271/2020-01-5154.

Повний текст джерела
Анотація:
At present, with the development of Intelligent Vehicle Infrastructure Cooperative Systems (IVICS), the decision-making for automated vehicle based on connected environment conditions has attracted more attentions. Reliability, efficiency and generalization performance are the basic requirements for the vehicle decision-making system. Therefore, this paper proposed a decision-making method for connected autonomous driving based on Wasserstein Generative Adversarial Nets-Deep Deterministic Policy Gradient (WGAIL-DDPG) algorithm. In which, the key components for reinforcement learning (RL) model, reward function, is designed from the aspect of vehicle serviceability, such as safety, ride comfort and handling stability. To reduce the complexity of the proposed model, an imitation learning strategy is introduced to improve the RL training process. Meanwhile, the model training strategy based on cloud computing effectively solves the problem of insufficient computing resources of the vehicle-mounted system. Test results show that the proposed method can improve the efficiency for RL training process with reliable decision making performance and reveals excellent generalization capability.
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії