Skip to main content

Policy Gradient Methods

  • Reference work entry

Synonyms

Policy search

Definition

A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized control policy by a variant of gradient descent. These methods belong to the class of policy search techniques that maximize the expected return of a policy in a fixed policy class, in contrast with traditional value function approximationapproaches that derive policies from a value function. Policy gradient approaches have various advantages: they enable the straightforward incorporation of domain knowledge in policy parametrization and often an optimal policy is more compactly represented than the corresponding value function; many such methods guarantee to convergence to at least a locally optimal policy; the methods naturally handle continuous states and actions and often even imperfect state information. The counterveiling drawbacks include difficulties in off-policy settings, the potential for very slow convergence and high sample complexity, as...

This is a preview of subscription content, log in via an institution.

Recommended Reading

  • Bagnell, J. A. (2004). Learning decisions: Robustness, uncertainty, and approximation. Doctoral dissertation, Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213.

    Google Scholar 

  • Fu, M. C. (2006). Handbook on operations research and management science: Simulation (Vol. 13, pp. 575–616) (Chapter 19: Stochastic gradient estimation). ISBN 10: 0-444-51428-7, Elsevier.

    Google Scholar 

  • Glynn, P. (1990). Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33(10), 75–84.

    Article  Google Scholar 

  • Hasdorff, L. (1976). Gradient optimization and nonlinear control. John Wiley & Sons.

    Google Scholar 

  • Jacobson, D. & H., Mayne, D. Q. (1970). Differential Dynamic Programming. New York: American Elsevier Publishing Company, Inc.

    Google Scholar 

  • Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (UAI), Acapulco, Mexico.

    Google Scholar 

  • Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–97.

    Article  Google Scholar 

  • Spall, J. C. (2003). Introduction to stochastic search and optimization: Estimation, simulation, and control. Hoboken: Wiley.

    Book  MATH  Google Scholar 

  • Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K.-R. Mueller, (Eds.), Advances in neural information processing systems (NIPS), Denver, CO. Cambridge: MIT Press.

    Google Scholar 

  • Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this entry

Cite this entry

Peters, J., Bagnell, J.A. (2011). Policy Gradient Methods. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_640

Download citation

Publish with us

Policies and ethics