Synonyms
Definition
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized control policy by a variant of gradient descent. These methods belong to the class of policy search techniques that maximize the expected return of a policy in a fixed policy class, in contrast with traditional value function approximationapproaches that derive policies from a value function. Policy gradient approaches have various advantages: they enable the straightforward incorporation of domain knowledge in policy parametrization and often an optimal policy is more compactly represented than the corresponding value function; many such methods guarantee to convergence to at least a locally optimal policy; the methods naturally handle continuous states and actions and often even imperfect state information. The counterveiling drawbacks include difficulties in off-policy settings, the potential for very slow convergence and high sample complexity, as...
This is a preview of subscription content, log in via an institution.
Recommended Reading
Bagnell, J. A. (2004). Learning decisions: Robustness, uncertainty, and approximation. Doctoral dissertation, Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213.
Fu, M. C. (2006). Handbook on operations research and management science: Simulation (Vol. 13, pp. 575–616) (Chapter 19: Stochastic gradient estimation). ISBN 10: 0-444-51428-7, Elsevier.
Glynn, P. (1990). Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33(10), 75–84.
Hasdorff, L. (1976). Gradient optimization and nonlinear control. John Wiley & Sons.
Jacobson, D. & H., Mayne, D. Q. (1970). Differential Dynamic Programming. New York: American Elsevier Publishing Company, Inc.
Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (UAI), Acapulco, Mexico.
Peters, J., & Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), 682–97.
Spall, J. C. (2003). Introduction to stochastic search and optimization: Estimation, simulation, and control. Hoboken: Wiley.
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K.-R. Mueller, (Eds.), Advances in neural information processing systems (NIPS), Denver, CO. Cambridge: MIT Press.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Peters, J., Bagnell, J.A. (2011). Policy Gradient Methods. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_640
Download citation
DOI: https://doi.org/10.1007/978-0-387-30164-8_640
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering