Policy Gradient Methods
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized control policy by a variant of gradient descent. These methods belong to the class of policy search techniques that maximize the expected return of a policy in a fixed policy class, in contrast with traditional value function approximationapproaches that derive policies from a value function. Policy gradient approaches have various advantages: they enable the straightforward incorporation of domain knowledge in policy parametrization and often an optimal policy is more compactly represented than the corresponding value function; many such methods guarantee to convergence to at least a locally optimal policy; the methods naturally handle continuous states and actions and often even imperfect state information. The counterveiling drawbacks include difficulties in off-policy settings, the potential for very slow convergence and high sample...
- Bagnell, J. A. (2004). Learning decisions: Robustness, uncertainty, and approximation. Doctoral dissertation, Robotics Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213.Google Scholar
- Fu, M. C. (2006). Handbook on operations research and management science: Simulation (Vol. 13, pp. 575–616) (Chapter 19: Stochastic gradient estimation). ISBN 10: 0-444-51428-7, Elsevier.Google Scholar
- Hasdorff, L. (1976). Gradient optimization and nonlinear control. John Wiley & Sons.Google Scholar
- Jacobson, D. & H., Mayne, D. Q. (1970). Differential Dynamic Programming. New York: American Elsevier Publishing Company, Inc.Google Scholar
- Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (UAI), Acapulco, Mexico.Google Scholar
- Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K.-R. Mueller, (Eds.), Advances in neural information processing systems (NIPS), Denver, CO. Cambridge: MIT Press.Google Scholar