Policy Gradient Methods

Peters, Jan; Bagnell, J. Andrew

doi:10.1007/978-0-387-30164-8_640

Jan Peters³ &
J. Andrew Bagnell³

1102 Accesses
1 Citations

Synonyms

Policy search

Definition

A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized control policy by a variant of gradient descent. These methods belong to the class of policy search techniques that maximize the expected return of a policy in a fixed policy class, in contrast with traditional value function approximationapproaches that derive policies from a value function. Policy gradient approaches have various advantages: they enable the straightforward incorporation of domain knowledge in policy parametrization and often an optimal policy is more compactly represented than the corresponding value function; many such methods guarantee to convergence to at least a locally optimal policy; the methods naturally handle continuous states and actions and often even imperfect state information. The counterveiling drawbacks include difficulties in off-policy settings, the potential for very slow convergence and high sample complexity, as...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Author information

Authors and Affiliations

Max Planck Institute for Biological Cybernetics, Tuebingen, Baden-Wuertemberg, Germany
Jan Peters & J. Andrew Bagnell

Authors

Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
J. Andrew Bagnell
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, University of New South Wales, Sydney, Australia, 2052
Claude Sammut
Faculty of Information Technology, Clayton School of Information Technology, Monash University, P.O. Box 63, Victoria, Australia, 3800
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Peters, J., Bagnell, J.A. (2011). Policy Gradient Methods. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-30164-8_640

Download citation

DOI: https://doi.org/10.1007/978-0-387-30164-8_640
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-30768-8
Online ISBN: 978-0-387-30164-8
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Policy Gradient Methods

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Policy Gradient Methods

Synonyms

Definition

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation