Policy Gradient Methods

Peters, Jan; Bagnell, J. Andrew

doi:10.1007/978-1-4899-7687-1_646

Jan Peters^3,5,6 &
J. Andrew Bagnell⁴

685 Accesses

Abstract

Already Richard Bellman suggested that searching in policy space is fundamentally different from value function-based reinforcement learning — and frequently advantageous, especially in robotics and other systems with continuous actions. Policy gradient methods optimize in policy space by maximizing the expected reward using a direct gradient ascent. We discuss their basics and the most prominent approaches to policy gradient estimation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 699.99; Price excludes VAT (USA)

Hardcover Book: USD 949.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Department of Empirical Inference, Max-Planck Institute for Intelligent Systems, Spemannstr. 38, 72076, Tübingen, Germany
Jan Peters
Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
J. Andrew Bagnell
Intelligent Autonomous Systems, Computer Science Department, Technische Universität Darmstadt, Hochschulstr. 10, 64293, Darmstadt, Hessen, Germany
Jan Peters
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Jan Peters

Authors

Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
J. Andrew Bagnell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Peters .

Editor information

Editors and Affiliations

The University of New South Wales, Sydney, NSW, Australia
Claude Sammut
Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Peters, J., Bagnell, J.A. (2017). Policy Gradient Methods. In: Sammut, C., Webb, G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7687-1_646

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7687-1_646
Published: 14 April 2017
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7685-7
Online ISBN: 978-1-4899-7687-1
eBook Packages: Computer ScienceReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Policy Gradient Methods

Abstract

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Publish with us

Navigation

Policy Gradient Methods

Abstract

Access this chapter

Recommended Reading

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this entry

Cite this entry

Download citation

Share this entry

Publish with us

Search

Navigation