Skip to main content
SpringerLink
Log in
Menu
Find a journal Publish with us Track your research
Search
Cart
  1. Home
  2. Machine Learning
  3. Article

Hybrid least-squares algorithms for approximate policy evaluation

  • Published: 23 July 2009
  • Volume 76, pages 243–256, (2009)
  • Cite this article
Download PDF
Machine Learning Aims and scope Submit manuscript
Hybrid least-squares algorithms for approximate policy evaluation
Download PDF
  • Jeff Johns1,
  • Marek Petrik1 &
  • Sridhar Mahadevan1 
  • 575 Accesses

  • 16 Citations

  • Explore all metrics

Abstract

The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Different algorithms offer different choices of the optimization criterion. Two popular least-squares algorithms for performing this task are the Bellman residual method, which minimizes the Bellman residual, and the fixed point method, which minimizes the projection of the Bellman residual. When used within policy iteration, the fixed point algorithm tends to ultimately find better performing policies whereas the Bellman residual algorithm exhibits more stable behavior between rounds of policy iteration. We propose two hybrid least-squares algorithms to try to combine the advantages of these algorithms. We provide an analytical and geometric interpretation of hybrid algorithms and demonstrate their utility on a simple problem. Experimental results on both small and large domains suggest hybrid algorithms may find solutions that lead to better policies when performing policy iteration.

Article PDF

Download to read the full article text

Similar content being viewed by others

Algorithms for Constrained Optimization

Chapter © 2013

Least-Squares Reinforcement Learning Methods

Chapter © 2017

Least-squares-based three-term conjugate gradient methods

Article Open access 03 February 2020

Chunming Tang, Shuangyu Li & Zengru Cui

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  • Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89–129.

    Article  Google Scholar 

  • Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th international conference on machine learning (pp. 30–37).

  • Boyan, J. (1999). Least-squares temporal difference learning. In Proceedings of the 16th international conference on machine learning (pp. 49–56).

  • Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1–3), 33–57.

    MATH  Google Scholar 

  • Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 326–334). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.

    Article  MathSciNet  Google Scholar 

  • Lagoudakis, M., Parr, R., & Littman, M. (2002). Least-squares methods in reinforcement learning for control. In Proceedings of the 2nd Hellenic conference on artificial intelligence (pp. 249–260).

  • Li, L. (2008). A worst-case comparison between temporal difference and residual gradient with linear function approximation. In Proceedings of the 25th international conference on machine learning (pp. 560–567).

  • Mahadevan, S. (2005). Representation policy iteration. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 372–379).

  • Munos, R. (2003). Error bounds for approximate policy iteration. In Proceedings of the 20th international conference on machine learning (pp. 560–567).

  • Puterman, M. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.

    MATH  Google Scholar 

  • Schoknecht, R. (2003). Optimality of reinforcement learning algorithms with linear function approximation. In Advances in neural information processing systems (Vol. 15, pp. 1555–1562).

  • Schweitzer, P., & Seidmann, A. (1985). Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110, 568–582.

    Article  MATH  MathSciNet  Google Scholar 

  • Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.

    Google Scholar 

  • Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Massachusetts Amherst, Amherst, MA, 01003, USA

    Jeff Johns, Marek Petrik & Sridhar Mahadevan

Authors
  1. Jeff Johns
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Marek Petrik
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Sridhar Mahadevan
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeff Johns.

Additional information

Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johns, J., Petrik, M. & Mahadevan, S. Hybrid least-squares algorithms for approximate policy evaluation. Mach Learn 76, 243–256 (2009). https://doi.org/10.1007/s10994-009-5128-4

Download citation

  • Received: 12 June 2009

  • Revised: 12 June 2009

  • Accepted: 16 June 2009

  • Published: 23 July 2009

  • Issue Date: September 2009

  • DOI: https://doi.org/10.1007/s10994-009-5128-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Reinforcement learning
  • Markov decision processes
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us
  • Track your research

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

103.230.141.187

Not affiliated

Springer Nature

© 2024 Springer Nature