Hybrid least-squares algorithms for approximate policy evaluation

Johns, Jeff; Petrik, Marek; Mahadevan, Sridhar

doi:10.1007/s10994-009-5128-4

Hybrid least-squares algorithms for approximate policy evaluation

Published: 23 July 2009

Volume 76, pages 243–256, (2009)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Hybrid least-squares algorithms for approximate policy evaluation

Download PDF

Jeff Johns¹,
Marek Petrik¹ &
Sridhar Mahadevan¹

577 Accesses
16 Citations
Explore all metrics

Abstract

The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Different algorithms offer different choices of the optimization criterion. Two popular least-squares algorithms for performing this task are the Bellman residual method, which minimizes the Bellman residual, and the fixed point method, which minimizes the projection of the Bellman residual. When used within policy iteration, the fixed point algorithm tends to ultimately find better performing policies whereas the Bellman residual algorithm exhibits more stable behavior between rounds of policy iteration. We propose two hybrid least-squares algorithms to try to combine the advantages of these algorithms. We provide an analytical and geometric interpretation of hybrid algorithms and demonstrate their utility on a simple problem. Experimental results on both small and large domains suggest hybrid algorithms may find solutions that lead to better policies when performing policy iteration.

Article PDF

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Article 09 April 2023

Hybrid approaches to optimization and machine learning methods: a systematic literature review

Article Open access 24 January 2024

A Systematic Review of the Whale Optimization Algorithm: Theoretical Foundation, Improvements, and Hybridizations

Article 27 May 2023

References

Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89–129.
Article Google Scholar
Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th international conference on machine learning (pp. 30–37).
Boyan, J. (1999). Least-squares temporal difference learning. In Proceedings of the 16th international conference on machine learning (pp. 49–56).
Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1–3), 33–57.
MATH Google Scholar
Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 326–334). San Mateo: Morgan Kaufmann.
Google Scholar
Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149.
Article MathSciNet Google Scholar
Lagoudakis, M., Parr, R., & Littman, M. (2002). Least-squares methods in reinforcement learning for control. In Proceedings of the 2nd Hellenic conference on artificial intelligence (pp. 249–260).
Li, L. (2008). A worst-case comparison between temporal difference and residual gradient with linear function approximation. In Proceedings of the 25th international conference on machine learning (pp. 560–567).
Mahadevan, S. (2005). Representation policy iteration. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 372–379).
Munos, R. (2003). Error bounds for approximate policy iteration. In Proceedings of the 20th international conference on machine learning (pp. 560–567).
Puterman, M. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.
MATH Google Scholar
Schoknecht, R. (2003). Optimality of reinforcement learning algorithms with linear function approximation. In Advances in neural information processing systems (Vol. 15, pp. 1555–1562).
Schweitzer, P., & Seidmann, A. (1985). Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110, 568–582.
Article MATH MathSciNet Google Scholar
Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.
Google Scholar
Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts Amherst, Amherst, MA, 01003, USA
Jeff Johns, Marek Petrik & Sridhar Mahadevan

Authors

Jeff Johns
View author publications
You can also search for this author in PubMed Google Scholar
Marek Petrik
View author publications
You can also search for this author in PubMed Google Scholar
Sridhar Mahadevan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeff Johns.

Additional information

Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Johns, J., Petrik, M. & Mahadevan, S. Hybrid least-squares algorithms for approximate policy evaluation. Mach Learn 76, 243–256 (2009). https://doi.org/10.1007/s10994-009-5128-4

Download citation

Received: 12 June 2009
Revised: 12 June 2009
Accepted: 16 June 2009
Published: 23 July 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s10994-009-5128-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Hybrid least-squares algorithms for approximate policy evaluation

Abstract

Article PDF

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Hybrid approaches to optimization and machine learning methods: a systematic literature review

A Systematic Review of the Whale Optimization Algorithm: Theoretical Foundation, Improvements, and Hybridizations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hybrid least-squares algorithms for approximate policy evaluation

Abstract

Article PDF

Similar content being viewed by others

An exhaustive review of the metaheuristic algorithms for search and optimization: taxonomy, applications, and open challenges

Hybrid approaches to optimization and machine learning methods: a systematic literature review

A Systematic Review of the Whale Optimization Algorithm: Theoretical Foundation, Improvements, and Hybridizations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation