Advertisement

Machine Learning

, Volume 76, Issue 2–3, pp 243–256 | Cite as

Hybrid least-squares algorithms for approximate policy evaluation

Article

Abstract

The goal of approximate policy evaluation is to “best” represent a target value function according to a specific criterion. Different algorithms offer different choices of the optimization criterion. Two popular least-squares algorithms for performing this task are the Bellman residual method, which minimizes the Bellman residual, and the fixed point method, which minimizes the projection of the Bellman residual. When used within policy iteration, the fixed point algorithm tends to ultimately find better performing policies whereas the Bellman residual algorithm exhibits more stable behavior between rounds of policy iteration. We propose two hybrid least-squares algorithms to try to combine the advantages of these algorithms. We provide an analytical and geometric interpretation of hybrid algorithms and demonstrate their utility on a simple problem. Experimental results on both small and large domains suggest hybrid algorithms may find solutions that lead to better policies when performing policy iteration.

Keywords

Reinforcement learning Markov decision processes 

References

  1. Antos, A., Szepesvári, C., & Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1), 89–129. CrossRefGoogle Scholar
  2. Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th international conference on machine learning (pp. 30–37). Google Scholar
  3. Boyan, J. (1999). Least-squares temporal difference learning. In Proceedings of the 16th international conference on machine learning (pp. 49–56). Google Scholar
  4. Bradtke, S., & Barto, A. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22(1–3), 33–57. MATHGoogle Scholar
  5. Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 326–334). San Mateo: Morgan Kaufmann. Google Scholar
  6. Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107–1149. CrossRefMathSciNetGoogle Scholar
  7. Lagoudakis, M., Parr, R., & Littman, M. (2002). Least-squares methods in reinforcement learning for control. In Proceedings of the 2nd Hellenic conference on artificial intelligence (pp. 249–260). Google Scholar
  8. Li, L. (2008). A worst-case comparison between temporal difference and residual gradient with linear function approximation. In Proceedings of the 25th international conference on machine learning (pp. 560–567). Google Scholar
  9. Mahadevan, S. (2005). Representation policy iteration. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 372–379). Google Scholar
  10. Munos, R. (2003). Error bounds for approximate policy iteration. In Proceedings of the 20th international conference on machine learning (pp. 560–567). Google Scholar
  11. Puterman, M. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley. MATHGoogle Scholar
  12. Schoknecht, R. (2003). Optimality of reinforcement learning algorithms with linear function approximation. In Advances in neural information processing systems (Vol. 15, pp. 1555–1562). Google Scholar
  13. Schweitzer, P., & Seidmann, A. (1985). Generalized polynomial approximations in Markovian decision processes. Journal of Mathematical Analysis and Applications, 110, 568–582. MATHCrossRefMathSciNetGoogle Scholar
  14. Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44. Google Scholar
  15. Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Massachusetts AmherstAmherstUSA

Personalised recommendations