Compositional Models for Reinforcement Learning

  • Nicholas K. Jong
  • Peter Stone
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5781)

Abstract

Innovations such as optimistic exploration, function approximation, and hierarchical decomposition have helped scale reinforcement learning to more complex environments, but these three ideas have rarely been studied together. This paper develops a unified framework that formalizes these algorithmic contributions as operators on learned models of the environment. Our formalism reveals some synergies among these innovations, and it suggests a straightforward way to compose them. The resulting algorithm, Fitted R-MAXQ, is the first to combine the function approximation of fitted algorithms, the efficient model-based exploration of R-MAX, and the hierarchical decompostion of MAXQ.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)Google Scholar
  2. 2.
    Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 260–268 (1998)Google Scholar
  3. 3.
    Brafman, R.I., Tennenholtz, M.: R-max – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)MathSciNetMATHGoogle Scholar
  4. 4.
    Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)MathSciNetMATHGoogle Scholar
  5. 5.
    Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Proceedings of the European Conference on Machine Learning (2005)Google Scholar
  6. 6.
    Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete-Event Systems 13, 41–77 (2003); Special Issue on Reinforcement LearningMathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., Chichester (1994)CrossRefMATHGoogle Scholar
  8. 8.
    Kakade, S.M.: On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London (2003)Google Scholar
  9. 9.
    Gordon, G.J.: Stable function approximation in dynamic programming. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)Google Scholar
  10. 10.
    Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Machine Learning 49(2), 161–178 (2002)CrossRefMATHGoogle Scholar
  11. 11.
    Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)MathSciNetMATHGoogle Scholar
  12. 12.
    Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: Proceedings of the Seventh Symposium on Abstraction, Reformulation and Approximation (2007)Google Scholar
  14. 14.
    Jong, N.K., Stone, P.: Hierarchical model-based reinforcement learning: R-max + MAXQ. In: Proceedings of the Twenty-Fifth International Conference on Machine Learning (2008)Google Scholar
  15. 15.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the Twenty-Third International Conference on Machine Learning (2006)Google Scholar
  16. 16.
    Duff, M.: Design for an optimal probe. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 131–138 (2003)Google Scholar
  17. 17.
    Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi-Markov decision processes. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Nicholas K. Jong
    • 1
  • Peter Stone
    • 1
  1. 1.The University of Texas at AustinAustinUnited States

Personalised recommendations