Compositional Models for Reinforcement Learning

Jong, Nicholas K.; Stone, Peter

doi:10.1007/978-3-642-04180-8_59

Nicholas K. Jong²² &
Peter Stone²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2618 Accesses
3 Citations

Abstract

Innovations such as optimistic exploration, function approximation, and hierarchical decomposition have helped scale reinforcement learning to more complex environments, but these three ideas have rarely been studied together. This paper develops a unified framework that formalizes these algorithmic contributions as operators on learned models of the environment. Our formalism reveals some synergies among these innovations, and it suggests a straightforward way to compose them. The resulting algorithm, Fitted R-MAXQ, is the first to combine the function approximation of fitted algorithms, the efficient model-based exploration of R-MAX, and the hierarchical decompostion of MAXQ.

Download to read the full chapter text

Chapter PDF

Reinforcement Learning

Behavioral Hierarchy: Exploration and Representation

Bayesian Reinforcement Learning with Exploration

References

Moore, A.W., Atkeson, C.G.: Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13, 103–130 (1993)
Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 260–268 (1998)
Google Scholar
Brafman, R.I., Tennenholtz, M.: R-max – a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213–231 (2002)
MathSciNet MATH Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet MATH Google Scholar
Riedmiller, M.: Neural fitted Q iteration – first experiences with a data efficient neural reinforcement learning method. In: Proceedings of the European Conference on Machine Learning (2005)
Google Scholar
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discrete-Event Systems 13, 41–77 (2003); Special Issue on Reinforcement Learning
Article MathSciNet MATH Google Scholar
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., Chichester (1994)
Book MATH Google Scholar
Kakade, S.M.: On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London (2003)
Google Scholar
Gordon, G.J.: Stable function approximation in dynamic programming. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)
Google Scholar
Ormoneit, D., Sen, Ś.: Kernel-based reinforcement learning. Machine Learning 49(2), 161–178 (2002)
Article MATH Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13, 227–303 (2000)
MathSciNet MATH Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999)
Article MathSciNet MATH Google Scholar
Jong, N.K., Stone, P.: Model-based exploration in continuous state spaces. In: Proceedings of the Seventh Symposium on Abstraction, Reformulation and Approximation (2007)
Google Scholar
Jong, N.K., Stone, P.: Hierarchical model-based reinforcement learning: R-max + MAXQ. In: Proceedings of the Twenty-Fifth International Conference on Machine Learning (2008)
Google Scholar
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the Twenty-Third International Conference on Machine Learning (2006)
Google Scholar
Duff, M.: Design for an optimal probe. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 131–138 (2003)
Google Scholar
Ravindran, B., Barto, A.G.: SMDP homomorphisms: An algebraic approach to abstraction in semi-Markov decision processes. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Texas at Austin, 1 University Station C0500, Austin, Texas, 78712, United States
Nicholas K. Jong & Peter Stone

Authors

Nicholas K. Jong
View author publications
You can also search for this author in PubMed Google Scholar
Peter Stone
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jong, N.K., Stone, P. (2009). Compositional Models for Reinforcement Learning. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_59

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Compositional Models for Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Reinforcement Learning

Behavioral Hierarchy: Exploration and Representation

Bayesian Reinforcement Learning with Exploration

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Compositional Models for Reinforcement Learning

Abstract

Chapter PDF

Similar content being viewed by others

Reinforcement Learning

Behavioral Hierarchy: Exploration and Representation

Bayesian Reinforcement Learning with Exploration

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation