Model Minimization in Hierarchical Reinforcement Learning

Ravindran, Balaraman; Barto, Andrew G.

doi:10.1007/3-540-45622-8_15

Model Minimization in Hierarchical Reinforcement Learning

Balaraman Ravindran³ &
Andrew G. Barto³

Conference paper
First Online: 01 January 2002

928 Accesses
13 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2371))

Abstract

When applied to real world problems Markov Decision Processes (MDPs) often exhibit considerable implicit redundancy, especially when there are symmetries in the problem. In this article we present an MDP minimization framework based on homomorphisms. The framework exploits redundancy and symmetry to derive smaller equivalent models of the problem. We then apply our minimization ideas to the options framework to derive relativized options—options defined without an absolute frame of reference. We demonstrate their utility empirically even in cases where the minimization criteria are not met exactly.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. Boutilier and R. Dearden. Using abstractions for decision theoretic planning with time constraints. In Proceedings of the AAAI-94, pages 1016–1022. AAAI, 1994.
Google Scholar
C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction In Proceedings of International Joint Conference on Artificial Intelligence 14, pages 1104–1111, 1995.
Google Scholar
Steven J. Bradtke and Michael O. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In Advances in Neural Information Processing Systems 7. MIT Press, 1995.
Google Scholar
Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of AAAI-97, pages 106–111. AAAI, 1997.
Google Scholar
E. A. Emerson and A. P. Sistla. Symmetry and model checking. Formal Methods in System Design, 9(1/2):105–131, 1996.
Article Google Scholar
Robert Givan, Thomas Dean, and Matthew Greig. Equivalence notions and model minimization in markov decision processes. Submitted to Artificial Intelligence, 2001.
Google Scholar
Robert Givan, Sonia Leach, and Thomas Dean. Bounded-parameter markov decision processes. Artificial Intelligence, 122:71–109, 2000.
Article MATH MathSciNet Google Scholar
J. Glover. Symmetry groups and translation invariant representations of markov processes. The Annals of Probability, 19(2):562–586, 1991.
Article MATH MathSciNet Google Scholar
J. Hartmanis and R. E. Stearns. Algebraic Structure Theory of Sequential Machines. Prentice-Hall, Englewood Cliffs, NJ, 1966.
MATH Google Scholar
Glenn A. Iba. A heuristic approach to the discovery of macro-operators. Machine Learning, 3:285–317, 1989.
Google Scholar
J. R. Jump. A note on the iterative decomposition of finite automata. Information and Control, 15:424–435, 1969.
Article MATH MathSciNet Google Scholar
K. G. Larsen and A. Skou. Bisimulation through probabilistic testing. Information and Computation, 94(1):1–28, 1991.
Article MATH MathSciNet Google Scholar
D. Lee and M. Yannakakis. Online minimization of transition systems. In Proceed-ings of 24^th Annual ACM Symposium on the Theory of Computing, pages 264–274. ACM, 1992.
Google Scholar
Doina Precup. Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, May2000.
Google Scholar
B. Ravindran and A. G. Barto. Symmetries and model minimization of markov decision processes. Technical Report 01-43, University of Massachusetts, Amherst, 2001.
Google Scholar
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning. An Introduction. MIT Press, Cambridge, MA, 1998.
Google Scholar
Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999.
Article MATH MathSciNet Google Scholar
C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge, England, 1989.
Google Scholar
M. Zinkevich and T. Balch. Symmetry in markov decision processes and its implications for single agent and multi agent learning. In Proceedings of the 18th International Conference on Machine Learning, pages 632–640, San Francisco, CA, 2001. Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts, Amherst, MA, 01003, USA
Balaraman Ravindran & Andrew G. Barto

Authors

Balaraman Ravindran
View author publications
You can also search for this author in PubMed Google Scholar
Andrew G. Barto
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Computing, Georgia Institute of Technology, 801 Atlantic Dr NW, Atlanta, GA, 30332-0280, USA
Sven Koenig
Department of Computing Science, Universtity of Alberta, 2-21 Athabasca Hall, Edmonton, Alberta, T6G 2E8, Canada
Robert C. Holte

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ravindran, B., Barto, A.G. (2002). Model Minimization in Hierarchical Reinforcement Learning. In: Koenig, S., Holte, R.C. (eds) Abstraction, Reformulation, and Approximation. SARA 2002. Lecture Notes in Computer Science(), vol 2371. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45622-8_15

Download citation

DOI: https://doi.org/10.1007/3-540-45622-8_15
Published: 09 July 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43941-7
Online ISBN: 978-3-540-45622-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics