Model Minimization in Hierarchical Reinforcement Learning

  • Balaraman Ravindran
  • Andrew G. Barto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2371)


When applied to real world problems Markov Decision Processes (MDPs) often exhibit considerable implicit redundancy, especially when there are symmetries in the problem. In this article we present an MDP minimization framework based on homomorphisms. The framework exploits redundancy and symmetry to derive smaller equivalent models of the problem. We then apply our minimization ideas to the options framework to derive relativized options—options defined without an absolute frame of reference. We demonstrate their utility empirically even in cases where the minimization criteria are not met exactly.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    C. Boutilier and R. Dearden. Using abstractions for decision theoretic planning with time constraints. In Proceedings of the AAAI-94, pages 1016–1022. AAAI, 1994.Google Scholar
  2. 2.
    C. Boutilier, R. Dearden, and M. Goldszmidt. Exploiting structure in policy construction In Proceedings of International Joint Conference on Artificial Intelligence 14, pages 1104–1111, 1995.Google Scholar
  3. 3.
    Steven J. Bradtke and Michael O. Duff. Reinforcement learning methods for continuous-time Markov decision problems. In Advances in Neural Information Processing Systems 7. MIT Press, 1995.Google Scholar
  4. 4.
    Thomas Dean and Robert Givan. Model minimization in markov decision processes. In Proceedings of AAAI-97, pages 106–111. AAAI, 1997.Google Scholar
  5. 5.
    E. A. Emerson and A. P. Sistla. Symmetry and model checking. Formal Methods in System Design, 9(1/2):105–131, 1996.CrossRefGoogle Scholar
  6. 6.
    Robert Givan, Thomas Dean, and Matthew Greig. Equivalence notions and model minimization in markov decision processes. Submitted to Artificial Intelligence, 2001.Google Scholar
  7. 7.
    Robert Givan, Sonia Leach, and Thomas Dean. Bounded-parameter markov decision processes. Artificial Intelligence, 122:71–109, 2000.MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    J. Glover. Symmetry groups and translation invariant representations of markov processes. The Annals of Probability, 19(2):562–586, 1991.MATHMathSciNetCrossRefGoogle Scholar
  9. 9.
    J. Hartmanis and R. E. Stearns. Algebraic Structure Theory of Sequential Machines. Prentice-Hall, Englewood Cliffs, NJ, 1966.MATHGoogle Scholar
  10. 10.
    Glenn A. Iba. A heuristic approach to the discovery of macro-operators. Machine Learning, 3:285–317, 1989.Google Scholar
  11. 11.
    J. R. Jump. A note on the iterative decomposition of finite automata. Information and Control, 15:424–435, 1969.MATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    K. G. Larsen and A. Skou. Bisimulation through probabilistic testing. Information and Computation, 94(1):1–28, 1991.MATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    D. Lee and M. Yannakakis. Online minimization of transition systems. In Proceed-ings of 24th Annual ACM Symposium on the Theory of Computing, pages 264–274. ACM, 1992.Google Scholar
  14. 14.
    Doina Precup. Temporal Abstraction in Reinforcement Learning. PhD thesis, University of Massachusetts, Amherst, May2000.Google Scholar
  15. 15.
    B. Ravindran and A. G. Barto. Symmetries and model minimization of markov decision processes. Technical Report 01-43, University of Massachusetts, Amherst, 2001.Google Scholar
  16. 16.
    Richard S. Sutton and Andrew G. Barto. Reinforcement Learning. An Introduction. MIT Press, Cambridge, MA, 1998.Google Scholar
  17. 17.
    Richard S. Sutton, Doina Precup, and Satinder Singh. Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181–211, 1999.MATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge, England, 1989.Google Scholar
  19. 19.
    M. Zinkevich and T. Balch. Symmetry in markov decision processes and its implications for single agent and multi agent learning. In Proceedings of the 18th International Conference on Machine Learning, pages 632–640, San Francisco, CA, 2001. Morgan Kaufmann.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Balaraman Ravindran
    • 1
  • Andrew G. Barto
    • 1
  1. 1.Department of Computer ScienceUniversity of MassachusettsAmherstUSA

Personalised recommendations