Skip to main content

Generating Hierarchical Structure in Reinforcement Learning from State Variables

  • Conference paper
Book cover PRICAI 2000 Topics in Artificial Intelligence (PRICAI 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1886))

Included in the following conference series:

Abstract

This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The CQ algorithm uses a heuristic which is applicable for problems that can be modelled by a set of state variables that conform to a special ordering, defined in this paper as a “nested Markov ordering”. The benefits of this approach are: (1) the automatic generation of actions and termination conditions at all levels in the hierarchy, and (2) linear scaling with the number of variables under certain conditions. This approach draws heavily on Dietterich’s MAXQ value function decomposition and Hauskrecht, Meuleau, Kaelbling, Dean, Boutilier’s and others region based decomposition of MDPs. The CQ algorithm is described and its functionality illustrated using a four room example. Different solutions are generated with different numbers of hierarchical levels to solve Dietterich’s taxi tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dean, T., Lin, S-H.: Decomposition Techniques for Planning in Stochastic Domains. (Technical Report CS-95-10). Department of Computer Science, Brown University, Providence, RI (1995)

    Google Scholar 

  2. Dietterich, T. G.: Hierarchical Reinforcement Learning with MAXQ Value Function Decomposition. Department of Computer Science, Oregon State University, Corvallis, OR (1999)

    Google Scholar 

  3. Digney, B. L.: Emergent Hierarchical Control Structures: Learning Reactive / Hierarchical Relationships in Reinforcement Environments. In: Maes, P., et al (eds.): From animals to animats 4: Proceedings of the fourth international conference on simulation of adaptive behaviour, MIT Press, Cambridge(MA) London (1996) 363–372

    Google Scholar 

  4. Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., Boutilier, C: Hierarchical Solution of Markov Decision Processes using Macro-actions. (Technical Report). Department of Computer Science, Brown University, Providence, RI (1998)

    Google Scholar 

  5. Parr, R. E.: Hierarchical Control and Learning for Markov Decision Processes. Doctoral dissertation, Computer Science, University of California, Berkley (1998)

    Google Scholar 

  6. Parr, R, Russell, S.: Reinforcement Learning with Hierarchies of Machines, Advances in Neural Information Processing Systems 10. MIT Press (1998)

    Google Scholar 

  7. Singh S.: Reinforcement Learning with a Hierarchy of Abstract Models. Proceedings of the Tenth National Conference on Artificial Intelligence, Menlo Park: AAAI Press (1992)

    Google Scholar 

  8. Sutton, S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  9. Sutton, R. S., Singh, S., Precup, D., Ravindran, B.: Improved switching among temporally abstract actions. Advances in Neural Information Processing Systems 11 (Proceedings of the 1998 conference), MIT Press (1999) 1066–1072

    Google Scholar 

  10. Sutton, R. S., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: Learning, Planning, and Representating Knowledge at Multiple Temporal Scales. (Technical Report) Department of Computer and Information Sciences, University of Massachusetts, Amherst, MA (1998)

    Google Scholar 

  11. Thrun, S., O’Sullivan, J.: Discovering Structure in Multiple Learning Tasks: The TC Algorithm. Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Mateo (1996)

    Google Scholar 

  12. Thrun, S., Schwartz, A.: Finding Structure in Reinforcement Learning. Advances in Neural Information Processing Systems 7, Morgan Kaufmann, San Mateo (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hengst, B. (2000). Generating Hierarchical Structure in Reinforcement Learning from State Variables. In: Mizoguchi, R., Slaney, J. (eds) PRICAI 2000 Topics in Artificial Intelligence. PRICAI 2000. Lecture Notes in Computer Science(), vol 1886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44533-1_54

Download citation

  • DOI: https://doi.org/10.1007/3-540-44533-1_54

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67925-7

  • Online ISBN: 978-3-540-44533-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics