Generating Hierarchical Structure in Reinforcement Learning from State Variables

Hengst, Bernhard

doi:10.1007/3-540-44533-1_54

Bernhard Hengst³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1886))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

923 Accesses
7 Citations

Abstract

This paper presents the CQ algorithm which decomposes and solves a Markov Decision Process (MDP) by automatically generating a hierarchy of smaller MDPs using state variables. The CQ algorithm uses a heuristic which is applicable for problems that can be modelled by a set of state variables that conform to a special ordering, defined in this paper as a “nested Markov ordering”. The benefits of this approach are: (1) the automatic generation of actions and termination conditions at all levels in the hierarchy, and (2) linear scaling with the number of variables under certain conditions. This approach draws heavily on Dietterich’s MAXQ value function decomposition and Hauskrecht, Meuleau, Kaelbling, Dean, Boutilier’s and others region based decomposition of MDPs. The CQ algorithm is described and its functionality illustrated using a four room example. Different solutions are generated with different numbers of hierarchical levels to solve Dietterich’s taxi tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dean, T., Lin, S-H.: Decomposition Techniques for Planning in Stochastic Domains. (Technical Report CS-95-10). Department of Computer Science, Brown University, Providence, RI (1995)
Google Scholar
Dietterich, T. G.: Hierarchical Reinforcement Learning with MAXQ Value Function Decomposition. Department of Computer Science, Oregon State University, Corvallis, OR (1999)
Google Scholar
Digney, B. L.: Emergent Hierarchical Control Structures: Learning Reactive / Hierarchical Relationships in Reinforcement Environments. In: Maes, P., et al (eds.): From animals to animats 4: Proceedings of the fourth international conference on simulation of adaptive behaviour, MIT Press, Cambridge(MA) London (1996) 363–372
Google Scholar
Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., Boutilier, C: Hierarchical Solution of Markov Decision Processes using Macro-actions. (Technical Report). Department of Computer Science, Brown University, Providence, RI (1998)
Google Scholar
Parr, R. E.: Hierarchical Control and Learning for Markov Decision Processes. Doctoral dissertation, Computer Science, University of California, Berkley (1998)
Google Scholar
Parr, R, Russell, S.: Reinforcement Learning with Hierarchies of Machines, Advances in Neural Information Processing Systems 10. MIT Press (1998)
Google Scholar
Singh S.: Reinforcement Learning with a Hierarchy of Abstract Models. Proceedings of the Tenth National Conference on Artificial Intelligence, Menlo Park: AAAI Press (1992)
Google Scholar
Sutton, S., Barto, A. G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R. S., Singh, S., Precup, D., Ravindran, B.: Improved switching among temporally abstract actions. Advances in Neural Information Processing Systems 11 (Proceedings of the 1998 conference), MIT Press (1999) 1066–1072
Google Scholar
Sutton, R. S., Precup, D., Singh, S.: Between MDPs and Semi-MDPs: Learning, Planning, and Representating Knowledge at Multiple Temporal Scales. (Technical Report) Department of Computer and Information Sciences, University of Massachusetts, Amherst, MA (1998)
Google Scholar
Thrun, S., O’Sullivan, J.: Discovering Structure in Multiple Learning Tasks: The TC Algorithm. Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Mateo (1996)
Google Scholar
Thrun, S., Schwartz, A.: Finding Structure in Reinforcement Learning. Advances in Neural Information Processing Systems 7, Morgan Kaufmann, San Mateo (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, University of New South Wales, UNSW, Sydney, 2052, Australia
Bernhard Hengst

Authors

Bernhard Hengst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047, Japan
Riichiro Mizoguchi
Computer Sciences Laboratory, Research School of Information Sciences and Engineering, Australian National University, Canberra, ACT, 0200, Australia
John Slaney

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hengst, B. (2000). Generating Hierarchical Structure in Reinforcement Learning from State Variables. In: Mizoguchi, R., Slaney, J. (eds) PRICAI 2000 Topics in Artificial Intelligence. PRICAI 2000. Lecture Notes in Computer Science(), vol 1886. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44533-1_54

Download citation

DOI: https://doi.org/10.1007/3-540-44533-1_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67925-7
Online ISBN: 978-3-540-44533-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics