Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Hierarchical Reinforcement Learning

  • Bernhard Hengst
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_363


Hierarchical reinforcement learning (HRL) decomposes a  reinforcement learningproblem into a hierarchy of subproblems or subtasks such that higher-level parent-tasks invoke lower-level child tasks as if they were primitive actions. A decomposition may have multiple levels of hierarchy. Some or all of the subproblems can themselves be reinforcement learning problems. When a parent-task is formulated as a reinforcement learning problem it is commonly formalized as a semi-Markov decision problem because its actions are child-tasks that persist for an extended period of time. The advantage of hierarchical decomposition is a reduction in computational complexity if the overall problem can be represented more compactly and reusable subtasks learned or provided independently. While the solution to a HRL problem is optimal given the constraints of the hierarchy there are no guarantees in general that the decomposed solution is an optimal solution to the original reinforcement...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Ashby, R. (1956). Introduction to Cybernetics. London: Chapman & Hall.zbMATHGoogle Scholar
  2. Barto, A., & Mahadevan, S. (2003). Recent advances in hiearchical reinforcement learning. Special Issue on Reinforcement Learning, Discrete Event Systems Journal, 13, 41–77.zbMATHMathSciNetGoogle Scholar
  3. Dayan, P., & Hinton, G. E. (1992). Feudal reinforcement learning. In Advances in neural information processing systems 5 NIPS Conference, Denver, CO, December 2–5, 1991. San Francisco: Morgan Kaufmann.Google Scholar
  4. Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227–303.zbMATHMathSciNetGoogle Scholar
  5. Digney, B. L. (1998). Learning hierarchical control structures for multiple tasks and changing environments. In From animals to animats 5: Proceedings of the fifth international conference on simulation of adaptive behaviour. SAB 98 Zurich, Switzerland, August 17–21, 1998. Cambridge: MIT Press.Google Scholar
  6. Ghavamzadeh, M., & Mahadevan, S. (2002). Hierarchically optimal average reward reinforcement learning. In C. Sammut & Achim Hoffmann (Eds.), Proceedings of the nineteenth international conference on machine learning, Sydney, Australia (pp. 195–202). San Francisco: Morgan-Kaufman.Google Scholar
  7. Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Fourteenth annual conference on uncertainty in artificial intelligence, Madison, WI (pp. 220–229).Google Scholar
  8. Hengst, B. (2008). Partial order hierarchical reinforcement learning. In Australasian conference on artificial intelligence Auckland, New Zealand, December 2008 (pp. 138–149). Berlin: Springer.Google Scholar
  9. Jonsson, A., & Barto, A. (2006). Causal graph based decomposition of factored MDPs. Journal of Machine Learning Research, 7, 2259–2301.MathSciNetGoogle Scholar
  10. Kaelbling, L. P. (1993). Hierarchical learning in stochastic domains: Preliminary results. In Machine learning: Proceedings of the tenth international conference (pp. 167–173). San Mateo: Morgan Kaufmann.Google Scholar
  11. Konidaris, G., & Barto, A. (2009). Skill discovery in continuous reinforcement learning domains using skill chaining. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems 22 (pp. 1015–1023).Google Scholar
  12. McGovern, A. (2002). Autonomous discovery of abstractions through interaction with an environment. In SARA (pp. 338–339). London: Springer.Google Scholar
  13. Moore, A., Baird, L., & Kaelbling, L. P. (1999). Multi-value functions: Efficient automatic action hierarchies for multiple goal MDPs. In Proceedings of the international joint conference on artificial intelligence, Stockholm (pp. 1316–1323). San Francisco: Morgan Kaufmann.Google Scholar
  14. Parr, R., & Russell, S. J. (1997). Reinforcement learning with hierarchies of machines. In NIPS, Denver, CO, 1997.Google Scholar
  15. Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. Wiley: New York.zbMATHGoogle Scholar
  16. Ryan, M. R. K., & Reid, M. D. (2000). Using ILP to improve planning in hierarchical reinforcement learning. In Proceedings of the tenth international conference on inductive logic programming, ILP 2000, London. London: Springer.Google Scholar
  17. Singh, S. (1992). Reinforcement learning with a hierarchy of abstract models. In Proceedings of the tenth national conference on artificial intelligence.Google Scholar
  18. Sutton, R. S., Precup, D., & Singh, S. P. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.zbMATHCrossRefMathSciNetGoogle Scholar
  19. Watkins, C. J. C. H. (1989). Learning from delayed rewards. PhD thesis, King’s College.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Bernhard Hengst

There are no affiliations available