Skip to main content

Hierarchical Reinforcement Learning

  • Chapter
  • First Online:
Deep Reinforcement Learning
  • 3891 Accesses

Abstract

The goal of artificial intelligence is to understand and create intelligent behavior; the goal of deep reinforcement learning is to find a behavior policy for ever larger sequential decision problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://bigai.cs.brown.edu/2019/09/03/hac.html.

  2. 2.

    https://www.youtube.com/watch?v=DYcVTveeNK0.

  3. 3.

    https://github.com/andrew-j-levy/Hierarchical-Actor-Critc-HAC-.

  4. 4.

    https://www.youtube.com/watch?v=R86Vs9Vb6Bc.

  5. 5.

    http://sneezingtiger.com/sokoban/levels.html.

  6. 6.

    http://www.sokobano.de/wiki/index.php?title=Level_format.

  7. 7.

    https://www.sourcecode.se/sokoban/levels.

References

  1. Sanjeevan Ahilan and Peter Dayan. Feudal multi-agent hierarchies for cooperative reinforcement learning. arXiv preprint arXiv:1901.08492, 2019.

    Google Scholar 

  2. Safa Alver. The option-critic architecture. https://alversafa.github.io/blog/2018/11/28/optncrtc.html, 2018.

  3. Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. In Advances in Neural Information Processing Systems, pages 5048–5058, 2017.

    Google Scholar 

  4. Arthur Aubret, Laetitia Matignon, and Salima Hassas. A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976, 2019.

    Google Scholar 

  5. Christer Backstrom and Peter Jonsson. Planning with abstraction hierarchies can be exponentially less efficient. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, volume 2, pages 1599–1604, 1995.

    Google Scholar 

  6. Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.

    Google Scholar 

  7. Chitta Baral. Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge university press, 2003.

    Book  Google Scholar 

  8. Andrew G Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1-2):41–77, 2003.

    Google Scholar 

  9. Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.

    Google Scholar 

  10. Matthew M Botvinick, Yael Niv, and Andew G Barto. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition, 113(3):262–280, 2009.

    Google Scholar 

  11. Andres Campero, Roberta Raileanu, Heinrich Küttler, Joshua B Tenenbaum, Tim Rocktäschel, and Edward Grefenstette. Learning with AMIGo: Adversarially motivated intrinsic goals. In International Conference on Learning Representations, 2020.

    Google Scholar 

  12. Maxime Chevalier-Boisvert, Lucas Willems, and Sumans Pal. Minimalistic gridworld environment for OpenAI Gym https://github.com/maximecb/gym-minigrid, 2018.

  13. Ken Currie and Austin Tate. O-plan: the open planning architecture. Artificial Intelligence, 52(1):49–86, 1991.

    Article  Google Scholar 

  14. Christian Daniel, Herke Van Hoof, Jan Peters, and Gerhard Neumann. Probabilistic inference for determining options in reinforcement learning. Machine Learning, 104(2):337–357, 2016.

    Article  MathSciNet  Google Scholar 

  15. Peter Dayan and Geoffrey E Hinton. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, pages 271–278, 1993.

    Google Scholar 

  16. Thomas G Dietterich. The MAXQ method for hierarchical reinforcement learning. In International Conference on Machine Learning, volume 98, pages 118–126, 1998.

    Google Scholar 

  17. Thomas G Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000.

    Google Scholar 

  18. Ishan P Durugkar, Clemens Rosenbaum, Stefan Dernbach, and Sridhar Mahadevan. Deep reinforcement learning with macro-actions. arXiv preprint arXiv:1606.04615, 2016.

    Google Scholar 

  19. Zach Dwiel, Madhavun Candadai, Mariano Phielipp, and Arjun K Bansal. Hierarchical policy learning is sensitive to goal space design. arXiv preprint arXiv:1905.01537, 2019.

    Google Scholar 

  20. Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. First return, then explore. Nature, 590(7847):580–586, 2021.

    Google Scholar 

  21. Richard E Fikes, Peter E Hart, and Nils J Nilsson. Learning and executing generalized robot plans. Artificial Intelligence, 3:251–288, 1972.

    Google Scholar 

  22. Yannis Flet-Berliac. The promise of hierarchical reinforcement learning. https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/, March 2019.

  23. Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning, pages 1515–1528. PMLR, 2018.

    Google Scholar 

  24. Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. Meta learning shared hierarchies. In International Conference on Learning Representations, 2018.

    Google Scholar 

  25. Michael Gelfond and Vladimir Lifschitz. Action languages. Electronic Transactions on Artificial Intelligence, 2(3–4):193–210, 1998.

    MathSciNet  Google Scholar 

  26. Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: theory and practice. Elsevier, 2004.

    MATH  Google Scholar 

  27. Mohammad Ghavamzadeh, Sridhar Mahadevan, and Rajbala Makar. Hierarchical multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 13(2):197–229, 2006.

    Article  Google Scholar 

  28. Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, and Matthieu Geist. There is no turning back: A self-supervised approach for reversibility-aware reinforcement learning. arXiv preprint arXiv:2106.04480, 2021.

    Google Scholar 

  29. Dongge Han, Wendelin Boehmer, Michael Wooldridge, and Alex Rogers. Multi-agent hierarchical reinforcement learning with dynamic termination. In Pacific Rim International Conference on Artificial Intelligence, pages 80–92. Springer, 2019.

    Google Scholar 

  30. Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas L Dean, and Craig Boutilier. Hierarchical solution of Markov decision processes using macro-actions. In UAI ’98: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, University of Wisconsin Business School, Madison, Wisconsin, 1998.

    Google Scholar 

  31. Craig A Knoblock. Learning abstraction hierarchies for problem solving. In AAAI, pages 923–928, 1990.

    Google Scholar 

  32. Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in Neural Information Processing Systems, pages 3675–3683, 2016.

    Google Scholar 

  33. John E Laird, Paul S Rosenbloom, and Allen Newell. Chunking in Soar: the anatomy of a general learning mechanism. Machine learning, 1(1):11–46, 1986.

    Google Scholar 

  34. Andrew Levy, George Konidaris, Robert Platt, and Kate Saenko. Learning multi-level hierarchies with hindsight. In International Conference on Learning Representations, 2019.

    Google Scholar 

  35. Zhuoru Li, Akshay Narayan, and Tze-Yun Leong. An efficient approach to model-based hierarchical reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.

    Google Scholar 

  36. Rajbala Makar, Sridhar Mahadevan, and Mohammad Ghavamzadeh. Hierarchical multi-agent reinforcement learning. In Proceedings of the Fifth International Conference on Autonomous Agents, pages 246–253. ACM, 2001.

    Google Scholar 

  37. Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, pages 3307–3317, 2018.

    Google Scholar 

  38. Pierre-Yves Oudeyer and Frederic Kaplan. How can we define intrinsic motivation? In the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, Lund: LUCS, Brighton, 2008.

    Google Scholar 

  39. Pierre-Yves Oudeyer and Frederic Kaplan. What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics, 1:6, 2009.

    Google Scholar 

  40. Pierre-Yves Oudeyer, Frederic Kaplan, and Verena V Hafner. Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, 2007.

    Google Scholar 

  41. Aleksandr I Panov and Aleksey Skrynnik. Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering. arXiv preprint arXiv:1806.05292, 2018.

    Google Scholar 

  42. Ronald Parr and Stuart J Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, pages 1043–1049, 1998.

    Google Scholar 

  43. Alexander Pashevich, Danijar Hafner, James Davidson, Rahul Sukthankar, and Cordelia Schmid. Modulated policy hierarchies. arXiv preprint arXiv:1812.00025, 2018.

    Google Scholar 

  44. Shubham Pateria, Budhitama Subagdja, Ah-hweewee Tan, and Chai Quek. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021.

    Article  Google Scholar 

  45. Alexandre Péré, Sébastien Forestier, Olivier Sigaud, and Pierre-Yves Oudeyer. Unsupervised learning of goal spaces for intrinsically motivated goal exploration. In International Conference on Learning Representations, 2018.

    Google Scholar 

  46. Karl Pertsch, Oleh Rybkin, Frederik Ebert, Shenghao Zhou, Dinesh Jayaraman, Chelsea Finn, and Sergey Levine. Long-horizon visual planning with goal-conditioned hierarchical predictors. In Advances in Neural Information Processing Systems, 2020.

    Google Scholar 

  47. Doina Precup, Richard S Sutton, and Satinder P Singh. Planning with closed-loop macro actions. In Working notes of the 1997 AAAI Fall Symposium on Model-directed Autonomous Systems, pages 70–76, 1997.

    Google Scholar 

  48. Jacob Rafati and David C Noelle. Learning representations in model-free hierarchical reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 10009–10010, 2019.

    Google Scholar 

  49. Roberta Raileanu and Tim Rocktäschel. RIDE: rewarding impact-driven exploration for procedurally-generated environments. In International Conference on Learning Representations, 2020.

    Google Scholar 

  50. Jette Randlov. Learning macro-actions in reinforcement learning. In Advances in Neural Information Processing Systems, pages 1045–1051, 1998.

    Google Scholar 

  51. Frank Röder, Manfred Eppe, Phuong DH Nguyen, and Stefan Wermter. Curious hierarchical actor-critic reinforcement learning. In International Conference on Artificial Neural Networks, pages 408–419. Springer, 2020.

    Google Scholar 

  52. Richard M Ryan and Edward L Deci. Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemporary Educational Psychology, 25(1):54–67, 2000.

    Google Scholar 

  53. Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approximators. In International Conference on Machine Learning, pages 1312–1320, 2015.

    Google Scholar 

  54. Jürgen Schmidhuber. Learning to generate sub-goals for action sequences. In Artificial neural networks, pages 967–972, 1991.

    Google Scholar 

  55. Yaron Shoham and Gal Elidan. Solving Sokoban with forward-backward reinforcement learning. In Proceedings of the International Symposium on Combinatorial Search, volume 12, pages 191–193, 2021.

    Google Scholar 

  56. Satinder Singh, Andrew G Barto, and Nuttapong Chentanez. Intrinsically motivated reinforcement learning. Technical report, University of Amherst, Mass, Department of Computer Science, 2005.

    Google Scholar 

  57. Martin Stolle and Doina Precup. Learning options in reinforcement learning. In International Symposium on Abstraction, Reformulation, and Approximation, pages 212–223. Springer, 2002.

    Google Scholar 

  58. Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, and Rob Fergus. Learning goal embeddings via self-play for hierarchical reinforcement learning. arXiv preprint arXiv:1811.09083, 2018.

    Google Scholar 

  59. Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, 2017.

    Google Scholar 

  60. Richard S Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.

    Google Scholar 

  61. Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Zhaopeng Meng, Changjie Fan, and Li Wang. Hierarchical deep multiagent reinforcement learning with temporal abstraction. arXiv preprint arXiv:1809.09332, 2018.

    Google Scholar 

  62. Justin K Terry, Benjamin Black, Ananth Hari, Luis Santos, Clemens Dieffendahl, Niall L Williams, Yashas Lokesh, Caroline Horsch, and Praveen Ravi. PettingZoo: Gym for multi-agent reinforcement learning. arXiv preprint arXiv:2009.14471, 2020.

    Google Scholar 

  63. Chen Tessler, Shahar Givony, Tom Zahavy, Daniel Mankowitz, and Shie Mannor. A deep hierarchical approach to lifelong learning in Minecraft. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.

    Google Scholar 

  64. Frank Van Harmelen, Vladimir Lifschitz, and Bruce Porter. Handbook of Knowledge Representation. Elsevier, 2008.

    MATH  Google Scholar 

  65. Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, and Satinder Singh. Discovery of options via meta-learned subgoals. arXiv preprint arXiv:2102.06741, 2021.

    Google Scholar 

  66. Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, and Koray Kavukcuoglu. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems, pages 3486–3494, 2016.

    Google Scholar 

  67. Alexander Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In Intl Conf on Machine Learning, pages 3540–3549. PMLR, 2017.

    Google Scholar 

  68. Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander Sasha Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom Le Paine, Çaglar Gülçehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy P. Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.

    Google Scholar 

  69. Yuchen Xiao, Joshua Hoffman, and Christopher Amato. Macro-action-based deep multi-agent reinforcement learning. In Conference on Robot Learning, pages 1146–1161. PMLR, 2020.

    Google Scholar 

  70. Sijia Xu, Hongyu Kuang, Zhuang Zhi, Renjie Hu, Yang Liu, and Huyang Sun. Macro action selection with deep reinforcement learning in StarCraft. In AAAI Artificial Intelligence and Interactive Digital Entertainment, volume 15, pages 94–99, 2019.

    Google Scholar 

  71. Lunjun Zhang, Ge Yang, and Bradly C Stadie. World model as a graph: Learning latent landmarks for planning. In International Conference on Machine Learning, pages 12611–12620. PMLR, 2021.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Plaat, A. (2022). Hierarchical Reinforcement Learning. In: Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-19-0638-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-0638-1_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-0637-4

  • Online ISBN: 978-981-19-0638-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics