Hierarchical Reinforcement Learning

Plaat, Aske

doi:10.1007/978-981-19-0638-1_8

Aske Plaat²

3891 Accesses

Abstract

The goal of artificial intelligence is to understand and create intelligent behavior; the goal of deep reinforcement learning is to find a behavior policy for ever larger sequential decision problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Sanjeevan Ahilan and Peter Dayan. Feudal multi-agent hierarchies for cooperative reinforcement learning. arXiv preprint arXiv:1901.08492, 2019.
Google Scholar
Safa Alver. The option-critic architecture. https://alversafa.github.io/blog/2018/11/28/optncrtc.html, 2018.
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. In Advances in Neural Information Processing Systems, pages 5048–5058, 2017.
Google Scholar
Arthur Aubret, Laetitia Matignon, and Salima Hassas. A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976, 2019.
Google Scholar
Christer Backstrom and Peter Jonsson. Planning with abstraction hierarchies can be exponentially less efficient. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, volume 2, pages 1599–1604, 1995.
Google Scholar
Pierre-Luc Bacon, Jean Harb, and Doina Precup. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
Google Scholar
Chitta Baral. Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge university press, 2003.
Book Google Scholar
Andrew G Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1-2):41–77, 2003.
Google Scholar
Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
Google Scholar
Matthew M Botvinick, Yael Niv, and Andew G Barto. Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition, 113(3):262–280, 2009.
Google Scholar
Andres Campero, Roberta Raileanu, Heinrich Küttler, Joshua B Tenenbaum, Tim Rocktäschel, and Edward Grefenstette. Learning with AMIGo: Adversarially motivated intrinsic goals. In International Conference on Learning Representations, 2020.
Google Scholar
Maxime Chevalier-Boisvert, Lucas Willems, and Sumans Pal. Minimalistic gridworld environment for OpenAI Gym https://github.com/maximecb/gym-minigrid, 2018.
Ken Currie and Austin Tate. O-plan: the open planning architecture. Artificial Intelligence, 52(1):49–86, 1991.
Article Google Scholar
Christian Daniel, Herke Van Hoof, Jan Peters, and Gerhard Neumann. Probabilistic inference for determining options in reinforcement learning. Machine Learning, 104(2):337–357, 2016.
Article MathSciNet Google Scholar
Peter Dayan and Geoffrey E Hinton. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, pages 271–278, 1993.
Google Scholar
Thomas G Dietterich. The MAXQ method for hierarchical reinforcement learning. In International Conference on Machine Learning, volume 98, pages 118–126, 1998.
Google Scholar
Thomas G Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000.
Google Scholar
Ishan P Durugkar, Clemens Rosenbaum, Stefan Dernbach, and Sridhar Mahadevan. Deep reinforcement learning with macro-actions. arXiv preprint arXiv:1606.04615, 2016.
Google Scholar
Zach Dwiel, Madhavun Candadai, Mariano Phielipp, and Arjun K Bansal. Hierarchical policy learning is sensitive to goal space design. arXiv preprint arXiv:1905.01537, 2019.
Google Scholar
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. First return, then explore. Nature, 590(7847):580–586, 2021.
Google Scholar
Richard E Fikes, Peter E Hart, and Nils J Nilsson. Learning and executing generalized robot plans. Artificial Intelligence, 3:251–288, 1972.
Google Scholar
Yannis Flet-Berliac. The promise of hierarchical reinforcement learning. https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/, March 2019.
Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning, pages 1515–1528. PMLR, 2018.
Google Scholar
Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, and John Schulman. Meta learning shared hierarchies. In International Conference on Learning Representations, 2018.
Google Scholar
Michael Gelfond and Vladimir Lifschitz. Action languages. Electronic Transactions on Artificial Intelligence, 2(3–4):193–210, 1998.
MathSciNet Google Scholar
Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: theory and practice. Elsevier, 2004.
MATH Google Scholar
Mohammad Ghavamzadeh, Sridhar Mahadevan, and Rajbala Makar. Hierarchical multi-agent reinforcement learning. Autonomous Agents and Multi-Agent Systems, 13(2):197–229, 2006.
Article Google Scholar
Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, and Matthieu Geist. There is no turning back: A self-supervised approach for reversibility-aware reinforcement learning. arXiv preprint arXiv:2106.04480, 2021.
Google Scholar
Dongge Han, Wendelin Boehmer, Michael Wooldridge, and Alex Rogers. Multi-agent hierarchical reinforcement learning with dynamic termination. In Pacific Rim International Conference on Artificial Intelligence, pages 80–92. Springer, 2019.
Google Scholar
Milos Hauskrecht, Nicolas Meuleau, Leslie Pack Kaelbling, Thomas L Dean, and Craig Boutilier. Hierarchical solution of Markov decision processes using macro-actions. In UAI ’98: Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, University of Wisconsin Business School, Madison, Wisconsin, 1998.
Google Scholar
Craig A Knoblock. Learning abstraction hierarchies for problem solving. In AAAI, pages 923–928, 1990.
Google Scholar
Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Advances in Neural Information Processing Systems, pages 3675–3683, 2016.
Google Scholar
John E Laird, Paul S Rosenbloom, and Allen Newell. Chunking in Soar: the anatomy of a general learning mechanism. Machine learning, 1(1):11–46, 1986.
Google Scholar
Andrew Levy, George Konidaris, Robert Platt, and Kate Saenko. Learning multi-level hierarchies with hindsight. In International Conference on Learning Representations, 2019.
Google Scholar
Zhuoru Li, Akshay Narayan, and Tze-Yun Leong. An efficient approach to model-based hierarchical reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
Google Scholar
Rajbala Makar, Sridhar Mahadevan, and Mohammad Ghavamzadeh. Hierarchical multi-agent reinforcement learning. In Proceedings of the Fifth International Conference on Autonomous Agents, pages 246–253. ACM, 2001.
Google Scholar
Ofir Nachum, Shixiang Gu, Honglak Lee, and Sergey Levine. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, pages 3307–3317, 2018.
Google Scholar
Pierre-Yves Oudeyer and Frederic Kaplan. How can we define intrinsic motivation? In the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, Lund: LUCS, Brighton, 2008.
Google Scholar
Pierre-Yves Oudeyer and Frederic Kaplan. What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics, 1:6, 2009.
Google Scholar
Pierre-Yves Oudeyer, Frederic Kaplan, and Verena V Hafner. Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2):265–286, 2007.
Google Scholar
Aleksandr I Panov and Aleksey Skrynnik. Automatic formation of the structure of abstract machines in hierarchical reinforcement learning with state clustering. arXiv preprint arXiv:1806.05292, 2018.
Google Scholar
Ronald Parr and Stuart J Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems, pages 1043–1049, 1998.
Google Scholar
Alexander Pashevich, Danijar Hafner, James Davidson, Rahul Sukthankar, and Cordelia Schmid. Modulated policy hierarchies. arXiv preprint arXiv:1812.00025, 2018.
Google Scholar
Shubham Pateria, Budhitama Subagdja, Ah-hweewee Tan, and Chai Quek. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021.
Article Google Scholar
Alexandre Péré, Sébastien Forestier, Olivier Sigaud, and Pierre-Yves Oudeyer. Unsupervised learning of goal spaces for intrinsically motivated goal exploration. In International Conference on Learning Representations, 2018.
Google Scholar
Karl Pertsch, Oleh Rybkin, Frederik Ebert, Shenghao Zhou, Dinesh Jayaraman, Chelsea Finn, and Sergey Levine. Long-horizon visual planning with goal-conditioned hierarchical predictors. In Advances in Neural Information Processing Systems, 2020.
Google Scholar
Doina Precup, Richard S Sutton, and Satinder P Singh. Planning with closed-loop macro actions. In Working notes of the 1997 AAAI Fall Symposium on Model-directed Autonomous Systems, pages 70–76, 1997.
Google Scholar
Jacob Rafati and David C Noelle. Learning representations in model-free hierarchical reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 10009–10010, 2019.
Google Scholar
Roberta Raileanu and Tim Rocktäschel. RIDE: rewarding impact-driven exploration for procedurally-generated environments. In International Conference on Learning Representations, 2020.
Google Scholar
Jette Randlov. Learning macro-actions in reinforcement learning. In Advances in Neural Information Processing Systems, pages 1045–1051, 1998.
Google Scholar
Frank Röder, Manfred Eppe, Phuong DH Nguyen, and Stefan Wermter. Curious hierarchical actor-critic reinforcement learning. In International Conference on Artificial Neural Networks, pages 408–419. Springer, 2020.
Google Scholar
Richard M Ryan and Edward L Deci. Intrinsic and extrinsic motivations: Classic definitions and new directions. Contemporary Educational Psychology, 25(1):54–67, 2000.
Google Scholar
Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approximators. In International Conference on Machine Learning, pages 1312–1320, 2015.
Google Scholar
Jürgen Schmidhuber. Learning to generate sub-goals for action sequences. In Artificial neural networks, pages 967–972, 1991.
Google Scholar
Yaron Shoham and Gal Elidan. Solving Sokoban with forward-backward reinforcement learning. In Proceedings of the International Symposium on Combinatorial Search, volume 12, pages 191–193, 2021.
Google Scholar
Satinder Singh, Andrew G Barto, and Nuttapong Chentanez. Intrinsically motivated reinforcement learning. Technical report, University of Amherst, Mass, Department of Computer Science, 2005.
Google Scholar
Martin Stolle and Doina Precup. Learning options in reinforcement learning. In International Symposium on Abstraction, Reformulation, and Approximation, pages 212–223. Springer, 2002.
Google Scholar
Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, and Rob Fergus. Learning goal embeddings via self-play for hierarchical reinforcement learning. arXiv preprint arXiv:1811.09083, 2018.
Google Scholar
Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z Leibo, Karl Tuyls, and Thore Graepel. Value-decomposition networks for cooperative multi-agent learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, 2017.
Google Scholar
Richard S Sutton, Doina Precup, and Satinder Singh. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.
Google Scholar
Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Zhaopeng Meng, Changjie Fan, and Li Wang. Hierarchical deep multiagent reinforcement learning with temporal abstraction. arXiv preprint arXiv:1809.09332, 2018.
Google Scholar
Justin K Terry, Benjamin Black, Ananth Hari, Luis Santos, Clemens Dieffendahl, Niall L Williams, Yashas Lokesh, Caroline Horsch, and Praveen Ravi. PettingZoo: Gym for multi-agent reinforcement learning. arXiv preprint arXiv:2009.14471, 2020.
Google Scholar
Chen Tessler, Shahar Givony, Tom Zahavy, Daniel Mankowitz, and Shie Mannor. A deep hierarchical approach to lifelong learning in Minecraft. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
Google Scholar
Frank Van Harmelen, Vladimir Lifschitz, and Bruce Porter. Handbook of Knowledge Representation. Elsevier, 2008.
MATH Google Scholar
Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, and Satinder Singh. Discovery of options via meta-learned subgoals. arXiv preprint arXiv:2102.06741, 2021.
Google Scholar
Alexander Vezhnevets, Volodymyr Mnih, Simon Osindero, Alex Graves, Oriol Vinyals, John Agapiou, and Koray Kavukcuoglu. Strategic attentive writer for learning macro-actions. In Advances in Neural Information Processing Systems, pages 3486–3494, 2016.
Google Scholar
Alexander Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, and Koray Kavukcuoglu. Feudal networks for hierarchical reinforcement learning. In Intl Conf on Machine Learning, pages 3540–3549. PMLR, 2017.
Google Scholar
Oriol Vinyals, Igor Babuschkin, Wojciech M. Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H. Choi, Richard Powell, Timo Ewalds, Petko Georgiev, Junhyuk Oh, Dan Horgan, Manuel Kroiss, Ivo Danihelka, Aja Huang, Laurent Sifre, Trevor Cai, John P. Agapiou, Max Jaderberg, Alexander Sasha Vezhnevets, Rémi Leblond, Tobias Pohlen, Valentin Dalibard, David Budden, Yury Sulsky, James Molloy, Tom Le Paine, Çaglar Gülçehre, Ziyu Wang, Tobias Pfaff, Yuhuai Wu, Roman Ring, Dani Yogatama, Dario Wünsch, Katrina McKinney, Oliver Smith, Tom Schaul, Timothy P. Lillicrap, Koray Kavukcuoglu, Demis Hassabis, Chris Apps, and David Silver. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
Google Scholar
Yuchen Xiao, Joshua Hoffman, and Christopher Amato. Macro-action-based deep multi-agent reinforcement learning. In Conference on Robot Learning, pages 1146–1161. PMLR, 2020.
Google Scholar
Sijia Xu, Hongyu Kuang, Zhuang Zhi, Renjie Hu, Yang Liu, and Huyang Sun. Macro action selection with deep reinforcement learning in StarCraft. In AAAI Artificial Intelligence and Interactive Digital Entertainment, volume 15, pages 94–99, 2019.
Google Scholar
Lunjun Zhang, Ge Yang, and Bradly C Stadie. World model as a graph: Learning latent landmarks for planning. In International Conference on Machine Learning, pages 12611–12620. PMLR, 2021.
Google Scholar

Download references

Author information

Authors and Affiliations

LIACS, Leiden University, Leiden, The Netherlands
Aske Plaat

Authors

Aske Plaat
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Plaat, A. (2022). Hierarchical Reinforcement Learning. In: Deep Reinforcement Learning. Springer, Singapore. https://doi.org/10.1007/978-981-19-0638-1_8

Download citation

DOI: https://doi.org/10.1007/978-981-19-0638-1_8
Published: 12 March 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0637-4
Online ISBN: 978-981-19-0638-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics