Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

Han, Dongge; Böhmer, Wendelin; Wooldridge, Michael; Rogers, Alex

doi:10.1007/978-3-030-29911-8_7

Dongge Han¹⁰,
Wendelin Böhmer¹⁰,
Michael Wooldridge¹⁰ &
…
Alex Rogers¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11671))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

3127 Accesses
7 Citations
2 Altmetric

Abstract

In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI, pp. 1726–1734 (2017)
Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)
Article MathSciNet Google Scholar
Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Google Scholar
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926 (2017)
Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1146–1155 (2017). http://proceedings.mlr.press/v70/foerster17b.html
Giannakis, M., Louis, M.: A multi-agent based system with big data processing for enhanced supply chain agility. J. Enterp. Inform. Management 29(5), 706–727 (2016)
Article Google Scholar
Harutyunyan, A., Vrancx, P., Bacon, P.L., Precup, D., Nowe, A.: Learning with options that terminate off-policy. arXiv preprint arXiv:1711.03817 (2017)
Jennings, N.R.: Commitments and conventions: the foundation of coordination in multi-agent systems. Knowl. Eng. Rev. 8(3), 223–250 (1993)
Article MathSciNet Google Scholar
Lesser, V., Ortiz Jr., C.L., Tambe, M.: Distributed Sensor Networks: A Multiagent Perspective, vol. 9. Springer, Heidelberg (2012)
MATH Google Scholar
Lin, K., Zhao, R., Xu, Z., Zhou, J.: Efficient large-scale fleet management via multi-agent deep reinforcement learning. arXiv preprint arXiv:1802.06444 (2018)
Makar, R., Mahadevan, S., Ghavamzadeh, M.: Hierarchical multi-agent reinforcement learning. In: Proceedings of the Fifth International Conference on Autonomous Agents, pp. 246–253. ACM (2001)
Google Scholar
Mnih, V., et al.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485 (2018)
Riedmiller, M., Withopf, D.: Effective methods for reinforcement learning in large multi-agent domains (leistungsfähige verfahren für das reinforcement lernen in komplexen multi-agenten-umgebungen). IT-Inform. Technol. 47(5), 241–249 (2005)
Google Scholar
Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 345–383 (2000)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
MATH Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. intell. 112(1–2), 181–211 (1999)
Article MathSciNet Google Scholar
Sutton, R.S., Precup, D., Singh, S.P.: Intra-option learning about temporally abstract actions. In: ICML., vol. 98, pp. 556–564 (1998)
Google Scholar
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Readings in Agents, pp. 487–494 (1998)
Google Scholar
Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)
Article Google Scholar
Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department Computer Science, University of Oxford, Oxford, UK
Dongge Han, Wendelin Böhmer, Michael Wooldridge & Alex Rogers

Authors

Dongge Han
View author publications
You can also search for this author in PubMed Google Scholar
Wendelin Böhmer
View author publications
You can also search for this author in PubMed Google Scholar
Michael Wooldridge
View author publications
You can also search for this author in PubMed Google Scholar
Alex Rogers
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dongge Han .

Editor information

Editors and Affiliations

Department of Computing, Macquarie University, Sydney, NSW, Australia
Abhaya C. Nayak
RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
Alok Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, D., Böhmer, W., Wooldridge, M., Rogers, A. (2019). Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11671. Springer, Cham. https://doi.org/10.1007/978-3-030-29911-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-29911-8_7
Published: 23 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29910-1
Online ISBN: 978-3-030-29911-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics