Abstract
In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI, pp. 1726–1734 (2017)
Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)
Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926 (2017)
Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1146–1155 (2017). http://proceedings.mlr.press/v70/foerster17b.html
Giannakis, M., Louis, M.: A multi-agent based system with big data processing for enhanced supply chain agility. J. Enterp. Inform. Management 29(5), 706–727 (2016)
Harutyunyan, A., Vrancx, P., Bacon, P.L., Precup, D., Nowe, A.: Learning with options that terminate off-policy. arXiv preprint arXiv:1711.03817 (2017)
Jennings, N.R.: Commitments and conventions: the foundation of coordination in multi-agent systems. Knowl. Eng. Rev. 8(3), 223–250 (1993)
Lesser, V., Ortiz Jr., C.L., Tambe, M.: Distributed Sensor Networks: A Multiagent Perspective, vol. 9. Springer, Heidelberg (2012)
Lin, K., Zhao, R., Xu, Z., Zhou, J.: Efficient large-scale fleet management via multi-agent deep reinforcement learning. arXiv preprint arXiv:1802.06444 (2018)
Makar, R., Mahadevan, S., Ghavamzadeh, M.: Hierarchical multi-agent reinforcement learning. In: Proceedings of the Fifth International Conference on Autonomous Agents, pp. 246–253. ACM (2001)
Mnih, V., et al.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485 (2018)
Riedmiller, M., Withopf, D.: Effective methods for reinforcement learning in large multi-agent domains (leistungsfähige verfahren für das reinforcement lernen in komplexen multi-agenten-umgebungen). IT-Inform. Technol. 47(5), 241–249 (2005)
Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 345–383 (2000)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. intell. 112(1–2), 181–211 (1999)
Sutton, R.S., Precup, D., Singh, S.P.: Intra-option learning about temporally abstract actions. In: ICML., vol. 98, pp. 556–564 (1998)
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Readings in Agents, pp. 487–494 (1998)
Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)
Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Han, D., Böhmer, W., Wooldridge, M., Rogers, A. (2019). Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11671. Springer, Cham. https://doi.org/10.1007/978-3-030-29911-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-29911-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29910-1
Online ISBN: 978-3-030-29911-8
eBook Packages: Computer ScienceComputer Science (R0)