Skip to main content

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

  • Conference paper
  • First Online:
PRICAI 2019: Trends in Artificial Intelligence (PRICAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11671))

Included in the following conference series:

Abstract

In a multi-agent system, an agent’s optimal policy will typically depend on the policies chosen by others. Therefore, a key issue in multi-agent systems research is that of predicting the behaviours of others, and responding promptly to changes in such behaviours. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical reinforcement learning framework. However, this approach results in inflexibility of agents if options have an extended duration and are dynamic. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent’s actual behaviour and its broadcast intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options. We evaluate our models empirically on a set of multi-agent pursuit and taxi tasks, and show that our agents learn to adapt flexibly across scenarios that require different termination behaviours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bacon, P.L., Harb, J., Precup, D.: The option-critic architecture. In: AAAI, pp. 1726–1734 (2017)

    Google Scholar 

  2. Dietterich, T.G.: Hierarchical reinforcement learning with the maxq value function decomposition. J. Artif. Intell. Res. 13, 227–303 (2000)

    Article  MathSciNet  Google Scholar 

  3. Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)

    Google Scholar 

  4. Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926 (2017)

  5. Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1146–1155 (2017). http://proceedings.mlr.press/v70/foerster17b.html

  6. Giannakis, M., Louis, M.: A multi-agent based system with big data processing for enhanced supply chain agility. J. Enterp. Inform. Management 29(5), 706–727 (2016)

    Article  Google Scholar 

  7. Harutyunyan, A., Vrancx, P., Bacon, P.L., Precup, D., Nowe, A.: Learning with options that terminate off-policy. arXiv preprint arXiv:1711.03817 (2017)

  8. Jennings, N.R.: Commitments and conventions: the foundation of coordination in multi-agent systems. Knowl. Eng. Rev. 8(3), 223–250 (1993)

    Article  MathSciNet  Google Scholar 

  9. Lesser, V., Ortiz Jr., C.L., Tambe, M.: Distributed Sensor Networks: A Multiagent Perspective, vol. 9. Springer, Heidelberg (2012)

    MATH  Google Scholar 

  10. Lin, K., Zhao, R., Xu, Z., Zhou, J.: Efficient large-scale fleet management via multi-agent deep reinforcement learning. arXiv preprint arXiv:1802.06444 (2018)

  11. Makar, R., Mahadevan, S., Ghavamzadeh, M.: Hierarchical multi-agent reinforcement learning. In: Proceedings of the Fifth International Conference on Autonomous Agents, pp. 246–253. ACM (2001)

    Google Scholar 

  12. Mnih, V., et al.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)

    Google Scholar 

  13. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  14. Rashid, T., Samvelyan, M., de Witt, C.S., Farquhar, G., Foerster, J., Whiteson, S.: QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv preprint arXiv:1803.11485 (2018)

  15. Riedmiller, M., Withopf, D.: Effective methods for reinforcement learning in large multi-agent domains (leistungsfähige verfahren für das reinforcement lernen in komplexen multi-agenten-umgebungen). IT-Inform. Technol. 47(5), 241–249 (2005)

    Google Scholar 

  16. Stone, P., Veloso, M.: Multiagent systems: a survey from a machine learning perspective. Auton. Robot. 8(3), 345–383 (2000)

    Article  Google Scholar 

  17. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  18. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. intell. 112(1–2), 181–211 (1999)

    Article  MathSciNet  Google Scholar 

  19. Sutton, R.S., Precup, D., Singh, S.P.: Intra-option learning about temporally abstract actions. In: ICML., vol. 98, pp. 556–564 (1998)

    Google Scholar 

  20. Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Readings in Agents, pp. 487–494 (1998)

    Google Scholar 

  21. Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995)

    Article  Google Scholar 

  22. Watkins, C., Dayan, P.: Q-learning. Mach. Learn. 8, 279–292 (1992)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongge Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Han, D., Böhmer, W., Wooldridge, M., Rogers, A. (2019). Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11671. Springer, Cham. https://doi.org/10.1007/978-3-030-29911-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29911-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29910-1

  • Online ISBN: 978-3-030-29911-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics