Skip to main content

Imbalanced Equilibrium: Emergence of Social Asymmetric Coordinated Behavior in Multi-agent Games

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Abstract

Multi-agent deep reinforcement learning (MADRL) has made remarkable progress but usually requires delicate and fragile reward engineering. Modeling other agents (MOA) is an effective method for compensating for the absence of efficient reward signals. However, existing MOA methods often assume that only one agent can model other non-learning agents. In this study, we propose continuous mutual modeling (CMM), which constantly models other agents that also learn appropriate behaviors from their viewpoints to facilitate the coordination among agents in complex MADRL environments. We then propose a CMM framework referred to as predictor-actor-critic (PAC) in which every agent determines its actions by estimating those of other agents through mutual modeling. We experimentally show that the proposed method enables agents to realize other agents’ activities and promotes the emergence of better-coordinated behaviors in agent society.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Albrecht, S.V., Stone, P.: Autonomous agents modelling other agents: a comprehensive survey and open problems. Artif. Intell. 258, 66–95 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  2. Baker, B., et al.: Emergent tool use from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 (2019)

  3. Barrett, S., Stone, P.: Cooperating with unknown teammates in complex domains: a robot soccer case study of ad hoc teamwork. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  4. Bowling, M., McCracken, P.: Coordination and adaptation in impromptu teams. In: AAAI, vol. 5, pp. 53–58 (2005)

    Google Scholar 

  5. He, H., et al.: Opponent modeling in deep reinforcement learning. In: International Conference on Machine Learning, pp. 1804–1813. PMLR (2016)

    Google Scholar 

  6. Heess, N., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)

  7. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  8. Hong, Z.W., et al.: A deep policy inference q-network for multi-agent systems. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1388–1396 (2018)

    Google Scholar 

  9. Hughes, E., et al.: Inequity aversion improves cooperation in intertemporal social dilemmas. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  10. Iida, H., Handa, K.I., Uiterwijk, J.: Tutoring strategies in game-tree search. ICGA J. 18(4), 191–204 (1995)

    Article  Google Scholar 

  11. Jaques, N., et al.: Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: International Conference on Machine Learning, pp. 3040–3049. PMLR (2019)

    Google Scholar 

  12. Kollock, P.: Social dilemmas: the anatomy of cooperation. Ann. Rev. Sociol. 24(1), 183–214 (1998)

    Article  Google Scholar 

  13. OpenAI: OpenAI five (2018). https://blog.openai.com/openai-five/

  14. Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)

    Article  Google Scholar 

  15. Rovatsos, M., Weiß, G., Wolf, M.: Multiagent learning for open systems: a study in opponent classification. In: Alonso, E., Kudenko, D., Kazakov, D. (eds.) AAMAS 2001-2002. LNCS (LNAI), vol. 2636, pp. 66–87. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-44826-8_5

    Chapter  MATH  Google Scholar 

  16. Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)

  17. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  18. Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  19. Stone, P., Kaminka, G.A., Kraus, S., Rosenschein, J.S.: Ad hoc autonomous agent teams: collaboration without pre-coordination. In: Twenty-Fourth AAAI Conference on Artificial Intelligence (2010)

    Google Scholar 

  20. Vinitsky, E., et al.: An open source implementation of sequential social dilemma games (2019). GitHub repository. https://github.com/eugenevinitsky/sequential_social_dilemma_games/issues/182

Download references

Acknowledgements

This work was partly supported by JST KAKENHI and SPRING, Grant Numbers 20H04245 and JPMJSP2128.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yidong Bai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bai, Y., Sugawara, T. (2023). Imbalanced Equilibrium: Emergence of Social Asymmetric Coordinated Behavior in Multi-agent Games. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13624. Springer, Cham. https://doi.org/10.1007/978-3-031-30108-7_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30108-7_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30107-0

  • Online ISBN: 978-3-031-30108-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics