Skip to main content

Multi-Agent Hyper-Attention Policy Optimization

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13623))

Included in the following conference series:

  • 1396 Accesses

Abstract

Policy-based methods like MAPPO have exhibited amazing results in diverse test scenarios in multi-agent reinforcement learning. Nevertheless, current actor-critic algorithms do not fully leverage the benefits of the centralized training with decentralized execution paradigm and do not effectively use global information to train the centralized critic, as seen by IPPO’s superior performance in some scenarios compared to MAPPO. To address this problem, we propose a game abstraction technique based on a state-conditioned hyper-attention network. It can help agents integrate important data and refine complex game interactions to achieve efficient policy optimization. In addition, to improve the stability of the trust-region methods, we introduce a point probability distance penalty in addition to the clipping operation in PPO. Experimental results demonstrate the advantages of our method in various cooperative environments.

This Project was Supported by National Defence Foundation Reinforcement Fund.

B. Zhang, Z. Xu, Y. Chen and L. Li—These Authors Contributed Equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chu, X.: Policy optimization with penalized point probability distance: an alternative to proximal policy optimization. arXiv preprint arXiv:1807.00442 (2018)

  2. Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS (2016)

    Google Scholar 

  3. Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., Madry, A.: Implementation matters in deep policy gradients: A case study on ppo and trpo. arXiv preprint arXiv:2005.12729 (2020)

  4. Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)

    Google Scholar 

  5. Foerster, J., Farquhar, G., Afouras, T., et al.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  6. Ha, D., Dai, A., Le, Q.V.: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016)

  7. Hu, J., Jiang, S., Harding, S.A., Wu, H., Liao, S.: Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479 (2021)

  8. Huang, Y., Xie, K., Bharadhwaj, H., Shkurti, F.: Continual model-based reinforcement learning with hypernetworks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805. IEEE (2021)

    Google Scholar 

  9. Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970. PMLR (2019)

    Google Scholar 

  10. Kuba, J.G., Chen, R., Wen, M., Wen, Y., et al.: Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251 (2021)

  11. Liu, Y., Wang, W., Hu, Y., Hao, J., Chen, X., Gao, Y.: Multi-agent game abstraction via graph attention neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)

    Google Scholar 

  12. Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  13. Peng, Z., Li, Q., Hui, K.M., Liu, C., Zhou, B.: Learning to simulate self-driven particles system with coordinated policy optimization. In: Advances in Neural Information Processing Systems, vol. 34, pp. 10784–10797 (2021)

    Google Scholar 

  14. Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J.N., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv:abs/1803.11485 (2018)

  15. Samvelyan, M., et al.: The starcraft multi-agent challenge. arXiv:abs/1902.04043 (2019)

  16. Sarafian, E., Keynan, S., Kraus, S.: Recomposing the reinforcement learning building blocks with hypernetworks. In: International Conference on Machine Learning, pp. 9301–9312. PMLR (2021)

    Google Scholar 

  17. Schulman, J., Levine, S., Abbeel, P., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)

    Google Scholar 

  18. Schulman, J., Moritz, P., Levine, S., et al.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)

  19. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  20. Son, K., Kim, D., et al.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv:abs/1905.05408 (2019)

  21. Su, J., Adams, S., Beling, P.A.: Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)

    Google Scholar 

  22. Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv:abs/1706.05296 (2018)

  23. Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE (2017)

    Google Scholar 

  24. Tao, N., Baxter, J., Weaver, L.: A multi-agent, policy-gradient approach to network routing. In: Proceedings of the 18th International Conference on Machine Learning. Citeseer (2001)

    Google Scholar 

  25. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)

    Google Scholar 

  26. Vinyals, O., Babuschkin, I., Czarnecki, W.M., et al.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)

    Article  Google Scholar 

  27. Wang, J., Ren, Z., Liu, T., et al.: QPLEX: duplex dueling multi-agent q-learning. In: International Conference on Learning Representations, ICLR (2021)

    Google Scholar 

  28. Wang, Y., He, H., Tan, X.: Truly proximal policy optimization. In: Uncertainty in Artificial Intelligence, pp. 113–122. PMLR (2020)

    Google Scholar 

  29. de Witt, C.S., Gupta, T., Makoviichuk, D., Makoviychuk, V., Torr, P.H., Sun, M., Whiteson, S.: Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533 (2020)

  30. Yang, Y., Luo, R., Li, M., et al.: Mean field multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR (2018)

    Google Scholar 

  31. Yu, C., Velu, A., Vinitsky, E., Wang, Y., et al.: The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)

  32. Zhang, B., Bai, Y., Xu, Z., Li, D., Fan, G.: Efficient cooperation strategy generation in multi-agent video games via hypergraph neural network. arXiv preprint arXiv:2203.03265 (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lijuan Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, B. et al. (2023). Multi-Agent Hyper-Attention Policy Optimization. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30105-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30104-9

  • Online ISBN: 978-3-031-30105-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics