Multi-Agent Hyper-Attention Policy Optimization

Zhang, Bin; Xu, Zhiwei; Chen, Yiqun; Li, Dapeng; Bai, Yunpeng; Fan, Guoliang; Li, Lijuan

doi:10.1007/978-3-031-30105-6_7

Bin Zhang¹²,
Zhiwei Xu¹²,
Yiqun Chen¹²,
Dapeng Li¹²,
Yunpeng Bai¹²,
Guoliang Fan¹² &
…
Lijuan Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13623))

Included in the following conference series:

International Conference on Neural Information Processing

1396 Accesses

Abstract

Policy-based methods like MAPPO have exhibited amazing results in diverse test scenarios in multi-agent reinforcement learning. Nevertheless, current actor-critic algorithms do not fully leverage the benefits of the centralized training with decentralized execution paradigm and do not effectively use global information to train the centralized critic, as seen by IPPO’s superior performance in some scenarios compared to MAPPO. To address this problem, we propose a game abstraction technique based on a state-conditioned hyper-attention network. It can help agents integrate important data and refine complex game interactions to achieve efficient policy optimization. In addition, to improve the stability of the trust-region methods, we introduce a point probability distance penalty in addition to the clipping operation in PPO. Experimental results demonstrate the advantages of our method in various cooperative environments.

This Project was Supported by National Defence Foundation Reinforcement Fund.

B. Zhang, Z. Xu, Y. Chen and L. Li—These Authors Contributed Equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient Policy Generation in Multi-agent Systems via Hypergraph Neural Network

Attention-Aware Actor for Cooperative Multi-agent Reinforcement Learning

SparseMAAC: Sparse Attention for Multi-agent Reinforcement Learning

References

Chu, X.: Policy optimization with penalized point probability distance: an alternative to proximal policy optimization. arXiv preprint arXiv:1807.00442 (2018)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS (2016)
Google Scholar
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., Madry, A.: Implementation matters in deep policy gradients: A case study on ppo and trpo. arXiv preprint arXiv:2005.12729 (2020)
Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
Google Scholar
Foerster, J., Farquhar, G., Afouras, T., et al.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Ha, D., Dai, A., Le, Q.V.: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016)
Hu, J., Jiang, S., Harding, S.A., Wu, H., Liao, S.: Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479 (2021)
Huang, Y., Xie, K., Bharadhwaj, H., Shkurti, F.: Continual model-based reinforcement learning with hypernetworks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805. IEEE (2021)
Google Scholar
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970. PMLR (2019)
Google Scholar
Kuba, J.G., Chen, R., Wen, M., Wen, Y., et al.: Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251 (2021)
Liu, Y., Wang, W., Hu, Y., Hao, J., Chen, X., Gao, Y.: Multi-agent game abstraction via graph attention neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)
Google Scholar
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Peng, Z., Li, Q., Hui, K.M., Liu, C., Zhou, B.: Learning to simulate self-driven particles system with coordinated policy optimization. In: Advances in Neural Information Processing Systems, vol. 34, pp. 10784–10797 (2021)
Google Scholar
Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J.N., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv:abs/1803.11485 (2018)
Samvelyan, M., et al.: The starcraft multi-agent challenge. arXiv:abs/1902.04043 (2019)
Sarafian, E., Keynan, S., Kraus, S.: Recomposing the reinforcement learning building blocks with hypernetworks. In: International Conference on Machine Learning, pp. 9301–9312. PMLR (2021)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Google Scholar
Schulman, J., Moritz, P., Levine, S., et al.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Son, K., Kim, D., et al.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv:abs/1905.05408 (2019)
Su, J., Adams, S., Beling, P.A.: Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)
Google Scholar
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv:abs/1706.05296 (2018)
Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE (2017)
Google Scholar
Tao, N., Baxter, J., Weaver, L.: A multi-agent, policy-gradient approach to network routing. In: Proceedings of the 18th International Conference on Machine Learning. Citeseer (2001)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Google Scholar
Vinyals, O., Babuschkin, I., Czarnecki, W.M., et al.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Article Google Scholar
Wang, J., Ren, Z., Liu, T., et al.: QPLEX: duplex dueling multi-agent q-learning. In: International Conference on Learning Representations, ICLR (2021)
Google Scholar
Wang, Y., He, H., Tan, X.: Truly proximal policy optimization. In: Uncertainty in Artificial Intelligence, pp. 113–122. PMLR (2020)
Google Scholar
de Witt, C.S., Gupta, T., Makoviichuk, D., Makoviychuk, V., Torr, P.H., Sun, M., Whiteson, S.: Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533 (2020)
Yang, Y., Luo, R., Li, M., et al.: Mean field multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR (2018)
Google Scholar
Yu, C., Velu, A., Vinitsky, E., Wang, Y., et al.: The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Zhang, B., Bai, Y., Xu, Z., Li, D., Fan, G.: Efficient cooperation strategy generation in multi-agent video games via hypergraph neural network. arXiv preprint arXiv:2203.03265 (2022)

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Bin Zhang, Zhiwei Xu, Yiqun Chen, Dapeng Li, Yunpeng Bai, Guoliang Fan & Lijuan Li

Authors

Bin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yiqun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dapeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yunpeng Bai
View author publications
You can also search for this author in PubMed Google Scholar
Guoliang Fan
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lijuan Li .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, B. et al. (2023). Multi-Agent Hyper-Attention Policy Optimization. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-30105-6_7
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30104-9
Online ISBN: 978-3-031-30105-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-Agent Hyper-Attention Policy Optimization

Abstract

Access this chapter

Similar content being viewed by others

Efficient Policy Generation in Multi-agent Systems via Hypergraph Neural Network

Attention-Aware Actor for Cooperative Multi-agent Reinforcement Learning

SparseMAAC: Sparse Attention for Multi-agent Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-Agent Hyper-Attention Policy Optimization

Abstract

Access this chapter

Similar content being viewed by others

Efficient Policy Generation in Multi-agent Systems via Hypergraph Neural Network

Attention-Aware Actor for Cooperative Multi-agent Reinforcement Learning

SparseMAAC: Sparse Attention for Multi-agent Reinforcement Learning

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation