Abstract
Policy-based methods like MAPPO have exhibited amazing results in diverse test scenarios in multi-agent reinforcement learning. Nevertheless, current actor-critic algorithms do not fully leverage the benefits of the centralized training with decentralized execution paradigm and do not effectively use global information to train the centralized critic, as seen by IPPO’s superior performance in some scenarios compared to MAPPO. To address this problem, we propose a game abstraction technique based on a state-conditioned hyper-attention network. It can help agents integrate important data and refine complex game interactions to achieve efficient policy optimization. In addition, to improve the stability of the trust-region methods, we introduce a point probability distance penalty in addition to the clipping operation in PPO. Experimental results demonstrate the advantages of our method in various cooperative environments.
This Project was Supported by National Defence Foundation Reinforcement Fund.
B. Zhang, Z. Xu, Y. Chen and L. Li—These Authors Contributed Equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chu, X.: Policy optimization with penalized point probability distance: an alternative to proximal policy optimization. arXiv preprint arXiv:1807.00442 (2018)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NIPS (2016)
Engstrom, L., Ilyas, A., Santurkar, S., Tsipras, D., Janoos, F., Rudolph, L., Madry, A.: Implementation matters in deep policy gradients: A case study on ppo and trpo. arXiv preprint arXiv:2005.12729 (2020)
Feng, Y., You, H., Zhang, Z., Ji, R., Gao, Y.: Hypergraph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
Foerster, J., Farquhar, G., Afouras, T., et al.: Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence (2018)
Ha, D., Dai, A., Le, Q.V.: Hypernetworks. arXiv preprint arXiv:1609.09106 (2016)
Hu, J., Jiang, S., Harding, S.A., Wu, H., Liao, S.: Rethinking the implementation tricks and monotonicity constraint in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2102.03479 (2021)
Huang, Y., Xie, K., Bharadhwaj, H., Shkurti, F.: Continual model-based reinforcement learning with hypernetworks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 799–805. IEEE (2021)
Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 2961–2970. PMLR (2019)
Kuba, J.G., Chen, R., Wen, M., Wen, Y., et al.: Trust region policy optimisation in multi-agent reinforcement learning. arXiv preprint arXiv:2109.11251 (2021)
Liu, Y., Wang, W., Hu, Y., Hao, J., Chen, X., Gao, Y.: Multi-agent game abstraction via graph attention neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)
Lowe, R., Wu, Y.I., Tamar, A., Harb, J., Pieter Abbeel, O., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems (2017)
Peng, Z., Li, Q., Hui, K.M., Liu, C., Zhou, B.: Learning to simulate self-driven particles system with coordinated policy optimization. In: Advances in Neural Information Processing Systems, vol. 34, pp. 10784–10797 (2021)
Rashid, T., Samvelyan, M., Witt, C.S., Farquhar, G., Foerster, J.N., Whiteson, S.: Qmix: monotonic value function factorisation for deep multi-agent reinforcement learning. arXiv:abs/1803.11485 (2018)
Samvelyan, M., et al.: The starcraft multi-agent challenge. arXiv:abs/1902.04043 (2019)
Sarafian, E., Keynan, S., Kraus, S.: Recomposing the reinforcement learning building blocks with hypernetworks. In: International Conference on Machine Learning, pp. 9301–9312. PMLR (2021)
Schulman, J., Levine, S., Abbeel, P., et al.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897. PMLR (2015)
Schulman, J., Moritz, P., Levine, S., et al.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Son, K., Kim, D., et al.: Qtran: learning to factorize with transformation for cooperative multi-agent reinforcement learning. arXiv:abs/1905.05408 (2019)
Su, J., Adams, S., Beling, P.A.: Value-decomposition multi-agent actor-critics. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)
Sunehag, P., et al.: Value-decomposition networks for cooperative multi-agent learning. arXiv:abs/1706.05296 (2018)
Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., et al.: Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE (2017)
Tao, N., Baxter, J., Weaver, L.: A multi-agent, policy-gradient approach to network routing. In: Proceedings of the 18th International Conference on Machine Learning. Citeseer (2001)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017)
Vinyals, O., Babuschkin, I., Czarnecki, W.M., et al.: Grandmaster level in starcraft II using multi-agent reinforcement learning. Nature 575(7782), 350–354 (2019)
Wang, J., Ren, Z., Liu, T., et al.: QPLEX: duplex dueling multi-agent q-learning. In: International Conference on Learning Representations, ICLR (2021)
Wang, Y., He, H., Tan, X.: Truly proximal policy optimization. In: Uncertainty in Artificial Intelligence, pp. 113–122. PMLR (2020)
de Witt, C.S., Gupta, T., Makoviichuk, D., Makoviychuk, V., Torr, P.H., Sun, M., Whiteson, S.: Is independent learning all you need in the starcraft multi-agent challenge? arXiv preprint arXiv:2011.09533 (2020)
Yang, Y., Luo, R., Li, M., et al.: Mean field multi-agent reinforcement learning. In: International Conference on Machine Learning. PMLR (2018)
Yu, C., Velu, A., Vinitsky, E., Wang, Y., et al.: The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955 (2021)
Zhang, B., Bai, Y., Xu, Z., Li, D., Fan, G.: Efficient cooperation strategy generation in multi-agent video games via hypergraph neural network. arXiv preprint arXiv:2203.03265 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, B. et al. (2023). Multi-Agent Hyper-Attention Policy Optimization. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13623. Springer, Cham. https://doi.org/10.1007/978-3-031-30105-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-30105-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30104-9
Online ISBN: 978-3-031-30105-6
eBook Packages: Computer ScienceComputer Science (R0)