Abstract
Air-to-air combat system is a complex multi-agent system (MAS) wherein a large number of unmanned combat aerial vehicles learn to combat with their opponents in a highly dynamic and uncertain environment. Because of the local observability of each individual, it is difficult for classical multi-agent learning methods to get effective cooperative strategies. Recently, a communication mechanism has been proposed to solve the local observability issue of MAS. However, existing methods with predefined rules easily cause an exponential increase in state–action pairs, leading to high communication costs. Taking this cue, this paper designs a graph neural network based on a two-stage graph-attention mechanism to capture the key interaction relationships and communication connections between agents in complex air-to-air combat scenarios. Based on an essential backbone multi-agent reinforcement learning method, known as Multi-Agent Proximal Policy Optimization, the proposed method with a hard- and soft-attention scheme can realize the dynamic adjustment of the communication relationship and ad hoc network of multiple agents, by cutting off the unrelated interaction connections while building the correlation importance between pair agents, concurrently. Last but not least, the experimental study in the simulation environment has validated the significance of our proposed method in solving the large-scale air-to-air combat problems.
Similar content being viewed by others
Data availibility statement
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Ferber J, Weiss G (1999) Multi-agent Systems: an Introduction to Distributed Artificial Intelligence. Addison-wesley-Reading, uk
Zhang K, Yang Z, Başar T (2021) Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control, 321–384
Roesch M, Linder C, Zimmermann R, Rudolf A, Hohmann A, Reinhart G (2020) Smart grid for industry using multi-agent reinforcement learning. Applied Sciences 10(19):6900
Arel I, Liu C, Urbanik T, Kohls AG (2010) Reinforcement learning-based multi-agent system for network traffic signal control. IET Intelligent Transport Systems 4(2):128–135
Tan CF, Wahidin L, Khalil S, Tamaldin N, Hu J, Rauterberg G (2016) The application of expert system: A review of research and applications. ARPN Journal of Engineering and Applied Sciences 11(4):2448–2453
Oliehoek FA, Spaan MT, Vlassis N (2008) Optimal and approximate q-value functions for decentralized pomdps. Journal of Artificial Intelligence Research 32:289–353
Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems 30
Yu C, Velu A, Vinitsky E, Wang Y, Bayen A, Wu Y (2021) The surprising effectiveness of ppo in cooperative, multi-agent games. arXiv preprint arXiv:2103.01955
Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2018) Counterfactual multi-agent policy gradients. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi V, Jaderberg M, Lanctot M, Sonnerat N, Leibo JZ, Tuyls K, et al. (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296
Rashid T, Samvelyan M, Schroeder C, Farquhar G, Foerster J, Whiteson S (2018) Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 4295–4304 . PMLR
Son K, Kim D, Kang WJ, Hostallero DE, Yi Y (2019) Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 5887–5896. PMLR
Yang Y, Hao J, Liao B, Shao K, Chen G, Liu W, Tang H (2020) Qatten: A general framework for cooperative multiagent reinforcement learning. arXiv preprint arXiv:2002.03939
Kurin V, Igl M, Rocktäschel T, Boehmer W, Whiteson S (2020) My body is a cage: the role of morphology in graph-based incompatible control. arXiv preprint arXiv:2010.01856
Li S, Gupta JK, Morales P, Allen R, Kochenderfer MJ (2020) Deep implicit coordination graphs for multi-agent reinforcement learning. arXiv preprint arXiv:2006.11438
Jiang J, Dun C, Huang T, Lu Z (2018) Graph convolutional reinforcement learning. arXiv preprint arXiv:1810.09202
Ryu H, Shin H, Park J (2020) Multi-agent actor-critic with hierarchical graph attention network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7236–7243
Sukhbaatar S, Fergus R, et al. (2016) Learning multiagent communication with backpropagation. Advances in neural information processing systems 29
Foerster J, Assael IA, De Freitas N, Whiteson S (2016) Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems 29
Jiang J, Lu Z (2018) Learning attentional communication for multi-agent cooperation. Advances in neural information processing systems 31
Singh A, Jain T, Sukhbaatar S (2018) Learning when to communicate at scale in multiagent cooperative and competitive tasks. arXiv preprint arXiv:1812.09755
Hu G, Zhu Y, Zhao D, Zhao M, Hao J (2020) Event-triggered multi-agent reinforcement learning with communication under limited-bandwidth constraint. arXiv preprint arXiv:2010.04978
Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, Yi Y (2019) Learning to schedule communication in multi-agent reinforcement learning. arXiv preprint arXiv:1902.01554
Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2020) Multi-agent game abstraction via graph attention neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7211–7218
Niu Y, Paleja RR, Gombolay MC (2021) Multi-agent graph-attention communication and teaming. In: AAMAS, pp. 964–973
Du Y, Liu B, Moens V, Liu Z, Ren Z, Wang J, Chen X, Zhang H (2021) Learning correlated communication topology in multi-agent reinforcement learning. In: Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp. 456–464
Li Z-P, Su H-L, Zhu X-B, Wei X-M, Jiang X-S, Gribova V, Filaretov VF, Huang D-S (2021) Hierarchical graph pooling with self-adaptive cluster aggregation. IEEE Trans Cognit Dev Syst 14(3):1198–1207
Apicella CL, Marlowe FW, Fowler JH, Christakis NA (2012) Social networks and cooperation in hunter-gatherers. Nature 481(7382):497–501
Battaglia P, Pascanu R, Lai M, Jimenez Rezende D, et al. (2016) Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems 29
Feng F, He X, Zhang H, Chua T-S (2021) Cross-gcn: Enhancing graph convolutional network with k-order feature interactions. IEEE Trans Knowl Data Eng 35(1):225–236
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. stat 1050:20
Ye Y, Ji S (2021) Sparse graph attention networks. IEEE Trans Knowl Data Eng 35(1):905–916
Chen Y, Rosolia U, Ames AD (2021) Decentralized task and path planning for multi-robot systems. IEEE Robotics and Automation Letters 6(3):4337–4344
Li Q, Gama F, Ribeiro A, Prorok A (2020) Graph neural networks for decentralized multi-robot path planning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11785–11792. IEEE
Luo T, Subagdja B, Wang D, Tan A-H (2019) Multi-agent collaborative exploration through graph-based deep reinforcement learning. In: 2019 IEEE International Conference on Agents (ICA), pp. 2–7. IEEE
Ziyang Z, Ping Z, Yixuan X, Yuxuan J (2019) Distributed intelligent self-organized mission planning of multi-uav for dynamic targets cooperative search-attack. Chinese Journal of Aeronautics 32(12):2706–2716
Yu B, Yin H, Zhu Z (2017) Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI Conference on Artificial Intelligence
Yu C, Velu A, Vinitsky E, Gao J, Wang Y, Bayen A, Wu Y (2022) The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems 35:24611–24624
Cui J, Liu Y, Nallanathan A (2019) Multi-agent reinforcement learning-based resource allocation for uav networks. IEEE Transactions on Wireless Communications 19(2):729–743
Li L, Yang Z, Sun Z, Zhan G, Piao H, Zhou D (2022) Generation method of autonomous evasive maneuver strategy in air combat. In: 2022 22nd International Conference on Control, Automation and Systems (ICCAS), pp. 360–365. https://doi.org/10.23919/ICCAS55662.2022.10003888
Yang Z, Zhou D, Piao H, Zhang K, Kong W, Pan Q (2020) Evasive maneuver strategy for ucav in beyond-visual-range air combat based on hierarchical multi-objective evolutionary algorithm. IEEE Access 8:46605–46623. https://doi.org/10.1109/ACCESS.2020.2978883
Gao W, Yang Z, Sun Z, Piao H, He Y, Zhou D (2022) Real-time calculation of tactical control range in beyond visual range air combat. In: 2022 IEEE International Conference on Unmanned Systems (ICUS), pp. 76–80 . https://doi.org/10.1109/ICUS55513.2022.9986608
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013)Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P et al (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 575(7782):350–354
Panait L, Luke S (2005) Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems 11:387–434
Open A Gym A Toolkit for Developing and Comparing Reinforcement Learning Algorithms
Acknowledgements
This work was supported in part by the National Key Research and Development Program of China (No. 2021ZD0112400), the National Natural Science Foundation of China under Grants 61906032 and 62206041, the Young Elite Scientists Sponsorship Program by CAST under Grant 2022QNRC001, the NSFC-Liaoning Province United Foundation under Grant U1908214, the 111 Project, No. D23006, the Fundamental Research Funds for the Central Universitiesunder grant DUT21TD107, DUT22ZD214, the LiaoNing Revitalization Talents Program, No. XLYC2008017.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Zhixiao Sun and Huahua Wu are joint first authors.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Z., Wu, H., Shi, Y. et al. Multi-agent air combat with two-stage graph-attention communication. Neural Comput & Applic 35, 19765–19781 (2023). https://doi.org/10.1007/s00521-023-08784-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08784-7