Skip to main content
Log in

Multi-intent autonomous decision-making for air combat with deep reinforcement learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Autonomous decision-making in unmanned aerial vehicle (UVA) confrontations presents challenges in making optimal strategy. Therefore, deep reinforcement learning (DRL) has been adopted to address these issues. However, existing DRL decision-making models suffer from poor situational awareness and inability to distinguish between different intentions. Therefore, a multi-intent autonomous decision-making is proposed in this paper. First, three typical intentions are designed comprising head-on attacking, pursuing and fleeing to derive decision models representing different intentions. Reinforcement learning based air combat game model is constructed with different intentions, which contains designing reward functions for intentions to deal with the problem of sparse rewards. Then, we propose the Temporal Proximal Policy Optimization (T-PPO) algorithm, which optimizes the Proximal Policy Optimization algorithm by integrating the long short-term memory network and feedforward neural network. This algorithm extracts the historical temporal information to enhance situational awareness. In addition, a basic-confrontation progressive training method is proposed to provide intention guidance and increase training diversity, which can improve learning efficiency and intelligent decision-making capability. Finally, experiments in our constructed UAV confrontation environment demonstrate that the proposed intentional decision models exhibit good performance in stability and learning efficiency, achieving high rewards, win rates, and low steps. Specifically, our autonomous decision-making increases win rate by 26% when head-on attacking and learning efficiency by 50% when pursuing. It is further proof of the potential and value of our multi-intent autonomous decision-making applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Algorithm 1
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability and access

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Jiang L, Wei R, Wang D (2023) Uavs rounding up inspired by communication multi-agent depth deterministic policy gradient. Appl Intell 53(10):11474–11489. https://doi.org/10.1007/s10489-022-03986-3

    Article  Google Scholar 

  2. Zhao R, Wang Y, Xiao G et al (2022) A method of path planning for unmanned aerial vehicle based on the hybrid of selfish herd optimizer and particle swarm optimizer. Appl Intell 52(14):16775–16798. https://doi.org/10.1007/s10489-021-02353-y

    Article  Google Scholar 

  3. Shi H, Lu F, Wu L et al (2022) Optimal trajectories of multi-uavs with approaching formation for target tracking using improved harris hawks optimizer. Appl Intell 52(12):14313–14335. https://doi.org/10.1007/s10489-022-03270-4

    Article  Google Scholar 

  4. Zhang A, Zhang B, Bi W et al (2022) Attention based trajectory prediction method under the air combat environment. Appl Intell 1–15. https://doi.org/10.1007/s10489-022-03292-y

  5. Li S, Chen M, Wang Y et al (2022) A fast algorithm to solve large-scale matrix games based on dimensionality reduction and its application in multiple unmanned combat air vehicles attack-defense decision-making. Inf Sci 594:305–321. https://doi.org/10.1016/j.ins.2022.02.025

    Article  Google Scholar 

  6. Zhang T, Li C, Ma D et al (2021) An optimal task management and control scheme for military operations with dynamic game strategy. Aerospace Sci Technol 115:106815. https://doi.org/10.1016/j.ast.2021.106815

    Article  Google Scholar 

  7. Guo C, Zhang J, Hu J, et al (2023) Uav air combat algorithm based on bayesian probability model. In: Fu W, Gu M, Niu Y (eds) Proceedings of 2022 International Conference on Autonomous Unmanned Systems (ICAUS 2022). Springer Nature Singapore, Singapore, pp 3176–3185, https://doi.org/10.1007/978-981-99-0479-2_292

  8. Wu H, Li M, Gao Q et al (2022) Eavesdropping and anti-eavesdropping game in uav wiretap system: A differential game approach. IEEE Trans Wirel Commun 21(11):9906–9920. https://doi.org/10.1109/TWC.2022.3180395

    Article  Google Scholar 

  9. Chai S, Lau VKN (2021) Multi-uav trajectory and power optimization for cached uav wireless networks with energy and content recharging-demand driven deep learning approach. IEEE J Sel Areas Commun 39(10):3208–3224. https://doi.org/10.1109/JSAC.2021.3088694

    Article  Google Scholar 

  10. McGrew JS, How JP, Williams B et al (2010) Air-combat strategy using approximate dynamic programming. J Guid Control Dyn 33(5):1641–1654. https://doi.org/10.2514/1.46815

    Article  Google Scholar 

  11. Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448. https://doi.org/10.1016/j.eswa.2022.117448

    Article  Google Scholar 

  12. Ng CBR, Bil C, Sardina S et al (2022) Designing an expert system to support aviation occurrence investigations. Expert Syst Appl 207:117994. https://doi.org/10.1016/j.eswa.2022.117994

    Article  Google Scholar 

  13. Yu X, Gao X, Wang L, et al (2022) Cooperative multi-uav task assignment in cross-regional joint operations considering ammunition inventory. Drones 6(3). https://doi.org/10.3390/drones6030077

  14. Xue L, Zhou R, Ran H (2022) Air combat decision based on genetic fuzzy tree. In: Yan L, Duan H, Yu X (eds) Advances in Guidance, Navigation and Control. Springer Singapore, Singapore, pp 5515–5525, https://doi.org/10.1007/978-981-15-8155-7_456

  15. Samigulina GA, Samigulina ZI (2019) Modified immune network algorithm based on the random forest approach for the complex objects control. Artif Intell Rev 52(4):2457–2473. https://doi.org/10.1007/s10462-018-9621-7

    Article  Google Scholar 

  16. Yu VF, Qiu M, Pan H et al (2021) An improved immunoglobulin-based artificial immune system for the aircraft scheduling problem with alternate aircrafts. IEEE Access 9:16532–16545. https://doi.org/10.1109/ACCESS.2021.3051971

    Article  Google Scholar 

  17. Xu X, Duan H, Guo Y et al (2020) A cascade adaboost and cnn algorithm for drogue detection in uav autonomous aerial refueling. Neurocomputing 408:121–134. https://doi.org/10.1016/j.neucom.2019.10.115

    Article  Google Scholar 

  18. Jiang H, Shi D, Xue C et al (2021) Multi-agent deep reinforcement learning with type-based hierarchical group communication. Appl Intell 51:5793–5808. https://doi.org/10.1007/s10489-020-02065-9

    Article  Google Scholar 

  19. Puente-Castro A, Rivero D, Pazos A et al (2022) Uav swarm path planning with reinforcement learning for field prospecting. Appl Intell 52(12):14101–14118. https://doi.org/10.1007/s10489-022-03254-4

    Article  Google Scholar 

  20. Wu K, Yang Y, Liu Q et al (2023a) Hierarchical independent coding scheme for varifocal multiview images based on angular-focal joint prediction. IEEE Trans Multimed 1–13. https://doi.org/10.1109/TMM.2023.3306072

  21. Wu K, Yang Y, Liu Q et al (2023b) Focal stack image compression based on basis-quadtree representation. IEEE Trans Multimed 25:3975–3988. https://doi.org/10.1109/TMM.2022.3169055

    Article  Google Scholar 

  22. Wu K, Liu Q, Wang Y et al (2023c) End-to-end varifocal multiview images coding framework from data acquisition end to vision application end. Optics Express 31(7):11659–11679. https://doi.org/10.1364/OE.482141

    Article  Google Scholar 

  23. Sun Z, Piao H, Yang Z et al (2021) Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Eng Appl Artif Intell 98:104112. https://doi.org/10.1016/j.engappai.2020.104112

    Article  Google Scholar 

  24. Hu D, Yang R, Zhang Y et al (2022) Aerial combat maneuvering policy learning based on confrontation demonstrations and dynamic quality replay. Eng Appl Artif Intell 111:104767. https://doi.org/10.1016/j.engappai.2022.104767

    Article  Google Scholar 

  25. Wang B, Li S, Gao X et al (2022) Weighted mean field reinforcement learning for large-scale uav swarm confrontation. Appl Intell 1–16. https://doi.org/10.1007/s10489-022-03840-6

  26. Yang Q, Zhang J, Shi G et al (2019) Maneuver decision of uav in short-range air combat based on deep reinforcement learning. IEEE Access 8:363–378. https://doi.org/10.1109/ACCESS.2019.2961426

    Article  Google Scholar 

  27. Jiandong Z, Qiming Y, Guoqing S et al (2021) Uav cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 32(6):1421–1438. https://doi.org/10.23919/JSEE.2021.000121

    Article  Google Scholar 

  28. Xianyong J, Hou M, Wu G et al (2022) Research on maneuvering decision algorithm based on improved deep deterministic policy gradient. IEEE Access 10:92426–92445. https://doi.org/10.1109/ACCESS.2022.3202918

    Article  Google Scholar 

  29. Bae JH, Jung H, Kim S et al (2023) Deep reinforcement learning-based air-to-air combat maneuver generation in a realistic environment. IEEE Access 11:26427–26440. https://doi.org/10.1109/ACCESS.2023.3257849

    Article  Google Scholar 

  30. Pope AP, Ide JS, Mićović D et al (2021) Hierarchical reinforcement learning for air-to-air combat. In: 2021 international conference on unmanned aircraft systems (ICUAS), IEEE, pp 275–284, https://doi.org/10.1109/ICUAS51884.2021.9476700

  31. Yuan W, Xiwen Z, Rong Z et al (2022) Research on ucav maneuvering decision method based on heuristic reinforcement learning. Comput Intell Neurosci 2022. https://doi.org/10.1155/2022/1477078

  32. Li B, Huang J, Bai S et al (2023) Autonomous air combat decision-making of uav based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8(1):64–81. https://doi.org/10.1049/cit2.12109

    Article  Google Scholar 

  33. Wr Kong, Dy Zhou, Zhou Y et al (2023) Hierarchical reinforcement learning from competitive self-play for dual-aircraft formation air combat. J Comput Des Eng 10(2):830–859. https://doi.org/10.1093/jcde/qwad020

    Article  Google Scholar 

  34. Hu J, Wang L, Hu T et al (2022) Autonomous maneuver decision making of dual-uav cooperative air combat based on deep reinforcement learning. Electronics 11(3):467. https://doi.org/10.3390/electronics11030467

    Article  Google Scholar 

  35. Cao Y, Kou YX, Li ZW et al (2023) Autonomous maneuver decision of ucav air combat based on double deep q network algorithm and stochastic game theory. Int J Aerosp Eng 2023. https://doi.org/10.1155/2023/3657814

  36. Jiang F, Xu M, Li Y et al (2023) Short-range air combat maneuver decision of uav swarm based on multi-agent transformer introducing virtual objects. Eng Appl Artif Intell 123:106358. https://doi.org/10.1016/j.engappai.2023.106358

    Article  Google Scholar 

  37. Zhang H, Zhou H, Wei Y et al (2022) Autonomous maneuver decision-making method based on reinforcement learning and monte carlo tree search. Front Neurorobot 16:996412. https://doi.org/10.3389/fnbot.2022.996412

    Article  Google Scholar 

  38. Sutton RS (2018) Reinforcement learning: An introduction. MIT press

    MATH  Google Scholar 

  39. Van Houdt G, Mosquera C, Nápoles G (2020) A review on the long short-term memory model. Artif Intell Rev 53:5929–5955. https://doi.org/10.1007/s10462-020-09838-1

    Article  Google Scholar 

  40. Austin F, Carbone G, Falco M, et al (1987) Automated maneuvering decisions for air-to-air combat. In: Guidance, navigation and control conference, p 2393, https://doi.org/10.2514/6.1987-2393

  41. Yu X, Wang Y, Qin J et al (2023) A q-based policy gradient optimization approach for doudizhu. Appl Intell 53(12):15372–15389. https://doi.org/10.1007/s10489-022-04281-x

    Article  Google Scholar 

  42. Li X, Xiao J, Cheng Y et al (2023) An actor-critic learning framework based on lyapunov stability for automatic assembly. Appl Intell 53(4):4801–4812. https://doi.org/10.1007/s10489-022-03844-2

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by a grant from Key Laboratory of Avionics System Integrated Technology, Fundamental Research Funds for the Central Universities in China, Grant No. 3072022JC0601, and the National Natural Science Foundation of China under Grant No. 52171332.

Author information

Authors and Affiliations

Authors

Contributions

Formal analysis and investigation: Zhengkun Ding, Junzheng Xu, Jiaqi Liu; Writing - original draft preparation: Luyu Jia; Writing - review and editing: Xingmei Wang, Chengtao Cai, Kejun Wu; authors read and approved the final manuscript.

Corresponding authors

Correspondence to Chengtao Cai or Xingmei Wang.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical and informed consent for data used

Ethical and informed consent for data used not applicable to this article as no datasets were used during the current study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, L., Cai, C., Wang, X. et al. Multi-intent autonomous decision-making for air combat with deep reinforcement learning. Appl Intell 53, 29076–29093 (2023). https://doi.org/10.1007/s10489-023-05058-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05058-6

Keywords

Navigation