Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction

Wang, Xinwei; Wang, Yihui; Su, Xichao; Wang, Lei; Lu, Chen; Peng, Haijun; Liu, Jie

doi:10.1007/s10462-023-10620-2

Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction

Published: 28 December 2023

Volume 57, article number 1, (2024)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Xinwei Wang¹,
Yihui Wang¹,
Xichao Su²,
Lei Wang³,
Chen Lu^4,5,6,
Haijun Peng¹ &
…
Jie Liu⁷

1214 Accesses
3 Citations
Explore all metrics

Abstract

Nowadays, various innovative air combat paradigms that rely on unmanned aerial vehicles (UAVs), i.e., UAV swarm and UAV-manned aircraft cooperation, have received great attention worldwide. During the operation, UAVs are expected to perform agile and safe maneuvers according to the dynamic mission requirement and complicated battlefield environment. Deep reinforcement learning (DRL), which is suitable for sequential decision-making process, provides a powerful solution tool for air combat maneuver decision-making (ACMD), and hundreds of related research papers have been published in the last five years. However, as an emerging topic, there lacks a systematic review and tutorial. For this reason, this paper first provides a comprehensive literature review to help people grasp a whole picture of this field. It starts from the DRL itself and then extents to its application in ACMD. And special attentions are given to the design of reward function, which is the core of DRL-based ACMD. Then, a maneuver decision-making method based on one-to-one dogfight scenarios is proposed to enable UAV to win short-range air combat. The model establishment, program design, training methods and performance evaluation are described in detail. And the associated Python codes are available at gitee.com/wangyyhhh, thus enabling a quick-start for researchers to build their own ACMD applications by slight modifications. Finally, limitations of the considered model, as well as the possible future research direction for intelligent air combat, are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autonomous Maneuver Decision of UAV in Air Combat Based on Scenario-Transfer Deep Reinforcement Learning

Enhancing multi-UAV air combat decision making via hierarchical reinforcement learning

Article Open access 23 February 2024

Autonomous maneuver strategy of swarm air combat based on DDPG

Article Open access 04 December 2021

References

Air Combat Evolution Project Overview. (Air Combat Evolution Project Overview. https://www.darpa.mil/program/air-combat-evolution. 2023–May–21
Air combat reinforcement learning. https://github.com/y8107928/air-combat-Reinforcement-Learning. 2023–May–21
Akabari S, Menhaj MB, Nikravesh SK (2005) Fuzzy modeling of offensive maneuvers in an air-to-air combat. computational intelligence. Theory Appl 10:171–184. https://doi.org/10.1007/3-540-31182-3_15
Article Google Scholar
AlMahamid F, Grolinger K (2022) Autonomous unmanned aerial vehicle navigation using reinforcement learning: a systematic review. Eng Appl Artificial Intell. https://doi.org/10.48550/arXiv.2208.12328
Article Google Scholar
Alpdemir MN (2022) Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput Appl 34:5649–5664. https://doi.org/10.1007/s00521-021-06702-3
Article Google Scholar
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA (2017) Deep reinforcement learning: a brief survey. IEEE Signal Process Mag 34:26–38. https://doi.org/10.1109/MSP.2017.2743240
Article Google Scholar
Austin F, Carbone G, Falco M, Hinz H, Lewis M (1987) Automated maneuvering decisions for air-to-air combat. American Institute Aeronaut Astronautics. https://doi.org/10.2514/6.1987-2393
Article Google Scholar
Austin F, Carbone G, Hinz H, Lewis M, Falco M (1991) Game theory for automated maneuvering during air-to-air combat. J Guid Control Dyn. https://doi.org/10.2514/3.20590
Article Google Scholar
Azar AT, Koubaa A, Ali Mohamed N, Ibrahim HA, Ibrahim ZF, Kazim M, Ammar A, Benjdira B, Khamis AM, Hameed IA, Casalino G (2021) Drone deep reinforcement learning: a review. Electronics 10:999. https://doi.org/10.3390/electronics10090999
Article Google Scholar
Bae J, Jung H, Kim S, Kim S, Kim Y-D (2023) Deep reinforcement learning-based air-to-air combat maneuver generation in a realistic environment. IEEE Access 11:26427–26440. https://doi.org/10.1109/ACCESS.2023.3257849
Article Google Scholar
Bayerlein H, Theile M, Caccamo M, Gesbert D (2021) Multi-UAV path planning for wireless data harvesting with deep reinforcement learning. IEEE Open J Commun Soc 2:1171–1187. https://doi.org/10.1109/OJCOMS.2021.3081996
Article Google Scholar
Bergdahl J, Gordillo C, Tollmar K, Gisslén L (2021) Augmenting automated game testing with deep reinforcement learning. ArXiv. https://doi.org/10.48550/arXiv.2103.15819
Article Google Scholar
Berner C, Brockman G, Chan B, Cheung V, Dębiak P, Dennison C, Farhi D, Fischer Q, Hashme S, Hesse C, Józefowicz R, Gray S, Olsson C, Pachocki J, Petrov M, Pinto H, Raiman J, Salimans T, Schlatter J, Zhang S (2019) Dota 2 with large scale deep reinforcement learning. ArXiv. https://doi.org/10.48550/arXiv.1912.06680
Article Google Scholar
Cao X, Wan H, Lin Y, Han S (2019) High-value prioritized experience replay for off-policy reinforcement learning. IEEE Int Conference Tools with Artificial Intell 2019:1510–1514. https://doi.org/10.1109/ICTAI.2019.00215
Article Google Scholar
Cao Y, Kou Y, Li Z, Xu A (2023) Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerospace Eng 2023:1–20. https://doi.org/10.1155/2023/3657814
Article Google Scholar
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y (2020a) Design and implementation of deep neural network-based control for automatic parking maneuver process. IEEE Trans Neural Net Learn Syst 33:1400–1413. https://doi.org/10.1109/TNNLS.2020.3042120
Article Google Scholar
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y, Chen CLP (2020b) Six-DOF spacecraft optimal trajectory planning and real-time attitude control: a deep neural network-based approach. IEEE Trans Neural Net Learn Syst 31:5005–5013. https://doi.org/10.1109/TNNLS.2019.2955400
Article Google Scholar
Chai R, Tsourdos A, Savvaris A, Xia Y, Chai S (2020c) Real-time reentry trajectory planning of hypersonic vehicles: a two-step strategy incorporating fuzzy multiobjective transcription and deep neural network. IEEE Trans Industr Electron 67:6904–6915. https://doi.org/10.1109/TIE.2019.2939934
Article Google Scholar
Chai R, Tsourdos A, Savvaris A, Chai S (2021a) Review of advanced guidance and control algorithms for space/aerospace vehicles. Prog Aerosp Sci. https://doi.org/10.1016/j.paerosci.2021.100696
Article Google Scholar
Chai R, Tsourdos A, Savvaris A, Chai S, Xia Y (2021b) Solving constrained trajectory planning problems using biased particle swarm optimization. IEEE Trans Aerosp Electron Syst 57:1685–1701. https://doi.org/10.1109/TAES.2021.3050645
Article Google Scholar
Chai R, Tsourdos A, Gao H, Chai S, Xia Y (2022a) Attitude tracking control for reentry vehicles using centralised robust model predictive control. Automatica. https://doi.org/10.1016/j.automatica.2022.110561
Article Google Scholar
Chai R, Tsourdos A, Gao H, Xia Y, Chai S (2022b) Dual-loop tube-based robust model predictive attitude tracking control for spacecraft with system constraints and additive disturbances. IEEE Trans Industr Electron 69:4022–4033. https://doi.org/10.1109/TIE.2021.3076729
Article Google Scholar
Chai R, Tsourdos A, Chai S, Xia Y, Savvaris A (2022c) Multi-phase overtaking maneuver planning for autonomous ground vehicles via a desensitized trajectory optimization approach. IEEE Trans Industr Inf 51:4035–4049. https://doi.org/10.1109/TII.2022.3168434
Article Google Scholar
Chai R, Liu D, Liu T, Tsourdos A, Xia Y, Chai S (2023) Deep learning-based trajectory planning and control for autonomous ground vehicle parking maneuver. IEEE Trans Autom Sci Eng 20:1633–1647. https://doi.org/10.1109/TASE.2022.3183610
Article Google Scholar
Chen M, Wu Q, Jiang C (2008) A modified ant optimization algorithm for path planning of UCAV. Appl Soft Comput 8:1712–1718. https://doi.org/10.1016/j.asoc.2007.10.011
Article Google Scholar
Crumpacker JB, Robbins MJ, Jenkins PR (2022) An approximate dynamic programming approach for solving an air combat maneuvering problem. Expert Syst Appl 203:117448. https://doi.org/10.1016/j.eswa.2022.117448
Article Google Scholar
Cruz J, Simaan M, Gacic A, Jiang H, Letelliier B, Li M, Liu Y (2001) Game-theoretic modeling and control of a military air operation. IEEE Trans Aerosp Electron Syst 37:1393–1405. https://doi.org/10.1109/7.976974
Article Google Scholar
Cui K, Han W, Liu Y, Wang X, Su X, Liu J, Shao X (2021) Model predictive control for automatic carrier landing with time delay. Int J Aerospace Eng 2021:8613498. https://doi.org/10.1155/2021/8613498
Article Google Scholar
DARPA AlphaDogfight program overview. (DARPA AlphaDogfight program overview. https://en.wikipedia.org/wiki/DARPA_AlphaDogfight. 2023–May–21
DARPA's Gremlins Program. (DARPA's Gremlins Program. https://www.darpa.mil/program/gremlins. 2023–May–21
Dassault nEUROn. https://zh.wikipedia.org/zh-cn. 2023–Aug–08
Din A, Mir I, Faiza SA (2022) Development of reinforced learning based non-linear controller for unmanned aerial vehicle. J Ambient Intell Humaniz Comput 14:4005–4022. https://doi.org/10.1007/s12652-022-04467-8
Article Google Scholar
Din A, Mir I, Gul F, Mir S (2023) Non-linear intelligent control design for unconventional unmanned aerial vehicle. American Institute Aeronautics Astronautics. https://doi.org/10.2514/6.2023-1071
Article Google Scholar
Din A, Akhtar S, Maqsood A, Habib M, Mir I (2023b) Modified model free dynamic programming: an augmented approach for unmanned aerial vehicle. Appl Intell 53:3048–3068. https://doi.org/10.1007/s10489-022-03510-7
Article Google Scholar
Dong Y, Ai J, Liu J (2019) Guidance and control for own aircraft in the autonomous air combat: a historical review and future prospects. J Aerosp Eng 233:5943–5991. https://doi.org/10.1177/0954410019889447
Article Google Scholar
European Horizons Program. (European Horizons Program. https://irp.fas.org/program/collect/uav_roadmap2005.pdf. 2023–May–21
Evers L, Dollevoet T, Barros AI, Monsuur H (2014) Robust UAV mission planning. Ann Oper Res 222:293–315. https://doi.org/10.1007/s10479-012-1261-8
Article MathSciNet Google Scholar
Fan Z, Xu Y, Kang Y, Luo D (2022) Air combat maneuver decision method based on A3C deep reinforcement learning. MACHINES 10:1033. https://doi.org/10.3390/machines10111033
Article Google Scholar
Fu L, Wang Q, Xu J, Zhou Y, Zhu K (2012) Target assignment and sorting for multi-target attack in multi-aircraft coordinated based on RBF. 2012 Chinese control and decision conference. https://doi.org/10.1109/CCDC.2012.6244311
Fu L, Xie F, Wang D, Meng G (2014) The overview for UAV air-combat decision method. Chinese Control and Decision Conference 2014:3380–3384. https://doi.org/10.1109/CCDC.2014.6852760
Article Google Scholar
Future combat air system project overview. https://en.wikipedia.org/wiki/Future_Combat_Air_System#Contractors. 2023–May–21
Gao X, Wang L, Yu X, Su X, Ding Y, Lu C, Peng H, Wang X (2023) Conditional probability based multi-objective cooperative task assignment for heterogeneous UAVs. Eng Appl Artificial Intell. https://doi.org/10.1016/j.engappai.2023.106404
Article Google Scholar
Grondman I, Busoniu L, Lopes G, Babuska R (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst 42:1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Article Google Scholar
Guo H, Hou M, Zhang Q, Tang C (2017) UCAV robust maneuver decision based on statistics principle. Binggong Xuebao/acta Armamentarii 38:160–167. https://doi.org/10.3969/j.issn.1000-1093.2017.01.021
Article Google Scholar
Guo T, Jiang N, Li B, Zhu X, Wang Y, Du W (2021) UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chin J Aeronaut 34:479–489. https://doi.org/10.1016/j.cja.2020.05.011
Article Google Scholar
Han Y, Piao H, Hou Y, Sun Y, Sun Z, Zhou D, Yang S, Peng X, Fan S (2022) Deep relationship graph reinforcement learning for multi-aircraft air combat. International Joint Conference on Neural Net 2022:1–8. https://doi.org/10.1109/IJCNN55064.2022.9892208
Article Google Scholar
Hou Z, Fei J, Deng Y, Xu J (2021) Data-Efficient hierarchical reinforcement learning for robotic assembly control applications. IEEE Trans Industr Electron 11:11565–11575. https://doi.org/10.1109/TIE.2020.3038072
Article Google Scholar
Hu X, Luo P, Zhang X, Wang J (2018) Improved ant colony optimization for weapon-target assignment. Math Prob Eng. https://doi.org/10.1155/2018/6481635
Article Google Scholar
Hu D, Yang R, Zuo J, Zhang Z, Wu J, Wang Y (2021) Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat. IEEE Access 9:32282–32297. https://doi.org/10.1109/ACCESS.2021.3060426
Article Google Scholar
Hu J, Wang L, Hu T, Guo C, Wang Y (2022) Autonomous maneuver decision making of dual-uav cooperative air combat based on deep reinforcement learning. Electronics 11:467. https://doi.org/10.3390/electronics11030467
Article Google Scholar
Hu Z (2020) Research on tactical decision-making of ucav based on deep reinforcement learning. Master of engineering, Harbin Institute of Technology, Shenzhen
Huang C, Dong K, Huang H, Tang S (2018) Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 29:86–97. https://doi.org/10.21629/JSEE.2018.01.09
Article Google Scholar
Huang C, Wei Z, Yang Y, Ku S, Zhang H (2019) Knowledge acquisition for the air combat based on GWO. In: 2019 International conference on artificial intelligence technologies and applications vol 1325, pp 12–78. https://doi.org/10.1088/1742-6596/1325/1/012078
Jang B, Kim M, Harerimana G, Kim JW (2019) Q-learning algorithms: a comprehensive classification and applications. IEEE Access 7:133653–133667. https://doi.org/10.1109/ACCESS.2019.2941229
Article Google Scholar
Jiang N, Jin S, Zhang C (2019) Hierarchical automatic curriculum learning: Converting a sparse reward navigation task into dense reward. Neurocomputing 360:265–278. https://doi.org/10.1016/j.neucom.2019.06.024
Article Google Scholar
Jiang Y, Yu J, Li Q (2022) A novel decision-making algorithm for beyond visual range air combat based on deep reinforcement learning. Youth Academic Annual Conference of Chinese Association of Automation 2022:516–521. https://doi.org/10.1109/YAC57282.2022.10023870
Article Google Scholar
Jing X, Hou M, Wu G, Ma Z, Tao Z (2022) Research on maneuvering decision algorithm based on improved deep deterministic policy gradient. IEEE Access 10:92426–92445. https://doi.org/10.1109/ACCESS.2022.3202918
Article Google Scholar
Kaneshige J, Krishnakumar K (2007) Artificial immune system approach for air combat maneuvering. Intell Comput. https://doi.org/10.1117/12718892
Article Google Scholar
Kim C, Ji C, Kim BS (2020) Development of a control law to improve the handling qualities for short-range air-to-air combat maneuvers. Adv Mech Eng 12:207–226. https://doi.org/10.1177/1687814020936790
Article Google Scholar
Kober J, Bagnell J, Peters J (2013) Reinforcement learning in robotics: a survey. Int J Robot Res 32:1238–1274. https://doi.org/10.1177/0278364913495721
Article Google Scholar
Kong W, Zhou D, Zhang K, Yang Z (2020) Air combat autonomous maneuver decision for one-on- one within visual range engagement base on robust multi-agent reinforcement learning. IEEE Int Conference Control Automation 2020:506–512. https://doi.org/10.1109/ICCA51439.2020.9264567
Article Google Scholar
Kong W, Zhou D, Du Y, Zhou Y, Zhao Y (2022a) Reinforcement Learning for Multi-aircraft autonomous air combat in multi-sensor UCAV platform. IEEE Sens J. https://doi.org/10.1109/JSEN.2022.3220324
Article Google Scholar
Kong W, Zhou D, Du Y, Zhou Y, Zhao YY (2022b) Hierarchical multi-agent reinforcement learning for multi-aircraft close-range air combat. IET Control Theory Appl. https://doi.org/10.1049/cth2.12413
Article Google Scholar
Kumar M, Agrawal K, Dutt V (2019) Modeling Decisions in Collective Risk Social Dilemma Games for Climate Change Using Reinforcement Learning. 2019 IEEE conference on cognitive and computational aspects of situation management. https://doi.org/10.1109/COGSIMA.2019.8724273.
Lange S, Riedmiller M (2010) Deep auto-encoder neural networks in reinforcement learning. 2010 International Joint Conference on Neural Networks. https://doi.org/10.1109/IJCNN.2010.5596468
Li B, Wu Y (2020) Path planning for uav ground target tracking via deep reinforcement learning. IEEE Access 8:29064–29074. https://doi.org/10.1109/ACCESS.2020.2971780
Article Google Scholar
Li B, Gan Z, Chen D, Sergey D (2020a) UAV maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sensing 12:3789. https://doi.org/10.3390/rs12223789
Article Google Scholar
Li Y, Han W, Wang Y (2020b) Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system. IEEE Access 8:67887–67898. https://doi.org/10.1109/ACCESS.2020.2985576
Article Google Scholar
Li B, Bai S, Gan Z, Liang S, Evgeny N, Yao S (2022a) Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning. CAAI Trans Intell Technol 8:64–81. https://doi.org/10.1049/cit2.12109
Article Google Scholar
Li Y, Shi J, Jiang W, Zhang W, Lyu Y (2022b) Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm. Def Technol 18:1697–1714. https://doi.org/10.1016/j.dt.2021.09.014
Article Google Scholar
Li B, Bai S, Liang S, Ma R, Neretin E, Huang J (2023) Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm. CAAI Trans Intell Technol. https://doi.org/10.1049/cit2.12195
Article Google Scholar
Li S, Wu Q, Du B, Wang Y, Chen M (2023b) Autonomous maneuver decision-making of ucav with incomplete information in human-computer gaming. Drones 7:157. https://doi.org/10.3390/drones7030157
Article Google Scholar
Liu X, Yin Y, Su Y, Ming R (2022) A Multi-UCAV cooperative decision-making method based on an MAPPO algorithm for beyond-visual-range air combat. Aerospace 9:563. https://doi.org/10.3390/aerospace9100563
Article Google Scholar
Luong NC, Hoang DT, Gong S, Niyato D, Wang P, Liang Y, Kim DI (2019) Applications of deep reinforcement learning in communications and networking: a survey. IEEE Commun Surveys Tutorials 21:3133–3174. https://doi.org/10.1109/COMST.2019.2916583
Article Google Scholar
Lyu L, Shen Y, Zhang S (2022) The advance of reinforcement learning and deep reinforcement learning. 2022 IEEE International conference on electrical engineering p 644–648. https://doi.org/10.1109/EEBDA53927.2022.9744760
Morales EF, Murrieta-Cid R, Becerra I, Esquivel-Basaldua MA (2021) A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning. Intel Serv Robot 14:773–805. https://doi.org/10.1007/s11370-021-00398-z
Article Google Scholar
MQ-9. https://zh.wikipedia.org/zh-cn/MQ-9. 2023–Aug–08
Nguyen TT, Nguyen ND, Nahavandi S (2020) Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybernet 50:3826–3839. https://doi.org/10.1109/TCYB.2020.2977374
Article Google Scholar
OFFensive Swarm-Enabled Tactics (OFFSET) program. https://apps.dtic.mil/sti/pdfs/AD1125864.pdf. 2023–May–21
Özbek M, Yıldırım S, Aksoy M, Kernin E, Koyuncu E (2022) Harfang3D dog-fight sandbox: a reinforcement learning research platform for the customized control tasks of fighter aircrafts. ArXiv. https://doi.org/10.48550/arXiv.2210.07282
Article Google Scholar
Parisi S, Tateo D, Hensel M, Eramo CD, Peters J, Pajarinen J (2022) Long-term visitation value for deep exploration in sparse-reward reinforcement learning. Algorithms 15:81. https://doi.org/10.3390/a15030081
Article Google Scholar
Park H, Lee B, Tahk M, Yoo D (2016) Differential game based air combat maneuver generation using scoring function matrix. Int J Aeronautical Space Sci 17:204–213. https://doi.org/10.5139/IJASS.2016.17.2.204
Article Google Scholar
Piao H, Sun Z, Meng G, Chen H, Qu B, Lang K, Sun Y, Yang S, Peng X (2020) Beyond-visual-range air combat tactics auto-generation by reinforcement learning. Int Joint Conference on Neural Net 2020:1–8. https://doi.org/10.1109/IJCNN48605.2020.9207088
Article Google Scholar
Piao H, Han Y, Chen H, Peng X, Fan S, Sun Y, Liang C, Liu Z, Sun Z, Zhou D (2023) Complex relationship graph abstraction for autonomous air combat collaboration: A learning and expert knowledge hybrid approach. Expert Syst Appl 215:119285. https://doi.org/10.1016/j.eswa.2022.119285
Article Google Scholar
Pope AP, Ide JS, Micovic D, Diaz H, Rosenbluth D, Ritholtz L, Twedt JC, Walker TT, Alcedo K, Javorsek D (2021) Hierarchical reinforcement learning for air-to-air combat. International Conference Unmanned Aircraft Syst. https://doi.org/10.48550/arXiv.2105.00990
Article Google Scholar
Poropudas J, Virtanen K (2010) Game-theoretic validation and analysis of air combat simulation models. IEEE Trans Syst, Man, Cybernet - Part a: Syst Humans 40:1057–1070. https://doi.org/10.1109/TSMCA.2010.2044997
Article Google Scholar
Russia National Weapons Program. https://www.foi.se/rest-api/report/FOI-R--4239--SE. 2023–May–21
Qie H, Shi D, Shen T, Xu X, Li Y, Wang L (2019) Joint optimization of multi-UAV target assignment and path planning based on multi-agent reinforcement learning. IEEE Access 7:146264–146272. https://doi.org/10.1109/ACCESS.2019.2943253
Article Google Scholar
Qiu X, Yao Z, Tan F, Zhu Z, Lu J (2020) One-to-one air-combat maneuver strategy based on improved TD3 algorithm. Chinese Automation Congress 2020:5719–5725. https://doi.org/10.1109/CAC51589.2020.9327310
Article Google Scholar
Rardin R, Uzsoy R (2001) Experimental evaluation of heuristic optimization algorithms: a tutorial. J Heurist 7:261–304. https://doi.org/10.1023/A:1011319115230
Article Google Scholar
RL air combat. https://github.com/Linaom1214/RL_air-combat. 2023–May–21
Rodriguez-Ramos A, Sampedro C, Bavle H, de la Puente P, Campoy P (2019) A deep reinforcement learning strategy for UAV autonomous landing on a moving platform. J Intell Rob Syst 93:351–366. https://doi.org/10.1007/s10846-018-0891-8
Article Google Scholar
Ruan W, Duan H, Deng Y (2022) Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements. IEEE/CAA J Automatica Sinica 9:1639–1657. https://doi.org/10.1109/JAS.2022.105803
Article Google Scholar
Russia is testing its own 'loyal wingman' drone for its Su-57 stealth fighter. https://tass.com/defense/1012351. 2023–May–21
Sarkar N, Gul S (2023) Artificial intelligence-based autonomous UAV networks: a survey. Drones 7:322. https://doi.org/10.3390/drones7050322
Article Google Scholar
Silver D, Huang A, Maddison C, Guez A, Sifre L, Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484–489. https://doi.org/10.1038/nature16961
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, Driessche G, Graepel T, Hassabis D (2017) Mastering the game of go without human knowledge. Nature 550:354–359. https://doi.org/10.1038/nature24270
Article Google Scholar
Smith R, Dike B (1995) Learning novel fighter combat maneuver rules via genetic algorithms. Int J Expert Syst 8:247–276
Google Scholar
Subrahmanian VS (1994) Amalgamating knowledge bases. Association for Comput Machinery. https://doi.org/10.1145/176567.176571
Article Google Scholar
Sun Y, Wang X, Wang T, Gao P (2020) Modeling of air-to-air missile dynamic attack zone based on bayesian networks. Chinese Automation Congress 2020:5596–5601. https://doi.org/10.1109/CAC51589.2020.9327613
Article Google Scholar
Tasbas S, Aydinli S (2021) 2-D air combat maneuver decision using reinforcement learning. Int Conference Eng Emerg Technol 2021:1–6. https://doi.org/10.1109/ICEET53442.2021.9659753
Article Google Scholar
Vázquez-Canteli JR, Nagy Z (2019) Reinforcement learning for demand response: a review of algorithms and modeling techniques. Appl Energy 235:1072–1089. https://doi.org/10.1016/j.apenergy.2018.11.002
Article Google Scholar
Vien NA, Yu H, Chung T (2011) Hessian matrix distribution for Bayesian policy gradient reinforcement learning. Inf Sci 181:1671–1685. https://doi.org/10.1016/j.ins.2011.01.001
Article MathSciNet Google Scholar
Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou JP, Jaderberg M, Vezhnevets AS, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine TL, Gulcehre C, Wang Z, Pfaff T, Wu Y, Ring R, Yogatama D, Wünsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C, Silver D (2019) Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575:350–354. https://doi.org/10.1038/s41586-019-1724-z
Article Google Scholar
Wang L, Wei H (2022) Research on autonomous decision-making of UCAV based on deep reinforcement learning. Inform Commun Technol Conference 2022:122–126. https://doi.org/10.1109/ICTC55111.2022.9778652
Article Google Scholar
Wang J, Zhao X, Zhang Y, Wang B (2011) Cooperative air-defense system of system model based on immune multi-agent for surface warship formation. Int Conference Awareness Sci Technol 2011:256–260. https://doi.org/10.1109/ICAwST.2011.6163151
Article Google Scholar
Wang Y, Li TS, Lin C (2013) Backward Q-learning: the combination of Sarsa algorithm and Q-learning. Eng Appl Artif Intell 26:2184–2193. https://doi.org/10.1016/j.engappai.2013.06.016
Article Google Scholar
Wang Y, Huang C, Tang C (2016) Research on unmanned combat aerial vehicle robust maneuvering decision under incomplete target information. Adv Mech Eng. https://doi.org/10.1177/1687814016674384
Article Google Scholar
Wang C, Wang J, Wang J, Zhang X (2020a) Deep reinforcement-learning-based autonomous UAV navigation with sparse rewards. IEEE Internet Things J 7:6180–6190. https://doi.org/10.1109/JIOT.2020.2973193
Article Google Scholar
Wang M, Wang L, Yue T, Liu H (2020b) Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp Sci Technol 96:105534. https://doi.org/10.1016/j.ast.2019.105534
Article Google Scholar
Wang Z, Li H, Wu H, Wu Z (2020c) Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learning algorithm. Math Probl Eng 2020:1–17. https://doi.org/10.1155/2020/7180639
Article Google Scholar
Wang L, Wang K, Pan C, Xu W, Aslam N, Hanzo L (2021a) Multi-agent deep reinforcement learning-based trajectory planning for multi-uav assisted mobile edge computing. IEEE Trans Commun 7:73–84. https://doi.org/10.1109/TCCN.2020.3027695
Article Google Scholar
Wang X, Chen Y, Zhu W (2021b) A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell 44:4555–4576. https://doi.org/10.1109/TPAMI.2021.3069908
Article Google Scholar
Wang X, Peng H, Liu J, Dong X, Zhao X, Lu C (2022) Optimal control based coordinated taxiing path planning and tracking for multiple carrier aircraft on flight deck. Def Technol 18:238–248. https://doi.org/10.1016/j.dt.2020.11.013
Article Google Scholar
Wang Y, Ren T, Fan Z (2022b) Autonomous maneuver decision of uav based on deep reinforcement learning: comparison of DQN and DDPG. Chinese Control and Decision Conference 2022:4857–4860. https://doi.org/10.1109/CCDC55256.2022.10033863
Article Google Scholar
Wang X, Li B, Su X, Peng H, Wang L, Lu C, Wang C (2023) Autonomous dispatch trajectory planning on flight deck: a search-resampling-optimization framework. Eng Appl Artificial Intell 119:105792. https://doi.org/10.1016/j.engappai.2022.105792
Article Google Scholar
Wang Y, Jiang T, Li Y, Zhang Z (2021) A hierarchical reinforcement learning method on multi UCAV air combat. Society of photo-optical instrumentation engineers 119330K–119337K. https://doi.org/10.1117/12.2615268
Wu J, He H, Peng J, Li Y, Li Z (2018) Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus. Appl Energy 222:799–811. https://doi.org/10.1016/j.apenergy.2018.03.104
Article Google Scholar
Wu L, Wang C, Zhang P, Wei C (2022) Deep reinforcement learning with corrective feedback for autonomous uav landing on a mobile platform. Drones 6:238. https://doi.org/10.3390/drones6090238
Article Google Scholar
Wu Y, Lei Y, Z Z, Wang Y (2022) Decision modeling and simulation of fighter air-to-ground combat based on reinforcement learning: association for computing machinery 8:102–109. https://doi.org/10.1145/3529446.3529463
Xi Z, Xu A, Kou Y, Li Z, Yang A (2020) Air combat maneuver trajectory prediction model of target based on chaotic theory and IGA-VNN. Math Probl Eng 2020:1–23. https://doi.org/10.1155/2020/8325498
Article Google Scholar
Xi Z, An X, Kou Y, Li Z, Yang A (2021) Target maneuver trajectory prediction based on RBF neural network optimized by hybrid algorithm. J Syst Eng Electron 32:498–516. https://doi.org/10.23919/JSEE.2021.000042
Article Google Scholar
Xi Z, Yu Y, Kou Y, Li Z, Li Y (2023) An online ensemble semi-supervised classification framework for air combat target maneuver recognition. Chinese J Aeronaut 36:340–360. https://doi.org/10.1016/j.cja.2023.04.020
Article Google Scholar
Xie J, Peng X, Wang H, Niu W, Zheng X (2020) UAV autonomous tracking and landing based on deep reinforcement learning strategy. Sensors 20:5630. https://doi.org/10.3390/s20195630
Article Google Scholar
Xu Z, Cao L, Chen X, Li C, Zhang Y, Lai J (2018) Deep reinforcement learning with sarsa and q-learning: a hybrid approach. IEICE Trans Inform Syst. https://doi.org/10.1587/transinf.2017EDP7278
Article Google Scholar
Xu D, Guo Y, Yu Z, Wang Z, Lan R, Zhao R, Xie X, Long H (2023) PPO-Exp: keeping fixed-wing UAV formation with deep reinforcement learning. Drones 7:28. https://doi.org/10.3390/drones7010028
Article Google Scholar
Xuan Y, Huang C, Li W (2011) Air combat situation assessment by gray fuzzy bayesian network. Appl Mech Mater 69:114–119. https://doi.org/10.4028/www.scientific.net/AMM.69.114
Article Google Scholar
Yan J, Daobo W, Tingting B, Zongyuan Y (2022) Multi-UAV objective assignment using hungarian fusion genetic algorithm. IEEE Access 10:43013–43021. https://doi.org/10.1109/ACCESS.2022.3168359
Article Google Scholar
Yang Q, Zhang J, Shi G, Hu J, Wu Y (2020) Maneuver decision of uav in short-range air combat based on deep reinforcement learning. IEEE Access 8:363–378. https://doi.org/10.1109/ACCESS.2019.2961426
Article Google Scholar
Yang K, Dong W, Cai M, Jia S, Liu R (2022) UCAV air combat maneuver decisions based on a proximal policy optimization algorithm with situation reward shaping. Electronics 11:2602. https://doi.org/10.3390/electronics11162602
Article Google Scholar
Yoo J, Seong H, Shim D, Bae J, Kim Y (2022) Deep reinforcement learning-based intelligent agent for autonomous air combat. IEEE/AIAA Digital Avionics Syst Conference 2022:1–9. https://doi.org/10.1109/DASC55683.2022.9925811
Article Google Scholar
You S, Diao M, Gao L, Zhang F, Wang H (2020) Target tracking strategy using deep deterministic policy gradient. Appl Soft Comput. https://doi.org/10.1016/j.asoc.2020.106490
Article Google Scholar
Yu X, Gao X, Wang L, Wang X, Ding Y, Lu C, Zhang S (2022) Cooperative multi-UAV task assignment in cross-regional joint operations considering ammunition inventory. Drones. https://doi.org/10.3390/drones6030077
Article Google Scholar
Yue L, Yang R, Zhang Y, Yu L, Wang Z (2022) Deep reinforcement learning for uav intelligent mission planning. Complexity 2022:1–13. https://doi.org/10.1155/2022/3551508
Article Google Scholar
Zhang L, Yuan Z, Liu W (2012) The design of target assignment model based on the reverse mutation ant colony algorithm. Procedia Eng 29:1554–1558. https://doi.org/10.1016/j.proeng.2012.01.172
Article Google Scholar
Zhang J, Yang Q, Shi G, Lu Y, Wu Y (2021) UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J syst Eng Electron 32:1421–1438. https://doi.org/10.23919/JSEE.2021.000121
Article Google Scholar
Zhang H, Zhou H, Wei Y, Huang C (2022) Autonomous maneuver decision-making method based on reinforcement learning and monte carlo tree search. Front Neurorobot. https://doi.org/10.3389/fnbot.2022.996412
Article Google Scholar
Zhang H, Wei Y, Zhou H, Huang C (2022b) Maneuver decision-making for autonomous air combat based on FRE-PPO. Appl Sci 12:10230. https://doi.org/10.3390/app122010230
Article Google Scholar
Zhao K, Huang C (2018) Air combat situation assessment for UAV based on improved decision tree. Chinese Control and Decision Conference 2018:1772–1776. https://doi.org/10.1109/CCDC.2018.8407414
Article Google Scholar
Zhao T, Hachiya H, Niu G, Sugiyama M (2012) Analysis and improvement of policy gradient estimation. Neural Netw 26:118–129. https://doi.org/10.1016/j.neunet.2011.09.005
Article Google Scholar
Zhao W, Chu H, Miao X, Guo L, Shen H, Zhu C, Zhang F, Liang D (2020a) Research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing UAV obstacle avoidance. Sensors 20:4546. https://doi.org/10.3390/s20164546
Article Google Scholar
Zhao Y, Chen Y, Zhen Z, Jiang J (2020b) Multi-weapon multi-target assignment based on hybrid genetic algorithm in uncertain environment. Int J Adv Rob Syst. https://doi.org/10.1177/1729881420905922
Article Google Scholar
Zhao W, Meng Z, Wang K, Zhang J, Lu S (2021) Hierarchical active tracking control for UAVs via deep reinforcement learning. Appl Sci 11:10595. https://doi.org/10.3390/app112210595
Article Google Scholar
Zhao X, Yang R, Zhang Y, Yan M, Yue L (2022) Deep reinforcement learning for intelligent dual-uav reconnaissance mission planning. Electronics 11:2031. https://doi.org/10.3390/electronics11132031
Article Google Scholar
Zheng Z, Duan H (2023) UAV maneuver decision-making via deep reinforcement learning for short-range air combat. Intell Robot 3:76–94. https://doi.org/10.20517/ir.2023.04
Article Google Scholar
Zhong L, Tong M, Zhong W, Zhagn S (2007) Sequential maneuvering decisions based on multi-stage influence diagram in air combat. J Syst Eng Electron 18:551–555. https://doi.org/10.1016/S1004-4132(07)60128-5
Article Google Scholar
Zhong Y, Yao P, Sun Y, Yang J (2016) Cooperative task allocation method of MCAV/UCAV formation. Math Probl Eng 2016:1–9. https://doi.org/10.1155/2016/6051046
Article MathSciNet Google Scholar
Zhou H, Zhang X, Zhang Z, Wu F, Liu J, Chen Y (2022) Reinforcement learning technology for air combat confrontation of unmanned aerial vehicle. Soc Photo-Optical Instrument Eng. https://doi.org/10.1117/122631651
Article Google Scholar
Zhou K, Wei R, Xu Z, Zhang Q (2018) A brain like air combat learning system inspired by human learning mechanism. In: 2018 IEEE CSAA guidance navigation and control conference. https://doi.org/10.1109/GNCC42960.2018.9018975
Zhu J, Song Y, Jiang D, Song H (2018) A new deep-Q-learning-based transmission scheduling mechanism for the cognitive internet of things. IEEE Int Things 5:2375–2385. https://doi.org/10.1109/JIOT.2017.2759728
Article Google Scholar
Zhu B, Bedeer E, Nguyen HH, Barton R, Henry J (2021) UAV trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans Veh Technol 70:9540–9554. https://doi.org/10.1109/TVT.2021.3102161
Article Google Scholar

Download references

Acknowledgements

The authors are thankful to the financial support of the National Key Research and Development Plan (2021YFB3302501); the National Natural Science Foundation of China (12102077, 12161076, U2241263); the Fundamental Research Funds for the Central Universities (DUT22RC(3)010, DUT22LAB305, DUT22QN223, DUT22ZD211).

Author information

Authors and Affiliations

Department of Engineering Mechanics, State Key Laboratory of Structural Analysis, Optimization and CAE Software for Industrial Equipment, Dalian University of Technology, Dalian, 116024, China
Xinwei Wang, Yihui Wang & Haijun Peng
Naval Aviation University, Yantai, 264001, China
Xichao Su
School of Mathematical Science, Dalian University of Technology, Dalian, 116024, China
Lei Wang
Science and Technology on Reliability and Environmental Engineering Laboratory, Beijing, 100191, China
Chen Lu
School of Reliability and Systems Engineering, Beihang University, Beijing, 100191, China
Chen Lu
Institute of Reliability Engineering, Beihang University, Beijing, 100191, China
Chen Lu
War Research Institute, Academy of Military Sciences, Beijing, 100850, China
Jie Liu

Authors

Xinwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yihui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xichao Su
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Lu
View author publications
You can also search for this author in PubMed Google Scholar
Haijun Peng
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XW: Conceptualization, Funding acquisition, Writing – original draft, Writing – review & editing, Supervision; YW: Writing – original draft, Visualization, Validation; XS: Project administration, Supervision; LW: Funding acquisition, Writing – review & editing; CL: Project administration; HP: Funding acquisition, Supervision; JL: Writing – review & editing.

Corresponding authors

Correspondence to Xinwei Wang or Xichao Su.

Ethics declarations

Competing interests

The authors declare no competing interests.

Code availability

The code is available at the URL gitee.com/wangyyhhh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, X., Wang, Y., Su, X. et al. Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction. Artif Intell Rev 57, 1 (2024). https://doi.org/10.1007/s10462-023-10620-2

Download citation

Accepted: 01 October 2023
Published: 28 December 2023
DOI: https://doi.org/10.1007/s10462-023-10620-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction

Abstract

Access this article

Similar content being viewed by others

Autonomous Maneuver Decision of UAV in Air Combat Based on Scenario-Transfer Deep Reinforcement Learning

Enhancing multi-UAV air combat decision making via hierarchical reinforcement learning

Autonomous maneuver strategy of swarm air combat based on DDPG

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction

Abstract

Access this article

Similar content being viewed by others

Autonomous Maneuver Decision of UAV in Air Combat Based on Scenario-Transfer Deep Reinforcement Learning

Enhancing multi-UAV air combat decision making via hierarchical reinforcement learning

Autonomous maneuver strategy of swarm air combat based on DDPG

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation