Evasion guidance for air-breathing hypersonic vehicles against unknown pursuer dynamics

The rapid development of hypersonic vehicles has motivated the related research dramatically while the evasion of the hypersonic vehicles becomes one of the challenging issues. Different from the work based on the premise that the pursuers’ information is fully known, in this paper the evasion guidance for air-breathing hypersonic vehicles (AHVs) against unknown pursuer dynamics is studied. The gradient descent is employed for parameter estimation of the unknown dynamics of the pursuer. The energy-optimized evasion guidance algorithm is further developed by taking the acceleration constraint and energy optimization into consideration. Under the proposed algorithm, the system can deal with the unknown pursuer dynamics effectively and provide more practical guidance for the evasion process. The simulation results show that the proposed method can enable the AHV to achieve successful evasion.


Introduction
The rapid development of hypersonic vehicles has motivated the relate research dramatically [1,2]. In the past decades, the advances of the hypersonic weapons drive the revolution of the abilities and strategies of pursuers. Therefore, the hypersonic vehicles will be confronted with rigorous pursuit threat, and it is significant to carry out the research upon the evasion strategies of the hypersonic vehicles.
As an emerging research area, the research on the evasion strategies of hypersonic vehicles is few. The few existing studies [3][4][5] are all aimed at known waypoints and no-fly zones, utilizing trajectory planning to achieve evasion. But the research on pursuit-evasion (PE) games of hypersonic vehicles is almost blank. However, the evasion guidance laws in PE games of some other offensive weapons (such as ballistic missiles and cruise missiles) have been extensively studied, mainly based on optimal control theory and differential game method.
The optimal control theory has been widely applied in the research of PE games [6][7][8][9][10][11][12]. In Ref. [6], the optimal evasion strategy from proportionally guided missiles is proposed under the assumption of two-dimensional linearized kinematics. The analytical expression of the specific maneuvering moment is provided, demonstrating that the maneuvering switching times are related to the pursuer's proportional guidance coefficient. And the threedimensional optimal evasion strategy in the case of linear kinematics model is investigated in Ref. [7]. Reference [8] puts forward the optimal evasion strategy with a path-angle constraint and against two pursuers. The optimal acceleration command is a bang-bang non-singular one governed by a switching function. And the number of switching points and their location is dependent on the dynamics of the evader and the pursuers, and the constraints of the problem. Considering the physical constraints of angular velocity and angle of attack, Ref. [9] carries out the research of an optimal three-dimensional avoidance & Tian Yan yantian0706@stu.xjtu.edu.cn 1 Northwestern Polytechnical University, Xi'an 710072, China trajectory. Similarly, in [10][11][12], the optimal control theory are used to construct the guidance laws. There are also some evasion strategies derived by the differential game method [13][14][15][16][17][18][19][20]. The linear quadratic PE games with terminal velocity constraints is discussed by differential game method in Ref. [13]. A large family of feedback solutions has been obtained from a simple-looking performance index that contains a weighted combination of the pursuer's control effort, the evader's control effort, the miss distance, and the terminal lateral velocity. In Ref. [14], the problem of capturing a maneuvering target is formulated as a non-cooperative zero-sum differential game. State dependent riccati equation technique is used to obtain the guidance law directly from the nonlinear state equations, without any requirement of linearizing them or making restrictive assumptions. And implementing the proposed guidance law does not require time-to-go estimate. Reference [15] poses the problem of intercepting a maneuvering target at a prespecified impact angle in nonlinear zero-sum differential games framework. A feedback form solution is proposed by extending state-dependent riccati equation method to nonlinear zero-sum differential games. In addition, Ref. [16][17][18] studied multiplayer PE games from other perspectives based on differential games.
In the aforementioned studies [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18], the evasion tasks are all achieved by maximizing the miss distance or the line-of-sight angle rate. However, for the purpose of energy optimization and the subsequent combat missions after the evasion, maximizing the miss distance may cause the undesired energy consumption and deviation from the original course, may not be the best choice. In recent years, by introducing the concept of evasion with specified miss distance (SMD), some evasion strategies about PE games has attempted to evasion with minimum energy consumption [19][20][21]. In Ref. [19][20][21], to obtain the analytical expression of the evasion guidance command, the miss distance constraint is regarded as an equality constraint. As a result, the miss distance can be controlled to a precise value while placing excessive requirements on evader's maneuverability. Therefore, the existing evasion strategies with SMD have limited scope of application.
In view of the above-mentioned discussion, while considering the flight characteristics of the AHVs studied in this paper, the main problems inside the work for evasion guidance are as follows: (i) The unknown pursuer dynamics will affect the results of PE game. (ii) The acceleration constraint must be guaranteed during the whole process of evasion. (iii) In order to better perform the subsequent combat missions after evasion, it is necessary to perform energy optimization during the process of evasion.
Compared with the existing studies (assuming that the enemy's dynamics are known), in this paper the gradient descent method is applied in this article to estimate the unknown dynamics parameters instead of directly assuming as known. At the same time, the flight capabilities of both aircrafts are taken into account during the design process of the guidance law.

Pursuit-Evasion game model
In this paper, it is assumed that both the AHV and the pursuer are within the detection range of each other, and the evasion-pursuit relationship and the definition of relative angles in horizontal plane are shown in Fig. 1.
The AHV and the pursuer are regarded as mass points, and the subscripts E & P indicate the state of the AHV and the pursuer, respectively. Both flight speeds V E & V P are assumed to be constant, and the flight path varies with lateral accelerations a E & a P which are normal to velocity direction. a E;c & a P;c are guidance commands and h E & h P are the trajectory deflection angles, respectively. r denotes the relative distance between the AHV and the pursuer.
The combat scenario is set as that the AHV under headon pursuit situation (HOPS). Therefore, it is assumed that the AHV and the pursuer are both on the X axis at initial time t 0 , with almost the opposite moving directions.
It is assumed [19,20,32,33] that the dynamic equations of both sides can be linearized near the initial line of sight, and can be established: (1) can be regard as constants. According to Fig. 1, the state variable can be chosen as x ¼ y _ y x E x P ½ T , where y ¼ y E À y P and the position deviation normal to the initial line of sight. x E and x P denote the state variable vectors of the AHV and the pursuer, respectively. The relative motion equations can be expressed as: where ; The initial state x at t 0 can be given by To facilitate the analysis of the problem, the time to go is defined as where t f is the terminal engagement time point and V r is the closing speed between the two sides.
In order to simplify expressions, Eq. (2) can be further written as where u & v are the guidance commands a E;c & a P;c of both sides, respectively. The position deviation is expressed by scalar output: where The absolute value yðt f Þ is regarded as the miss distance.
In this paper, it is assumed that the guidance law of the pursuer is known, which is given as On the basis of Eq. (1), in order to simplify the derivation process, the dynamic equation of the pursuer can be expressed as where the coefficients A P & b P in Eq. (9) are unknown to the evader.

Remark 1
The scramjet is used as the power system of the AHV. If the evasion maneuver occurs in the longitudinal plane, the sharp change of attitude angle and angular velocity will affect the working state of scramjet [34,35]. Lateral maneuver can be carried out at a fixed altitude and fixed speed to avoid the impact on the attitude angle and angular velocity due to changes in speed and altitude during evasion. In some existing literature about the evasion of hypersonic vehicles, the modelling and simulations are conducted in 2-dimension [32,33].

Remark 2
In the PE (pursuit-escape) game model, in order to simplify the derivation process, the dynamic equations of aircrafts can be expressed by first-order equations, and the dynamic parameters in the equations are constants. This way of expression can be seen in many studies [19,20,32,33,36,37] about PE (pursuit-escape) games.

Remark 3
Since the values of A P & b P in Eq. (9) depend on the acceleration response speed of the pursuer, in the real scene, the dynamic information of the pursuer is unknown to the evader.

Design goals
M is chosen as the lower bound of miss distance for successful evasion, and the condition for successful evasion can be shown as: The acceleration command for the AHV is limited and can be given by The energy consumption of the AHV during the evasion process is given by In summary, the main problem in this paper can be summarized as Problem 1.

Problem 1
Consider the PE game model given by Eq. (6) with unknown pursuer dynamics given by Eq. (9), the evasion strategy should be derived to minimize the energy consumption given by Eq. (12), while the miss distance subject to Eq. (10) and the control constraint subject to Eq. (11).

The structure of evasion strategy
The design of evasion strategy in this article is divided into two steps. In details, we firstly collected the flight data of the pursuer and utilized the gradient descent method to train a estimation model of the unknown pursuer dynamics. Then the energy-optimized evasion guidance algorithm while considering system constraints is derived on the basis of the estimated parameters.
The framework of the evasion strategy design in this paper is shown as Fig. 2.

The estimation of unknown pursuer dynamics
According to Eq. (9), the pursuer's dynamic coefficients The key to the estimation of unknown coefficients in the equation is how to determine the influence of two or more related inputs on the corresponding output.
In this paper, for the two inputs a P & a Pc in Eq. (9), the influence of a P & a Pc on the output is determined through multiple iterations of a large amount of data. That is, the machine learning is used to obtain the coefficients A P & b P in Eq. (9).
As one of the basic algorithms of machine learning, the gradient descent method has the following advantages: the standard deviation is low, the number of iterations required for convergence is small, and the learning efficiency will not decrease [38,39]. The basic idea of the gradient descent algorithm is to find the minimum value of the loss function according to the direction of the gradient descent of the loss function. Therefore, the gradient descent algorithm is employed to learn the pursuer dynamics in this work.
In this paper, for calculating the unknown pursuer dynamics in Eq.
Equation (13) Calculate the partial derivative of the current loss function Define a as the update step size, representing the learning rate. When the learning rate is determined, the expression of h i is updated to

The evasion guidance command
Considering the fact that AHVs' maximum acceleration is inferior to the pursuer's, it poses a great challenge to the evasion of the AHV. As a result, effective evasion algorithms are required for achieving successful evasion while giving consideration to the acceleration constraints and energy optimization simultaneously. For the purpose of energy optimization, the evasion command u 1 is designed by introducing the SMD. Giving consideration to the acceleration constraints, the evasion command u 2 based on NDG is designed. When the pursuer's capability is strong enough to cause the AHV's evasion command u 1 to saturate, the evasion command is switched from u 1 to u 2 . The composition structure of evasion command is shown as Fig. 3.
To makes the proposed framework more clear, the following explanations are provided. For the goal of the evasion with minimum energy consumption, the guidance command u 1 should be given priority. However, when the pursuer's capability is strong enough that the AHV cannot achieve successful evasion with u 1 , the evasion command u 2 is employed.
Accordingly, the composite evasion command can be given by where t 1 is the moment when u 1 reaches saturation for the first time( u 1 ðt 1 Þ j j! u max ). This section is divided into two parts. Firstly, the derivation of SMD-based evasion command u 1 is presented. Secondly, the derivation of NDG-based evasion command u 2 is presented. With Eq. (17) and the expression of u 1 and u 2 , the composite guidance command u c is generated.

Derivation of SMD-based evasion command in evasion strategy
In this subsection, the process of deriving u 1 is given. Substituting Eq. (8) into Eq. (6) yields To facilitate the derivation process, the zero effort miss distance Z 1 ðtÞ is introduced here [40]. According to Eq. (18), the expression of Z 1 ðtÞ and its derivative can be given by where / 1 ðÁ; ÁÞ is the state transition matrix, corresponding to the state matrix A þ B v F ð Þ . After the transition, the miss distance can be expressed as Z 1 ðt f Þ and the condition for successful evasion of the AHV is given by On this basis, Problem 1 in Sect. 2.2 can be convert to Problem 2.

Problem 2 In line of the PE game model given by
Eq. (20), the evasion strategy should be derived to minimize the energy consumption given by Eq. (12), while the miss distance subject to Eq. (23) and the control constraint subject to Eq. (11).
For solving the Problem2, in consideration of bounded acceleration of the AHV, the performance index is given by subject to Þis a weighted matrix to be designed. By adjusting Q, the miss distance is larger than the boundary value M while u 1 meets the acceleration constraint.
As mentioned in Sect. 2.1, the AHV and the pursuer are both on the X axis at initial time t 0 , with opposite moving directions. As a result, there is For calculating Z 1 ðtÞ and bðtÞ, the vector Y is introduced to simplify computation.
Easy to prove that Y satisfies Equation (28) is the adjoint equation of the model in Eq. (18). According to Eq. (27) and Eqs. (19,21,28) can be expressed as bðtÞ ¼ Y T ðt f À tÞB u ðtÞ ð 30Þ In summary, the energy-optimized evasion command u 1 is given by

Derivation of NDG-based evasion command in evasion strategy
In this subsection, the process of deriving u 2 is given, based on the NDG method [41]. During the derivation, according to the principle of the NDG, the expression of vðtÞ cannot be assumed in advance. Therefore, different from Z 1 ðtÞ, the zero effort miss distance Z 2 ðtÞ is given by with the boundary conditions where / 2 ðÁ; ÁÞ is the state transition matrix, corresponding to the state matrix A.
After the transition, the condition for successful evasion of the AHV is given by On this basis, Problem 1 in Sect. 2.2 can be convert to Problem 3.

Problem 3
In line of the PE game model given by Eq. (33), the evasion strategy should be derived to minimize the energy consumption given by Eq. (12), while the miss distance subject to Eq. (36) and the control constraint subject to Eq. (11). According to Fig. 3, when the NDG-based command u 2 is employed, the AHV try to evade with maximum miss distance, and the performance index is given by subject to The Hamiltonian of the problem is The adjoint equations are _ k 2 ðtÞ ¼ À oH oZ 2 ðtÞ ¼ 0 On the basis of extremum principle, then we have u Ã 2 ðtÞ ¼ arg max Substituting Eq. (40) into Eq. (33) yields In the NDG, both sides should adopt the optimal strategy. On this basis, for the reason that the maximum acceleration of the AHV is weaker than the pursuer's, there is L\0 ð42Þ Integral on both sides of Eq. (41), we obtain For Substituting Eq. (44) into Eq. (40), the NDG-based evasion command u 2 is given by where Z 2 ðtÞ can be calculated by Eq. (32). To sum up, by substituting Eqs. (31 45) into Eq. (17), the composite evasion command u c is given by where u 1 ðtÞ ¼ ÀQbðtÞ Remark 4 Compared with the existing studies [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21], the major innovations of the proposed method are highlighted as follows.
(i) In a lot of previous work [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18], either the acceleration constraint is not taken into account, or the bang-bang control is directly adopted and the energy optimization is ignored. In contrast, the evasion strategy introduces the boundary value M(lower bound of miss distance for successful evasion) and combines the concept of SMD and NDG theory, so as to solve the energy optimization problem under the premise of acceleration constraints. (ii) In some previous work [19][20][21], the evader tries to evasion with SMD(Zðt f Þ ¼ M,Zðt f Þ is the zero effort miss distance). But the equation constraint Zðt f Þ ¼ M has excessive requirements for the evader's maneuverability. In this paper, the conditions of successful evasion are seen as an inequality Zðt f Þ [ M. And the inequality constraint Zðt f Þ [ M does not have excessive requirements for the evader's maneuverability. Therefore, the evasion command in this work is more practical and has wider applicability. (iii) In all the above-mentioned previous work [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21], the evasion strategies are developed on the basis of the perfect information of the pursuers. In this paper, by introducing the gradient descent method, the dynamic information of the pursuer is estimated. With the estimation, a more realistic dynamic model of pursuer is obtained. On the basis of the estimated model, the evasion strategy is designed. As far as authors know, the evasion strategy in this work is the first one which can achieve the successful evasion against the pursuer with unknown dynamics.

Simulation of pursuer dynamics estimation
In the process of pursuer dynamics estimation, the relevant simulation parameters are set as follows: 1. Without prior information, the initial values of coefficients h i are randomly selected as h 0 = 7,h 1 = 2. 2. The learning rate a needs to be carefully selected. If a is too small, the function Jðh 0 ; h 1 Þ will decrease slowly and consume a lot of resources. If a is too large, the function Jðh 0 ; h 1 Þ may not converge. In this paper, a ¼ 0:001.
The estimation results of pursuer dynamics are shown in Figs. 4, 5 and 6 And the analysis is as follows: 1. It can be seen from Fig. 4   However, the oscillation will not affect the accuracy of the final result. It can be seen that after several iterations, the oscillation has completely disappeared. 4. In summary, there isÂ P ¼ h 0 ¼ À0:5077, b P ¼ h 1 ¼ 0:5073. Substituting the value ofÂ P &b P into Eq. (9), then the estimation result of the pursuer dynamic model can be expressed as: _ a P ¼ A P a P þ b P a Pc % À0:5077a P þ 0:5073a Pc

Simulation of evasion guidance of the AHV
In the process of evasion guidance, the relevant assumption and simulation parameters are set as follows: 1. The augmented proportional navigation (APN) guidance law is selected as the guidance law of the pursuer. 2. The basic parameters of both sides and simulation parameters of the engagement scenario in this paper are shown in Tables 1 and 2, respectively.
In Table 1, a is the speed of sound. The value of a P max =g j j is set as 8.0 for the sake of guaranteeing the robustness of evasion strategies. In the existing literature, in order to meet the hit-to-kill condition for the pursuit, the maneuverability of pursuers should be more than three times the AHV's acceleration [42]. For the sake of guaranteeing the robustness of evasion strategies in this paper, the value of a P max =g j jis set as 8.0. In addition, in order to meet the conditions of HOP, the initial value of trajectory deflection angle h E 0 is set between 0; 10 ½ °. In Table 2, M is set as 1 m which is the maximum distance to meet the hit-to-kill condition. r 0 is set as 60; 100 ½ km due to the fact that pursuers can detect and track the evader within 100 km.
In this section, the composite evasion command u c (Eq. 46) is simulated. (The value of u c is based on the estimation result in Sect. 4.1.) The key index parameters yðt f Þ (miss distance) is shown below for evaluating the success of evasion. As a comparison, the simulation results of u 1 (Eq. (31)) and u 2 (Eq. (45)) are also given below. where u c : the composite evasion command. u 1 : the SMD-based evasion command. u 2 : the NDG-based evasion command.
The simulation results are given in Figs. 7, 8, 9, 10, 11 and 12 and Table 3. The analysis is as follows: 1. According to Eq. (10), the criterion for successful evasion is yðt f Þ [ M ¼ 1m.The flight trajectories with different evasion commands are respectively given in Figs. 7, 8, 9, from which we can see that the AHV can achieve successful evasion with u 2 or u c ,but not with u 1 (even if u 1 does not meet the acceleration constraint in Fig. 10. Therefore, it can be concluded that u 2 and u c have stronger evasion capabilities than u 1 . 2. It can be seen from Fig. 11 that different evasion commands have a greater impact on the AHV's trajectory deflection angle at the end of the evasion. Compared with u 1 and u c , the evasion command u 2 produces a large trajectory deflection angle. Considering the AHV's fast flight speed, a large trajectory deflection angle will cause the AHV to deviate from the original route to a great extent. Moreover, combining the energy consumption in Table 3, it can be seen that u 2 can enable the AHV to achieve successful evasion while paying great cost, which is not conducive to the combat mission after evasion. 3. As can be seen in Fig. 12, as the initial trajectory deflection angle h E 0 and initial distance r 0 change, the AHV can still achieve successful evasion with u c . Therefore, the evasion command u c can enable the AHV to complete the evasion for different initial conditions.  In summary, compared to u 1 and u 2 , the proposed command u c is the only guidance command that can enable the AHV to achieve the successful evasion while considering acceleration constraints and energy optimization simultaneously.

Conclusions
In this paper, considering the unknown dynamics of the pursuer, a novel evasion strategy based on the estimation of the pursuer dynamics is proposed for the AHV. The structure of proposed evasion algorithm mainly combines the estimation of unknown pursuer dynamics based on   satisfies the acceleration constraint and energy optimization at the same time was designed. The simulation results show that, with the proposed evasion algorithm, the AHV can achieve successful evasion against the pursuer with unknown dynamics, and take into account the energy optimization and acceleration constraint simultaneously. The future direction for this work can be the strategies in the multiplayer PE games. Specifically, the cooperative strategy of multiple teammates and pursuit-evasion strategies against multiple enemies can be studied with the aid of computation intelligence in the future.

Declarations
Conflicts of Interest The authors declare that there is no conflict of interest regarding the publication of this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.