1 Introduction

The rapid development of hypersonic vehicles has motivated the relate research dramatically [1, 2]. In the past decades, the advances of the hypersonic weapons drive the revolution of the abilities and strategies of pursuers. Therefore, the hypersonic vehicles will be confronted with rigorous pursuit threat, and it is significant to carry out the research upon the evasion strategies of the hypersonic vehicles.

As an emerging research area, the research on the evasion strategies of hypersonic vehicles is few. The few existing studies [3,4,5] are all aimed at known waypoints and no-fly zones, utilizing trajectory planning to achieve evasion. But the research on pursuit-evasion (PE) games of hypersonic vehicles is almost blank. However, the evasion guidance laws in PE games of some other offensive weapons (such as ballistic missiles and cruise missiles) have been extensively studied, mainly based on optimal control theory and differential game method.

The optimal control theory has been widely applied in the research of PE games [6,7,8,9,10,11,12]. In Ref. [6], the optimal evasion strategy from proportionally guided missiles is proposed under the assumption of two-dimensional linearized kinematics. The analytical expression of the specific maneuvering moment is provided, demonstrating that the maneuvering switching times are related to the pursuer's proportional guidance coefficient. And the three-dimensional optimal evasion strategy in the case of linear kinematics model is investigated in Ref. [7]. Reference [8] puts forward the optimal evasion strategy with a path-angle constraint and against two pursuers. The optimal acceleration command is a bang-bang non-singular one governed by a switching function. And the number of switching points and their location is dependent on the dynamics of the evader and the pursuers, and the constraints of the problem. Considering the physical constraints of angular velocity and angle of attack, Ref. [9] carries out the research of an optimal three-dimensional avoidance trajectory. Similarly, in [10,11,12], the optimal control theory are used to construct the guidance laws.

There are also some evasion strategies derived by the differential game method [13,14,15,16,17,18,19,20]. The linear quadratic PE games with terminal velocity constraints is discussed by differential game method in Ref. [13]. A large family of feedback solutions has been obtained from a simple-looking performance index that contains a weighted combination of the pursuer's control effort, the evader's control effort, the miss distance, and the terminal lateral velocity. In Ref. [14], the problem of capturing a maneuvering target is formulated as a non-cooperative zero-sum differential game. State dependent riccati equation technique is used to obtain the guidance law directly from the nonlinear state equations, without any requirement of linearizing them or making restrictive assumptions. And implementing the proposed guidance law does not require time-to-go estimate. Reference [15] poses the problem of intercepting a maneuvering target at a prespecified impact angle in nonlinear zero-sum differential games framework. A feedback form solution is proposed by extending state-dependent riccati equation method to nonlinear zero-sum differential games. In addition, Ref. [16,17,18] studied multiplayer PE games from other perspectives based on differential games.

In the aforementioned studies [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], the evasion tasks are all achieved by maximizing the miss distance or the line-of-sight angle rate. However, for the purpose of energy optimization and the subsequent combat missions after the evasion, maximizing the miss distance may cause the undesired energy consumption and deviation from the original course, may not be the best choice. In recent years, by introducing the concept of evasion with specified miss distance (SMD), some evasion strategies about PE games has attempted to evasion with minimum energy consumption [19,20,21]. In Ref. [19,20,21], to obtain the analytical expression of the evasion guidance command, the miss distance constraint is regarded as an equality constraint. As a result, the miss distance can be controlled to a precise value while placing excessive requirements on evader’s maneuverability. Therefore, the existing evasion strategies with SMD have limited scope of application.

On the other hand, in the above-mentioned studies [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21], all the evasion strategies are developed on the basis of the perfect information of the pursuers. However, the dynamic information of the pursuers is unknown to the evader in the real scene. This problem can be solved by introducing machine learning. In recent years, computation intelligence have been applied in theoretical research on hypersonic vehicles [22,23,24] and multiplayer PE games [25,26,27,28]. However, Ref [22,23,24] are all about the intelligent control of hypersonic vehicles, while Ref. [25,26,27,28] studied the cooperative pursuit strategy of multiple pursuers. The above-mentioned studies do not deal with the evasion guidance of hypersonic vehicles. Regarding the dynamic uncertainty estimation problem mentioned above, Ref. [29,30,31] employ machine learning to estimate the model dynamics. However, Ref. [29,30,31] are not the evasion-faced information estimation research and are difficult to use in the PE games of hypersonic vehicles.

Based on the studies mentioned above, it is obvious that, on the premise that the dynamics of the pursuer is unknown and the evader’s maneuver ability is not dominant, the existing strategies cannot achieve the evasion while considering the acceleration constraints and energy optimization simultaneously. It has to be pointed out that, Ref. [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18] all try to achieve the evasion by maximizing the miss distance or the line-of-sight angle rate. However, it is difficult to balance acceleration constraints and energy optimization in this case. Reference [19,20,21] derived the evasion strategies with SMD, but it has excessive requirements for evader’s maneuverability, which is not suitable for the AHV for the reason of acceleration constraint. In addition, Ref. [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21] assumes that the pursuers’ information is fully known, but does not take into account the uncertainty of pursuer dynamics. And in the studies [22,23,24,25,26,27,28,29,30,31] of computational intelligence, the estimation of the uncertainty of pursuer dynamics is not addressed.

In view of the above-mentioned discussion, while considering the flight characteristics of the AHVs studied in this paper, the main problems inside the work for evasion guidance are as follows:

  1. (i)

    The unknown pursuer dynamics will affect the results of PE game.

  2. (ii)

    The acceleration constraint must be guaranteed during the whole process of evasion.

  3. (iii)

    In order to better perform the subsequent combat missions after evasion, it is necessary to perform energy optimization during the process of evasion.

Compared with the existing studies (assuming that the enemy’s dynamics are known), in this paper the gradient descent method is applied in this article to estimate the unknown dynamics parameters instead of directly assuming as known. At the same time, the flight capabilities of both aircrafts are taken into account during the design process of the guidance law.

2 Problem formulation and preliminaries

2.1 Pursuit-Evasion game model

In this paper, it is assumed that both the AHV and the pursuer are within the detection range of each other, and the evasion-pursuit relationship and the definition of relative angles in horizontal plane are shown in Fig. 1.

Fig. 1
figure 1

Planar engagement geometry

The AHV and the pursuer are regarded as mass points, and the subscripts E & P indicate the state of the AHV and the pursuer, respectively. Both flight speeds VE & VP are assumed to be constant, and the flight path varies with lateral accelerations \(a_{E}\, \& \, a_{P}\) which are normal to velocity direction. \(a_{{E,c}} \, \& \,a_{{P,c}}\) are guidance commands and \(\theta _{E} \, \& \,\theta _{P}\) are the trajectory deflection angles, respectively. r denotes the relative distance between the AHV and the pursuer.

The combat scenario is set as that the AHV under head-on pursuit situation (HOPS). Therefore, it is assumed that the AHV and the pursuer are both on the X axis at initial time \(t_{0}\), with almost the opposite moving directions.

It is assumed [19, 20, 32, 33] that the dynamic equations of both sides can be linearized near the initial line of sight, and can be established:

$$\begin{aligned} {\mathbf{\dot{x}}}_{i} &= A_{i} {\mathbf{x}}_{i} + b_{i} a_{{i,c}} ,{\mathbf{x}}_{i} \in R^{{n_{i} }} ,i = E{\text{,}}P \hfill \\ a_{i}^{ \bot } &= c_{i}^{T} {\mathbf{x}}_{i} + d_{i} a_{{i,c}} \hfill \\ \end{aligned}$$
(1)

where \(a_{i}^{ \bot } = a_{i} \cos \theta _{i} ,i = E,P\). The coefficients \(A_{i} \, \& \,b_{i}\) in Eq. (1) can be regard as constants. According to Fig. 1, the state variable can be chosen as \({\mathbf{x}} = \left[ {\begin{array}{*{20}c} y & {\dot{y}} & {{\mathbf{x}}_{E} } & {{\mathbf{x}}_{P} } \\ \end{array} } \right]^{{\text{T}}}\), where \(y = y_{E} - y_{P}\) and the position deviation normal to the initial line of sight. \({\mathbf{x}}_{E}\) and \({\mathbf{x}}_{P}\) denote the state variable vectors of the AHV and the pursuer, respectively. The relative motion equations can be expressed as:

$${\mathbf{\dot{x}}} = A{\mathbf{x}} + B_{E} a_{{E,c}} + B_{P} a_{{P,c}}$$
(2)

where

$$\begin{gathered} A(t) = \left[ {\begin{array}{*{20}l} 0 & 1 & {0_{{1 \times n_{E} }} } & {0_{{1 \times n_{P} }} } \\ 0 & 0 & {c_{E}^{T} } & { - c_{P}^{T} } \\ {0_{{n_{E} \times 1}} } & {0_{{n_{E} \times 1}} } & {A_{E} } & {0_{{n_{E} \times n_{P} }} } \\ {0_{{n_{P} \times 1}} } & {0_{{n_{P} \times 1}} } & {0_{{n_{P} \times n_{E} }} } & {A_{P} } \\ \end{array} } \right] \hfill \\ B_{E} = \left[ {\begin{array}{*{20}c} 0 \\ {d_{E} } \\ {b_{E} } \\ {0_{{n_{P} \times 1}} } \\ \end{array} } \right],{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} B_{P} = \left[ {\begin{array}{*{20}c} 0 \\ { - d_{P} } \\ {0_{{n_{E} \times 1}} } \\ {b_{P} } \\ \end{array} } \right] \hfill \\ \end{gathered}$$
(3)

The initial state \({\mathbf{x}}\) at \(t_{0}\) can be given by

$${\mathbf{x}}(t_{0} ) = \left[ {\begin{array}{*{20}c} {y_{0} } & {\dot{y}_{0} } & {{\mathbf{x}}_{{E_{0} }} } & {{\mathbf{x}}_{{P_{0} }} } \\ \end{array} } \right]^{T}$$
(4)

To facilitate the analysis of the problem, the time to go is defined as

$$t_{{go}} = t_{f} - t_{0} = \frac{r}{{V_{r} }}$$
(5)

where \(t_{f}\) is the terminal engagement time point and \(V_{r}\) is the closing speed between the two sides.

In order to simplify expressions, Eq. (2) can be further written as

$${\mathbf{\dot{x}}} = A{\mathbf{x}} + B_{u} u(t) + B_{v} v(t)$$
(6)

where \(u\, \& \,v\) are the guidance commands \(a_{{E,c}} \, \& \,a_{{P,c}}\) of both sides, respectively. The position deviation is expressed by scalar output:

$$y = {\mathbf{c}}^{{\mathbf{T}}} {\mathbf{x}}$$
(7)

where \({\mathbf{c}}^{{\mathbf{T}}} = [\begin{array}{*{20}c} 1 & 0 & {0_{{1 \times n_{E} }} } & {0_{{1 \times n_{P} }} } \\ \end{array} ]\). The absolute value \(\left| {y(t_{f} )} \right|\) is regarded as the miss distance.

In this paper, it is assumed that the guidance law of the pursuer is known, which is given as

$$v = F(t){\mathbf{x}} + G(t)$$
(8)

On the basis of Eq. (1), in order to simplify the derivation process, the dynamic equation of the pursuer can be expressed as

$$\dot{a}_{P} = A_{P} a_{P} + b_{P} a_{{Pc}}$$
(9)

where the coefficients \(A_{P} \, \& \,b_{P}\) in Eq. (9) are unknown to the evader.

Remark 1

The scramjet is used as the power system of the AHV. If the evasion maneuver occurs in the longitudinal plane, the sharp change of attitude angle and angular velocity will affect the working state of scramjet [34, 35]. Lateral maneuver can be carried out at a fixed altitude and fixed speed to avoid the impact on the attitude angle and angular velocity due to changes in speed and altitude during evasion. In some existing literature about the evasion of hypersonic vehicles, the modelling and simulations are conducted in 2-dimension [32, 33].

Remark 2

In the PE (pursuit-escape) game model, in order to simplify the derivation process, the dynamic equations of aircrafts can be expressed by first-order equations, and the dynamic parameters in the equations are constants. This way of expression can be seen in many studies [19, 20, 32, 33, 36, 37] about PE (pursuit-escape) games.

Remark 3

Since the values of \(A_{P} \, \& \,b_{P}\) in Eq. (9) depend on the acceleration response speed of the pursuer, in the real scene, the dynamic information of the pursuer is unknown to the evader.

2.2 Design goals

\(M\) is chosen as the lower bound of miss distance for successful evasion, and the condition for successful evasion can be shown as:

$$\left| {y(t_{f} )} \right| \ge M$$
(10)

The acceleration command for the AHV is limited and can be given by

$$\left| {u(t)} \right| \le u_{{\max }}$$
(11)

The energy consumption of the AHV during the evasion process is given by

$$\int\limits_{{t_{0} }}^{{t_{f} }} {u^{2} } {\text{d}}t$$
(12)

In summary, the main problem in this paper can be summarized as Problem 1.

Problem 1

Consider the PE game model given by Eq. (6) with unknown pursuer dynamics given by Eq. (9), the evasion strategy should be derived to minimize the energy consumption given by Eq. (12), while the miss distance subject to Eq. (10) and the control constraint subject to Eq. (11).

3 Main results

3.1 The structure of evasion strategy

The design of evasion strategy in this article is divided into two steps. In details, we firstly collected the flight data of the pursuer and utilized the gradient descent method to train a estimation model of the unknown pursuer dynamics. Then the energy-optimized evasion guidance algorithm while considering system constraints is derived on the basis of the estimated parameters.

The framework of the evasion strategy design in this paper is shown as Fig. 2.

Fig. 2
figure 2

Framework of the evasion strategy design

3.2 The estimation of unknown pursuer dynamics

According to Eq. (9), the pursuer’s dynamic coefficients \(A_{P} \, \& \,b_{P}\) need to be estimated.

The key to the estimation of unknown coefficients in the equation is how to determine the influence of two or more related inputs on the corresponding output.

In this paper, for the two inputs \(a_{P} \, \& \,a_{{Pc}}\) in Eq. (9), the influence of \(a_{P} \, \& \,a_{{Pc}}\) on the output is determined through multiple iterations of a large amount of data. That is, the machine learning is used to obtain the coefficients \(A_{P} \, \& \,b_{P}\) in Eq. (9).

As one of the basic algorithms of machine learning, the gradient descent method has the following advantages: the standard deviation is low, the number of iterations required for convergence is small, and the learning efficiency will not decrease [38, 39]. The basic idea of the gradient descent algorithm is to find the minimum value of the loss function according to the direction of the gradient descent of the loss function. Therefore, the gradient descent algorithm is employed to learn the pursuer dynamics in this work.

In this paper, for calculating the unknown pursuer dynamics in Eq. (9), set \(h_{\theta } (x_{0} ,x_{1} ) = \dot{a}_{P} = A_{P} a_{P} + b_{P} a_{{Pc}}\), then we have

$$h_{\theta } (x_{0} ,x_{1} ) = \theta _{0} x_{0} + \theta _{1} x_{1}$$
(13)

where \(x_{0} \, \& \,x_{1}\) correspond to \(a_{P} \, \& \,a_{{Pc}}\), respectively, \(\theta _{0} \, \& \,\theta _{1}\) correspond to the unknown parameters \(A_{P} \, \& \,b_{P}\), respectively.

Equation (13) is the multiple regression equation. In Eq. (13), the response variable \(h_{\theta } (x)\) is shown with independent variables \(x_{i} (i = 0,1)\), \(\theta _{i} (i = 0,1)\) is a constant value. And, for model building process to minimize the differences between the observed and the predicted response values, coefficients \(\theta _{0} \&\theta _{1}\) are calculated.

For the multiple regression equation Eq. (13), when the number of samples is determined, there are \((x_{0}^{{(0)}} ,x_{1}^{{(0)}} ,y_{0} )\), \((x_{0}^{{(1)}} ,x_{1}^{{(1)}} ,y_{1} )\), …, \((x_{0}^{{(m)}} ,x_{1}^{{(m)}} ,y_{m} )\). Then the loss function can be expressed as

$$J(\theta _{0} ,\theta _{1} ) = \frac{1}{{2m}}\sum\limits_{{j = 1}}^{m} {(h_{\theta } (x_{0}^{{(j)}} ,x_{1}^{{(j)}} ) - y_{j} )^{2} }$$
(14)

Calculate the partial derivative of the current loss function

$$\frac{{\partial J(\theta _{0} ,\theta _{1} )}}{{\partial \theta _{i} }} = \frac{1}{m}\sum\limits_{{j = 0}}^{m} {(h_{\theta } (x_{0}^{{(j)}} ,x_{1}^{{(j)}} ) - y_{j} )} x_{i}^{{(j)}}$$
(15)

Define \(\alpha\) as the update step size, representing the learning rate. When the learning rate is determined, the expression of \(\theta _{i}\) is updated to

$$\theta _{i} = \theta _{i} {\text{ - }}\alpha \frac{1}{m}\sum\limits_{{j = 0}}^{m} {(h_{\theta } (x_{0}^{{(j)}} ,x_{1}^{{(j)}} ) - y_{j} )} x_{i}^{{(j)}}$$
(16)

Based on Eqs. (1316), the steps to estimate \(A_{P} \, \& \,b_{P}\) can be expressed as follows.

  1. 1.

    Establish a pursuit-evasion game simulation system, conduct pursuit-evasion tests under unknown dynamic parameters, and collect sample data through the historical game process.

  2. 2.

    Take Eq. (13) as the multiple regression equation, according to the calculation method in Eqs. (1416), calculate the loss function’s partial derivative \(\frac{{\partial J(\theta _{0} ,\theta _{1} )}}{{\partial \theta _{i} }}\) and the update expression of \(\theta _{i}\).

  3. 3.

    The minimum value of the loss function \(J(\theta _{0} ,\theta _{1} )\) and the corresponding coefficients \(\theta _{0} \, \& \,\theta _{1}\) can be obtained through iterative calculation(data comes from step 1). And \(\theta _{0} \, \& \,\theta _{1}\) are equivalent to \(A_{P} \, \& \,b_{P}\) in Eq. (9).

3.3 The evasion guidance command

Considering the fact that AHVs’ maximum acceleration is inferior to the pursuer’s, it poses a great challenge to the evasion of the AHV. As a result, effective evasion algorithms are required for achieving successful evasion while giving consideration to the acceleration constraints and energy optimization simultaneously.

For the purpose of energy optimization, the evasion command \(u_{1}\) is designed by introducing the SMD. Giving consideration to the acceleration constraints, the evasion command \(u_{2}\) based on NDG is designed. When the pursuer's capability is strong enough to cause the AHV's evasion command \(u_{1}\) to saturate, the evasion command is switched from \(u_{1}\) to \(u_{2}\). The composition structure of evasion command is shown as Fig. 3.

Fig. 3
figure 3

Composition structure of evasion command \(u_{c}\)

To makes the proposed framework more clear, the following explanations are provided. For the goal of the evasion with minimum energy consumption, the guidance command \(u_{1}\) should be given priority. However, when the pursuer’s capability is strong enough that the AHV cannot achieve successful evasion with \(u_{1}\), the evasion command \(u_{2}\) is employed.

Accordingly, the composite evasion command can be given by

$$\left\{ \begin{aligned} u_{c} (t) = u_{1} (t){\kern 1pt} \quad {\kern 1pt} \left| {u_{1} (t)} \right| < u_{{\max }} (t \in \left[ {t_{0} ,t_{f} } \right]) \hfill \\ u_{c} (t) = \left\{ \begin{aligned} u_{1} (t),t \in \left[ {t_{0} ,t_{1} } \right] \hfill \\ u_{2} (t),t \in \left( {t_{1} ,t_{f} } \right] \hfill \\ \end{aligned} \right.{\kern 1pt} ,{\kern 1pt} \quad {\kern 1pt} \left| {u_{1} (t)} \right| \ge u_{{\max }} \hfill \\ \end{aligned} \right.$$
(17)

where \(t_{1}\) is the moment when \(u_{1}\) reaches saturation for the first time(\(\left| {u_{1} (t_{1} )} \right| \ge u_{{\max }}\)).

This section is divided into two parts. Firstly, the derivation of SMD—based evasion command \(u_{1}\) is presented. Secondly, the derivation of NDG—based evasion command \(u_{2}\) is presented. With Eq. (17) and the expression of \(u_{1}\) and \(u_{2}\), the composite guidance command \(u_{c}\) is generated.

3.3.1 Derivation of SMD–based evasion command in evasion strategy

In this subsection, the process of deriving \(u_{1}\) is given.

Substituting Eq. (8) into Eq. (6) yields

$${\mathbf{\dot{x}}} = \left( {A + B_{v} F} \right){\mathbf{x}} + B_{u} u(t) + B_{v} G{\text{(t)}}$$
(18)

To facilitate the derivation process, the zero effort miss distance \(Z_{1} (t)\) is introduced here[40]. According to Eq. (18), the expression of \(Z_{1} (t)\) and its derivative can be given by

$$Z_{1} (t) = {\mathbf{c}}^{{\mathbf{T}}} \left[ {\phi _{1} (t_{f} ,t){\mathbf{x}}(t) + \int_{t}^{{t_{f} }} {\phi _{1} (t_{f} ,\tau )B_{v} G{\text{(}}\tau {\text{)d}}\tau } } \right]$$
(19)
$$\dot{Z}_{1} (t) = b(t)u_{1} (t)$$
(20)
$$b(t) = {\mathbf{c}}^{{\mathbf{T}}} \phi _{1} (t_{f} ,t)B_{u} (t)$$
(21)

with the boundary conditions

$$\left\{ \begin{aligned}& Z_{1} (t) = {\mathbf{c}}^{{\mathbf{T}}} \left[ {\phi _{1} (t_{f} ,t_{0} ){\mathbf{x}}_{0} (t) + \int_{{t_{0} }}^{{t_{f} }} {\phi _{1} (t_{f} ,\tau )B_{v} G{\text{(}}\tau {\text{)d}}\tau } } \right] \hfill \\& Z_{1} (t_{f} ) = y(t_{f} ) \hfill \\ \end{aligned} \right.$$
(22)

where \(\phi _{1} ( \cdot , \cdot )\) is the state transition matrix, corresponding to the state matrix \(\left( {A + B_{v} F} \right)\).

After the transition, the miss distance can be expressed as \(\left| {Z_{1} (t_{f} )} \right|\) and the condition for successful evasion of the AHV is given by

$$\left| {Z_{1} (t_{f} )} \right| \ge M$$
(23)

On this basis, Problem 1 in Sect. 2.2 can be convert to Problem 2.

Problem 2

In line of the PE game model given by Eq. (20), the evasion strategy should be derived to minimize the energy consumption given by Eq. (12), while the miss distance subject to Eq. (23) and the control constraint subject to Eq. (11).

For solving the Problem2, in consideration of bounded acceleration of the AHV, the performance index is given by

$$J = \frac{1}{2}\left[ {Q\left[ {Z_{1} (t_{f} ) - M} \right]^{2} + \int\limits_{{t_{0} }}^{{t_{f} }} {u_{1}^{2} (t){\text{d}}t} } \right]$$
(24)

subject to

$$\left| {u_{1} (t)} \right| < u_{{\max }}$$

where \(Q\left( {Q > 0} \right)\) is a weighted matrix to be designed. By adjusting \(Q\), the miss distance is larger than the boundary value \(M\) while \(u_{1}\) meets the acceleration constraint.

As mentioned in Sect. 2.1, the AHV and the pursuer are both on the \(X\) axis at initial time \(t_{0}\), with opposite moving directions. As a result, there is \(Z_{1} (t) = 0\). On this basis, considering the evasion condition Eq. (23), the condition that \(Z_{1} (t_{f} ) > M\) is chosen as the criteria of successful evasion.

According to Eq. (24), the analytic expression of \(u_{1}\) can be derived and expressed as

$$u_{1}^{*} (t) = - Q\left( {Z_{1} (t_{f} ) - M} \right)b(t)$$
(25)

Substituting Eq. (25) into Eq. (20), and integrate the new equation, then we have

$$u_{1}^{*} (t) = \frac{{ - Qb(t)}}{{1 + Q\int_{t}^{{t_{f} }} {b^{2} (t){\text{d}}t} }}\left[ {{\text{Z}}_{1} (t) - M} \right]$$
(26)

For calculating \(Z_{1} (t)\) and \(b(t)\), the vector \({\mathbf{Y}}\) is introduced to simplify computation.

$${\mathbf{Y}}(t) = \phi _{1}^{T} (t_{f} ,t_{f} - t){\mathbf{c}}$$
(27)

Easy to prove that \({\mathbf{Y}}\) satisfies

$${\mathbf{\dot{Y}}} = (A + B_{v} F)^{T} (t_{f} - t){\mathbf{Y}}, {\mathbf{Y}}(0) = {\mathbf{c}}$$
(28)

Equation (28) is the adjoint equation of the model in Eq. (18). According to Eq. (27) and Eqs. (19, 21, 28) can be expressed as

$$Z_{1} (t) = {\mathbf{Y}}^{T} (t_{f} - t){\mathbf{x}}(t) + \int\limits_{t}^{{t_{f} }} {{\mathbf{Y}}(t_{f} - \tau )B_{v} {\text{(}}\tau {\text{)}}G{\text{(}}\tau {\text{)d}}\tau }$$
(29)
$$b(t) = {\mathbf{Y}}^{T} (t_{f} - t)B_{u} (t)$$
(30)

In summary, the energy-optimized evasion command \(u_{1}\) is given by

$$u_{1}^{*} (t) = \frac{{ - Qb(t)}}{{1 + Q\int_{t}^{{t_{f} }} {b^{2} (t){\text{d}}t} }}\left[ {{\text{Z}}_{1} (t) - M} \right]$$
(31)

where

$$\left\{ \begin{aligned} & Z_{1} (t) = {\mathbf{Y}}^{T} (t_{f} - t){\mathbf{x}}(t) + \int\limits_{t}^{{t_{f} }} {{\mathbf{Y}}(t_{f} - \tau )B_{v} {\text{(}}\tau {\text{)}}g{\text{(}}\tau {\text{)d}}\tau } \hfill \\ & b(t) = {\mathbf{Y}}^{T} (t_{f} - t)B_{u} (t) \hfill \\ \end{aligned} \right.$$

3.3.2 Derivation of NDG-based evasion command in evasion strategy

In this subsection, the process of deriving \(u_{2}\) is given, based on the NDG method [41].

During the derivation, according to the principle of the NDG, the expression of \(v(t)\) cannot be assumed in advance. Therefore, different from \(Z_{1} (t)\), the zero effort miss distance \(Z_{2} (t)\) is given by

$$Z_{2} (t) = {\mathbf{c}}^{{\mathbf{T}}} \phi _{2} (t_{f} ,t){\mathbf{x}}(t)$$
(32)
$$\dot{Z}_{2} (t) = b_{u} (t)u_{2} (t) + b_{v} (t)v(t)$$
(33)
$$\left\{ \begin{aligned} b_{u} (t) = {\mathbf{c}}^{{\mathbf{T}}} \phi _{2} (t_{f} ,t)B_{u} (t) \hfill \\ b_{v} (t) = {\mathbf{c}}^{{\mathbf{T}}} \phi _{2} (t_{f} ,t)B_{v} (t) \hfill \\ \end{aligned} \right.$$
(34)

with the boundary conditions

$$\left\{ \begin{aligned} & Z_{2} (t_{0} ) = {\mathbf{c}}^{{\mathbf{T}}} \phi _{2} (t_{f} ,t_{0} ){\mathbf{x}}_{0} \hfill \\ &Z_{2} (t_{f} ) = y(t_{f} ) \hfill \\ \end{aligned} \right.$$
(35)

where \(\phi _{2} ( \cdot , \cdot )\) is the state transition matrix, corresponding to the state matrix \(A\).

After the transition, the condition for successful evasion of the AHV is given by

$$\left| {Z_{2} (t_{f} )} \right| \ge M$$
(36)

On this basis, Problem 1 in Sect. 2.2 can be convert to Problem 3.

Problem 3

In line of the PE game model given by Eq. (33), the evasion strategy should be derived to minimize the energy consumption given by Eq. (12), while the miss distance subject to Eq. (36) and the control constraint subject to Eq. (11).

According to Fig. 3, when the NDG-based command \(u_{2}\) is employed, the AHV try to evade with maximum miss distance, and the performance index is given by

$$J = \left| {Z_{2} (t_{f} )} \right|$$
(37)

subject to

$$\left| {u_{2} (t)} \right| \le u_{{\max }}$$

The Hamiltonian of the problem is

$$H = \lambda _{2} (t)\dot{Z}_{2} (t)$$
(38)

The adjoint equations are

$$\left\{ {\begin{array}{*{20}c} {\dot{\lambda }_{2} (t) = - \frac{{\partial H}}{{\partial Z_{2} (t)}} = 0} \\ {\lambda _{2} (t_{f} ) = \frac{{\partial J}}{{\partial Z_{2} }}\left| {_{{t_{f} }} } \right. = \text{sgn} \left[ {Z_{2} (t_{f} )} \right]} \\ \end{array} } \right. \Rightarrow \lambda _{2} (t) = \text{sgn} \left[ {Z_{2} (t_{f} )} \right]$$
(39)

On the basis of extremum principle, then we have

$$\left\{ {\begin{array}{*{20}c} {u_{2}^{*} (t) = \arg {\kern 1pt} {\kern 1pt} \mathop {\max }\limits_{{u_{2} }} H = u_{{\max }} \text{sgn} \left[ {Z_{2} (t_{f} )} \right]} \\ {v^{*} (t) = \arg {\kern 1pt} {\kern 1pt} \mathop {\min }\limits_{v} H = v_{{\max }} \text{sgn} \left[ {Z_{2} (t_{f} )} \right]} \\ \end{array} } \right.$$
(40)

Substituting Eq. (40) into Eq. (33) yields

$$\left\{ \begin{gathered} \dot{Z}_{2} (t) = L\text{sgn} \left[ {Z_{2} (t_{f} )} \right] \hfill \\ L = {\mathbf{c}}^{{\mathbf{T}}} \phi _{2} (t_{f} ,t)B_{u} (t)u_{{\max }} {\text{ + }}{\mathbf{c}}^{{\mathbf{T}}} \phi _{2} (t_{f} ,t)B_{v} (t)v_{{\max }} \hfill \\ \end{gathered} \right.$$
(41)

In the NDG, both sides should adopt the optimal strategy. On this basis, for the reason that the maximum acceleration of the AHV is weaker than the pursuer’s, there is

$$L < 0$$
(42)

Integral on both sides of Eq. (41), we obtain

$$\begin{aligned} {\text{Z}}_{2} (t) & = {\text{Z}}_{2} (t_{f} ) - \int\limits_{t}^{{t_{f} }} {\dot{Z}_{2} (t){\text{d}}t} \hfill \\ & = {\text{Z}}_{2} (t_{f} ) - \int\limits_{t}^{{t_{f} }} {Lsgn[Z_{2} (t_{f} )]{\text{d}}t} \hfill \\ & = {\text{Z}}_{2} (t_{f} ) - {\text{sgn}}[Z_{2} (t_{f} )]\int\limits_{t}^{{t_{f} }} {L{\text{d}}t} \hfill \\ \end{aligned}$$
(43)

For \(\int_{t}^{{t_{f} }} {L{\text{d}}t} < 0\), there is

$${\text{sgn}}[Z_{2} (t)] = {\text{sgn}}[Z_{2} (t_{f} )]$$
(44)

Substituting Eq. (44) into Eq. (40), the NDG-based evasion command \(u_{2}\) is given by

$$u_{2}^{*} (t) = u_{{\max }} \text{sgn} \left[ {Z_{2} (t)} \right]$$
(45)

where \(Z_{2} (t)\) can be calculated by Eq. (32).

To sum up, by substituting Eqs. (3145) into Eq. (17), the composite evasion command \(u_{c}\) is given by

$$\left\{ \begin{gathered} u_{c} (t) = u_{1} (t)\quad {\kern 1pt} \left| {u_{1} (t)} \right| < u_{{\max }} (t \in \left[ {t_{0} ,t_{f} } \right]) \hfill \\ u_{c} (t) = \left\{ \begin{gathered} u_{1} (t),t \in \left[ {t_{0} ,t_{1} } \right] \hfill \\ u_{2} (t),t \in \left( {t_{1} ,t_{f} } \right] \hfill \\ \end{gathered} \right.{\kern 1pt} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \left| {u_{1} (t)} \right| \ge u_{{\max }} \hfill \\ \end{gathered} \right.$$
(46)

where

$$\left\{ \begin{gathered} u_{1}^{{}} (t) = \frac{{ - Qb(t)}}{{1 + Q\int_{t}^{{t_{f} }} {b^{2} (t)dt} }}\left[ {{\text{Z}}_{1} (t) - M} \right] \hfill \\ u_{2}^{{}} (t) = u_{{\max }} \text{sgn} \left[ {Z_{2} (t)} \right] \hfill \\ \end{gathered} \right.$$

Remark 4

Compared with the existing studies [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21], the major innovations of the proposed method are highlighted as follows.

  1. (i)

    In a lot of previous work [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18], either the acceleration constraint is not taken into account, or the bang-bang control is directly adopted and the energy optimization is ignored. In contrast, the evasion strategy introduces the boundary value \(M\)(lower bound of miss distance for successful evasion) and combines the concept of SMD and NDG theory, so as to solve the energy optimization problem under the premise of acceleration constraints.

  2. (ii)

    In some previous work [19,20,21], the evader tries to evasion with SMD(\(Z(t_{f} ) = M\),\(Z(t_{f} )\) is the zero effort miss distance). But the equation constraint \(Z(t_{f} ) = M\) has excessive requirements for the evader’s maneuverability. In this paper, the conditions of successful evasion are seen as an inequality \(Z(t_{f} ) > M\). And the inequality constraint \(Z(t_{f} ) > M\) does not have excessive requirements for the evader’s maneuverability. Therefore, the evasion command in this work is more practical and has wider applicability.

  3. (iii)

    In all the above-mentioned previous work [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21], the evasion strategies are developed on the basis of the perfect information of the pursuers. In this paper, by introducing the gradient descent method, the dynamic information of the pursuer is estimated. With the estimation, a more realistic dynamic model of pursuer is obtained. On the basis of the estimated model, the evasion strategy is designed. As far as authors know, the evasion strategy in this work is the first one which can achieve the successful evasion against the pursuer with unknown dynamics.

4 Results and discussion

4.1 Simulation of pursuer dynamics estimation

In the process of pursuer dynamics estimation, the relevant simulation parameters are set as follows:

  1. 1.

    Without prior information, the initial values of coefficients \(\theta _{i}\) are randomly selected as \(\theta _{0} {\text{ = }}7\),\(\theta _{1} {\text{ = }}2\).

  2. 2.

    The learning rate \(\alpha\) needs to be carefully selected. If \(\alpha\) is too small, the function \(J(\theta _{0} ,\theta _{1} )\) will decrease slowly and consume a lot of resources. If \(\alpha\) is too large, the function \(J(\theta _{0} ,\theta _{1} )\) may not converge. In this paper, \(\alpha = 0.001\).

The estimation results of pursuer dynamics are shown in Figs. 4, 5 and  6 And the analysis is as follows:

Fig. 4
figure 4

Calculation of loss function \(J(\theta _{0} ,\theta _{1} )\). a 10,000 times iterative calculation, b First 10 times iterative calculation

Fig. 5
figure 5

Estimation of coefficient \(\theta _{0}\). a 10,000 times iterative calculation, b First 10 times iterative calculation

Fig. 6
figure 6

Estimation of coefficient \(\theta _{1}\). a 10,000 times iterative calculation, b First 10 times iterative calculation

  1. 1.

    It can be seen from Fig. 4 that the loss function converges to the order of 0.005, so the result of parameter estimation can be considered to be with high accuracy.

  2. 2.

    According to Figs. 5 and  6, after the 6380th and 6144th iterations, the parameters \(\theta _{0} \, \& \,\theta _{1}\) converged to -0.5077 & 0.5073, respectively.

  3. 3.

    As can be seen in Figs. 4, 5 and  6, in the first 10 times iterative calculation, the numerical curves of all parameters have severe oscillation. The reason is that the initial values of \(\theta _{0} \, \& \,\theta _{1}\) are randomly selected. However, the oscillation will not affect the accuracy of the final result. It can be seen that after several iterations, the oscillation has completely disappeared.

  4. 4.

    In summary, there is \(\hat{A}_{P} = \theta _{0} = - 0.5077\), \(\hat{b}_{P} = \theta _{1} = 0.5073\). Substituting the value of \(\hat{A}_{P} \, \& \,\hat{b}_{P}\) into Eq. (9), then the estimation result of the pursuer dynamic model can be expressed as: \(\dot{a}_{P} = A_{P} a_{P} + b_{P} a_{{Pc}} \approx - 0.5077a_{P} + 0.5073a_{{Pc}}\)

4.2 Simulation of evasion guidance of the AHV

In the process of evasion guidance, the relevant assumption and simulation parameters are set as follows:

  1. 1.

    The augmented proportional navigation (APN) guidance law is selected as the guidance law of the pursuer.

  2. 2.

    The basic parameters of both sides and simulation parameters of the engagement scenario in this paper are shown in Tables 1 and 2, respectively.

Table 1 Basic parameters of the AHV and the pursuer
Table 2 Simulation parameters

In Table 1, \(a\) is the speed of sound. The value of \(\left| {a_{{P\max }} /g} \right|\) is set as 8.0 for the sake of guaranteeing the robustness of evasion strategies. In the existing literature, in order to meet the hit-to-kill condition for the pursuit, the maneuverability of pursuers should be more than three times the AHV’s acceleration [42]. For the sake of guaranteeing the robustness of evasion strategies in this paper, the value of \(\left| {a_{{P\max }} /g} \right|\) is set as 8.0. In addition, in order to meet the conditions of HOP, the initial value of trajectory deflection angle \(\theta _{{E_{0} }}\) is set between \(\left[ {0,10} \right]\)°.

In Table 2, \(M\) is set as 1 m which is the maximum distance to meet the hit-to-kill condition. \(r_{0}\) is set as \(\left[ {60,100} \right]\) km due to the fact that pursuers can detect and track the evader within 100 km.

In this section, the composite evasion command \(u_{c}\) (Eq. 46) is simulated. (The value of \(u_{c}\) is based on the estimation result in Sect. 4.1.) The key index parameters \(\left| {y(t_{f} )} \right|\) (miss distance) is shown below for evaluating the success of evasion. As a comparison, the simulation results of \(u_{1}\) (Eq. (31)) and \(u_{2}\) (Eq. (45)) are also given below.

where \(u_{c}\): the composite evasion command. \(u_{1}\): the SMD-based evasion command. \(u_{2}\): the NDG-based evasion command.

The simulation results are given in Figs. 7, 8, 9, 10, 11 and 12 and Table 3. The analysis is as follows:

Fig. 7
figure 7

Flight trajectory with command \(u_{1}\)

Fig. 8
figure 8

Flight trajectory with command \(u_{2}\). a Flight trajectory. b Partial enlarged view of (a)

Fig. 9
figure 9

Flight trajectory with command \(u_{c}\)

Fig. 10
figure 10

Different evasion guidance command of the AHV

Fig. 11
figure 11

Trajectory deflection angle of the AHV. a Trajectory deflection angle, b Partial enlarged view of (a)

Fig. 12
figure 12

Miss distance of different scenarios. a Different initial trajectory deflection angle \(\theta _{{E_{0} }}\), b Different initial distance \(r_{0}\)

Table 3 Miss distance, energy consumption and acceleration constraint
  1. 1.

    According to Eq. (10), the criterion for successful evasion is \(\left| {y(t_{f} )} \right| > M = 1m\).The flight trajectories with different evasion commands are respectively given in Figs. 7, 8, 9, from which we can see that the AHV can achieve successful evasion with \(u_{2}\) or \(u_{c}\),but not with \(u_{1}\) (even if \(u_{1}\) does not meet the acceleration constraint in Fig. 10. Therefore, it can be concluded that \(u_{2}\) and \(u_{c}\) have stronger evasion capabilities than \(u_{1}\).

  2. 2.

    It can be seen from Fig. 11 that different evasion commands have a greater impact on the AHV’s trajectory deflection angle at the end of the evasion. Compared with \(u_{1}\) and \(u_{c}\), the evasion command \(u_{2}\) produces a large trajectory deflection angle. Considering the AHV's fast flight speed, a large trajectory deflection angle will cause the AHV to deviate from the original route to a great extent. Moreover, combining the energy consumption in Table 3, it can be seen that \(u_{2}\) can enable the AHV to achieve successful evasion while paying great cost, which is not conducive to the combat mission after evasion.

  3. 3.

    As can be seen in Fig. 12, as the initial trajectory deflection angle \(\theta _{{E_{0} }}\) and initial distance \(r_{0}\) change, the AHV can still achieve successful evasion with \(u_{c}\). Therefore, the evasion command \(u_{c}\) can enable the AHV to complete the evasion for different initial conditions.

In summary, compared to \(u_{1}\) and \(u_{2}\), the proposed command \(u_{c}\) is the only guidance command that can enable the AHV to achieve the successful evasion while considering acceleration constraints and energy optimization simultaneously.

5 Conclusions

In this paper, considering the unknown dynamics of the pursuer, a novel evasion strategy based on the estimation of the pursuer dynamics is proposed for the AHV. The structure of proposed evasion algorithm mainly combines the estimation of unknown pursuer dynamics based on gradient descent and an composite evasion command based on SMD and NDG. Through the gradient descent method, the dynamic model of the pursuer was obtained. On the basis of the estimated model, an evasion strategy that satisfies the acceleration constraint and energy optimization at the same time was designed. The simulation results show that, with the proposed evasion algorithm, the AHV can achieve successful evasion against the pursuer with unknown dynamics, and take into account the energy optimization and acceleration constraint simultaneously. The future direction for this work can be the strategies in the multiplayer PE games. Specifically, the cooperative strategy of multiple teammates and pursuit-evasion strategies against multiple enemies can be studied with the aid of computation intelligence in the future.