1 Introduction

UAV swarms play an important role in disaster relief, city management, geological reconnaissance and other difficult tasks for humans [1]. Such swarms can work in a complex and dangerous environment with low cost and fewer casualties [2]. The cooperative work of a UAV swarm depends on communication, which requires determining an appropriate routing path [3].

It is difficult for the traditional routing protocol to meet the demands of UAV swarm networks because UAVs move at a high speed and the relative motion among UAV swarms is strenuous [4]. Therefore, the network topology is highly dynamic and cannot be used as prior information when determining the routing path. To address these problems, many studies have focused on routing methods based on machine learning (ML) since they require less prior information and are more flexible [5, 6]. The ML algorithms applied to routing methods include reinforcement learning (RL) [7,8,9], neural networks (NNs) [5, 10,11,12], swarm intelligence algorithms [13], and combinations of different ML algorithms [13,14,15,16,17]. Although the ML-based routing method is intelligent, it leads to routing oscillations since the ML algorithms are not robust.

To find the routing path more intelligently and stably, some researchers have employed ML to evaluate the routing information, such as the performance of the neighbor node and the routing path [18,19,20]. In this case, ML algorithms did not directly determine the next hop, but the predicted results of the routing information were provided as the reference for routing. These ML-based routing methods can estimate the optimal routing path for moving nodes or predict the performance of neighboring nodes through trial-and-error structures, imitating path searching and optimal problem solving [21, 22]. However, some ML algorithms that depend on online learning are difficult to apply. These algorithms occupy much of the computing resources of the UAVs [23, 24]. In addition, some ML-based routing information prediction methods ignore the features of UAV movement. Those features can be the prior information for more precise prediction and can be easily obtained by the navigation and control systems of the UAV [25].

To apply the predicted routing information to the UAV swarm network for routing path determination, a routing prediction strategy based on the combination of PIO and NN (PIONN) is proposed. The strategy includes offline training and routing prediction. In offline training, the states of the UAV swarm and network are used as the prior information for precise prediction. The PIO method is applied to find the optimal weight matrices of the NN-based routing prediction model, which can reduce the computational complexity and can avoid a differential in the learning process. In routing prediction, the matrix of the hop count index function is calculated according to the PIONN routing prediction (PIONNRP) model.

The remainder of this paper is organized as follows: in Sect. 2, the features of UAV movement are extracted as the prior information for the routing prediction model, and the PIONN framework is designed. In Sect. 3, the routing prediction strategy based on PIONN is presented. In Sect. 4, simulations are conducted to evaluate the performance of PIONNRP. Section 5 presents the conclusions of this paper.

2 System Model

2.1 Feature Extraction of Movement

The UAV swarm includes \(N_{{{\text{UAV}}}}\) nodes and \({\mathcal{A}} = \left\{ {a_{1} ,\;a_{2} , \cdots ,\;a_{{N_{{{\text{UAV}}}} }} } \right\}\) denotes the set of the swarm, with node \(a_{i} \;(i \in \left[ {1,\;N_{{{\text{UAV}}}} } \right],\;i \in {\mathbb{N}}^{ + } )\). Let node \(a_{j}^{i} \in {\mathcal{A}}_{{{\text{NEI}}}}^{i} \;\) be the neighbor of node \(a_{i}\), in which \({\mathcal{A}}_{{{\text{NEI}}}}^{i} \; = \left\{ {\left. {a_{j}^{i} } \right|{\text{hop}}(a_{i} ,\;a_{j} ) = 1} \right\}\) is the set of neighbors and \({\text{hop}}(a_{i} ,\;a_{j} ) \in {\mathbb{N}}^{ + }\) is the function of the minimum hop count from node \(a_{i}\) to node \(a_{j}\).

Determining a proper next hop can optimize a routing path. The next hop for less \({\text{hop}}(a_{i} ,\;a_{t} )\) may be the neighbor of the originating node \(a_{i}\), flying toward the target node \(a_{t}\). Hence, the relative position and relative velocity among the originating node, neighboring node and target node should be considered.

As shown in Fig. 1, to evaluate the performance of node \(a_{j}^{i}\) as the next hop, the forward distance \(r_{{{\text{for}}}} (i,\;j,\;t)\) and forward speed \(v_{{{\text{for}}}} (i,\;j,\;t)\) are defined as:

$$ \left\{ \begin{gathered} r_{{{\text{for}}}} (i,\;j,\;t) = \left\| {{\mathbf{r}}_{{{\text{for}}}} (i,\;j,\;t)} \right\| = \left\| {{\mathbf{r}}(i,\;j)\cos \left[ {\theta_{{\text{r}}} (i,\;j,\;t)} \right]} \right\| \hfill \\ v_{{{\text{for}}}} (i,\;j,\;t) = \left\| {{\mathbf{v}}_{{{\text{for}}}} (i,\;j,\;t)} \right\| = \left\| {{\mathbf{v}}(i,\;j)\cos \left[ {\theta_{{\text{v}}} (i,\;j,\;t)} \right]} \right\| \hfill \\ \end{gathered} \right., $$
(1)

where \({\mathbf{r}}_{{{\text{for}}}} (i,\;j,\;t) \in {\mathbb{R}}^{3}\) and \({\mathbf{v}}_{{{\text{for}}}} (i,\;j,\;t)\) are the forward position and forward velocity among the originating node \(a_{i}\), neighboring node \(a_{j}^{i}\) and target node \(a_{t} \in {\mathcal{A}}\), with \(t \ne i,\;j\), respectively. \({\mathbf{r}}(i,\;j) \in {\mathbb{R}}^{3}\) is the position between node \(a_{i}\) and node \(a_{j}^{i}\), and \({\mathbf{v}}(i,\;j) \in {\mathbb{R}}^{3}\) is the velocity of node \(a_{j}^{i}\). \(\theta_{{\text{r}}} (i,\;j,\;t) \in {\mathbb{R}}\) is the angle between \({\mathbf{r}}(i,\;j)\) and \({\mathbf{r}}(i,\;t)\), and \(\theta_{{\text{v}}} (i,\;j,\;t) \in {\mathbb{R}}\) is the angle between \({\mathbf{v}}(i,\;j)\) and \({\mathbf{r}}(i,\;t)\).

Fig. 1
figure 1

Motion feature of UAV swarm

2.2 Routing Prediction Model Based on the PIONN Framework

2.2.1 Routing Prediction Model

The hop count index function of the neighbor node \(a_{j}^{i}\) is defined to evaluate the performance of this neighbor as the next hop to the target node \(a_{t}\). In a network with a known path, the hop count index function \(H(a_{j}^{i} ,\;a_{t} ) \in {\mathbb{R}}\) is the minimum hop count from node \(a_{j}^{i}\) to node \(a_{t}\). Otherwise, it is obtained by prediction.

$$ H(a_{j}^{i} ,\;a_{t} ) = \left\{ \begin{gathered} {\text{hop}}(a_{j}^{i} ,\;a_{t} ),\;{\text{known}}\;{\text{path}} \hfill \\ \hat{h}(a_{j}^{i} ,\;a_{t} )\quad ,\;{\text{else}} \hfill \\ \end{gathered} \right., $$
(2)

where \(\hat{h}(a_{j} ,\;a_{t} ) \in {\mathbb{R}}\) is the predicted hop count index function of the neighbor node \(a_{j}^{i}\).

\(\hat{h}(a_{j} ,\;a_{t} )\) is predicted using PIONN, which is the fully connected neural network with two hidden layers. The first hidden layer includes 16 neurons, while the second hidden layer includes 4 neurons.

The input \({\mathbf{u}}(i,\;j,\;t) \in {\mathbb{R}}^{4}\) and output \(\hat{y}(i,\;j,\;t) \in {\mathbb{R}}\) are designed as:

$$ \left\{ \begin{gathered} {\mathbf{u}}(i,\;j,\;t) = \left[ {\begin{array}{*{20}c} {r_{{{\text{for}}}} (i,\;j,\;t)} & {v_{{{\text{for}}}} (i,\;j,\;t)} & {T_{{{\text{de}}}} (i,j)} & {B(i,j)} \\ \end{array} } \right]^{{\text{T}}} \hfill \\ \hat{y}(i,\;j,\;t) = \hat{h}(a_{j}^{i} ,\;a_{t} ) \hfill \\ \end{gathered} \right., $$
(3)

where \(T_{de} (i,j) \in {\mathbb{R}}\) and \(B(i,j) \in {\mathbb{R}}\) are the delay and maximum bandwidth from node \(a_{i}\) to \(a_{j}^{i}\), respectively.

And the hop count index function is predicted by:

$$ \left\{ \begin{gathered} {\mathbf{z}}_{1} (i,\;j,\;t) = \tanh \left[ {{\mathbf{W}}_{{{\text{uh}}}} {\mathbf{u}}(i,\;j,\;t)} \right] \hfill \\ {\mathbf{z}}_{2} (i,\;j,\;t) = \tanh \left[ {{\mathbf{W}}_{{{\text{hh}}}} {\mathbf{z}}_{1} (i,\;j,\;t)} \right] \hfill \\ \hat{y}(i,\;j,\;t) = {\mathbf{W}}_{{{\text{hy}}}} {\mathbf{z}}_{2} (i,\;j,\;t) \hfill \\ \end{gathered} \right., $$
(4)

where \({\mathbf{W}}_{{{\text{uh}}}} \in {\mathbb{R}}^{16 \times 4}\), \({\mathbf{W}}_{{{\text{hh}}}} \in {\mathbb{R}}^{4 \times 16}\) and \({\mathbf{W}}_{{{\text{hy}}}} \in {\mathbb{R}}^{1 \times 4}\) are the weight matrices, and \({\mathbf{z}}_{1} (i,\;j,\;t) \in {\mathbb{R}}^{16}\) and \({\mathbf{z}}_{2} (i,\;j,\;t) \in {\mathbb{R}}^{4}\) are the outputs of the first and second hidden layers, respectively.

2.2.2 PIONN Framework

To accurately predict the hop count index function, it is necessary to obtain the optimal weight matrices of the NN-based routing prediction model. To address this problem, the PIO-based NN framework is proposed.

Definition 1

In the learning process of the PIONN, let \({\mathcal{W}}_{{{\text{uh}}}} \in {\mathbb{R}}^{16 \times 4}\), \({\mathcal{W}}_{{{\text{hh}}}} \in {\mathbb{R}}^{4 \times 16}\), and \({\mathcal{W}}_{{{\text{hy}}}} \in {\mathbb{R}}^{1 \times 4}\) be three different pigeon groups:

$$ \left\{ \begin{gathered} {\mathcal{W}}_{{{\text{uh}}}} \triangleq \left\{ {{\mathbf{W}}_{{{\text{uh}}}} (1),\;{\mathbf{W}}_{{{\text{uh}}}} (2),\; \ldots ,\;{\mathbf{W}}_{{{\text{uh}}}} (N_{{{\text{set}}}} )} \right\} \hfill \\ {\mathcal{W}}_{{{\text{hh}}}} \triangleq \left\{ {{\mathbf{W}}_{{{\text{hh}}}} (1),\;{\mathbf{W}}_{{{\text{hh}}}} (2), \ldots ,\;{\mathbf{W}}_{{{\text{hh}}}} (N_{{{\text{set}}}} )} \right\} \hfill \\ {\mathcal{W}}_{{{\text{hy}}}} \triangleq \left\{ {{\mathbf{W}}_{{{\text{hy}}}} (1),\;{\mathbf{W}}_{{{\text{hy}}}} (2),\; \ldots ,\;{\mathbf{W}}_{{{\text{hy}}}} (N_{{{\text{set}}}} )} \right\} \hfill \\ \end{gathered} \right., $$
(5)

where \(N_{{{\text{set}}}}\) is the number of sample pairs in the database. This means that each pigeon group includes \(N_{{{\text{set}}}}\) individuals.

Definition 2

Pigeon groups \({\mathcal{W}}_{{{\text{uh}}}}\), \({\mathcal{W}}_{{{\text{hh}}}}\), and \({\mathcal{W}}_{{{\text{hy}}}}\) move in the spaces \({\mathcal{V}}_{{{\text{uh}}}} \in {\mathbb{R}}^{16 \times 4}\), \({\mathcal{V}}_{{{\text{hh}}}} \in {\mathbb{R}}^{4 \times 16}\), and \({\mathcal{V}}_{{{\text{hy}}}} \in {\mathbb{R}}^{1 \times 4}\), respectively. At the \(k^{{{\text{th}}}}\) iteration, the positions of the pigeon groups are:

$$ \left\{ \begin{gathered} {\mathcal{W}}_{{{\text{uh}}}} (k) \triangleq \left\{ {{\mathbf{W}}_{{{\text{uh}}}} (1,\;k),\;{\mathbf{W}}_{{{\text{uh}}}} (2,\;k),\; \ldots ,\;{\mathbf{W}}_{{{\text{uh}}}} (N_{{{\text{set}}}} ,\;k)} \right\} \hfill \\ {\mathcal{W}}_{{{\text{hh}}}} (k) \triangleq \left\{ {{\mathbf{W}}_{{{\text{hh}}}} (1,\;k),\;{\mathbf{W}}_{{{\text{hh}}}} (2,\;k),\; \ldots ,\;{\mathbf{W}}_{{{\text{hh}}}} (N_{{{\text{set}}}} ,\;k)} \right\} \hfill \\ {\mathcal{W}}_{{{\text{hy}}}} (k) \triangleq \left\{ {{\mathbf{W}}_{{{\text{hy}}}} (1,\;k),\;{\mathbf{W}}_{{{\text{hy}}}} (2,\;k),\; \ldots ,\;{\mathbf{W}}_{{{\text{hy}}}} (N_{{{\text{set}}}} ,\;k)} \right\} \hfill \\ \end{gathered} \right.. $$
(6)

Three pigeon groups perform a cooperative flight to decrease the error function of the output layer \(e_{{\text{y}}} (p,k) \in {\mathbb{R}}\):

$$ e_{{\text{y}}} (p,k) = \tilde{y}(k) - \hat{y}(p,k), $$
(7)

with the theoretical output of the output layer \(\tilde{y}(k) = \tilde{h}(k)\).

To satisfy \(\lim_{{k \to k_{\max } }} e_{{\text{y}}} (p,k) = 0\) with the order of the last iteration \(k_{\max }\), the center positions \({\mathbf{W}}_{{{\text{uh}}}}^{{\text{c}}} (k)\), \({\mathbf{W}}_{{{\text{hh}}}}^{{\text{c}}} (k)\), and \({\mathbf{W}}_{{{\text{hy}}}}^{{\text{c}}} (k)\) are defined as the guidance. All individuals in the pigeon groups fly toward its center, as shown in Fig. 2 with \(N_{{{\text{set}}}} = 6\) as an example.

Fig. 2
figure 2

The movement of pigeon group \({\mathcal{W}}_{{{\text{uh}}}}\)

The central positions \({\mathbf{W}}_{{{\text{uh}}}}^{{\text{c}}} (k)\), \({\mathbf{W}}_{{{\text{hh}}}}^{{\text{c}}} (k)\), and \({\mathbf{W}}_{{{\text{hy}}}}^{{\text{c}}} (k)\) are calculated based on the quality of the \(p^{{{\text{th}}}}\) pigeon individual \({\text{fitness}}_{{{\text{N1}}}} (p,k)\), \({\text{fitness}}_{{{\text{N2}}}} (p,k)\), and \({\text{fitness}}_{{\text{y}}} (p,k)\):

$$ {\mathbf{W}}_{{{\text{hy}}}}^{{\text{c}}} (k) = \frac{{\sum\nolimits_{p = 0}^{{N_{{{\text{set}}}} }} {{\mathbf{W}}_{{{\text{hy}}}} (p,k){\text{fitness}}_{{\text{y}}} (p,k)} }}{{N_{{{\text{set}}}} \sum\nolimits_{p = 0}^{{N_{{{\text{set}}}} }} {{\text{fitness}}_{{\text{y}}} (p,k)} }}, $$
(8a)
$$ {\mathbf{W}}_{{{\text{hh}}}}^{{\text{c}}} (k) = \frac{{\sum\nolimits_{p = 0}^{{N_{{{\text{set}}}} }} {{\mathbf{W}}_{{{\text{hh}}}} (p,k){\text{fitness}}_{{{\text{N2}}}} (p,k)} }}{{N_{{{\text{set}}}} \sum\nolimits_{p = 0}^{{N_{{{\text{set}}}} }} {{\text{fitness}}_{{{\text{N2}}}} (p,k)} }} ,$$
(8b)
$$ {\mathbf{W}}_{{{\text{uh}}}}^{{\text{c}}} (k) = \frac{{\sum\nolimits_{p = 0}^{{N_{{{\text{set}}}} }} {{\mathbf{W}}_{{{\text{uh}}}} (p,k){\text{fitness}}_{{{\text{N1}}}} (p,k)} }}{{N_{{{\text{set}}}} \sum\nolimits_{p = 0}^{{N_{{{\text{set}}}} }} {{\text{fitness}}_{{{\text{N1}}}} (p,k)} }}, $$
(8c)

in which:

$$ \left\{ \begin{gathered} {\text{fitness}}_{{\text{y}}} (p,k) = \left\| {e_{{\text{y}}} (p,k)} \right\| \hfill \\ {\text{fitness}}_{{{\text{N2}}}} (p,k) = \left\| {{\mathbf{e}}_{{{\text{N2}}}} (p,k)} \right\| \hfill \\ {\text{fitness}}_{{{\text{N1}}}} (p,k) = \left\| {{\mathbf{e}}_{{{\text{N1}}}} (p,k)} \right\| \hfill \\ \end{gathered} \right., $$
(9)

where \({\mathbf{e}}_{{{\text{N1}}}} (p,k) \in {\mathbb{R}}^{16 \times 1}\) and \({\mathbf{e}}_{{{\text{N2}}}} (p,k) \in {\mathbb{R}}^{4 \times 1}\) are the virtual error function of the first and second hidden layers, respectively:

$$ \left\{ \begin{gathered} {\mathbf{e}}_{{{\text{N2}}}} (p,k) = {\tilde{\mathbf{z}}}_{2} (p,k) - {\mathbf{z}}_{2} (p,k) \hfill \\ {\mathbf{e}}_{{{\text{N1}}}} (p,k) = {\tilde{\mathbf{z}}}_{1} (p,k) - {\mathbf{z}}_{1} (p,k) \hfill \\ \end{gathered} \right.. $$
(10)

Definition 3

In the kth iteration, the virtual target output of the first hidden layer \({\tilde{\mathbf{z}}}_{1} (p,k) \in {\mathbb{R}}^{16 \times 1}\) and the virtual target output of the second hidden layer \({\tilde{\mathbf{z}}}_{2} (p,k) \in {\mathbb{R}}^{4 \times 1}\) of the pth sample pair are defined as:

$$ \left\{ \begin{gathered} {\tilde{\mathbf{z}}}_{2} (p,\;k) \triangleq \mathop {\arg }\limits_{{{\mathbf{z}}_{2}^{\# } \in {\mathbb{R}}^{4 \times 1} }} \left[ {{\mathbf{W}}_{{{\text{hy}}}} (p,\;k){\mathbf{z}}_{2}^{\# } : = \tilde{y}(p)} \right] \hfill \\ {\tilde{\mathbf{z}}}_{1} (p,\;k) \triangleq \mathop {\arg }\limits_{{{\mathbf{z}}_{1}^{\# } \in {\mathbb{R}}^{16 \times 1} }} \left\{ {\tanh \left[ {{\mathbf{W}}_{{{\text{hh}}}} (p,\;k){\mathbf{z}}_{1}^{\# } } \right]: = {\tilde{\mathbf{z}}}_{2} (p,\;k)} \right\} \hfill \\ \end{gathered} \right.. $$
(11)

Based on this definition, the virtual error functions \({\mathbf{e}}_{{{\text{N1}}}} (p,k)\) and \({\mathbf{e}}_{{{\text{N2}}}} (p,k)\) are derivate by:

$$ \begin{gathered} {\mathbf{e}}_{{{\text{N}}2}} (p,k) = \tilde{z}_{2} (p,k) - z_{2} (p,k) \hfill \\ \quad \quad \quad = {\mathbf{W}}_{{{\text{hy}}}}^{ + } (p,\;k)\left( {{\mathbf{W}}_{{{\text{hy}}}} (p,\;k)\tilde{z}_{2} (p,k) - {\mathbf{W}}_{{{\text{hy}}}} (p,\;k)z_{2} (p,k)} \right) \hfill \\ \quad \quad \quad = {\mathbf{W}}_{{{\text{hy}}}}^{ + } (p,\;k)\left( {\tilde{y}(p,k) - y(p,k)} \right) \hfill \\ \quad \quad \quad = {\mathbf{W}}_{{{\text{hy}}}}^{ + } (p,\;k)e_{{\text{y}}} (p,k), \hfill \\ \end{gathered} $$
(12)
$$ \begin{gathered} {\mathbf{e}}_{{{\text{N}}1}} (p,k) = \tilde{z}_{1} (p,k) - z_{1} (p,k) \hfill \\ \quad \quad \quad = {\mathbf{W}}_{{{\text{hh}}}}^{ + } (p,\;k)\left( {{\mathbf{W}}_{{{\text{hh}}}} (p,\;k)\tilde{z}_{1} (p,k) - {\mathbf{W}}_{{{\text{hh}}}} (p,\;k)z_{1} (p,k)} \right) \hfill \\ \quad \quad \quad = {\mathbf{W}}_{{{\text{hh}}}}^{ + } (p,\;k)\left( {\tilde{z}_{2} (p,k) - z_{2} (p,k)} \right) \hfill \\ \quad \quad \quad = {\mathbf{W}}_{{{\text{hh}}}}^{ + } (p,\;k){\mathbf{e}}_{{{\text{N}}2}} (p,k), \hfill \\ \end{gathered} $$
(13)

where, \({\mathbf{W}}_{{{\text{hy}}}}^{ + } (p,\;k)\) and \({\mathbf{W}}_{{{\text{hh}}}}^{ + }\) are the pseudoinverse matrices of \({\mathbf{W}}_{{{\text{hy}}}} (p,k)\) and \({\mathbf{W}}_{{{\text{hh}}}} (p,k)\), respectively.

\({\mathbf{W}}_{{{\text{hy}}}} (p,k)\) and \({\mathbf{W}}_{{{\text{hh}}}} (p,k)\) are not square matrices, the pseudoinverse solutions are used and Eqs. (12)–(13) are rewritten as:

$$ {\mathbf{e}}_{{{\text{N}}2}} (p,k) = {\mathbf{W}}_{{{\text{hy}}}}^{{\text{T}}} (p,k)\left( {{\mathbf{W}}_{{{\text{hy}}}} (p,k){\mathbf{W}}_{{{\text{hy}}}}^{{\text{T}}} (p,k)} \right)^{ - 1} e_{{\text{y}}} (p,k) ,$$
(14)
$$ {\mathbf{e}}_{{{\text{N1}}}} (p,k) = {\mathbf{W}}_{{{\text{hh}}}}^{{\text{T}}} (p,k)\left( {{\mathbf{W}}_{{{\text{hh}}}} (p,k){\mathbf{W}}_{{{\text{hh}}}}^{{\text{T}}} (p,k)} \right)^{ - 1} {\mathbf{e}}_{{{\text{N2}}}} (p,k). $$
(15)

3 Routing Prediction Strategy Based on PIONN

The routing prediction strategy is designed to predict the hop count index function \(H(a_{j}^{i} ,\;a_{t} )\), which evaluates the performance of the neighbor node \(a_{j}^{i}\) as the next hop to the target node \(a_{t}\). The prediction results are used as a reference when choosing the route. The proposed strategy provides only the predicted hop count index function, and it can directly select the next hop or can be combined with most routing methods. In this case, the routing choice can be more intelligent but still stable.

The structure of the proposed routing prediction strategy is shown in Fig. 3. Before the mission, to find the routing prediction model, offline training based on PIONN is conducted with the database, which is created by the historical state of the network and the UAV swarm and its corresponding hop count index function. Since the routing path is known in the previous missions, the hop count index function is the hop count from the neighbor node to the target node.

Fig. 3
figure 3

The structure of the routing prediction based on PIONN

During the mission, the hop count index function of each neighbor node is predicted based on the routing prediction model and the state of the network and UAV swarm in real time. The real-time network state includes the real-time delay \(T_{{{\text{de}}}} (i,j)\) and maximum bandwidth \(B(i,j)\) from the originating node \(a_{i}\) to its neighbor node \(a_{j}^{i}\), while the real-time state of the UAV swarm includes the real-time forward distance \(r_{{{\text{for}}}} (i,\;j,\;t)\) and forward speed \(v_{{{\text{for}}}} (i,\;j,\;t)\) from the originating node \(a_{i}\) to the target node \(a_{t}\) with node \(a_{j}^{i}\) as the next hop.

3.1 Offline Training

The offline training procedures based on PIONN are shown in Fig. 4:

Fig. 4
figure 4

Offline training

Step 1: Importing database

The input and output of the PIONN are obtained based on the database:

$$ \left\{ \begin{gathered} {\mathbf{u}}(p) = \left[ {\begin{array}{*{20}c} {r_{{{\text{for}}}} (p)} & {v_{{{\text{for}}}} (p)} & {T_{{{\text{de}}}} (p)} & {B(p)} \\ \end{array} } \right]^{{\text{T}}} \hfill \\ \tilde{y}(p) = H(p) \hfill \\ \end{gathered} \right., $$
(16)

where \(p \le N_{{{\text{set}}}} \;(P \in {\mathbb{N}}^{ + } )\) is the order of the sample pair.

Step 2: Initialization of the learning process

The initial order of the learning process is set as \(k = 1\), and all initial weight matrices \({\mathbf{W}}_{{{\text{uh}}}} (p,0)\), \({\mathbf{W}}_{{{\text{hh}}}} (p,0)\) and \({\mathbf{W}}_{{{\text{hy}}}} (p,0)\) are chosen.

Step 3: Calculation of the output and error functions of each layer

The output of the first and second hidden layers and the output of the output layer are obtained by:

$$ \left\{ \begin{gathered} {\mathbf{z}}_{1} (p,k) = \tanh \left[ {{\mathbf{W}}_{{{\text{uh}}}} (p,k){\mathbf{u}}(p,k)} \right] \hfill \\ {\mathbf{z}}_{2} (p,k) = \tanh \left[ {{\mathbf{W}}_{{{\text{hh}}}} (p,k){\mathbf{z}}_{1} (p,k)} \right] \hfill \\ \hat{y}(p,k) = {\mathbf{W}}_{{{\text{hy}}}} (p,k){\mathbf{z}}_{2} (p,k) \hfill \\ \end{gathered} \right.. $$
(17)

The error function of the output layer \(e_{{\text{y}}} (p,k)\) is obtained based on Eq. (7), while the virtual error functions of the first and second hidden layers are obtained based on Eqs. (14), (15).

Step 4: Training the pigeon groups

The quality of the \(p^{{{\text{th}}}}\) pigeon individual \({\text{fitness}}_{{{\text{N1}}}} (p,k)\), \({\text{fitness}}_{{{\text{N2}}}} (p,k)\), and \({\text{fitness}}_{{\text{y}}} (p,k)\) are calculated based on Eq. (9). The central positions \({\mathbf{W}}_{{{\text{uh}}}}^{{\text{c}}} (k)\), \({\mathbf{W}}_{{{\text{hh}}}}^{{\text{c}}} (k)\), and \({\mathbf{W}}_{{{\text{hy}}}}^{{\text{c}}} (k)\) are obtained based on Eq. (8). Then, each pigeon individual flies toward its central position:

$$ \left\{ \begin{gathered} {\mathbf{W}}_{{{\text{hy}}}} (p,k + 1) = {\mathbf{W}}_{{{\text{hy}}}} (p,k) + \eta_{{{\text{hy}}}} \left[ {{\mathbf{W}}_{{{\text{hy}}}}^{{\text{c}}} (k) - {\mathbf{W}}_{{{\text{hy}}}} (p,k)} \right] \hfill \\ {\mathbf{W}}_{{{\text{hh}}}} (p,k + 1) = {\mathbf{W}}_{{{\text{hh}}}} (p,k) + \eta_{{{\text{hh}}}} \left[ {{\mathbf{W}}_{{{\text{hh}}}}^{{\text{c}}} (k) - {\mathbf{W}}_{{{\text{hh}}}} (p,k)} \right] \hfill \\ {\mathbf{W}}_{{{\text{uh}}}} (p,k + 1) = {\mathbf{W}}_{{{\text{uh}}}} (p,k) + \eta_{{{\text{uh}}}} \left[ {{\mathbf{W}}_{{{\text{uh}}}}^{{\text{c}}} (k) - {\mathbf{W}}_{{{\text{uh}}}} (p,k)} \right] \hfill \\ \end{gathered} \right., $$
(18)

where \(\eta_{{{\text{hy}}}}\), \(\eta_{{{\text{hh}}}}\), and \(\eta_{{{\text{uh}}}}\) are the learning steps.

Step 5: Determination

If \(\lim_{{k \to k_{\max } }} e_{{\text{y}}} (p,k) \le \varepsilon\) or \(k = k_{\max }\) is satisfied with a small positive value \(\varepsilon\), the learning process ends. The optimal weight matrices are:

$$ \left\{ \begin{gathered} {\mathbf{W}}_{{{\text{hy}}}}^{{{\text{opt}}}} = {\mathbf{W}}_{{{\text{hy}}}}^{{\text{c}}} (k) \hfill \\ {\mathbf{W}}_{{{\text{hh}}}}^{{{\text{opt}}}} = {\mathbf{W}}_{{{\text{hh}}}}^{{\text{c}}} (k) \hfill \\ {\mathbf{W}}_{{{\text{uh}}}}^{{{\text{opt}}}} = {\mathbf{W}}_{{{\text{uh}}}}^{{\text{c}}} (k) \hfill \\ \end{gathered} \right., $$
(19)

or let \(k = k + 1\) and go to Step 3.

3.2 Routing Prediction

The hop count index function of \(j^{{{\text{th}}}}\) the neighbor node is predicted by:

$$ H(a_{j}^{i} ,\;a_{t} ) = \hat{h}(a_{j}^{i} ,\;a_{t} ) = {\mathbf{W}}_{{{\text{hy}}}}^{{{\text{opt}}}} \left\{ {\tanh \left[ {{\mathbf{W}}_{{{\text{hh}}}}^{{{\text{opt}}}} \left( {\tanh \left[ {{\mathbf{W}}_{{{\text{uh}}}}^{{{\text{opt}}}} {\mathbf{u}}(i,\;j,\;t)} \right]} \right)} \right]} \right\}, $$
(20)

and the matrix of hop count index function is:

$$ {{\varvec{\Gamma}}}\left( {a_{i} } \right) = \left[ {\begin{array}{*{20}c} {H(a_{1}^{i} ,\;a_{1} )} & {H(a_{2}^{i} ,\;a_{1} )} & \cdots & {H(a_{{N_{{{\text{NEI}}}} }}^{i} ,\;a_{1} )} \\ {H(a_{1}^{i} ,\;a_{2} )} & {H(a_{2}^{i} ,\;a_{2} )} & \cdots & {H(a_{{N_{{{\text{NEI}}}} }}^{i} ,\;a_{2} )} \\ \vdots & \vdots & \ddots & \vdots \\ {H(a_{1}^{i} ,\;a_{{N_{{{\text{TAR}}}} }} )} & {H(a_{2}^{i} ,\;a_{{N_{{{\text{TAR}}}} }} )} & \cdots & {H(a_{{N_{{{\text{NEI}}}} }}^{i} ,\;a_{{N_{{{\text{TAR}}}} }} )} \\ \end{array} } \right], $$
(21)

where \(N_{{{\text{NEI}}}}\) and \(N_{{{\text{TAR}}}}\) are the number of the neighbor nodes and target nodes.

4 Performance Evaluation

To evaluate the performance of the proposed routing prediction strategy, four routing strategies are implemented in simulation scenarios, including greedy perimeter stateless routing (GPSR) [26], back-propagation NN-based routing prediction (BPNNRP) strategy [27], PIONNRP, and the combination of PIONNRP and GPSR (PIONNRP + GPSR), as shown in Table 1:

Table 1 Routing strategies in simulations

If PIONNRP is implemented without other routing methods, the next hop is selected by the following equation:

$$ a_{{{\text{next}}}} = \mathop {\arg }\limits_{{a_{i}^{j} }} \min \left\{ {{\mathbf{W}}_{{{\text{hy}}}}^{{{\text{opt}}}} \left\{ {\tanh \left[ {{\mathbf{W}}_{{{\text{hh}}}}^{{{\text{opt}}}} \left( {\tanh \left[ {{\mathbf{W}}_{{{\text{uh}}}}^{{{\text{opt}}}} {\mathbf{u}}(i,\;j,\;t)} \right]} \right)} \right]} \right\}} \right\}. $$
(22)

Since PIONNRP is the routing prediction strategy, it can be combined with other routing methods to maintain a balance between stability and less hop count. Therefore, PIONNRP + GPSR is used to determine the next hop. PIONN + GPSR is conducted by a simple strategy:

$$ a_{{{\text{next}}}} = \left\{ \begin{gathered} \mathop {\arg }\limits_{{a_{i}^{j} }} \min \left\{ {{\mathbf{W}}_{{{\text{hy}}}}^{{{\text{opt}}}} \left\{ {\tanh \left[ {{\mathbf{W}}_{{{\text{hh}}}}^{{{\text{opt}}}} \left( {\tanh \left[ {{\mathbf{W}}_{{{\text{uh}}}}^{{{\text{opt}}}} {\mathbf{u}}(i,\;j,\;t)} \right]} \right)} \right]} \right\}} \right\},\;{\text{rand}} < \zeta \hfill \\ {\text{GSRP}}(i,\;j,\;t),\quad \quad \quad \quad \quad \;\;{\text{else}} \hfill \\ \end{gathered} \right.. $$
(23)

The aforementioned equation means that if \({\text{rand}} < \zeta\), the next hop is selected by PIONNRP, or it is selected based on GPSR, where \({\text{rand}} \in \left[ {0,\;1} \right]\) is a random function and \(\zeta \in \left( {0,\;1} \right)\) is a constant.

The parameters of the UAV swarm in the simulation are shown in Table 2, and its position and velocity are shown in Fig. 5. It was flying toward the destination and avoided the forbidden zone. The forbidden zone is shown as a sphere in Fig. 5a.

Table 2 Parameters of UAV swarms
Fig. 5
figure 5

Movement of UAV swarm

There are three cases in the simulation, and the half-power beam width (HPBW) among them is different. HPBW is the angular separation in which the magnitude of the radiation pattern decreases by − 3 dB from the peak of the main beam. Therefore, there are more neighbors with larger HPBW within limits. The HPBW of case 1 is larger than that of case 2. Similarly, the HPBW of case 2 is larger than that of case 3. Hence, the average number of neighbors of these cases are different. In each case, the number of sample pairs in the database is \(N_{{{\text{set}}}} = 73500\), and the number of test simulations is 1,225,000. Each case includes four scenarios, and in each scenario, PIONNRP, PIONNRP + GPSR, GPSR, and BPNNRP are implemented. Their parameters are shown in Table 3.

Table 3 Parameters of scenarios

The training times of BPNNRP and PIONNRP were 7311.79 s and 2189.06 s, respectively. Obviously, the training time of PIONNRP was much less than the time of BPNNRP, and the computational complexity of PIONNRP was reduced. For detailed comparison, different time-to-live in networking (TTL) is used in simulations for detailed comparison. TTL is the time limit imposed on the data packet to avoid the problem of circulating forever in the network. It can be regarded as the maximum hop in which the data packet is valid in the network.

The performance evaluations of Case 1 are shown in Fig. 6. In this case, since the average number of neighbors was considerable, as shown in Fig. 6a, it was not difficult to find an appropriate routing path. Therefore, the delivery failure ratios of all routing strategies were less than 10% with \({\text{TTL}} = \left\{ {10,\;15,\;20,\;25} \right\}\). PIONNRP and PIONNRP + GPSR performed much better than GPSR and PIONNRP. Compared with those of the traditional routing protocol, the average hop counts of PIONNRP were about approximately 20.5%, 39.7%, 44.6%, 46.9%, and 47.8% less than those of GPSR with varying TTLs because GPSR considered only the distance among nodes and ignored the historical experience. Compared with the ML-based routing strategy, although both PIONNRP and BPNNRP could find a short path, the delivery failure ratios of PIONNRP are 43.8%, 70.0%, 77.5%, 84.1%, and 91.1% less than those of BPNNRP with varying TTL, respectively. In particular, PIONNRP + GPSR performed slightly better than PIONNRP because PIONNRP + GPSR was more stable than PIONNRP with a lower delivery failure ratio.

Fig. 6
figure 6

Performance evaluation with varying TTL of case 1

The simulation results of case 2 are shown in Fig. 7. The average number of neighbors is less than that of Case 1 due to the smaller HPBW. Similarly, PIONNRP and PIONNRP + GPSR performed better than GPSR and BPNNRP because the delivery failure ratios, average hop counts, and variance of hop count of GPSR and BPNNRP are larger than those of PIONNRP and PIONNRP + GPSR. Contrasting PIONNRP and PIONNRP + GPSR, PIONNRP paid more attention to shorter routing paths, while PIONNRP + GPSR focused on keeping a balance between stability and short paths. In this case, the average hop counts of PIONNRP were 9.3%, 20.1%, 18.6%, 15.0%, and 12.9% less than those of PIONNRP + GPSR with varying TTL, while its delivery failure ratios were 56.8%, 79.3%, and 99.2% larger than those of PIONNRP + GPSR with \({\text{TTL}} = \left\{ {5,\;10,\;15} \right\}\), respectively. With \({\text{TTL}} = \left\{ {20,\;25} \right\}\), the delivery failure ratios of PIONNRP + GPSR were 0%.

Fig. 7
figure 7

Performance evaluation with varying TTL of case 2

The simulation results of case 3 are shown in Fig. 8. The average number of neighbors decreases due to the smaller HPBW. Compared with that of cases 1–2, the average number of neighbors in this case decreases and the average delivery failure ratio increases. Similarly, PIONNRP and PIONNRP + GPSR performed better than GPSR and BPNNRP. As shown in Fig. 8c, the delivery failure ratios of GPSR and BPNNRP were unbearable. Comparing PIONNRP and PIONNRP + GPSR, PIONNRP + GPSR still maintained a balance between delivery success ratio and hop count, although the neighbors were not dense.

Fig. 8
figure 8

Performance evaluation with varying TTL of case 3

The comparisons among the four routing strategies are shown in Table 4, and the conclusions are given as follows:

Table 4 Comparisons among routing strategies
  1. (a)

    The traditional routing protocol, GPSR, failed to find an appropriate routing path for UAV swarm network.

  2. (b)

    Although BPNNRP could find a shorter path with lower end-to-end delay, compared with GPSR, its delivery failure ratio was still unacceptable.

  3. (c)

    Compared with GPSR and BPNNRP, PIONNRP could find a shorter path and decrease the delivery failure ratio for the UAV swarm network.

  4. (d)

    PIONNRP + GPSR performed better than PIONNRP since it could find an appropriate routing path with low end-to-end delay and delivery failure ratio.

Since the combination of PIONNRP and GPSR is better in most cases, it is important to optimally choose \(\zeta\). A larger \(\zeta\) is suggested in the case with higher-dynamic topology because PIONNRP can solve problems with uncertainties.

5 Conclusions

In this paper, a routing prediction strategy is designed based on the combination of PIO and NN for UAV swarm networks. The proposed strategy predicts the performance of neighbors for routing paths through a PIO-based NN framework without the topology as prior information. Therefore, it can be applied to networks with highly dynamic topology. PIONNRP can be implemented to select the next hop according to the prediction results or be combined with other routing methods. Simulation results have demonstrated the efficiency of the proposed strategy. This routing prediction strategy provides a method to apply ML to routing with a balance between intelligence and stability.