Hybrid swarm intelligent algorithm for multi-UAV formation reconfiguration

Formation flight of unmanned aerial vehicles (UAVs) utilizes reconfiguration procedures to handle a variety of emergencies, such as collision avoidance, malfunctions, fuel savings, and member replacement. As UAVs have limited computing power and energy resources, it is necessary to optimize the control inputs to reduce the distance travelled by UAVs while reducing the computing costs during formation reconfiguration. In this paper, the problem of multi-UAV reconfiguration is decoupled into two stages: task assignment and control input optimization of UAVs. For a solution to the above problem, we propose an adaptive hybrid particle swarm optimization and differential evolution algorithm (AHPSODE) to optimize minimize the distance of the total movement and reduce the computing cost of formation reconfiguration. Based on the idea of receding horizon control (RHC) and the nonlinear model of multi-UAV formation reconfiguration, an RHC controller using AHPSODE is designed to optimize the control input of the UAV group to obtain the shortest movement distance, and this method can reduce the computation time. We use the CEC 2017 test suit to test the performance of our proposed AHPSODE algorithm, and simulate the AHPSODE-based RHC controller to manage formation reconfiguration. The results show that our proposed AHPSODE performed well in convergence and accuracy and the RHC controller is effective.


Introduction
Multi-UAV formation flight [1] refers to the formation arrangement with precisely defined geometries and task assignment to meet the requirements of different tasks. Multi-UAV formation flights enable the onboard equipment to be distributed to various UAVs, and a complex task can be broken down into several relatively simple tasks performed by different individuals. These features can make multi- UAV formation flights more stable, reliable, and redundant than single UAV flights. Consequently, multi-UAV formation flight has been widely used in various areas, including target strike, communications relay, electronic countermeasures, and battlefield assessment.
An mutil-UAV formation flight consists of a full process that includes formation creation, maintenance, and reconfiguration, as well as the planning, organization, and execution of tasks [2]. In the absence of reconfiguration capabilities, multi-UAV formation flights would suffer from several issues, including collisions, inability to withstand external attacks, restricted scaling, and instabilities. Therefore, formation reconfiguration is of extreme significance, since the formation shape must be continuously reconfigured to cope with changes in group composition, environments, and tasks. This can enhance the robustness, reliability, and efficiency of a formation flight. However, UAVs are a kind of flight equipment with limited battery life and computing power. Therefore, the main challenge for multi-UAV formation reconfiguration is to design a controller that can optimize the control inputs to achieve the shortest total movement distance with fast computing speed and low computational overhead. Research to reduce energy consumption and computing consumption in multi-UAV formation reconfiguration can be divided into three categories: path planning for reconfiguration, improvements to UAV controllers, and optimization of control inputs. In the path planning method, different path planning algorithms are improved and optimized to improve the performance of formation reconfiguration [3], such as A* [4], Dijkstra [5], Voronoi graph [6], artificial potential [7], and Dubin trajectory [8], etc. Despite their abilities to solve formation reconfiguration problems, these path planning methods cannot produce a moving trajectory with the shortest total distance, and are usually applied in a two-dimensional environment. The improvement of the control algorithm can improve the controller's performance and thus improve the efficiency of formation reconfiguration. The learning-based control algorithm has become a focus of research in this area, since it requires no complex mathematical model and is highly adaptable [9,10]. However, learning-based control algorithms always involve considerable uncertainty and mathematical complexity. The requirement of complexity and abundant computational resources are limitations to the use of learning-based control algorithms.
In comparison with improving the control algorithm, the method of optimizing the control input is more direct, effective, and scalable. In this method, the formation reconfiguration problem of UAV is transformed into a time-most control problem [11], in which swarm intelligent optimization methods can be an excellent choice to solve these optimization problems [12]. Wei et al. [13] propose a hybrid GA algorithm to solve the multi-UAV formation reconfiguration problem in 2D space. In the following work, Duan et al. [14] and Zhang [15] adapt CPTD (Control Parameters Time Discrete) method to construct an objective function and optimize the problem with different bionic algorithms. However, the CPTD method cannot solve the problem with too many control parameters for substantial computational overhead. Literature [16,17] adapt the rolling time-domain (RHC) method to decrease the computational overhead and improve the real-time performance of multi-UAV reconfiguration. Using this method, the global control problem can be divided into several local optimization problems with the same objective function. Consequently, by reducing the number of control parameters, the computational complexity and computational time can be substantially reduced. However, these two methods do not consider the problem of target assignment of the formation final position.
As a solution to the problem of reducing computation overhead and determining a trajectory for formation reconfiguration with the shortest total moving distance, this paper splits the reconfiguration process into two parts: task assignment for the final positions and optimization of control inputs. We propose an adaptive hybrid particle swarm optimization and differential evolution algorithm (AHPSODE), which is designed to solve the target assignment problem in discrete space and create an RHC controller based on AHPSODE to optimize the control inputs for formation reconfiguration. Finally, to verify the validity of the proposed AHPSODE algorithm, we compared the performances of the AHPSODE with the standard PSO, standard DE, HPP-SOGA [14], PIO [15], and IDE [17] using the CEC 2017 test suit [18] and the experimental results show that the proposed AHPSODE have good performances. Furthermore, we simulate the AHPSODE-based RHC controller to conduct the multi-UAV reconfiguration. The simulation results show that the convergence time and search precision of the AHPSODE-based RHC are improved when achieving multi-UAV formation reconfiguration. Table 1 shows a comparison of some algorithms for solving multi-UAV formation reconfiguration.
The primary contributions of this paper are as follows: 1. We propose a new adaptive hybrid intelligent algorithm AHPSODE, which has a faster convergence speed and better ability to jump out local optimal than standard PSO and DE. 2. We first consider the task assignment problem of the final position of each UAV in a multi-UAV reconfiguration and solve this problem by a discrete AHPSODE. This method can reduce the total movement distance of the UAV formation. 3. We design an AHPSODE-based RHC controller to optimize multi-UAV formation reconfiguration control inputs' sequence, which can significantly reduce computation consumption, have higher search precision, and can obtain a better control input sequence.
4. We conduct a series of numerical simulation experiments to test the performance of the AHPSODE and to simulate the effectiveness of the AHPSODE-based RCH controller. The experiment results show that the algorithm and controller are effective.
The rest sections of this paper are organized as follows. In the section "Related works", we make a summary of the related work. In the section "Application scenario and algorithm principle", the application scenarios and some principles are introduced. In the section "Approach overview", the approach is overviewed. Section "Mathematical model presents the dynamics model of the quadrotor and the mathematical description of the target assignment. Section "AHPSODE algorithm" introduces the AHPSODE algorithm. In the section "Design of formation reconfiguration controller, the implementation of the AHPSODE-based RHC controller is explained in detail. Section "Experimental verification" shows the experiment. Section "Conclusion" makes a summary of this paper.

Related works
The reconfiguration of multi-UAV formations is essentially a problem of formation control. In this section, we review research on formation control of multi-UAV reconfiguration from three perspectives: control structure, control mechanism, and control algorithm.

The control structure of UAV formation
The control approaches for formation reconfiguration can be classified into centralized and decentralized approaches based on their control structure. Each UAV is controlled by a single controller in the former method. The latter method involves each UAV acquiring information about its local environment from its neighbors to control itself.
Brandao [19] designed a multi-layer control scheme based on centralized formation control to guide a UAV group in desired trajectory tracking missions. After that, in [20], a centralized management structure of UAVs in the form of triangles was proposed to solve the intragroup collision problem for the UAV group. Obviously, centralized control is more efficient when there are fewer UAVs in the group. In formations with a large number of members, however, the decentralized formation control strategy offers the advantages of reducing the computing burden, improving computation efficiency, and increasing the real-time performance of the formation control. In [21], an adaptive full-order sliding mode control framework was proposed for robust multi-UAV synchronized formation motion under uncertainty. A robust 3D UAV formation flight can be achieved with this control framework, as demonstrated in experiments. In a decentralized UAV formation flight, [22] proposed two control techniques for rejecting time-varying disturbances, PID, and integral sliding modes. In [23], a decentralized behavior-based control algorithm for UAV formation was designed. In this method, only the information of the relative position between neighboring group members is used.

The control mechanism of UAV formation
Control approaches for the formation reconfiguration can be categorized based on control mechanisms into leaderfollower methods, virtual structures, and behavioral behavioralbased approach. Leader-follower approaches have been widely used in the control and management of UAV formations, and this approach has been shown to reduce the energy consumption of reconfiguring formations. Sajad Mousavi [24] designed a formation control algorithm based on the leader follower method to solve the multi-objective optimization problem of coalition formation in large-scale UAV networks. This method is inspired by quantum evolutionary algorithms and has better performance than NSGA-II. In [25], the focus is on the control problem of fixed-wing UAV formations in three-dimensional space. They discussed control law convergence using a Lyapunov stability tool, proposed an adaptive disturbance observer, and used an artificial potential method to solve the collision problem. Despite its versatility, the leader-follower method has very stringent requirements for security and safety on the leader UAV. Additionally, if the leader UAV suffers serious damage, the whole formation will be destroyed. Consequently, the visual structure method is proposed. Cezary Kownacki [26] combined the navigator algorithm and virtual structure method to improve the reliability and throughput of information sharing between UAVs in formation flight. A command generation method for UAV formations was proposed in [27] to solve the problem of wind disturbance and obstacle avoidance. Using this method, position control can be improved in the presence of wind and collisions can be avoided. Behavioral control defines several basic behaviors without the involvement of human operators, including following to avoid obstacles and composing formations, etc. Fan et al. [28] proposed an adaptive null-space-based behavioral (NSB) method to deal with the inadaptability of the NSB method in a formation control situation. The method can effectively improve formation performance. Based on the idea of method behavior-based formation control, Ferik [29] designed a behavioral adaptive fuzzy-based cluster space intelligent controller, which can well solve the formation maintenance problem. Although behavior-based approaches are very flexible in dealing with obstacle avoidance problems, they can interfere destructively in a formation group, and they can be difficult to analyze mathematically.

The control algorithms of UAV formation
From the perspective of the control algorithms, the existing UAV flight controllers can be classified into linear flight control algorithms, model-based nonlinear flight control algorithms, and learning-based control algorithms. To solve the UAV formation rearrangement problem, Mahfouz [30] designed a PID controller combined with the concept of backstepping. Vlahakis [31] improved the distributed LQR control methods to solve heterogeneous UAV formation stability problems. However, these linear control approaches easily suffer from performance degradation when the UAV departs from the nominal conditions. Some nonlinear flight control algorithms have been developed to overcome the limitations of linear algorithms based on the nonlinear model of quadrotor dynamics. Zhang [32] presented a backstepping approach that could be used to accomplish the reconfiguration of Multi-UAV formations and formation keeping. To improve the computational efficiency of the traditional MPC method, a Virtual Target Guidance (VTG)-based distributed MPC scheme for formation control of multi-UAV was proposed by CAI Zhihao [33], and the numerical simulation confirms the expectations of this proposed scheme. Learning-based control algorithms are usually independent of the UAV system model. However, these algorithms need a large amount of data to train the UAV flight control system. Examples include fuzzy logic (FL), neural networks (ANN), and reinforcement learning (RL). Quesada [9] proposed an algorithm based on a fuzzy logic approach capable of guiding a robot swarm to keep a leader-follower formation. A distributed optimal control method was also proposed using the idea of reinforcement learning to address the heterogeneous multi-UAV formation trajectory tracking problem. In [10]. This method can predict the optimal control input online without any knowledge of the dynamics of other follower UAVs. Although learning-based control algorithms do not depend on the mathematical model, these methods always involve considerable uncertainty and mathematical complexity. The requirement of complexity and abundant computational resources are limitations to the use of learningbased control algorithms.

Application scenario
The multi-UAV formation reconfiguration problem can be described as follows: during a formation flight mission, the geometry shape of the UAV formation needs to change according to its environmental and mission requirements to avoid attacks or obstacles and improve the mission success rate. Meanwhile, the process of formation reconfig-uration usually needs to consider some constraints, such as the computational overhead, energy consumption, obstacle avoidance, and air disturbance. Figure 1 shows a simplified application scenario for multi-UAV formation flight and reconfiguration. In this example, both three UAVs are in landed mode at departure points A and B, respectively. Then, the procedure of multi-UAV formation reconfiguration can then be described as follows: S1: The ground station receives the mission instructions from the operators and sends these instructions to corresponding UAVs landed in departure points A and B.
S2: After receiving the mission instructions, the UAVs will fly to the target area and complete the formation assembling.
S3: The assembled UAVs will form a formation first. Then, this UAV formation will take the corresponding flight strategy flying to the mission area to execute its task (the leader-follower method is adopted in Fig. 1). S4: The UAV will perform tasks, such as traffic control, data collection, and communication assistance after entering the task area.
S5: After mission completion, all UAVs will assemble according to their mission requirements during the return process and maintain this flight formation.
Due to the possibility of UAV malfunction or damage during the execution of the mission, and changes in the environment, the UAVs' flight formations need to be changed during each of the five flight sessions. Some cases that need formation reconfiguration, such as obstacle avoidance, adjustment of formation members, and UAV malfunction, are shown at the bottom of Fig. 1.
In this paper, the AHPSODE-based RHC method is used to optimize UAVs' control input, so the reconfiguration of multi-UAV formations can take the shortest time possible. Furthermore, all constraints can be met, such as preventing the UAVs from collisions and maintaining their communication distance.

Principle of PSO algorithm
PSO algorithm was first proposed by Kennedy and Eberhart in 1995 from the perspective of social and cognitive behavior [34] and has been widely applied in the engineering field. In the standard PSO algorithm, each particle will fly in the solution space and calculate the value of the current position's objective function to determine the next move according to the optimal position of the individual particle history and the whole group's optimal position. Ultimately, the swarm, as a whole, the same as a flock of birds working together to find food, is likely to move in the direction that can obtain the best value of the target function. In the continuous space coordinate system, the mathematical description of the PSO algorithm is as follows: A group composed of m particles moves in a D-dimensional search space, and each particle has a certain speed. When the group is searching for the best position, each particle will consider the historical best position that it has searched for and the historical best position of the other particles in the group. Then, each particle will update its position and speed.
For each particle x id (t), its velocity and position update formula in d dimension ( where p id (t) is individual optimal position, p gd (t) is global optimal position, and the acceleration constant c1 and c2 are two non-negative values. These constants can help each particle self-summarize and learn from excellent individuals to approach its own historical best position and the group's global best position. c1 and c2 are usually equal to 2, and r 1 r 2 are two random number ranging from [0,1].

Principle of DE algorithm
Differential evolutionary algorithm (DE) is an evolutionary algorithm based on population differences proposed by Storn and Price [35] to solve Chebyshev polynomials. The basic idea of DE is to use the differences between the individuals of the current population to recombine the population to obtain the intermediate population and then use the competition between the parent and the offspring to obtain the next-generation population.
In the process of mutation, three individuals are chosen randomly from the population, symbolized as x r 1 , x r 2 , x r 3 , and these three individuals must be different from those selected from x i . We can then define the mutated individual v i as follows: where F is the magnification factor and usually valued from [0, 2]. Figure 2 is the mutation process diagram: Using mutated individuals v i and target individual x i , the reorganization process can be expressed as follows: where randb is a random number between [0,1], and C R is the crossover factor used to control the crossover probability. randr is a random integer between [1,D], which can be used to guarantee that at least one-dimension value of reconstituted individual u ji comes from the mutant individ-  Figure 3 shows the detail process of crossover. This recombination process can be described by Fig. 3. During selection, the offspring is determined by the fitness values of the initial individual x i and the mutant recombination individual u i . Equation (4) describes this selection principle

Receding horizon control (RHC)
Receding horizon control (RHC) is also called model predictive control (MPC). At each discrete moment of sampling, the system's current state is used as the initial condition to solve a finite-time-domain open-loop optimal control problem and obtain the optimal control sequence. For each time-domain, the cost function of local optimization is the same as global optimization. Figure 4 shows a simple generation process of control inputs' sequences through RHC. In Fig. 4, each time horizon will calculate p inputs, but only the first several m inputs can be applied to the actual system.

Approach overview
During multi-UAV formation reconfiguration, each UAV's target formation must be determined, which is equivalent to a task assignment problem. A successful assignment can reduce the overall moving distance and time cost of reconfiguring a formation. At the same time, the optimization of the controller's control inputs at every moment is also important for reconfiguring a multi-UAV formation. In this paper, the AHPSODE algorithm is proposed to optimize the target assignment problem. The control inputs of each UAV can also be optimized using an AHPSODE-based RHC controller. Figure 5 shows a complete process for multi-UAV formation reconfiguration. The process is mainly composed of three parts: reconfiguration determination, task assignment optimizing, and control input optimizing.
Reconfiguration determination In multi-UAV formation flight, formation reconfiguration is usually required when there occur changes in the external environment or mission instruction. These situations include obstacle avoidance, malfunction of formation members, adjustment of formation members, and reducing energy consumption. The UAVs can acquire external environment information through the onboard sensors to judge whether the current formation needs to reconfigure. Each UAV will transmit its current status parameters to the ground station through the MavLink protocol. Then, the ground station will determine the shape of the formation and analyze the UAV group's status information to select the leader UAV.
Optimizing task assignment After determining the UAV formation's initial state and the target position, the target assignment problem in the reconfiguration process will be solved by the discretized AHPSODE algorithm. In the target assignment's optimization process, the UAV group's matching relationship is optimized with the goal of the shortest total length. Then, the UAVs' relative coordinates, relative to the leader UAV, Xre f can be determined, where Xre f plays an essential role in optimizing the cost function.
Optimizing control input This paper uses receding horizon control (RHC) to divide the global control problem into several local optimization problems with the same objective function. The complexity of computation and the amount of computation can be reduced by reducing the number of optimized control inputs. Additionally, as the reconfiguration of multi-UAV formations is a complex nonlinear problem, the RHC control method uses existing optimization techniques to generate the optimal sequence of control inputs. Therefore, the AHPSODE algorithm proposed in this paper is adopted to quickly calculate the optimal control input sequence in each time-domain. After optimization, the control input sequence is sent to the UAV group for execution. Afterward, each UAV will provide feedback on its current status to the RHC controller. The formation reconfiguration will be terminated if the current UAV group's status information meets the termination condition. Otherwise, roll the time-domain and repeat the above optimization process until the termination condition is satisfied.

Mathematical model
In this paper, we focus on the optimization of control inputs for multi-UAV formation reconfiguration in 3D space. Therefore, in this section, we analyze the dynamic model of the quadrotor and determine the cost function. Finally, the target assignment problem is mathematically explained.

The dynamic model of UAV
The quadrotor in the experimental environment consists of four fixed-pitch rotors mounted at the four ends of a simple cross frame, as shown in Fig. 6.
where φ, θ, ψ are, respectively, roll angle, pitch angle, and yaw angle, which are the main control variables for attitude control. F1, F2, F3, F4 are the forces generated from these four motors. mg is the weight of the quadrotor, and ox yz is the body-fixed coordinate system. Transform matrix from bodyfixed coordinate (ox yz) to ground-fixed coordinate (O XY Z) is shown in Eq. (5) The lifting forces F1, F2, F3, and F4 in body-fixed coordinate (ox yz) are as follows: Then, we need to transform the lifting forces F B from the body-fixed coordinate to the ground-fixed coordinate F G by rotation matrix R According to the motion equations and Eq. (7), we can deduce the following equation: Then, we can deduce attitude change from Newton-Euler formula where I = diag(I φ , I θ , I ψ ) are, respectively, equivalent moment of inertia about the roll axis, pitch axis, and yaw axis. ω = (θ,φ,ψ) T is the torque, which consists of the torque M i generated by lifting forces. Then, according to Eq. (9), we can deduce Eq. (10) According to the mechanic principle, we can get Eq. (11) Here, we define the input variables of the UAV as follows: where U 1 , U 2 , U 3 , and U 4 are the control variables of vertical speed, roll angle, pitch angle, and yaw angle, respectively. i is the rotating speed of the ith rotor. K t is the coefficient of lifting forces; K d is the coefficient of torques. Considering Eqs. (7)-(12), the mathematical model of a quadrotor can be finally presented as Eq. (13) To simplify the problem, this paper only considers UAVs' the position in three-dimensional space, and does not consider the state change of attitude angle. Therefore, the control inputs of ith UAV can be defined as is the velocity of UAV ith under body coordinates), and the state variable can be expressed as Therefore, the dynamic equation of the ith UAV in a UAV formation can be described as follows: According to Eq. (14), we can know that if control input u i and initial state x 0 are provided, the state of ith UAV at any time t can be obtained as follows: This paper adopts the centralized control structure and the leader-follower method as our control mechanism (Fig. 7). The state vector of the leader UAV is expressed as x L . Then, the relative state of the ith UAV in the formation relative to the leader can be expressed as X re f ,i = x i − x L , and we set the initial time as t = 0 and the termination time as t = T during a formation reconfiguration process. The formation reconfiguration process in this paper can be regarded as a control optimization problem. The goal of this optimization problem is to find a set of continuous control inputs U = (u 1 , u 2 , . . . , u n ) that can achieve formation reconfiguration and make the cost function minimum. The quadratic form of the cost function is as follows: where is the vector of the relative states of the all UAV in the formation relative to the leader UAV at time t. x i (t|u i ) is the state of ith UAV at time t under the control inputs sequential u i (t). For each UAV, given initial state x i (t 0 ), the state x i (t) will be deduced by u i . The control input u i will be limited according to the performance of UAV and Q = diag(q 1 , q 2 , q 3 , q 4 ) is a positive definite matrix. d i j (t) , i, j = 1, . . . , N represents the distance between any two UAVs in the formation of UAVs, and its calculation formula is as follows: To prevent collisions within the formation during the formation of reconfiguration, it is required that d i j (t) must be greater than the safe distance D safe in the period of reconfiguration At the same time, to guarantee the communication between UAVs in formation, we need d i j (t) less than the communication distance D com between UAVs during the whole time-period of reconfiguration By combining the Eqs. (16)(17)(18)(19), a new cost function can be obtained as follows: where ω is the penalty coefficient of distance, which is usually a large constant.

Mathematical description of target assignment
Target assignment of multi-UAV formation reconfiguration problem is equivalent to a task assign problem (TAP). Stone first proposes TPA in 1997 [36], which assigns more than one processor in a distributed system and minimizes the execution cost of task allocation and communication cost under the constrained system resource condition. In this paper, we attempt to solve a target assignment problem similar to TAP, in which the main objective is to allocate the initial position of each UAV to a position in the target formation. This assignment needs to minimize the total moving distance during formation reconstruction without considering resource constraints.
The process of multi-UAVs formation reconfiguration can be divided into two stage. In the first stage, the geometry shape of target formation and its position are determined according to changes in the external environment or mission requirements. This stage also deals with the tasks assignment problem mentioned in this article The second stage is to control the movement of the UAV group to realize the change of the formation. This stage is related to the control parameter optimization problem mentioned in this paper, that is, the motion planning.
We decouple the formation reconfiguration problem into these two stages for two main reasons: 1. For multi-UAV formation reconfiguration, besides using the shortest path as the optimization goal for motion planning, it is also necessary to determine the moving targets of each UAV in the initial formation when determining the geometries of the target formation. Here is a straight- , and we expect that the initial line formation will transform into the target V formation. In this study, UAV heterogeneity is not taken into account, so both a-A b-B c-C and a-C b-A c-B can achieve the same target formation. Obviously, the total movement distance of the former assignment is smaller than that of the lat- ter. Therefore, when the number of UAVs increases, it is necessary to separate the task assignment problem from the entire formation reconfiguration. This will reduce the total movement distance of the overall formation reconstruction. 2. The process of optimizing task assignment requires some computing time and consumption, and considering the real-time requirements in the process of formation reconfiguration, we hope to reduce the time costing for optimizing control parameters as much as possible. As a result, the task assignment is decoupled from the overall reconstruction process and handled before the formation change, which can reduce the complexity of the problem.
This paper uses natural numbers encoding each group of assignment results in the target assignment problem to solve the multi-UAV formation reconstruction problem. To solve the multi-UAV formation reconstruction problem, this paper uses natural numbers to encode each group of assignment results. Suppose a UAV formation consists of N UAVs, and then, there will be n! targets assigned. Let x i be the ith target assignment result of the group, then x i can be represented as follows: where x i j represent the target position of jth UAV in the ith, which is determined by ith target assignment result.
Suppose there is a flight team made up of five UAVs, and this team needs to change its formation. In the initial formation, the serial number of UAVs is set to 1-5, and the UAV positions in the target formation are also set to 1-5. Then, a set of target assignment results x i can be expressed as 5]. Where x i j represents the assigned position in target formation of UAV j and 5 is the total number of UAVs in the flight formation. If x i = [1,3,5,4,2], then the result of the target assignment can be represented as Table 2: To minimize the total moving distance of the formation reconfiguration by target assignment, distance overhead matrix D is constructed as where D jm represents the distance between UAV j in initial formation and position m in target formation, j = 1, 2, . . . , n, m = 1, 2, . . . , n. Furthermore, according to the target allocation vector x i and the distance cost matrix D i , we can obtain the total distance Total D Therefore, the x i that can minimize the value of TotalD is the optimal target allocation result.

AHPSODE algorithm
The PSO algorithm is known for its low computational overhead and fast convergence speed, but it is easy to fall into the local optimum. Through the crossover process of the DE algorithm, it is possible to increase the amount of search space by encouraging information exchange between individuals. In addition, the mutation process can increase the diversity of the population and improve the algorithm's ability to escape from the local minimum. Therefore, the AHPSODE proposed in this paper combines the advantage of the PSO algorithm and the DE algorithm to find an optimal solution.
First, the PSO algorithm is used to update the position of the initial population. To reduce the complexity of the algorithm, this process runs 50% max iterations. Then, we set the population classification threshold as 30%, sort the particles' fitness values, and divide the population into superior and inferior group. The top 30% of excellent particles are superior groups, and the rest are inferior group. The individuals in inferior groups will be reinitialized to increase the diversity of the whole population. Then merge inferior groups and inferior group to generate a new population as the initial population for DE. In the early stage of this hybrid algorithm, the PSO algorithm can quickly get an approximate optimal solution for the advantage of fast convergence and the low computational cost. Meanwhile, the mutation operation of the DE algorithm can effectively prevent the algorithm from local optimization and improve the search precision.
Generally, w in the PSO has a great impact on the search ability of the algorithm. The w in the standard PSO algorithm is a constant value. As a result, the standard PSO cannot dynamically adjust its search capability during different periods of iteration and is easy to falling into local optimality. In the iteration process of the PSO algorithm, the distribution of the whole population gradually changes from a dispersive state to a centralized state. However, excessive concentration may result in the algorithm falling into local optimization. As a consequence, we need to enhance the global search abil-ities of the PSO when the distribution of the population is too concentrated. And we need to enhance the local search abilities when the distribution is dispersed. In other words, as population density increases, the inertial weight w needs to adjust dynamically. This article improves the inertia weight w of the standard particle swarm, which integrates the distance information between particles into w, so that w can be adjusted according to the distribution of the particle swarm adaptively.
First, define the average distance between the ith particle and other N − 1 particles as d i where N is the population size, and D is the dimension. Then, define the adaptation factor σ as follows: In Eq. (25), d g represents the average distance between of global optimal point. d max and d min represent the maximum and minimum average distances in the population, respectively. Then, according to the properties of adaptive factor σ , we construct an adaptive inertia weight W (δ) According to Eq. (26), the value of W (δ) is positively correlated with σ , which means that the larger σ is, the more massive W (δ) is, and vice versa. At the beginning of the iteration of PSO, the overall distribution of particles is scattered, which means that the σ is relatively large. Therefore, the PSO algorithm with the adaptive factor σ has better global search capabilities at the early stage of the iteration. As the algorithm converges, the position of the particle swarm is relatively concentrated, so that the σ will decrease, which can lead to the decrease of the W (δ) and increase the local search ability of the algorithm. Besides, the coefficients of W (δ) are used to adjust the transformation range, and the current coefficient can determine that the range of W (δ) is [0.4, 0.9].
In DE, the crossover probability CR has a great impact on the population's diversity. When the value of CR is large, there will be a big probability of mutation for the population, which can guarantee the population's diversity, and the algorithm is more likely to find the global optimal solution. When the value of CR is small, the population's information is easier to be inherited, which is conducive to the algorithm for stable search in the current solution space. Therefore, this for i = 1 to M do 3: x i j (t) = Minx i j + rand(0, 1) · Max x i j − Minx i j 4: v i j (t) = Minv i j + rand(0, 1) · Maxv i j − Minv i j 5: end for 6: P i = x i (Pbest) 7: end for 8: P g = min(P i ) (Gbest) 9: t ⇐ 1 10: while (| f (Gbest)| ≥ ε) or (t ≤ 0.5 * T ) do 11: for i = 1 to M do 12: (Adaptive factor of PSO and DE) 13: Calculate for j = 1 to D do 35: According to Eq. (27), then range of CR(δ) is [0.5, 0.9], and the improved DE algorithm will adaptively adjust the cross probability according to the concentration degree of population, which can improve the search accuracy of the standard DE algorithm. Algorithm (1) is the pesudo of AHPSODE.

RHC-based formation reconfiguration
RHC divides the global control problem into several local optimization problems, and these local optimization problems have the same optimization goal as the global optimization. In the kth sampling period, the state equation of the ith UAV in the UAV formation can be described as follows: The constraints for the u i (k) and x i (k) are as follows: where , v bi (k) T ∈ R 4 represents the corresponding control input, which will keep stable until the next sampling period, and T represents the sampling period's time interval. At time horizon k, the RHC controller can calculate each UAV's control input sequence in the current time horizon and the future p predicted time horizon through the UAV formation's current state and constraint conditions. These control input sequences can be expressed as u i (k|k), u i (k + 1|k), . . . , u i (k + p − 1|k). Then, the corresponding states in the next p time horizons k can be represented as Through Eq. (16) and Eqs. (28)-(30), we can obtain the quadratic cost by the following cost function at the kth time horizon: Finding a solution that minimizes the fitness function at a given time horizon k, we can obtain the optimal control input sequence u * i (k + j −1|k), j = 1, 2, . . . , p, where we apply the preceding m control input u * (k|k), u * (k + 1|k), . . . , u * (k + m − 1|k)) , (0 ≤ m ≤ p) to the corresponding UAVs in the current and following m − 1 time horizons, respectively. The final state of multi-UAV formation can then be approximated using the RHC method. The process of an RHC method can be described in Fig. 8.
Although the RHC method can divide a global control problem into a series of local optimization problems to reduce the computational complexity and the time overhead, the multi-UAV formation reconfiguration problem is essentially a constrained nonlinear problem. Therefore, it is hard to solve this problem using traditional optimization methods. Therefore, how to calculate the control law of RHC is a crucial technology. Aiming at the above problems, an adaptive hybrid intelligent algorithm AHPSODE is proposed to optimize the target assignment problem and the control inputs' sequence of RHC.

AHPSODE-based RHC controller
In this paper, we use the AHPSODE algorithm to optimize each UAV's control input to realize the reconfiguration of multi-UAV formation. Suppose there is a UAV flight formation composed of N UAVs, and the size of predicted time horizon is p. Then, the control input of ith UAV in the formation at time horizon k can be expressed as i = 1, . . . , N , j = 1, . . . , p. Since UAV's control input has four parameters, the dimensionality of the search space of the AHPSODE algorithm can be obtained as D = 4 * N * p. Furthermore, the particle x i in Fig. 9 Schematic of AHPSODE-based RHC controller the AHPSODE algorithm at time horizon k can be expressed as The schematic of the AHPSODE-based RHC controller is shown in Fig. 9. Here, X re f = x re f ,1 , . . . , x re f , . . . , x re f ,N is the relative state vector between the leader UAV and other UAVs. x ref is the reference state of the Leader UAV x L , and x ref,i represents the ith UAV's expected state relative to x L .
The target formation determination module is used to determine the leader UAV's number and the state of the target formation. All of the above information will send to the target assignment module. Subsequently, the target assignment module optimizes each UAV's moving targets through the information transmitted by the target formation determination module and the current state of the UAV group. Then, the controller will update the relative state.
After acquiring the X ref , the online optimization module will calculate the optimal control sequence U in the current time horizon and send the control input sequence to the corresponding UAV for execution. After each UAV completing the execution of control input, the UAV formation will feedback the current status to the online optimization module and repeat the above steps until the UAV team's states satisfy the termination condition. Considering the complexity of computation and the controller's timeliness, we adopted the first-order predictive control strategy in this controller. Therefore, we set p = m = 1.

Performance analysis of AHPSODE
We employed a set of 29 benchmark functions for explaining the performance of the proposed AHPSODE. The current well-known single objective optimization test suites contain   -20), and ten composition functions (F21-F30). Please refer to [18] for more details about CEC 2017. We compared the performance of AHPSODE with eight relevant algorithms, namely: SPSO, SDE, APSO, ADE, HPSOGA [14], IDE [17], and PIO [15]. The APSO algorithm and ADE algorithm are derived from SPSO and SDE using distance adaptive factors. By comparing APSO and ADE to PSO and DE, the effectiveness of the proposed distance adaptive factor can be validated. On this premise, the comparison of the AHPSODE algorithm with ADE and APSO can further demonstrate the performance of the AHPSODE algorithm. HPSOGA, IDE, and PIO algorithms are used in similar work to solve the formation reconfiguration problem, and hence, these three algorithms are also chosen to compare with AHPSODE.

Experiment setting
To have a fair comparison, all the experiments were coded in MATLAB language and were implemented on Windows 10 Pro-64 bit of a PC with 32 GB of RAM and Inter(R) Core(TM)i7-9700 CPU @ 3.0GHz.
We set the population N as 100 and the maximum generation G as 3000 for all algorithms. On the premise of retaining the theme idea of the algorithm, we will ensure that the parameters of the tested algorithm are consistent. For example, the ADE and the IDE have different improvement strategies for parameter CR, which means that our goal is to keep their range of changes for CR consistent. The detailed parameter settings are shown in Table 3.

Comparison for AHPSODE with other algorithms
In this part, the performance results on CEC 2017 at 30-D and 50-D are presented. In CEC 2017, there are a total of 30 test  functions (F2 has been excluded, because it shows unstable behavior, especially for higher dimensions). The maximum generation G for each algorithm is set as 3000. Each algorithm needs to run 15 times for a same test function at each dimension and calculate the mean value and standard deviation. Tables 4 and 5 present the statistical data for mean value and standard deviation at 30-D and 50-D, respectively. The Function Type is the collection of all test functions of the same type (e.g., Unimodal Function = F1, F3). The values in the table represent the number of times that the algorithm finds the optimal value compared with other algorithms for the same function type and dimension. In this situation, a blank cell indicates that the algorithm did not get a good result on the test function when compared to other algorithms. SUM represents the sum optimum statistics for a given algorithm over all test functions. The detailed results are shown in the Appendix, Tables 14 and 15, respectively. Table 4 shows that the APSO algorithm and the ADE algorithm are not more effective than other algorithms when handling 30-dimensional problems, but the SPSO and SDE algorithms in F7 F8 F9 F12 F13 F18 and F14 F15 obtain better solutions. However, in 30 dimensions, SDE can only get a better solution than ADE when solving F10 F14, so distance adaptive factor does have some improvement for ADE to deal with low-dimensional problems. In 50 dimensions, the mean value of APSO on the 25 test functions is better than those of SPSO. On the other hand, SPSO and SDE did not find the best solution for 50-dimensional problems, whereas APSO found a better solution for F13 and F14. Consequently, an enhanced SPSO and SDE incorporating distance adaptation factor do not have excellent performance in low dimensions. However, the distance factor has a much better effect when dealing with high dimension problems. In Table 4, we can also find that AHPSODE has achieved optimal solutions for all F1-F10 problems with 30 dimensions. This illustrates that the AHPSO algorithm is effective in solving low-dimensional unimodal and multimodal problems. In 50 dimensions, there are 22 optimal solutions to 30 test functions using AHPSODE. This shows that the AHP-SODE algorithm has a significant performance improvement compared to SPSO and SDE, especially when solving high-dimensional complex problems, the algorithm can avoid falling into local optimums and improve global search capability. Additionally, AHPSODE provides better solutions than HPSOGA IDE and SPIO algorithms when dealing with 30-dimensional problems. However, AHPSODE performs similarly to HPSOGA when solving complex problems of high dimension. Table 5 shows that although the AHPSODE algorithm proposed in this paper has good global search ability, the standard deviation is lower than that of SDE for 30-dimensional problems. This means that the AHPSODE algorithm has a small probability calculating a less desirable solution when dealing with low-dimensional problems. SPIO is significantly more stable in 30 dimensions than SDE. However, the algorithm's accuracy is poor.
The boxplots for experiment data are shown in Appendix  Figs. 19, 20, 21, 22, With the boxplots, we can intuitively visualize the calculation results of each algorithm on the specified test function. Data deviation is especially intuitive with the box plot. We also plot the convergence curves of all algorithms in different dimensions for different test functions as shown in Appendix Figs. 23, 24, 25, 26. These figures show that at the early stage of convergence, AHPSODE has a faster convergence speed than SDE. At the later stages of the algorithm, AHPSPODE has a better search accuracy than SPSO. The above data analysis demonstrates that by integrating the SPSO and SDE algorithms and including the distance adaptive factor, the AHPSODE performs much better than the other algorithms in global search.

Run-time complexity
In this subsection, we use the widely known running-time complexity criterion proposed in CEC 2017 to measure the efficiency of AHPSODE with SPSO, SDE, APSO, and ADE. In Table 6, the running-time complexity results of above algorithms on the CEC 2017 at 10-D, 30-D, and 50-D are calculated.
From Table 6, we can see that the SDE consumes more running-time than SPSO, and the running-time of the SPSO increases faster with the improvement of problem dimensions than that of DE. We also find that the running-time of the ADE and APSO is larger than that of the standard SPSO and SDE. The reason is that APSO and ADE need to calculate the distance adaptive factor at each iteration. However, this distance adaptive can improve the calculation accuracy of the SPSO and SDE. Due to the fact that the AHPSODE algorithm integrates the distance adaptive factor, the algorithm's running-time is also bigger than that of the SPSO and SDE. However, the running-time of the AHPSODE is substantially lower than that of ADE for the algorithm combines the properties of SPSO and SDE, and the running-time will not increase much as the problem dimension expands. Meanwhile, the algorithm's computational precision is superior than that of other algorithms. In Table 4, this finding has been verified.

Simulation of multi-UAV reconfiguration
In this paper, the problem of multi-UAV formation reconfiguration is decoupled into two parts. In the first part, we construct the target position determination problem in the formation change process as a task assignment problem. Then, we solve it through the AHPSODE algorithm to determine the target position of each UAV. In the second part, the control input sequence for each UAV is generated by the AHPSODEbased RHC controller to reform the target formation with the shortest moving distance. Therefore, this section will simulate the above two decoupled processes to evaluate the effectiveness of AHPSODE-based RCH controller for multi-UAV reconfiguration. All the simulation experiments were coded in MATLAB language and were implemented on Windows 10 Pro-64 bit of a PC with 32 GB of RAM and Inter(R) Core(TM)i7-9700 CPU @ 3.0GHz.

Simulation for target assignment
In this subsection, we use AHPSODE to solve the discrete task assignment problem in multi-UAV formation reconfiguration and test its performance and effectiveness. We first set up a virtual formation reconfiguration scene. In this scene, a group of 10 UAVs are randomly generated limited to a spherical sphere with (0, 0, 0) as its center and 30m as its radius. The target is to control these UAVs to form a circle forma- tion. The coordinates of the initial state and the target state are shown in Table 7.
Since the above task assignment problem is discrete, we discretize AHPSODE to solve this problem with the goal of minimizing the total moving distance. The SPSO, SDE, APSO, ADE, IDE, AHPSOGA, and SPIO algorithms are also compared to the AHPSODE algorithm in the present experiment. The parameters of the above algorithms are shown in Table 8. According to the experiment results, we find the optimal solution gbest = [4 1 5 6 8 9 7 3 10 2]. Figure 10 shows the initial position and target position of each UAV, and the solid lines in different colors correspond to gbest. Figure 11 shows the convergence curves of SPSO, SDE, APSO, ADE, IDE, PIO, HPSOGA, and AHPSODE in solving the target assignment problem of above UAV formation reconfiguration problem. The x-coordinate is the number of iterations, and the y-coordinate is the fitness value (the total moving distance).
As shown in Fig. 11a, the APSO algorithm has better convergence accuracy than the SPSO algorithm, since it is easier to jump out of the local optimum. In comparison to SDE, the ADE algorithm has a faster convergence speed, mainly because the distance adaptive factor adjusts the mutation probability of the algorithm, helping it obtain better global search ability in the early stages of algorithm iteration. Based on the aforementioned results, the distance adaptive factor Fig. 10 The optimal task assignment result of 10 UAVs in 3D space proposed in this paper can effectively improve the accuracy of the algorithm in the later iteration of the discrete PSO and discrete DE when dealing with the problem of target assignment. At the same time, according to the convergence curve, it can be found that the convergence performance of AHPSODE algorithm is better than that of ADE and APSO.
As illustrated in Fig. 11b, the convergence speed of the AHPSODE algorithm in the early stages of the iteration is slightly lower than that of SPIO, but the global convergence speed is higher than SPIO, IDE, and HPSOGA. The AHP-SODE also has better convergence accuracy than the above algorithm. As a result, the AHPSODE algorithm can not only find the optimal assignment result in this problem of multi-UAV formation reconstruction, but also provide a relatively good performance.

Simulation of AHPSODE-based RHC controller
In this part, we construct a virtual scene for multi-UAV formation reconfiguration. To solve this formation problem, the AHPSODE-based RHC controller is simulated in our experiment. Additionally, we compare AHPSODE-based RHC controller with similar methods PIO, HPSOGA, and IDE on the basis of RHC controllers. The parameters of the above test algorithm are displayed in the Table 9.
Simulation setting In our experiment scenario, there is a UAV group of 9 UAVs with the same specifications. We numbered these UAVs as 1-9, and the initial state of each UAV is shown in Table 10. Among this UAV group, U AV 5 is selected as leader UAV. The goal is to transform this initial formation into a circular formation, and we use our proposed controller to generate optimized control inputs for each UAV. The states of the target circular formation are also shown in Table 10.
Furthermore, we add several constraints on the control inputs in our experiment to ensure that the optimized control   input sequence can provide a smooth trajectory for the UAV group (Table 11).

Performance analysis of AHPSODE-based RHC controller
Based on the states of Table 10, we can get the optimal task assignment solution assign = [5 9 4 3 1 2 7 8] by the discrete AHPSODE algorithm. Then, the relative states X ref can be determined. Figure 12 shows the initial position and target position of the UAV group. The blue star indicates the initial   by the discrete AHPSODE. Figure 13 illustrates the effect of constraints on the generated trajectory. By the AHPSODEbased RHC controller, the original straight line formation can be successfully changed into the target circular formation. However, the trajectory generated without constraints is quite jagged. The convergence curve of the IDE, SPIO, HPSOGA, and AHPSODE is shown in Fig. 14. In Fig. 14, every 100 iterations represent a control time domain of 1s. We also divided the 800 iterations into the first 400 iterations (Fig. 14a) and the last 400 iterations (Fig. 14b) to better observe the conver-gence characteristics. According to the convergence curve, the controller will converge in time-domain 8, and the proposed AHPOSDE has a faster convergence speed than IDE and SPIO. It also has a higher accuracy than HPSOGA. At the same time, according to the convergence curve of each time-domain, it can be found that the AHPSODE is easy to jump out local optimum and search a new solution with better fitness. This can help AHPSODE have a bigger probability finding a better solution when dealing with real problems.
To verify the performance of the RHC method, we also compared RHC method with the CPTD method mentioned in Fig. 15 The convergence curve of AHPSODE-based RHC controller and AHPSODE-based CPTD controller [14]. We first construct an AHPSODE-based RHC controller and AHPSODE-based CPTD controller. For a fair comparison, the population size is set as 100, and all populations will update 800 times. Then, two controllers are used to solve the problem of multi-UAV formation reconfiguration mentioned in this section. Figure 15 shows the convergence curve of AHPSODE-based RHC controller and AHPSODE-based CPTD controller. Although the search accuracy of these two methods is similar at the same iteration, the CPTD method takes 28s for 800 iterations, while the RHC method only takes 7 s. Therefore, the RHC method can save more computing resources than the CPTD method. Figure 16 illustrates the optimized control inputs in eight time domains for each UAV. The results demonstrate that each UAV's control inputs in all time-domains satisfy our constraint. Figure 17 depicts the distance between UAV5 and other UAVs in each time domain, and it can be seen that UAV5 satisfies both the safe distance and communication distance requirements in each time domain.
The average fitness values and average time cost of ten repeated experiment for different algorithms in each timedomain are shown in Table 12. From Table 12, the AHPSODE method requires the least amount of time and has the highest level of computing precision compared to all other methods.
To verify the performance of our proposed method when solving large scale UAV formation reconfiguration, we set the number of population size n = 50, the max iteration time G = 50, and the GroupScale = [10, 20, 30, . . . , 100], v B = 15. As this part of the experiment investigates the time consumption of the method described in this paper in solving the reconfiguration problem of multi-UAV formations of different group scale, the initial positions are equidistant   Fig. 18. Table 13 shows the experimental results for different scales of UAV formation reconfiguration scenarios. It can be seen from Table 13 that the time consumption increases rapidly as the number of UAVs increases. Therefore, our proposed AHPSODE-based method cannot solve the real-time formation reconfiguration problem for large-scale UAV groups.
According to the above simulations, we can conclude that our proposed AHPSODE algorithm can utilize the advantages of both DE and PSO in solving optimization problems. Meanwhile, the convergence curve of different algorithms also indicates that the AHPSODE has higher search veracity than SPSO, more rapid convergence speed. Finally, the proposed AHPSODE-Based RHC Controller can successfully transform the initial UAV formation to the desired target formation in 3-D space and each UAV satisfies all constraints in the reconfiguration process.

Conclusion
In this paper, an AHPSODE-based RHC controller is proposed to solve multiple-UAV formation reconfiguration problem in a 3-D environment. The proposed AHPSODE algorithm is first used to solve the discrete target assignment problem of formation reconfiguration. Then combined with the idea of rolling time-domain control, the global control problem is decomposed into several local optimization problems, and then, the AHPSODE algorithm is used to  find the optimal solution of the local optimization problem. Through the numerical simulation experiment, it can be proved that the AHPSODE-based RHC controller has higher search precision and speed compared. At the same time, due to the decomposition of the global control problem, the complexity and computational overhead of multi-UAV reconfiguration is reduced, and the real-time performance of the controller in the formation reconfiguration task is improved. In future work, the problem of formation reconfiguration with multi-objective optimization and large-scale problem will be consider. Furthermore, the optimization of the controller within a framework of optimal control will also be considered as a further research direction to enhance the integrity and practicality of the formation reconfiguration strategy.pg