Introduction

The technology of unmanned aerial vehicle swarm (UAVs) covers design, construction, and deployment of UAVs, and UAVs system has capability to collaborate in problem-solving and task execution through coordinated manner [1, 2]. UAVs system is primarily applied in areas such as exploration, patrolling, and surveillance in the contexts of earthquakes, wildfires, natural resources and environmental monitoring [3, 4]. Nowadays, the research on UAVs systems involves various aspects, such as target search [5, 6], path planning [7, 8], and task assignment between UAVs [9, 10]. The application of intelligence optimization algorithms to the target search is a significant research direction. UAVs system can obtain a better performance than individual UAV system when the optimization algorithm facilitates the effective cooperative between UAVs [11]. Notably, if optimization algorithm assists the UAVs system in possessing powerful cooperative capability, it will make the execution of complex target search tasks more efficient and reliable.

For cooperative target search of UAVs in unknown environments, the main approaches can be categorized into the scanning search methods [12], the dynamic search methods [13], and the intelligent optimization methods [14]. The typical scanning search method uses a small number of robots or specific devices to conduct repeated search in relevant areas [15], which often leads to longer search times and consumes significant resources. A nash-combined adaptive differential evolution method is proposed by combining the adaptive differential evolution algorithm with Nash optimization for dynamically searching targets [16], while the method exists the weaknesses in terms of the substantial consumption of computational resources and the reduction of search efficiency. Currently, the relevant research primarily focuses on swarm intelligent optimization algorithms for addressing target search. Extensive studies have shown that swarm intelligent optimization algorithms can offer promising solutions for multi-UAV cooperative control, improving problem-solving efficiency in complex environments [17]. A particle swarm optimization algorithm with adaptive inertia weight is proposed for multi-robot target search [18]. An adaptive robot grey wolf optimization algorithm incorporates the optimal learning strategy and strategies related to inertia weight and obstacle avoidance to address cooperative target search [19]. An improved sparrow search algorithm utilizes position update strategy for cooperative target search of multi-UAV [20]. However, the utilized swarm intelligent optimization algorithms usually ignore issues such as the collisions between UAVs, the low search efficiency, and the local optima.

Swarm intelligent optimization methods, mainly based on biomimicry of animal groups, require the establishment of relatively complex usage conditions. For instance, various behavioral parameters are defined for individuals, such as speed, inertia weight, acceleration coefficient, pheromone evaporation factor, and intensity. To achieve optimal performance, the algorithms need conduct thorough research on the configuration of these parameters, thereby involving high computational costs [21, 22]. Not limited to the intelligent research of animal bionics, most plant populations in nature also have excellent the survival and reproduction strategies with “intelligence" [23]. Nature Plants also published a review article on the intelligence of plants, stating that plants are wonderful life forms that integrate information from various external and internal environments to generate a life form with extraordinary adaptability [24]. Botanists are generally impressed by the intricate responses of plant populations and their adaptive ability to changing environments, and these remarkable characteristics of the survival and reproduction further propel the advancement of plant bionic intelligence. The bean optimization algorithm (BOA) [25] is constructed by analyzing the excellent dispersal methods and population adaptive evolution strategies of plant in nature. BOA dynamically displays the process of survival, reproduction, and distribution evolution of approximately static plant populations by significantly compressing the timeline. It exhibits characteristics such as close distributed collaborative interaction, and significant emergence of collective intelligence. The algorithm demonstrates the marvelous and robust collective adaptive survival capability of plant populations, making it highly suitable for research on swarm robotics in target search [26, 27]. Robotic bean optimization algorithm (RBOA) is a promising method with straightforward structure based on BOA in the research on cooperative target search of UAVs [28,29,30]. However, the initialization efficiency of RBOA is not well, and there exists limitation in terms of global search capability.

In this paper, a reinforced robotic bean optimization algorithm (RRBOA) is proposed based on the hybrid strategies for cooperative target search of UAVs. Specifically, the hybrid strategies include a region segmentation exploration strategy for avoiding collisions between UAVs, a neutral evolution strategy for improving the global exploration capability of UAVs, and an adaptive Levy flight strategy for preventing from the local optimal of UAVs. RRBOA is a swarm intelligent optimization algorithm that can be effectively used in UAVs system. The major contributions of this work are summarized as follows:

  1. (1)

    An initialization mechanism of UAVs is designed based on a region segmentation exploration strategy, which involves uniformly partitioning the search area to address a concentrated distribution problem of UAVs initialization, thereby avoiding collisions between UAVs and enhancing the coverage capability of UAVs for regional exploration.

  2. (2)

    A neutral evolution strategy is presented based on the spatial distribution of UAVs to make the majority of UAV individuals maintain their positions for independent exploration, avoiding the clustering of UAV individuals around the current optimal positions and enhancing the global search capability of the UAVs.

  3. (3)

    A wide-area cooperative search mechanism is constructed based on an adaptive Levy flight strategy, which expands the search range of UAVs and enhances the diversity of UAVs exploration, thus avoiding the UAVs search falling into local optima.

The remainder of the paper is organized as follows. The related works of target search are reviewed in Sect. 2. In Sect. 3, the RBOA algorithm is described in detail. In Sect. 4, a detailed presentation of the proposed RRBOA algorithm is provided, including three strategies: the region segmentation exploration strategy, the neutral evolution strategy and the Levy flight exploration strategy. The effectiveness of RRBOA will be verified by various experiments in Sect. 5. Finally, the conclusions are drawn in Sect. 6.

Related works

This paper primarily focuses on the cooperative target search problem among UAVs in unknown environments. Considering that the communication capabilities among UAVs have been well-developed [31], UAVs is regarded to have the real-time processing and great communication satisfying the request of cooperative target search in this paper. For cooperative target search of UAVs in unknown environments, although UAVs is unaware of the specific locations of target, they can know that the search area is bounded. The search environment may exist uncertain external factors including potential environmental threats and obstacles. For example, the magnetic interference is an external environmental factor affecting the movement trajectory of UAVs. Specifically, target search algorithms have certain requirements that should be taken into consideration [32]: simple to calculate, using distributed control instead of centralized control, avoiding long-term information sharing between UAVs. It is worthy to note that swarm intelligent optimization algorithms exhibit different applications in various scenarios and are capable of completing various complex tasks. Actually, swarm intelligent optimization algorithms have been widely and successfully applied in target search tasks of UAVs, since their advantages such as concise structure, simplicity of implementation and high efficiency of search.

Nowadays, the most popular swarm intelligent optimization algorithms, such as those based on glowworm swarm optimization (GSO)[33] and particle swarm optimization (PSO)[34], have been applied in research on target search of robots. The performance of GSO is closely related to the communication range, and it does not exhibit a obvious advantage in the field of target search. PSO has been found extensive applications in addressing target search problem of robots. Phung et al. propose a motion-coded particle swarm algorithm for dynamic target search of UAVs [35], but the algorithm is prone to getting stuck in local optima. Zhang et al. design a UAV signal source localization model based on PSO, solving the issue of signal strength decreasing as the detection distance of UAV gradually increases with flight altitude [36]. However, the method does not address the issue of collaboration among UAVs. Garg et al. introduce a robot PSO algorithm with adaptive exploration to address multi-objective search problems [37], but its obstacle avoidance effectiveness is comparatively poor. Masoud et al. propose a classical multi-robot target search algorithm called A-RPSO, which is built upon the PSO algorithm. A-RPSO treats each particle as a robot and utilizes the improved position and velocity to update the positions of robots in the search space, introducing an obstacle avoidance mechanism to reduce collisions and incorporating an adaptive inertia weight to avoid local optima [18]. However, the effectiveness of A-RPSO relies on substantial communication among robots, and the algorithm is prone to the premature convergence.

Besides, other swarm intelligent optimization algorithms have also been tailored for target search of UAVs. Cai et al. introduce an improved PSO with a potential field function for target search of multiple robots in a complex unknown environment [38], while the drawback of the algorithm is the utilization of a centralized control method. Patricia et al. apply the bat algorithm to a swarm robotics search model and propose a new robot bat algorithm [39]. Wang et al. construct an intrusion model of dynamic target, and they propose an improved bat algorithm to solve the trajectory optimization problem of UAVs for tracking the intrusive target [40]. While the bat algorithm has advantages such as fast convergence and easy implementation in addressing cooperative target search problems, there are still some issues to be resolved. For instance, the algorithm has a relatively high number of control parameters, making it prone to premature convergence and getting trapped in local optima. Tang et al. propose an adaptive robot grey wolf optimization algorithm, named RGWO [19]. In RGWO, every grey wolf represents a robot, and their movement is restricted by a maximum speed. These robots can only update their locations by moving from their initial positions. RGWO extends the original GWO algorithm [41] by incorporating the optimal learning strategy and strategies related to inertia weight and obstacle avoidance, aiming to address the cooperative target search problem for a group of robots in unknown environments. However, this algorithm is limited in the search diversity of robots, which may result in getting stuck in local optima.

Recently, some swarm intelligent optimization algorithms are developed for target search of UAVs. Niu et al. propose an improved sand cat swarm optimization algorithm by combining the motion-encoded mechanism, the elite pooling strategy and the adaptive T-distribution to increase planning efficiency and avoid local optima, thus enhancing the effectiveness for locating moving targets with UAVs [42]. However, it fails to address issues like UAVs collisions. Fei et al. proposed an improved sparrow search algorithm for target search, called ISSA [20]. In ISSA, a new cooperative framework is designed to control the observation locations of UAVs, and the position update strategy to local communication is utilized to adjust the UAVs’ position in the search space. Nevertheless, ISSA exists the issues that are UAVs collisions and potential local optima.

Robotic BOA

RBOA is proposed based on BOA, which has the features involving efficient search ability, distributed collaboration and intelligence emergence of population. Initialization of robot population in RBOA including the number of UAV individuals, the number of parent UAV individuals, the distance threshold between patent UAV individuals, the chosen evolutionary model of plant population distribution and the parameter settings of model. Initialization of UAVs is done in the form of Eq. (1):

$$\begin{aligned} INI\left( N\right) =rand\left( N,M,d_m,G\left( X\right) ,R\right) , \end{aligned}$$
(1)

where INI denotes the random initialization of UAVs, and rand is random function. N, M, and \(d_m\) denote the total number of UAVs, the number of parent UAV individuals, and the distance threshold between two parent UAV individuals, respectively. G, X and R respectively represent the plant population distribution model, UAV position, and the search space.

As shown in Fig. 1, the spatial hierarchy structure of UAVs in the scheduling process consists of parent UAV individuals layer, temporary dispatch layer and UAV individuals layer. The parent UAV is selected from the relatively optimal individuals based on the overall condition of the UAVs in each iteration. The parent UAV individuals layer is used to calculate the UAV distribution and free motion space of the sub-UAVs layer. The UAV individuals perform the search tasks in their free motion space, updating the optimal fitness value and the corresponding position information, and then sending them to the parent UAV individuals. The temporary dispatch layer is utilized to adjust the position of the UAV individuals. The distribution formula of sub-UAVs is shown in Eqs. (2) and (3):

$$\begin{aligned}{} & {} N_{CBi}=P_i*N, \end{aligned}$$
(2)
$$\begin{aligned}{} & {} P_i=\frac{\left[ f\left( FB_i\right) \right] }{\left[ {\textstyle \sum _{i=1}^{M}}f\left( FB_i\right) +\alpha _i\right] }, \end{aligned}$$
(3)

where \(N_{CBi}\) is the number of UAV individuals deployed by parent \(UAV_i\); \(P_i\) is the proportion of UAV individuals allocated to parent \(UAV_i\); N is the total number of UAVs; M is the number of parent UAV individuals; \(FB_i\) is the current position of the parent \(UAV_i\), and \(f(FB_i)\) is the fitness function of the parent \(UAV_i\) in the current position \(FB_i\); \(\alpha _i\) is the offset of the UAV allocation proportion of the parent UAV, and its default value is zero. The UAV individuals generate new positions around their parent UAV individuals according to the chosen distribution model G, and they perform search tasks in their free motion space. Figure 2 shows the free motion space division of UAV individuals.

Fig. 1
figure 1

Spatial hierarchy structure of UAVs in the scheduling process. The structure consists of parent UAV individuals layer, temporary dispatch layer and UAV individuals layer, where the parent UAV is selected from the relatively optimal individuals based on the overall condition of the UAVs

Fig. 2
figure 2

Free motion space division of UAV individuals. The free motion space of Each UAV is given by the voronoi polygon partitioning

The free motion space of UAV individuals is divided based on voronoi polygon, which effectively avoids collision of UAVs during the search process. After that, the UAV individuals and their corresponding polygon vertices are connected separately, and the UAVs motion trajectory points are randomly selected on the line segments. As shown in Fig. 3, assume that the number of polygon vertices corresponding to \(UAV_i\), the number of its traversal cycles in free motion space, and the number of trajectory points traversed by \(UAV_i\) respectively are j, c, and cj, the trajectory points are denoted as \(P_{11},P_{12},...,P_{c1},...,P_{cj}\). Starting from the UAV individual IN1, the trajectory points are connected to generate the motion trajectory of IN1 in the free motion space. The generation formula of the trajectory point is shown in Eq. (4):

$$\begin{aligned} P_{cj}=ttv*INi+\left( 1-ttv\right) *P_j, \end{aligned}$$
(4)

where INi is the ith UAV, and ttv is a generated random number. During its movement, the optimal fitness value and its corresponding position information are updated and sent to its corresponding parent UAV. After integrating the information, the parent UAV is dispatched to the optimal fitness value, and the UAV individual is reassigned to dispatch until the suspected target area is found.

Fig. 3
figure 3

Motion trajectory of \(UAV_1\). Starting from the UAV individual IN1, the trajectory points are connected to generate the motion trajectory of IN1 in its free motion space

RRBOA algorithm

In order to improve the initial search efficiency of UAVs, enhance the global exploration capability of UAVs and prevent the local optima of UAVs search, this paper proposes RRBOA for cooperative target search of UAVs. Based on RBOA, the proposed algorithm introduces a region segmentation exploration strategy to achieve coarse-grained full coverage of the search area when UAVs are initialized, thus improving the ability of obstacle avoidance and global search. In the process of evolution, the combination of neutral evolution and adaptive Levy flight improves the diversity of UAVs search and the addition of fine search for target suspected area, thus enhancing the exploration ability of UAVs and the local optima of UAVs search.

The overall framework of RRBOA

The framework of RRBOA can be illustrated in Fig. 4. The framework diagram mainly consists of three components: initialization based on region segmentation exploration strategy, cooperative search based on neutral evolution strategy and Levy flight strategy, and the fine search component within the RBOA algorithm.

Fig. 4
figure 4

Framework diagram of RRBOA. The framework starts with initialization based on the region segmentation exploration strategy. If UAVs does not meet the conditions of fine search, the neutral evolution strategy and the Levy flight strategy are performed for cooperative search

As shown in Algorithm 1, RRBOA divides the target area evenly and assigns a separate UAV to each region for searching in the initialization. The fitness value of the UAVs traversal points is calculated, and the optimal point for each region is selected as the initial location of the UAVs. The algorithm then checks if the evenly initialized positions meet the threshold for fine search. If the fine search conditions are not met, the generation of UAVs detection positions are carried out using a neutral evolution strategy. Subsequently, the motion space for each UAV is partitioned based on voronoi polygons, and finally, a Levy flight strategy is employed to conduct wide-ranging cooperative searches in the search space, further refining the search for the target.

Algorithm 1
figure a

RRBOA Algorithm

The region segmentation exploration strategy

For the target search task of UAVs in unknown environment, if the initial position of UAVs are randomly generated, it is easy to lead to the low efficiency of UAVs search due to the uncertainty of the environment. Therefore, this paper proposes a region segmentation exploration strategy for full coverage to enhance the initial search efficiency. In this paper, all UAV positions are projected onto the same horizontal plane to carry out cooperative search tasks. Figure 5 shows the schematic diagram of the region segmentation exploration strategy for full coverage. The specific steps are as follows:

  1. 1.

    Set the search area of any polygon as P, and the area of P as \(Lx *Ly\). Take any edge of the polygon area as the starting edge. First, divide the task area equally according to the total number of UAV individuals. Take the first vertex in the clockwise direction of this edge as the starting position of the first UAV, and the \({N-1}\) bisection point as the starting position of the remaining \({N-1}\) UAVs. Each UAV occupies an equally divided area, and the UAVs stops at the starting position, setting the area occupied by each UAV as S(N), and the calculation formula is shown in Eq. (5):

    $$\begin{aligned} S(N)=\frac{L_y}{N}*L_x. \end{aligned}$$
    (5)
  2. 2.

    Take the adjacent edge of the starting edge and divide the area occupied by each UAV into \(1*m\) discrete cells. The set of all discrete cells is \(E=(i,j)|i=1,2,...,N,j=1,2,...,m\), and (ij) represents the jth search cell of the ith UAV. When the scale of UAV cluster is large, more uniform sampling points may lead to resource waste. Therefore, the number of discrete units in each region is calculated as shown in Eq. (6):

    (6)

    where \( \lceil \rceil \) represents the ceiling function.

  3. 3.

    m is the number of trajectory points traversed transversely by each UAV. A trajectory point is randomly generated in each grid. The UAV individual searches transversely in its divided area. Assuming that the location of IN1 is the plane coordinate origin (0, 0), the generating formula of UAV track points is shown in Eq. (7):

    $$\begin{aligned} P_{ij}=\left( \left( r+j-1\right) *\frac{L_x}{N}, \left( r+i-1\right) *\frac{L_y}{N}\right) , \end{aligned}$$
    (7)

    where r is the random number of (0, 1).

  4. 4.

    Taking IN1 as an example, its motion track points are \(P_{11}\), \(P_{12}\), \(P_{13}\), \(P_{14}\), and \(P_{15}\). The trajectory of UAV is as follows:

    $$\begin{aligned} P_{11}\rightarrow P_{12}\rightarrow P_{13}\rightarrow \dots \rightarrow P_{1m}. \end{aligned}$$
    (8)
  5. 5.

    When the UAVs horizontal search is completed, the UAVs will evaluate the optimal fitness value position in each division area, and then dispatch the UAVs to the corresponding area position. The position determination formula is shown in Eq. (9):

    $$\begin{aligned} \begin{aligned}&INi=IN{P_{ij}}_{best},\\&i=1,\dots ,\text {N}, j=1,\dots ,m, \end{aligned} \end{aligned}$$
    (9)

    where \(IN{P_{ij}}_{best}\) represents the position with the best fitness value among the m trajectory points traversed by the \(UAV_i\).

Fig. 5
figure 5

Schematic diagram of the region segmentation exploration strategy for full coverage. The initial position selection of the UAVs is guided by the region segmentation exploration strategy. Taking IN1 as an example, its motion track points are \(P_{11}\), \(P_{12}\), \(P_{13}\), \(P_{14}\), and \(P_{15}\)

The neutral evolution strategy

The key point of the neutral evolution theory is that the protagonist of evolution is neutral variation rather than favorable variation, most of which are neutral. Variation is neither harmful nor beneficial to individual survival, and selection has no effect on them. These neutral mutations float freely in the gene pool due to no pressure of selection, and they are fixed in the population through random drift. Based on the neutral evolution theory, a small number of UAV individuals in the UAVs dispatch around the parent UAV individuals in a certain distribution. The position of these UAV individuals are inferior to that of the whole UAVs. Therefore, they tend to change in favor of themselves and tend to be evolutionary selection. However, most UAV individuals maintain their position, and their position is neutral relative to the whole UAVs, neither is the optimal position, it is also not a disadvantage position. They have no pressure to choose and conduct free detection in the mission area. The schematic diagram of the position generation of UAVs detection based on neutral evolution is shown in Fig. 6.

Fig. 6
figure 6

Position generation of UAVs detection based on neutral evolution strategy. The UAV individuals outside the circle are guided by the neutral evolution strategy for free exploration, while the UAV individuals inside the circle are allocated to the parent UAV

The specific steps for the position generation of UAVs detection based on neutral evolution strategy are as follows:

  1. 1.

    After cooperative search of UAVs, the detected information will be integrated, and the system will evaluate the first several location points with better fitness values. The number of location points depend on the number of parent UAV individuals. The UAV with the best fitness value is selected the 1th parent UAV, all parent UAV individuals are numbered and sorted according to their fitness values.

  2. 2.

    After that, the parent UAV individuals are allocated to the sub-UAVs. The UAV individual allocation rules of the parent \(UAV_i\) are set as follows. According to the fitness value of the parent \(UAV_i\), the corresponding number of UAV individual is allocated to it. As shown in Eq. (10) and Eq. (11).

    $$\begin{aligned}{} & {} N_{CBi}= \lceil P_i*N*\gamma \rceil , \end{aligned}$$
    (10)
    $$\begin{aligned}{} & {} P_i=\frac{\left[ f\left( FB_i \right) \right] }{ {\textstyle \sum _{i=1}^{M}} f\left( FB_i \right) }, \end{aligned}$$
    (11)

    where \(P_i\) is the proportion of sub-UAVs that parent \(UAV_i\) can be assigned, \(\gamma \) is the total proportion of sub-UAVs that are assigned, and M is the number of parent UAV individuals. The total number of sub-UAVs that can be allocated satisfies Eq. (12):

    $$\begin{aligned} PN= {\textstyle \sum \limits _{i=1}^{M}} N_{CBi}=N*\gamma . \end{aligned}$$
    (12)
  3. 3.

    Order the fitness value of the location of UAVs. The PN individuals at the bottom of the order are referred to as the assignable sub-UAVs set S, and the remaining UAV individuals are referred to as the random drifting population in the neutral evolutionary theory. The specific updating formula of the sub-UAVs is shown in Eq. (13):

    $$\begin{aligned} X_{ij}\left( t+1 \right) = {\left\{ \begin{array}{ll} &{} \text {} G\left( X_i\left( t \right) \right) , INi\in S \\ &{} \text {} X_j\left( t \right) , else\end{array}\right. }, \end{aligned}$$
    (13)

    where \( G\left( X_i\left( t\right) \right) \) is a new position of the \(UAV_j\) generating around its parent \(UAV_i\) according to the Gaussian distribution model; \( X_j\left( t\right) \) is the position of \(UAV_j\) at time t; \( X_i\left( t\right) \) is the position of parent \(UAV_i\) at time t; INi is the \(UAV_i\). The formulas related to the Gaussian model are shown in Eqs. (14), (15) and (16):

    $$\begin{aligned}{} & {} G\left( X \right) = \frac{1}{\delta _i\sqrt{2\pi } } exp\left( {\frac{-\left( X-\mu _i\right) ^2}{2\delta _i^2}}\right) , \end{aligned}$$
    (14)
    $$\begin{aligned}{} & {} \mu _i=X_i\left( t \right) , \end{aligned}$$
    (15)
    $$\begin{aligned}{} & {} \delta _i=d_r+\left[ \frac{d_{max}}{N_{CBi}} \right] *\alpha _{\delta _i}, \end{aligned}$$
    (16)

    where \(\delta _i\) is the dispersion degree of the new location of UAV, and \(\mu _i\) is the centralized trend of the distribution of the sub-UAVs. \(d_r\) is the minimum safe distance between UAVs. \(d_{max}\) is the boundary distance of the mission area, and \(\alpha _{\delta _i} \) is the bias of the dispersion degree of the individual UAV position distribution, whose default value is 0.

  4. 4.

    After determining the new location of the UAVs, it is divided into free motion based on voronoi diagram, and then the UAV cluster performs wide-area cooperative search based on Levy flight.

The adaptive Levy flight strategy

Levy flight is a Markov process proposed by Paul Pierre Levy. Some studies have indicated that the dispersal of plants is influenced by various factors, such as wind and water currents. This phenomenon displays the typical characteristics of Levy flight and has been applied to the filed of optimization. In this paper, the combination of RBOA and Levy flight strategy expands the search range of UAVs and enhances the diversity of population, so that the algorithm can easily jump out of the local optimum. UAVs performs wide-area cooperative search based on Levy flight, and designing a Levy step size method with adaptive scaling factor. The adaptive scaling factor of step size are adjusted according to the change of the optimal fitness value in the group, and then the position of UAVs is updated based on the new obtained step size with adaptive scaling factor. The updating formulas of UAV individual position can be shown in Eqs. (17), (18) and (19):

$$\begin{aligned}{} & {} X\left( t+1 \right) =X\left( t \right) +\partial *\omega *Levy\left( \beta \right) , \end{aligned}$$
(17)
$$\begin{aligned}{} & {} \partial =\frac{\partial *\left( maxT-t*b \right) }{maxT}, \end{aligned}$$
(18)
$$\begin{aligned}{} & {} Levy\left( \beta \right) = \frac{\mu }{\nu ^{\frac{1}{\beta }}}, \end{aligned}$$
(19)

where \(\partial \) denotes the adaptive scaling factor of levy step size; b is a normal number used to control the scaling rate of \(\partial \), maxT is the maximum iteration number of the algorithm; \(\omega \) is a constant related to the size of the task area; \(Levy\left( \beta \right) \) is the levy random path; \(\beta \) is an exponential constant, \(1\le \beta \le 3\); \(\mu \) and \(\nu \) come from the normal distributions conforming to \(\mu \sim N\left( 0,\sigma _\mu ^{2}\right) \) and \(\nu \sim N\left( 0,\sigma _\nu ^{2}\right) \), respectively. \(\sigma _\mu \) and \(\sigma _\nu \) in the formula are the scale parameters of normal distribution, and their definitions are shown in Eq. (20):

$$\begin{aligned} \sigma _\mu= & {} \left\{ \frac{\tau \left( 1+\beta \right) \sin \left( \frac{\pi \beta }{2} \right) }{\tau \left( \frac{1+\beta }{2} \right) \beta 2^\frac{\beta -1}{2} } \right\} ^\frac{1}{\beta }, \sigma _\nu =1, \end{aligned}$$
(20)

where \(\tau \) is the standard Gamma function, If the new position of the UAV individual exceeds the spatial boundaries of its free motion, a new position for the UAV is generated between the boundary intersection point and its original position, ensuring the rationality of both motion and distribution.

Simulation experiments and results analysis

Experimental descriptions

To verify the effectiveness of the proposed algorithm, this paper utilizes the reciprocal of function values to simulate signal strength, aiming to identify the region with the highest signal intensity. Experimental comparisons are conducted between the proposed RRBOA and three representative target search algorithms: A-RPSO, RGWO and ISSA, which are performed on the distinct benchmark simulations and the pollution source search. The max iteration time for all comparison algorithms is set to 100. The configurations of the other parameters in the compared algorithms are all based on their corresponding references. This paper utilizes the average evaluation times of target found and the success rate of target found in 100 repeated experiments as performance metrics to assess the algorithm, denoted as Et and Suc, respectively. If the fitness value of UAV individuals is lower than the target threshold, it is considered successful in finding the target. Notably, to ensure the fairness of the experimental assessment, the evaluation times of RRBOA includes the evaluation times of the initial region segmentation exploration.

First, experiments adopt 9 two-dimensional benchmark functions to simulate nine different detection areas. These functions are Sphere (F1), Schwefel 2.22 (F2), Schwefel 1.2 (F3), Schwefel 2.21 (F4), Rosenbrock (F5), Schwefel 2.26 (F6), Rastrigin (F7), Ackley (F8), and Griewank (F9). As these functions are not easily minimized quickly, this paper employs them to assess the performance of the algorithm. The search ranges and target threshold values are specified based on the characteristics of each function. The detailed information of benchmark functions is provided in Table 1.

Table 1 Detailed information of nine benchmark functions. The information of functions includes function name, mathematical representation, initial range, and target threshold
Fig. 7
figure 7

Simulated pollution concentration distribution map and contour map. a Shows the concentration distribution based on the pollution source dispersion model defined by Eq. (21), exhibiting multiple local maxima and three global maxima. b Depicts the contour map generated by the pollution source dispersion model, and three pollution source targets are marked in the region

Table 2 Parameter settings in the pollution source search experiment
Fig. 8
figure 8

Average evaluation times of target found for RRBOA performed on nine benchmark functions under different \(\gamma \)

Subsequently, this paper established a new pollution source search scenario based on atmospheric pollution source dispersion models such as the Gaussian plume model, Sutton model, FEM3 model, and CALPUFF model, validating the practicality of RRBOA. The pollution source dispersion model is defined by the benchmark functions with multiple extremes, which is shown in Eq. (21):

$$\begin{aligned} z\left( x, y \right)&=20exp\left\{ -0.2\left[ 0.5\times \left( x^2+y^2 \right) \right] ^\frac{1}{2} \right\} \nonumber \\&\quad +exp\left\{ 0.5\times \left[ \cos \left( 2\pi x \right) + \cos \left( 2\pi y \right) \right] \right\} \nonumber \\&\quad -exp\left( 1 \right) -6. \end{aligned}$$
(21)
Table 3 Average evaluation times and success rates of target found in 100 experiments for RRBOA, A-RPSO, RGWO, and ISSA when the size of UAVs is 10
Table 4 Average evaluation times and success rates of target found in 100 experiments for RRBOA, A-RPSO, RGWO, and ISSA when the size of UAVs is 20

The characteristic of this function is a nearly flat region modulated by cosine waves, forming multiple peaks with multiple local maxima and a global maximum. This creates a surface with undulations, effectively simulating influencing factors in natural environments. When UAVs search for pollution sources, it is essential to consider that UAVs can traverse the interference peaks to locate the position of the pollution source. Considering the possibility of multiple pollution sources, the defined pollution source dispersion model includes three pollution sources. The pollution concentration distribution map is depicted in Fig. 7a. The positions of three pollution sources are marked in contour map with red pentagram in Fig. 7b, and three locations are \(\left( 0,0\right) \), \(\left( -1,-3\right) \), and \(\left( 3,3\right) \), respectively. Figure 7 shows that the relationship of the actual coordinate system is mapped into the relationship of a unit coordinate system. In Fig. 7a, b, the x-axis and y-axis represent the two search directions of pollution source to indicate the range of the search area, where the unit of each axis is one hundred meters. In Fig. 7a, the z-axis represents the concentration of pollutants, and its unit is milligrams per cubic meter (mg/m\(^3\)). Some experimental parameters are set shown in Table 2, where the upper bound of map size, the lower bound of map size, the pollution source location, the target threshold, the population size and the maximum number of iterations are given.

Table 5 Average evaluation times and success rates of target found in 100 experiments for RRBOA, A-RPSO, RGWO, and ISSA when the size of UAVs is 30
Fig. 9
figure 9

Convergence curves of RRBOA, A-RPSO, RGWO, and ISSA on nine benchmark functions

Fig. 10
figure 10

Positions of UAVs for RRBOA with generation 1, generation 10, generation 20, and generation 26 on benchmark function F7

Fig. 11
figure 11

Positions of UAVs for A-RPSO with generation 1, generation 20, generation 40, and generation 48 on benchmark function F7

Fig. 12
figure 12

Positions of UAVs for RGWO with generation 1, generation 15, generation 30, and generation 38 on benchmark function F7

Fig. 13
figure 13

Positions of UAVs for ISSA with generation 1, generation 20, generation 40, and generation 45 on benchmark function F7

Experimental results on benchmark simulations

Considering RRBOA is sensitive to the total proportion of sub-UAVs allocated by the parent UAV, a series of experiments are carried out to determine the optimal value of \(\gamma \). This paper conducts 100 independent experiments on nine benchmark functions for various value of \(\gamma \) in [0.1, 0.9], where the size of UAVs is set to 20, and the average evaluation times of target found are calculated to evaluate the performance of the algorithm. Figure 8 shows the average evaluation times of target found for RRBOA with different \(\gamma \). The experimental results indicate that the average evaluation times required for RRBOA to find the target gradually decreases along with increase of \(\gamma \) ranging from 0.1 to 0.5. Moreover, as the value of \(\gamma \) increases from 0.5 to 0.9, the average evaluation times for RRBOA to find the target also increase. The experimental results illustrate the effectiveness of the neutral evolution strategy adopted in this paper. When the value of \(\gamma \) equals 0.5, half of the UAVs are allowed to use the neutral evolution strategy for position updates, thus leading to the best performance of RRBOA in cooperative target search.

Next, RRBOA are compared with A-RPSO, RGWO, and ISSA on benchmark simulation experiments, where the value of \(\gamma \) is 0.5, and the size of UAVs is respectively set at 10, 20, and 30. Et and Suc on 9 benchmark functions are calculated to evaluate the performance of the algorithms. As shown in Table 3, when the size of UAVs is 10, RRBOA has the lowest value of Et on six benchmark functions \(\left( F1,F3,F4,F5,F6,F9\right) \). A-RPSO, RGWO, and ISSA only have the minimum Et values on functions F8, F7, and F2, respectively. Suc of RRBOA is \(100\%\) on functions F1, F2, F3, F4, F8, and F9, and Suc of RRBOA exceeds \(80\%\) on functions F5, F6, and F7. For F5, RRBOA with approximately 350 evaluation times achieves a success rate over \(90\%\), while A-RPSO and RGWO each require almost 500 evaluation times to achieve a success rate over \(70\%\). ISSA performs slightly better than A-RPSO and RGWO for F5, but it also requires nearly 400 evaluation times to achieve a success rate over \(80\%\). On function F7, although RRBOA has a slightly higher number of evaluations compared to RGWO, its success rate in achieving the target search is the highest among the three algorithms. Although RRBOA’s Et is slightly inferior to A-RPSO and RGWO on F2 and F8, it still achieves a \(100\%\) success rate. Among them, ISSA performs the best on function F2. RRBOA has a relatively good performance when the size of UAVs is small, because RRBOA uses neural evolution and Levy flight for cluster exploration, which strengthens the global search capability of the UAVs to some extent. Although A-RPSO, RGWO and ISSA possess fine search capabilities and they can converge quickly for simple environments, A-RPSO, RGWO and ISSA are not as effective as RRBOA in the complex environments \(\left( F5, F6, F7\right) \), due to they are prone to fall into local optimum. Tables 4 and 5 present the experimental results on benchmark simulation experiments when the size of UAVs is 20 and 30, respectively. As the size of UAVs increases, RRBOA can successfully find the target with the minimum Et on seven benchmark functions \(\left( F1, F2, F3, F4, F5, F6, F9 \right) \). A-RPSO performs well only on F8 function. Although RGWO slightly outperforms RRBOA on F7 function, RGWO exhibits a lower Suc in achieving target search. From the tables, it can be observed that the values of Suc are improved as the size of UAVs increases. RRBOA achieves a \(100\%\) success rate in eight experimental scenarios when the sizes of UAVs are 20 and 30, with a success rate of over \(90\%\) in F6 experimental scenarios. For different size of UAVs, the performance of A-RPSO and RGWO are notably poor on functions F6 and F7. Moreover, the performance of ISSA is relatively poor on functions F6, F7, and F8. Thus, RRBOA always obtains better performance than A-RPSO, RGWO and ISSA under different size of UAVs. Moreover, RRBOA could solve problems with less Et and higher Suc when the size of UAVs is 20.

Figure 9 shows the convergence curves of the four algorithms on nine benchmark simulations, where the value of \(\gamma \) is 0.5 and the size of UAVs is 20. The horizontal coordinate is the number of iterations, and the vertical coordinate is the average optimal fitness value of each generation in 100 experiments. Figure 9a–i show that RRBOA exhibits faster convergence curves with lower average fitness values except for function F7. On F7, although the convergence speed of RRBOA is not as fast as RGWO after the mid stage, RRBOA exhibits the highest convergence performance in the early stage. RRBOA outperforms A-RPSO and ISSA on F7. The experimental results indicate that RRBOA occpies good convergence performance on the nine simulation experiments, allowing for quick discovery and identification of target. Additionally, RRBOA always achieves the best average fitness value in the first generation for all experimental scenarios, reflecting the effectiveness of the proposed region segmentation exploration strategy adopted in the initialization.

Figures 10, 11, 12, 13 show the positions of UAVs for RRBOA, A-RPSO, RGWO, and ISSA with different generations on benchmark function F7, respectively. The position of the red pentagram in the center represents the target area of UAVs search. The first figures in Figs. 10, 11, 12, 13 respectively are the initial distribution of UAVs for RRBOA, A-RPSO, and RGWO, which show that RRBOA has a more uniform distribution of UAVs compared to A-RPSO and RGWO. This highlights the advantage of the proposed region segmentation strategy in this study. As iteration progresses, A-RPSO and RGWO exhibit a distinct aggregation pattern of UAVs, which not only leads to a local optimum but also poses a risk of UAVs collision. Although ISSA does not exhibit a aggregation pattern of UAVs, it is difficult to quickly locate the target. RRBOA finds the target location when the number of iterations reaches 26, while A-RPSO, RGWO and ISSA respectively find the target location after 48, 38 and 45 iterations because they may fall into local optimum. For RRBOA, UAV individuals move within their respective independent region, effectively avoiding collisions with others. Moreover, the neutral evolution strategy and the Levy flight strategy further contribute to preventing UAVs from getting stuck in local optima. The experimental results demonstrate the effectiveness of RRBOA in cooperative target search problems.

Experimental results on pollution source search

The experimental results of RRBOA in simulating the search for complex pollution sources are presented in Fig. 14. Once a pollution source is found, it is marked with a red pentagram in the graph. As shown in Fig. 14a, the initial assigned positions of the UAV individuals are depicted based on the region segmentation exploration strategy. Figure 14b demonstrates that the UAVs finds the first pollution source location in the 5th generation, positioned at \(\left( 0,0\right) \). In Fig. 14c, the UAVs finds the second pollution source in the 13th generation, positioned at \(\left( 3,3\right) \). Figure 14d displays that the UAVs discovers the third pollution source in the 23th generation, positioned at \(\left( -1,-3\right) \). For RRBOA, the UAVs can efficiently re-allocate themselves to parent UAV individuals without located pollution source after finding a pollution source, thereby enhancing global exploration capability of the algorithm. As shown in Fig. 15, it illustrates a comparison between RRBOA and A-RPSO for the average iteration times of target found in 100 experiments, with a UAVs size of 20. It can be observed that RRBOA is able to find all targets within approximately 23 iterations, while A-RPSO needs a larger number of iterations to recognize all target regions. RRBOA has a higher search efficiency than A-RPSO due to the presence of multiple targets can increase the probability of trapping in local optima.

Fig. 14
figure 14

Positions of UAVs for RRBOA with generation 1, generation 5, generation 13, and generation 23 on pollution source search

Ablation experiment

To validate the effectiveness of the three proposed strategies, this section conducts ablation experiments on RRBOA. Different strategies are compared on benchmark simulation experiments with 20 UAV individuals, and Et is calculated as the performance indicator. The ablation experimental results are presented in Table 6. RRBOA-1, RRBOA-2, and RRBOA-3 represent RRBOA without employing the region segmentation exploration strategy, RRBOA without adopting the neutral evolution strategy, and RRBOA without utilizing the Levy flight strategy. RRBOA-1, RRBOA-2, and RRBOA-3 have higher values of Et than RRBOA on nine benchmark experiments. The results of the ablation experiments conclude that each implemented strategy can enhance the effectiveness of UAVs search. Thus, the combination of these three distinct and effective strategies results in RRBOA has a better performance in cooperative target search problems.

Summarily, RRBOA utilizes the region segmentation exploration strategy, the neutral evolution strategy, and the adaptive Levy flight strategy to avoid collisions between UAVs, improve the global exploration capability of UAVs, and to prevent from the local optimal of UAVs, respectively. Therefore, RRBOA occupies good performance for cooperative target search of UAVs, and the proposed three strategies exhibit the effectiveness. For two internal parameters, the suggestion values of \(\gamma \) and the suggestion size of UAVs are 0.5 and 20 in the experiments, respectively. Moreover, the larger the size of UAVs is, the higher the average evaluation times and the success rate are. It is noteworthy that the proposed approach is designed for solving static target search problem, and it does not take into account the realistic constraints in terms of communication and energy consumption of UAVs.

Fig. 15
figure 15

Average iteration times of target found in 100 experiments for RRBOA and A-RPSO when the size of UAVs is 20

Conclusions

This paper has proposed a reinforcement robot bean optimization algorithm, denoted as RRBOA, aiming to enhance the efficiency of cooperative target search of UAVs in unknown environments. RRBOA employs a region segmentation exploration strategy to ensure a uniform distribution of UAVs to avoid collisions and the coverage capability of UAVs search. Moreover, a neutral evolution strategy is incorporated to improve the global exploration capability of UAVs. Finally, an adaptive Levy flight strategy is introduced to enhance the diversity of UAVs search and then prevent the UAVs search from converging to the local optima. Experimental comparisons among RRBOA, A-RPSO, and RGWO are conducted on nine benchmark functions and simulated pollution source search experiments. The suggestion value of the total proportion of sub-UAVs allocated by the parent UAV and the suggestion size of UAVs are 0.5 and 20, respectively. Experimental results indicate that RRBOA consistently outperforms other compared algorithms in the majority of experiments, occupying fewer evaluation iterations and higher success rate in locating the target.

Our work still has some shortcomings in both theory and practical application. The proposed approach is designed for static target search, without considering dynamic target search, potential environmental threats, and obstacles in unknown environments. Additionally, the practical application may involve communication constraint and energy consumption constraint in cooperative target search of UAVs. In the future work, the above issues can be addressed to make the proposed approach more suitable for complex scenarios. For example, the proposed approach can be improved to address cooperative target search of UAVs based on the actual constraint requirements.

Table 6 Average evaluation times of target found in 100 experiments for RRBOA and its variants on nine benchmark functions