1 Introduction

As the city grows, there are an increasing number of outdated buildings, making incidents like building collapses more common. Regular inspections and repair works are conducive and indispensable for building health and urban safety. These inspections typically require workers to enter dangerous areas that are risky and difficult to access. With the maturity of unmanned aerial technologies, in many emergent scenarios, unmanned aerial vehicles (UAVs) play an essential role in collecting data efficiently and safely throughout the inspection process [1,2,3,4,5,6] (such as infrared images, optical images, etc.).

For inspection with a single UAV, Phung et al. [7] formulated the building inspection path planning problem into a traveling salesman problem (TSP) and verified their method on open datasets. Tan et al. [8] proposed a building information model (BIM) and UAV-integrated method to realize automatic inspection data collection. The inspection path planning problem is solved by a genetic algorithm based on the BIM of the target building. For real-time path planning, Kuang et al. [9] proposed a real-time flight path planning method based on the reconstructed point clouds and potential coverage areas with the top view of the building as the initial model. The approach can achieve coverage of the target building.

Fig. 1
figure 1

CUHK Cheng Ming Building. A 52 m long, 14 m wide, and 16 m high building. a Physical building. b Reconstruction result

However, they concentrate on using a single UAV to complete the inspection task, which takes a long time to cover the entire building. Take the building shown in Fig. 1 as an example. It is 52 m long, 14 m wide, and 16 m high building. To achieve full coverage of this building, 870 viewpoints need to be reached, and it takes about 3 h for a single UAV to complete the whole task.

To provide fast and high-quality data for 3D building models, multiple UAVs are used for inspection and reconstruction problems to resolve task complexity and improve time efficiency. Jing et al. [10] proposed an offline approach by converting a multi-UAV inspection problem into a set-covering vehicle routing problem and used a modified genetic algorithm to solve it. In [11], Zheng et al. presented a UAV path planning methodology based on the pre-obtained rough 3D model of the building to cover the target building. The above works [12,13,14], either generated the viewpoints and path offline, or needed an initial model from pre-flight, Google earth, or open datasets for planning. However, in many real applications, the 3D model of the building and surrounding environment cannot be acquired in advance. Therefore, it becomes especially important to design an exploration method that can quickly acquire information about the target building and environment.

Recently, more and more researchers focus on using multiple UAVs to autonomously explore the unknown environment and reach their best regions. More specifically, numerous solutions tend to use cellular decomposition methods to reduce the workload for each UAV and perform a complete coverage task in real time. The most popular method is to utilize the Voronoi partition. It is tractable for a large number of robots or in large-scale environments, and beneficial to obtaining a quick and feasible solution only based on the position of robots [15, 16]. Collins et al. [17] proposed a Voronoi-based scalable approach by splitting the non-convex area for each UAV to minimize the task completion time. However, this work mainly focused on 2D environments without considering the 3D geometry of the target. The researchers further expand upon the convergence findings of the aforementioned work, transitioning from planar environments to encompass 3D scenarios [18]. However, it is important to note that these studies primarily focus on addressing the challenges posed by a static Voronoi partition, assuming prior knowledge of the environment. Regrettably, the consideration of unforeseen out-of-range obstacles occurring abruptly is seldom taken into account within these works.

To tackle these challenges, this paper presents an explore-then-exploit framework for multi-UAV cooperative building inspection and reconstruction. Building upon our previous work in [16], this framework guarantees the coverage of the target buildings, balanced workload assignment for each UAV, and realizes online data collection without a pre-obtained 3D model of the target building. The main contributions of this paper are as follows:

  • We propose an explore-then-exploit framework for multi-UAV cooperative building inspection and reconstruction, which can significantly enhance data collection quality.

  • We design a density map construction method that supplies both global and local information for real-time planning and execution. Concurrently, a dynamic Voronoi-based safe region is established to decrease the total inspection time by evenly distributing the workload among multi-UAV.

  • Simulations are conducted in various environments to show the time efficiency, robustness, and effectiveness of the proposed framework. Furthermore, our developed system, which includes a team of three UAVs, is used to physically inspect different large-scale industrial buildings. This approach considerably reduces the overall operation time, effectively minimizing potential hazards.

The rest of the paper is organized as follows. In Sect. 2, the overall framework is introduced. The design of the density map, spatial deployment, exploration, and exploitation method has been covered in Sect. 3. The real applications are described in Sect. 4. Experimental results and evaluations are provided in Sect. 5. Conclusion is given in Sect. 6.

Fig. 2
figure 2

Overview of the proposed framework

2 Explore-then-exploit framework

Existing methods for multi-UAV inspection problems can be divided into two steps. In the first step, a set of viewpoints are generated around the surface of the target facility. Then, in the second step, the shortest paths that travel through these viewpoints are founded by solving a multiple traveling salesman problem (mTSP). However, mTSP is an NP-hard (Non-deterministic polynomial hard) problem when the environment is large and complex. It takes a long time to solve mTSP with plenty of viewpoints to be optimized. Moreover, for large-scale infrastructures, even though multiple UAVs are used, each UAV needs to travel back and forth frequently, which is very time-consuming. As a result, we propose an explore-then-exploit framework to decouple this complex and time-consuming problem and solve it in two stages. This framework reduces the requirement of knowing the 3D model of the target building and the whole process can be processed online. The overall framework is shown in Fig. 2.

During the online exploration of unknown environments, a density map is updated with consideration of the building’s reconstructability. According to the information obtained from the density map and relative state information, a Voronoi-based planner is implemented to construct the safe corridor of the flight region, and UAVs cooperatively converge their corresponding working domains in a distributed way to cover the building as best as they can. This balanced workload deployment can reduce the total inspection time and improve flight efficiency. The Voronoi corridor is constructed to avoid obstacles in the movement. The advantages of performing multi-agent coverage exploration in a distributed way include robustness to agents being added to or leaving the system, less information must be shared with each other agent, and there is no central point of failure.

After completion of exploration, the UAVs have access to environmental and building information. The viewpoints for each UAV are generated within its working region. With regard to the number of viewpoints and the endurance of the batteries of each UAV, a capacity-constrained Voronoi is used to distribute the viewpoints for each flight. Therefore, a complex building inspection task is simplified into a series of TSPs with a small number of viewpoints that can be calculated online. Furthermore, in consideration of the safety constraints of the UAV, an A* planner is used to connect each viewpoint, and the distance of the trajectory is regarded as the travel cost. Eventually, the shortest trajectory with minimization of task completion time that travels through all these viewpoints can be generated by solving TSPs.

During both the exploration and exploitation stage, the online sensing and positioning module plays a significant role in providing robust localization and mapping information. We utilize a Lidar-based simultaneous localization and mapping (SLAM) [19] to calculate the local position and update the surface point cloud of the target building in real time. Euclidean distance map can be built with the acceleration of GPU [20]. Furthermore, the density map can be updated based on the surface point cloud, the reconstructability, the Euclidean distance map, and the local position. It is used to predict the distribution of viewpoints to achieve balanced workload deployment.

With the combination of the above three modules, multiple UAVs can fly autonomously in an unknown environment without the 3D model of the target building. UAVs can cover the entire building with an optimal and collision-free trajectory and reach their best region. After task assignment, UAVs can follow the generated trajectory and collect the images at specified locations and orientations. The collected images can be used in reconstruction and inspection for further applications.

3 Methodology

3.1 Density map construction

In multi-UAV region division, especially for the centroid Voronoi-based method, just using an occupancy grid map or Euclidean distance map to represent the environment is not enough. A map that represents the information about the environment is needed to guide the UAV. To reduce the complexity of tasks and improve time efficiency, the whole area needs to be divided into non-intersecting regions uniformly for each UAV, and the best region division is highly related to the viewpoints for inspection in each region. However, before getting the 3D model of the target building, the distribution of the viewpoints is unknown. As a result, a density map needs to be built to predict the distribution of the viewpoints over the environment \(\mathcal {W}\).

A common method for creating a density map involves utilizing an occupancy grid map to divide the environment into discrete spaces, each represented by a voxel that holds a predicted density value. This density value is influenced by two factors: the reconstructability and the distance to the obstacles. The distance to the obstacles can be obtained from the Euclidean distance map built by the GPU acceleration method [20]. Furthermore, the calculation of reconstructability is based on two theorems: stereo camera-based theorem [21] and light field-based theorem [22].

As shown in Fig. 3, the reconstruction of two images can be simplified into a stereo camera image matching and depth estimation process. The depth estimation error of a stereo camera is determined by the baseline between two cameras and the distance between the observed object and the cameras [23]. Based on this definition, the reconstructability can be calculated by the view correlation relationship between each viewpoint. As shown in Fig. 4. \(\alpha \) is the parallax angle between two viewpoints \(V_1\) and \(V_2\), and s is the surface point of the building. Based on the stereo camera matching theorem, the error in reconstruction grows as the distance increases and the parallax decreases. Concurrently, the capability to reconstruct points using stereo-image matching reduces when dealing with a larger parallax. As a result, we define the view correlation relationship between two viewpoints as follows:

$$\begin{aligned} c(\alpha )=\frac{1}{1+e^{-(\alpha -\alpha _1)}}\left( 1-\frac{1}{1+e^{-(\alpha -\alpha _2)}}\right) , \end{aligned}$$
(1)

where \(\alpha _1\) and \(\alpha _2\) are the lower and upper bounds of the parallax angle.

Fig. 3
figure 3

The relationship between stereo camera depth estimation and the reconstruction process

Fig. 4
figure 4

View correlation relationship

Furthermore, according to the light field theorem, as the camera gets farther from the measured object, the effective field of view of the camera becomes smaller. To determine which surface points are effect, the effective field of view needs to be calculated. We define \(\theta _{\max }\) being the maximum field of view of each viewpoint, \(d_i\) being the distance between \(V_i\) and the surface f. \(d_{\min }\) is the minimum attenuation distance and \(\gamma _\textrm{decay}\) is the decay parameter. When the distance between the viewpoint and the surface is less than the minimum attenuation distance, the effective field of view \(\theta ^{i}_\textrm{eff}\) will not attenuate. Based on the above definition, the effective field of view of each viewpoint \(V_i\) can be calculated by Eq. (2)

$$\begin{aligned} \theta ^i_\textrm{eff} = \theta _{\max }e^{\frac{{-\max }\left( d_i-d_{\min },0\right) }{\gamma _{\textrm{decay}}}}. \end{aligned}$$
(2)

After getting the effective field of view of each viewpoint, we can use the following equation to determine if the point s is in the field of view of \(V_i\):

$$\begin{aligned} \varGamma _{V_i}^{s}=\left\{ \begin{array}{ll} 1,&{} \text{ if } \big \vert {\textrm{arccos}}\left( \dfrac{\varvec{n}_s\cdot \varvec{v}_i}{\Vert \varvec{n}_s\Vert \Vert \varvec{v}_i\Vert }\right) \big \vert <\theta ^i_{\textrm{eff}},\\ 0,&{} \text{ otherwise },\\ \end{array}\right. \end{aligned}$$
(3)

where \(\varvec{n}_s\) represents the normal of point s and \(\varvec{v}_i\) represents the vector of \(V_i\) and s. As shown in Fig. 5, when the distance \(d_i\) is larger than \(d_{\min }\), the effective field of view becomes smaller as the distance becomes farther.

Fig. 5
figure 5

Light field theorem. \(s_3,s_4,\dots ,s_7\) can be seen in \(V_1\). \(s_2,s_3,\dots ,s_8\) can be seen in \(V_2\)

The density value is also related to the distance to the obstacles, because the UAV needs to maintain a safe distance from the environment. Define the distance value of each voxel as \(D_i\). Based on the above equations, the total reconstructability stored as a value on each voxel can be calculated by Eq.  (4)

$$\begin{aligned} R_i=\textstyle \sum \limits _{j=1,j\ne {i}}^{N_v}\textstyle \sum \limits _{k=1}^{N_s}\varGamma _{V_j}^{s_k}c(\alpha )+\lambda D_i, \end{aligned}$$
(4)

where \(N_v\) is the total number of voxels, \(N_s\) is the total number of surface points, and \(\lambda \) is coefficient. Each voxel coordinate corresponds to a position on a map. As a result, the density map of the environment can be represented as a function of position \(R(\varvec{p})\), \( \varvec{p}\in \mathcal {W}\) to guide the UAV inspecting the building. A sample of generated density map is shown in Fig. 6.

Fig. 6
figure 6

An illustration of density map in top view, where the voxel with high value means high reconstructability

3.2 Spatial deployment

After constructing the density map, the area coverage method needs to be designed to drive each UAV moving to its working region to achieve the best coverage of the target building. In particular, the Voronoi partition is conducive to obtaining convergence results for complete coverage problems while strictly avoiding duplicated task execution and collisions.

The Voronoi cell is used for generating non-intersecting convex regions for each UAV. For each time t, the Voronoi cell for each UAV is only determined by neighboring UAVs and the building, thus can be formed as the intersection of the following half-spaces:

  1. 1.

    \(n-1\) half-spaces separating UAV i from UAV j with the parameters of separating hyper-planes \(\varvec{a}_{ij}\) and \(b_{ij}\), where \(i,j \in \{1,\dots ,n\}\);

  2. 2.

    One half-spaces separating UAV i from target building with parameters of separating hyper-planes \(\varvec{a}_{io}\) and \(b_{io}\).

To establish a hyper-plane for inter-robot, we need to separate a safe region for UAV i, so that each point in this region is closer to the position of the UAV than to other UAVs. We can calculate \(\varvec{a}_{ij}\) and \(b_{ij}\) by finding a perpendicular line of any two positions of UAV \(\varvec{p}_{i},\varvec{p}_{j}\) using the following equations:

$$\begin{aligned} \left\{ \begin{aligned}&\varvec{a}_{i j} = \varvec{p}_{i j}=\varvec{p}_{i }-\varvec{p}_{j},\\&b_{i j} = \varvec{p}_{i j}^\textrm{T}\frac{\varvec{p}_{i}+\varvec{p}_{j}}{2}. \end{aligned}\right. \end{aligned}$$
(5)

In the context of constructing hyper-planes between the robot and the building, we regard the building as a bounded convex hull decided by a vertex vector \(\varvec{q}_o =[\varvec{q}_1,\dots , \varvec{q}_{n_o}] \in \mathbb {R}^{d\times n_o}\). The object is to minimize the length of the collision vector from the UAV to the building, and the \(\varvec{a}_{io}\) can be quickly calculated by solving the following low-dimensional optimization problem:

$$\begin{aligned} \begin{aligned} \min&~~\varvec{a}_{io}^\textrm{T}\varvec{a}_{io},\\ ~\text {s.t.}~&~~(\varvec{q}_l-\varvec{p}_i)^\textrm{T} \varvec{a}_{io} \ge 1, \ \forall l \in \{1,\dots ,n_o\}. \end{aligned} \end{aligned}$$
(6)

We then shift the hyper-plane to be tight with the building. Thus, \(b_{io} =\min {\varvec{a}_{io}^\textrm{T}\varvec{q}_o}\).

Based on the above, for a team of n UAVs localized at \(\varvec{P} = [\varvec{p}_1,\dots ,\varvec{p}_n]\) in an environment \(\mathcal {W}\), the Voronoi tessellation, \(\mathcal {V^{\textrm{r}}}(\varvec{P}) = \{\mathcal {V}_r^1,\dots ,\mathcal {V}_r^n\}\) is defined by

$$\begin{aligned} \begin{aligned} \mathcal {V}_r^i&= \left\{ \varvec{p} \in \mathcal {W} \vert \varvec{a}_{ij}^\textrm{T}\varvec{p}\le b_{ij},\forall j\ne i, i,j\in \mathcal {I},\right. \\&\quad \left. \varvec{a}_{io}^\textrm{T}\varvec{p}\le b_{io}\right\} . \end{aligned} \end{aligned}$$
(7)

As a result, each UAV can compute its own working domain region \(\mathcal {V}_r^i\) according to the relative positions of others in a distributed fashion.

To evaluate how well the structure is inspected by the UAVs, we can use the following coverage function to optimize the location of deployment based on the density map:

$$\begin{aligned} \begin{aligned} \mathcal {H}(\varvec{P}) = {\textstyle \sum \limits _{i=1}^{n} \mathcal {H}_i(\varvec{P}) =\textstyle \sum \limits _{i=1}^{n}}\int _{\mathcal {V}^{\textrm{r}}_i}\Vert \varvec{p}-\varvec{p}_i \Vert ^2 R(\varvec{p})\textrm{d}\varvec{p}. \end{aligned} \end{aligned}$$
(8)

Solving the above optimization problem, a gradient decent control policy \(\varvec{u}_i\) is designed to steer UAVs to converge to a centroidal Voronoi tessellation (CVT) [24], that is

$$\begin{aligned} \begin{aligned} \varvec{u}_i = -u_{\max }\frac{\varvec{p}_i - \varvec{C_{\mathcal {V}_r^i}}}{\Vert \varvec{p}_i - \varvec{C_{\mathcal {V}_r^i}} \Vert }, \end{aligned} \end{aligned}$$
(9)

where \(u_{\max }\) is the maximum velocity of UAV. Using the control policy in Eq. (9), Each UAV can cooperatively update its disjoint Voronoi cell according to its and neighbors’ positions at each time step. Eventually, a load-balanced separation of the workspace can be constructed. This division, which is crucial for optimal operations, ensures that the workload is distributed evenly among the available spaces. In this specific context, each UAV is assigned a distinct portion of the building to cover.

3.3 Online exploration

We now give a detailed process of our online exploration method described in Algorithm 1. In real-world applications, each agent is equipped with a sensor with limited sensing capability. The density map that reflects the information on the environment needs to be updated online. At the beginning of the inspection task, we assume that the building is a rectangle with estimated surface points \(\bar{S}\), and the exact surface points S of the building will be constantly updated as the group of UAVs moves. First, the global density map R is constructed based on the rough estimate of the building. It is formulated as a grid map and the initial value of \(R(\varvec{p})\) is defined as

$$\begin{aligned} R(\varvec{p})=\left\{ \begin{array}{ll} 1, &{} \text{ if }~~\varvec{p} \in \mathcal {W}\setminus \mathcal {B}, \\ 0, &{} \text{ if }~~\varvec{p} \in \mathcal {B},\\ \end{array} \right. \end{aligned}$$
(10)

where \(\mathcal {B}\) is the target building.

Algorithm 1
figure e

Distributed online exploration

During exploring the surface of the building at each time step, the Voronoi region \(\mathcal {V}^{\textrm{r}}_i\) for each UAV i can be constructed based on the relative positions with other UAVs and the building. Then, the control policy in Eq. (9) drives each UAV to cover the building guided by the density map. During the movement, the shape of the surface of the building is gradually updated according to the limited sensor range and the point clouds generated by Lidar SLAM. The point clouds that are in the building area \(\mathcal {B}\) are extracted to replace the initial surface points. Meanwhile, the density map can be updated based on the surface points and the relative position using Eq. (4). After several iterations of these processes, a group of agents will complete the exploration of the unknown environment. Subsequently, a spatial deployment can be formulated, aiming to offer locally optimal sensor coverage across the structure.

3.4 Exploitation

Upon completion of coverage exploration, the multi-agent system has access to environmental and structure surface information S. The online exploitation module is performed to let the UAVs collect the reconstruction and inspection data. First, \(N_p\) viewpoints are generated according to the shape of the surface of the target building. The viewpoints are generated in the normal direction of the surface with the consideration of the shooting distance and obstacles [25].

To consider the endurance of battery life during flight, we need to uniformly divide the number of viewpoints to be traveled \(N_p\) for each UAV on each flight. Partitioning \(N_p\) into m clusters will be equivalent to a problem that divides the working area of each UAV \(\mathcal {V}_r^i\) into m smaller Voronoi cells. Therefore, capacity-constrained Voronoi diagrams are introduced to solve this clustering problem.

Consider a set \(\varvec{P}_m\) of m generators determines a new Voronoi partition \(\mathcal {V}_r^i(\varvec{P}_m) =\{\mathcal {V}^{\textrm{vpt},i}_1,\dots ,\mathcal {V}^{\textrm{vpt},i}_m \}\) in the region \(\mathcal {V}_r^i,\varvec{p}\in \mathcal {V}_r^i\). Denote the capacity \(\vert \mathcal {V}^{\textrm{vpt},i}_k\vert \) of a generator \(\varvec{p}_k \in \varvec{P}_m\) with respect to its Voronoi region \(\mathcal {V}^{\textrm{vpt},i}_k \in \mathcal {V}_r^i (\varvec{P}_m)\) as follows:

$$\begin{aligned} \begin{aligned} \vert \mathcal {V}^{\textrm{vpt},i}_k\vert = \int _{\varvec{p}\in \mathcal {V}^{\textrm{vpt},i}_k} R(\varvec{p})\textrm{d}\varvec{p}. \end{aligned} \end{aligned}$$
(11)

To improve the flight efficiency and reduce the total execution time, the distribution of m generators needs to satisfy the capacity constraint \(\varvec{C}=\{c_k\vert k \in \{1,\dots , m\} \}\), where \(c_k\) is defined as

$$\begin{aligned} \begin{aligned} c_k = \frac{1}{m}\int _{\mathcal {V}_r^i }R(\varvec{p})\textrm{d}\varvec{p}. \end{aligned} \end{aligned}$$
(12)

Then, a capacity-constrained Voronoi tessellation with a capacity constraint fulfills the following equation:

$$\begin{aligned} \begin{aligned} \textstyle \sum \limits _{k}^{m}\left( \vert \mathcal {V}^{\textrm{vpt},i}_k\vert - c_k\right) = 0. \end{aligned} \end{aligned}$$
(13)

Our method first generates an initial partition that fulfills the capacity constraint for all sub-regions. Similar to the above centroid Voronoi tessellation, this partition is then optimized so that m generators are relocated to the centroids of their sub-regions, while simultaneously maintaining the capacity for each site. As a result, a capacity-constrained Voronoi partition separates the working region of each UAV into a set of sub-regions that balance the workload for the batteries and minimize the sum of the shortest distances from the viewpoints to assigned generators.

After viewpoints distribution, the \(N_p\) number of viewpoints is divided into several small subsets \(N^m_p\). For each subset, to get the shortest path that travels through all these viewpoints, we formulate it into a collision-free trajectory-based TSP. Based on the occupancy grid map, every two viewpoints are connected by A* planner [26]. A collision-free path can be extracted, and the distance of the path is the travel cost \(c_{ij}\) between every two viewpoints. As a result, it becomes an asymmetric traveling salesman problem. The formulation is shown in Eq. (14). \(v_{ij}\) represents the connection relationship between viewpoint \(v_i\) and \(v_j\). \(u_i\) and \(u_j\) are any real numbers. The objective function of this optimization problem is to minimize the total distance traveling through all these viewpoints

$$\begin{aligned} \begin{aligned} \min&\textstyle \sum \limits _{i=1}^{n} \textstyle \sum \limits _{j=1}^{n} c_{i j} v_{i j}, \\ \text{ s.t. }&\textstyle \sum \limits _{j \in \bar{N}} v_{i j}=1, \ \forall i \in N^m_p, \\&\textstyle \sum \limits _{i \in \bar{N}} v_{i j}=1, \ \forall j \in N^m_p,\\&u_{i}-u_{j}+n v_{i j} \le n-1, \ \forall i,j \in N^m_p, \\&v_{i j} \in \{0,1\}, \ \forall i,j \in N^m_p,\\&u_{i}, u_{j} \in \mathbb {R}, \ \forall i,j \in N^m_p.\\ \end{aligned} \end{aligned}$$
(14)

Because the number of each set is small, the TSP of each set can be solved online. Finally, the UAVs will follow the generated paths and collect the images of the target building.

4 Applications

To verify the performance of our coverage control within a real-time operating platform, particularly when faced with limited computational resources, we have implemented the system into a real-world inspection application. The inspection task is conducted by three fully autonomous UAVs. These UAVs are tasked with the role to inspect and reconstruct an academic building, as shown in Fig. 1.

The building exhibits a range of defects of various magnitudes, with some necessitating more in-depth inspections to ensure the safety of the surrounding infrastructure. To completely cover this building, the UAVs must access roughly 900 different viewpoints for image capture. Carrying out this task is a lengthy process for a single UAV, especially considering the vast scale of the environment. As a result, to accomplish these tasks and decrease the total time needed, it is important to utilize a spatially load-balanced deployment of multiple UAVs. This approach allows for more efficient use of resources and a significant reduction in the time required to complete the inspection.

Fig. 7
figure 7

Coverage process of our algorithm. Top row: paths of 3 UAVs taking off from a corner in an obstacles-dense environment. Bottom row: density map information

We deploy area coverage, sensing, viewpoints generation, and path generation module on these UAVs. Starting from one corner of the building, three UAVs autonomously fly to their own load-balanced areas based on the reconstructability obtained from the density map. The flight path and the generated density map are shown in Fig. 7. They coordinate with other UAVs to expand the range of perceived buildings gradually. They collaborate with other drones to progressively broaden the scope of buildings they can perceive until reaching the best regions. This figure suggests that our coverage control strategy encourages each agent to investigate the unexplored surroundings in a decentralized manner, ensuring safety throughout the dynamic coverage operation.

After reaching the best regions, as shown in Fig. 8a, the 3D point cloud of this building can be obtained and the viewpoints can be generated along the vertical direction of the surface. Furthermore, the viewpoints are divided into several sub-regions by considering the battery endurance of the UAV.

Fig. 8
figure 8

Illustration of point clouds and viewpoints

Finally, the viewpoints are connected by solving the TSP and UAVs travel all of the viewpoints in the shortest path to collect images, which can be used by AI detection. The region division result and travel paths are shown in Fig. 8b, c. The paths with different colors represent the task of each flight. 3D reconstruction [27] is conducted using these collected images to verify the effectiveness. The result shown in Fig. 9 can achieve millimeter-level accuracy. To further evaluate the generated model, we conduct a scan to BIM [28] process. The generated building information model is shown in the middle row of Fig. 9. The RGB and infrared information that reflect the found defects are shown in the right row. Despite some instability of the GPS signal and uncertainty of localization, these three UAVs successfully finished the inspection and reconstruction task using our proposed framework.

Fig. 9
figure 9

Sample of reconstruction result, building information model, and collected images

5 Experiment

To elaborate on the proposed explore-then-exploit cooperative framework, we conduct several experiments in both simulators and real scenarios. We use Gazebo as the simulation platform and Pixhawk as the flight controller. The simulation and algorithms are running on a laptop with an Intel i7-9700 CPU and 16 GB RAM. The parameters of our algorithm are set as follows. The resolution of the grid map is 1 m for all axes. To meet the real-world requirements, the maximum velocity of the agent is 2 m/s, and the time interval is chosen as \(\Delta t = 0.1\) s. We assume that each UAV is equipped with a lidar with the same range radius.

5.1 Validate time efficiency

To show the real-time performance of our proposed work, we select mTSP as the benchmark and conduct the experiments on a large-scale environment with different numbers of UAVs, as shown in Fig. 10. The scene is a school with 10 m high, 20 m wide, and 10 m long. We use the calculation time (CT) per re-planning iteration and total inspection time (IT) as the evaluation metrics. For a fair comparison, because the mTSP can only work in known environments, we assume that the 3D model of the building is known, and mTSP has converged. As shown in Table 1, CT per iteration for each UAV in our method remains the same basically, which shows that our method is intractable for a large-scale multi-UAV system and satisfied with real-time requirements. On the other hand, as the number of coordinated UAVs becomes larger, IT has decreased dramatically shorter than the mTSP method. This is because our method distributes the multi-UAV into more workload-balanced working regions. It is observed that compared with others, our proposed work gains higher time efficiency satisfying the real-time requirements.

Table 1 Scalability analysis
Fig. 10
figure 10

Simulation environments

Fig. 11
figure 11

Coverage paths of 5 UAVs taking off from a random initial configuration. One of the UAVs (light blue) suddenly failed and others dynamically adjust the responsible area

5.2 Validate robustness

We continue to assess how well our proposed system handles a situation where a single UAV fails during the mission. To check how broadly our algorithm can be applied, we run it in a different, messy environment. We keep the dimensions of this environment and the target building identical to those used earlier. The obstacles are scattered randomly and come from different heights. As depicted in Fig. 11, five agents, beginning from random starting points, collaboratively aim to cover the building extensively, guided by the reconstructability data from the density map. When the light blue agent becomes non-functional at the 7.6 s mark, its nearby agents (the dark blue and red ones) adjust their operational areas to make up for the coverage shortfall due to the faulty drone. Ultimately, all agents manage to fully cover the building by employing our distributed spatial deployment algorithm, achieving this at the 14.6 s mark. It is noticeable that even with a UAV failure mid-flight, the remaining drones can still successfully complete the mission by adapting to the changes in the information density map.

6 Conclusion

This paper proposes an explore-then-exploit multi-UAV cooperative framework for building inspection and reconstruction. The proposed framework guarantees the coverage of the target buildings, balanced workload assignment for each UAV, and realizes online data collection without a pre-obtained 3D model of the target building. Despite an increase in obstacle density, the computational time of our suggested algorithm remains consistent, and the collision avoidance technique guarantees a safe region. We have effectively carried out a comprehensive inspection of a large industrial building using the system we developed. The high-quality images gathered by the multi-agent system can help property managers identify any structural issues that need to be addressed. The system we have proposed holds substantial value for the maintenance and management of industrial facilities. Furthermore, the comparison results show that our method has higher time efficiency compared with other method and the experiments verify the effectiveness of our method in different large-scale environments.