Introduction

Unmanned aerial vehicles (UAVs), informally known as drones, have been receiving increasing interest in academic research as well as industrial applications due to the increase of sensing and compute capabilities and the decrease of their form factors and price. In addition, the development of several technologies such as on-board intelligence and autonomous capabilities has increased the utilization of UAV systems in more applications. As a result, different types of unmanned aerial platforms are now being used in applications such as aerial photography and remote sensing [1], infrastructure inspection [2, 3], payload transportation [4], precision agriculture [5], surveillance [6], and search and rescue missions [7], to name a few.

Small and light-weight UAVs, also known as micro aerial vehicles (MAVs), have further stretched the boundaries of their applications. In addition to their portability, smaller UAVs have more agility which allows them to navigate through narrow environments, and cause less damage to their surroundings due to their light weight. The small size of MAVs, however, limits their capabilities in terms of less flight time, on-board sensing and compute power, and payload, which, as a result, significantly reduces the number of tasks that they can perform individually. This has motivated the development of aerial swarms [8•, 9•] in which multiple UAVs cooperate in large teams to overcome the limitations of the individual robots. Robot swarms are envisioned to be fully distributed systems where each robot observes its local neighboring environment and coordinates with other robots to execute local actions that collectively lead to achieving an overall swarm goal. Indeed, this is a multi-disciplinary complex system that requires tight integration of multiple subsystems such as global and relative localization, safe trajectory planning , and swarm-level task coordination.

In this article, we review the main motivating applications of aerial swarms, see Section 5, and a discussion of the core elements of an aerial swarm system, see Section 5 . In Section 5, we share some thoughts that can help advancing swarm systems towards real-world applications, and a proposed modular system architecture. Finally, our conclusions are presented in Section 5.

Overview of Aerial Swarm Applications

Although single-UAV systems have been heavily studied in the literature and their numerous applications already exist as mentioned in the introduction, there are limitations that can be overcome by using a group of aerial vehicles which are discussed in this section.

Entertainment

Aerial drone shows are considered one of the most successful entertainment applications of UAV swarms. In 2016, Intel made the first Guinness world record of most UAVs airborne simultaneously with a formation of 100 drones equipped with LEDs Footnote 1, followed by another record of 500 drones in the same year Footnote 2. Ehang, a Chinese company, made new records by performing a light show with 1000 drones in 2017 Footnote 3, followed by another show with 1374 drones in 2017 Footnote 4. The latest record of the largest drone show was performed by Intel in 2018 with 2018 drones Footnote 5.

In such performances, each drone is usually a simple UAV platform (e.g., a quadcopter) equipped with an on-board flight controller, GPS sensor for positioning, customized LEDs, and a communication module to communicate with the ground station. The ground station is used to pre-compute the required individual missions (collision-free trajectories in open 3D space) of all drones during the show. Then, each mission is uploaded to the corresponding drone which is executed by the on-board flight controller. The ground station also continuously monitors the swarm status during the show and provides controls for any required emergency actions.

While solving a multi-stage optimization problem in a distributed setting can lead to optimal solution, these methods are often not feasible for aerial swarms because of the computational and communication requirements of these methods. There has to be a trade-off between real-time performance of an algorithm and optimal performance [10]. For instance, MPC-based srs presented a method of multi-robot goal assignment based on linear sum assignment algorithm, and time parameterized collision-free trajectories in obstacle-free space. This work was extended by authors in [11] to further optimize the goal shape, the scale, and location of the goal formation.

There are also some research works which extend the conventional aerial choreography to allow user interaction. For example, [12, 13] show some methods to interpret online user commands to real-time feasible behaviors of quadrotor groups in theatrical performance.

Security and Surveillance

It is now a matter of time for drones to be deployed at scale for industrial and commercial security around the world. Due to the rapid advances in drone-related and AI technologies and the noticeable decrease in price, several of-the-shelf drones can now easily be integrated with on-board monitoring sensors such as RGB and thermal cameras for aerial monitoring surveillance. Surveillance drones introduce several advantages over traditional methods (fixed CCTV cameras and human patrols), such as extending the monitored areas as well as reducing risks and costs associated with human patrols. Single drones have been already used as electronic eyes in manual and automated operations. Drone swarms have further extended advantages as they can cover much larger areas in shorter times. With the ability of having on-board intelligence, UAVs not only are able to collect intelligence information about automatically detected and identified objects, but are also able, as a fleet, to self-coordinate their tasks and collaborate in order to accomplish surveillance tasks such as optimal area coverage given vehicle and sensors constraints [14, 15], convoy protection to a group of ground vehicles [16], and persistent surveillance [17, 18].

The rapid advances in UAV technologies along with their affordability comes with increasing risks of malicious and unauthorized use. There are over 250 recorded UAV incidents all over the world ranging from an unauthorized UAV use over private properties to drone attacks over critical assets [19].

Several technologies have been developed to help providing counter-drone solutions that are able to detect unauthorized drone operations and provide mitigation methods. A counter-drone system usually consists of two main subsystems, UAV detection and tracking, and mitigation or neutralization systems. Detection and tracking can be accomplished by individual sensor technologies such as radio frequency (RF) detection and spoofing, RADAR, optical sensors (RGB and thermal cameras), and acoustic sensor, and can also be done by fusing the aforementioned technologies in order to maximize detection and tracking accuracy. Mitigation and neutralization methods can be performed using GPS jamming, RF hijacking, high-power laser shooting, or the use of a drone system carrying a net gun for intruder capturing and retrieval. In the case of multi-UAV threats, the previous technologies become less efficient. Therefore, there has been increasing interest in developing multi-UAV pursuit systems.

Although swarm-based anti-UAV systems have not been commercially realized yet, there are several research efforts in that regard. For example, multi-agent pursuit-evasion games [20] are a common method to describe such problems. Research works try to pursue this problem using 3 main approaches, namely, in game theoretical [21], optimization-based [22], and reinforcement learning [23] frameworks. Another important aspect in such problems is the decision making architecture, centralized vs. decentralized. In centralized control settings, a single control station is responsible for computing actions for all agents which are then communicated to agents for execution. However, in decentralized control settings, robot actions are computed on-board which avoids the single-point-of-failure issue in the centralized settings. Most of the research work results of this problem are based on simulations. Only few research works reported experimental results and test beds using real multi-UAV systems such as [24, 25].

UAV groups are also useful in surveillance applications where large areas can be searched and covered in shorter times compared to single UAV use. Examples of research works in this area can be found in [26] where a full system of multiple quadrotors capable of planning an autonomous surveillance mission is presented, and in [6] where authors presented a multi-UAV system for optimal sensor coverage and with vision-based relative localization.

Collaborative Transportation

Aerial payload transportation has been receiving increasing attention especially in the logistic sectors for package delivery applications Footnote 6. Several companies have already used single UAVs for small package delivery such as Flirty with its first FAA-approved autonomous urban drone delivery in the USA in 2016 Footnote 7, Amazon [27] making its first delivery using a drone in the UK in 2016, and Wingcopter drone delivering COVID-19 test kits in Scotland in 2020 Footnote 8, to name a few. Currently, single UAVs can carry relatively small packages due to payload and energy constraints. As a result, heavier payloads would require a larger and heavier UAV that is difficult to deploy due to safety and regulation constraints.

On the other hand, larger payload can be transported using a group of small UAVs which has been demonstrated in several research works. For example, an early research work in [28] presented methods of controlling a group of quadrotors to grasp and transport a rigidly attached payload with known mass. The presented control laws of each quadrotor are decentralized given that each quadrotor knows its fixed relative position and orientation with respect to the body and payload goal in terms of hover position or desired trajectory. Although the controllers were decentralized, the required state estimation of quadrotor positions and velocity is done in a centralized way using an overhead motion capture system. For transporting a certain class of flexible structures (e.g., flexible ring) using multiple UAVs, but still rigidly attached to the UAVs, authors in [29] presented methods to estimate payload deformation and stabilize it in 3D using a centralized LQR controller. The quadrotor positions and velocities were also accurately provided by a motion capture system.

Most of the works in this area present experimental results that leverage the accurate state estimation of transnational and rotational states, which are provided by overhead motion capture systems. Motion capture systems are usually suitable for indoor settings such as laboratories. In outdoor settings, however, state estimation of full six degree-of-freedom rigid body models is more difficult. With the advances of onboard compute power, several visual- and inertial-based state estimation methods have been developed which provide good estimation performance for real outdoor application including multi-UAV payload transportation. For example, authors in [30] presented a system of multiple small quadrotors that carry an attached rigid rod and navigate using only onboard RGB camera and an IMU sensors.

Environmental Monitoring

There are potential advantages in using groups of UAVs in environmental monitoring applications. For example, authors in [31] presented a multi-UAV system for real-time flood monitoring and tracking which is usually not a very accurate task to accomplish using conventional forecasting methods. Each UAV carries a set of disposable sensors that can be carried by flood streams and provide communication to UAVs to estimate flood direction and velocity. Other research works related to environmental monitoring such as pollution level monitoring and assessment of forest environments can be found in [32, 33].

Flying Cellular Networks

Although using UAVs for image and video acquisition is currently the most popular application, developing the so-called flying cellular networks [34] has been receiving an increasing attention. On one hand, UAVs can be equipped with cellular communication modules in order to extend their operation range, therefore significantly improving their service. On the other hand, UAVs can offer a unique opportunity to deploy flying base stations that can be dynamically located in 3D in order to boost coverage and optimize user experience [35].

One of the main challenges to this kind of application is the UAV’s limited endurance as a typical electric UAV would require recharging every hour or so. This problem can be overcome by utilizing tethered UAVs (TUAVs), see [36] and [37]. A TUAV receives continuous power and high bandwidth communication through a tether connected to a base station. Interestingly, tethered drones for this application outperforms free-flying UAVs especially for 5G networks as the 5G equipment are heavier and consumes more power than 4G.

Several organizations have made efforts into realizing this potential application. For example, in 2014, DARPA, one of the research arms of the US military, announced the establishment of a program called Hotspot to develop a swarm of drones that could provide one gigabit per second communications for troops operating in remote areas [38]. Also, Google is working on a project called SkyBender that experiments with a group of solar-powered UAVs to test millimeter-wave radio transmissions, a technology that could theoretically transmit gigabits of data every second, up to 40 times more than today’s 4G LTE systems [39].

Core Elements of a Swarm System

A group of robots can exhibit a swarm behavior by integrating coordination mechanisms in their controls. The multi-agent systems literature provides numerous tools and algorithms for coordinated motion, including formation control [40], consensus [41], rendezvous [42], and flocking [43]. For instance, Reynolds simulated the flocking behavior at the individual level with three rules: collision avoidance (staying out of the collision zone of the nearby peers), velocity matching (achieving the same velocity in the limit), and flock centering (moving cohesively) [44]. By integrating environmental factors into the model, more comprehensive swarm models can be realized. Coordination methods can be thought of as tools for a general module of swarm-level mission planning. Swarm missions can be formation control, pursuit-evasion problems, and optimal coverage, to name a few. Swarm mission planning requires another module for swarm-level state estimation in general and localization in particular. In this section, swarm localization and planning methods are discussed.

State Estimation and Localization

To apply coordination algorithms on robot swarms, each robot must possess a sense of situational awareness by perceiving its environment continuously. In aerial robot swarms, this requirement corresponds to acquiring the state variables of neighbor robots such as position, velocity, and attitude. However, exchange of such data among swarm members entails for designing localization and communication mechanisms and may lead to a high computational demand as the swarm size increases.

Localization refers to estimating a robot’s position in a given map of the environment. As the robot travels in the environment, it associates the collected sensory data with the possible locations in the free space of the map. Since this method leads to multiple hypothesis, Bayesian filtering methods are usually applied to estimate the robot position. Although single robot localization has been well understood as one of the main research fields in robotics, only primitive results have been proposed for the multi-robot localization to date. The multi-robot localization objective has unique demands. First, one is interested in the relative quantities among the robots instead of the positions of the individual robots. For instance, formation control algorithms employ relative positions between robots to achieve a desired formation shape. Second, the search map is usually the complete 2D or 3D space in a multi-robot setting as opposed to a constrained map. Third, the multi-robot localization setting usually lacks a reference landmark and is treated independent of the environment. These unique challenges emphasize the complexity of the problem and call for advanced techniques beyond the classical localization solutions.

In indoor environments, motion capture (mocap) systems can provide a precise localization solution for multi-robot systems (Fig. 1-left). In this approach, a set of infrared camera modules connected to a ground station calculates and broadcasts the positions of all robots in the system’s coverage area. Therefore, they enable the demonstration of swarm coordination and intelligence by the use of centralized algorithms [45, 46]. From a practical perspective, distributed coordination algorithms offer more flexible, robust, and resilient robot swarm realizations compared to the centralized approaches. Since such algorithms are based on local interactions in the corresponding swarm graph, they impose onboard sensing and communication capabilities on each individual robot. Although a mocap system can be employed to mimic distributed implementations, e.g., by restricting the information flow according to the distributed rules, its immobile infrastructure restricts the swarm’s freedom. Essentially, aerial robot swarms are expected to function in unstructured environments as well.

Fig. 1
figure 1

(Left) Indoor localization solution: A set of motion capture cameras or wireless devices compute and broadcast the drones’ positions via a ground station. (Right) Outdoor localization solution: Each drone fuses several sensor measurements such as GPS, camera, and ultrawideband to compute the relative positions to its neighbor drones

In outdoor environments, a naive attempt would be employing onboard GPS sensors for swarm coordination. However, most commercial GPS sensors provide absolute position data within three meters accuracy, which may not be sufficient for operations where the drones fly close by. Thus, either wireless communication among swarm members [47] or onboard sensor fusion [48] can be used to improve the positioning accuracy. Besides, the swarm must behave robustly in case of GPS signal degradation, e.g., in tunnels or buildings. Therefore, swarm robots should have the onboard capabilities to handle such challenging scenarios (Fig. 1-Right).

With the ultimate goal of providing a flexible localization solution, researchers have designed several onboard localization frameworks for aerial swarms. Onboard localization frameworks can be divided into two categories: distance-based and vision-based. Vision-based architectures rely on onboard camera sensors to detect the neighbor robots [49,50,51,52]. A common practice is to place patterns or ultraviolet lights on the robots that can be detected by basic computer vision algorithms. Recently, with the significant developments in the speed and performance of computational boards, embedding the computationally demanding deep learning algorithms in small boards on drones has become possible. Primary results were demonstrated on a two-drone system integrating the you only look once (YOLO) algorithm in [53, 54]. Although remarkable performance was obtained with the vision-based approaches, vision-only algorithms have structural drawbacks. For instance, the swarm configuration is constrained by the cameras’ field-of-views, which entails to maintain a suitable configuration during operation. Furthermore, the detection performance is prone to ambient conditions and, for the learning-based detection approaches, to the size of the training dataset.

Collaborative simultaneous localization and mapping (C-SLAM) algorithms can provide a viable coordination mechanism if a reliable communication framework among swarm members is guaranteed [55]. They achieve both the localization and mapping objectives in a swarm at the expanse of additional computation and communication burden. The fundamental requirement of communication with a central station continuously poses a challenge in real world implementations, which was recently addressed in [56].

Distance-based onboard localization architectures rely on the inter-robot distances which can be acquired by wireless communication devices such as ultrawideband (UWB), radio frequency, or Bluetooth modules. Research in this direction have focused on estimating the relative positions to neighbor robots in a robot’s local (body) coordinate frame by utilizing Bayesian filtering methods [57,58,59,60,61]. The relative positions are calculated based on various geometric or optimization methods and filtered to obtain a precise estimation. Particularly, omnidirectional UWB sensors eliminate the FOV limitation and ambient condition requirements of the vision-based approaches, see Fig. 2 for a depiction of an outdoor experiment of two drones in a leader-follower configuration using UWB sensors only for localization. Also, the UWB device can be modified to produce the bearing angle toward the neighbor robots as well [62]. On the other hand, UWB sensor implementation poses new challenges such as non-line-of-sight measurements and accurate calibration.

Fig. 2
figure 2

Front (left) and top (right) views of an outdoor experiment of a drone with three UWB sensors (hexacopter) estimate the relative position toward another drone (quadcopter) with a single UWB sensor by using the three distance measurements. The estimation signal is fed back to the control algorithms of the drones for a coordinated flight. The drones rely on their onboard sensors only and do not use a GPS sensor for localization

The idea of improving the performance of multi-robot localization by integrating the inter-robot communications motivated the development of the cooperative localization concept [63,64,65]. In such a framework, all robots run a common filter such that once a robot receives a measurement, the measured data together with some filter parameters are transmitted to the other robots. Cooperative localization proves how communication among team members can enhance the localization performance, and the entire algorithm can be designed in a centralized or distributed way. However, as a natural outcome of the additional communication layer, the processing burden increases with the size of the swarm [63].

Swarm Path Planning

Path planning for aerial swarms is an active area of research in robotics as well as control community [66]. Numerous approaches exist in the literature for driving a robotic swarm from some initial configuration to a desired configuration. In one approach, all the robots comprising a swarm are considered as a team. In the team-based approaches, the path planning problem is formulated as a multi-stage optimization problem in which the global objective function is defined as a sum of local cost functions that are typically convex [67,68,69]. In this setup, the local cost function corresponds to a robot but it depends on the state of the entire system, which implies that the actions of all the robots are dependent on each other. An optimal solution to this global optimization problem consists of optimal trajectories for all the robots in the network.

The global optimization problem can be solved by a centralized authority, which then communicates optimal action trajectories to individual agents. The optimization problem can also be solved in a distributed manner. For distributed computation of optimal solutions, each robot starts with a local estimate of a globally optimal solution. Then, the robots exchange their local estimates with each other via communication. Finally, each robot updates its local estimate by using this information and solving a local optimization problem, and the process is repeated. If this communication is among all the robots, then the centralized setup is recovered. Typically, each robot is only allowed to communicate with a subset of other robots. If the multi-stage optimization problem satisfies certain properties like linear dynamics and quadratic cost, then convergence to a globally optimal solution can be guaranteed. For systems with uncoupled dynamics and non-linear stage costs, distributed algorithms with stability guarantees exist in the literature. However, the optimality and stability guarantees are under the conditions that the robots are allowed to communicate infinitely often and the communication network topology remains connected [69,70,71].

Solving the multi-stage optimization problem generates optimal trajectories for an entire swarm. However, implementing these trajectories in real time is another equally important problem. One approach is to implement trajectory tracking controllers that are designed to follow the optimal trajectories. Even if the optimal trajectories are collision free, the trajectory tracking controllers are augmented with local controllers for obstacle and collision avoidance [72,73,74], and [75]. These local controllers are often based on potential functions that generate repulsive forces for avoiding collisions among neighboring robots or obstacles. The idea of using potential functions for obstacle and collision avoidance has been around for a while, see [76] and [44]. However, using potential functions for an aerial swarm with a large number of robots has its challenges as well since these approaches often lead to local minima and cannot provide safety guarantees in the case of large number of fast moving robots [77].

Another prevalent approach is to solve the multi-stage optimization problem in a model predictive control (MPC) setting as in [67, 69,70,71, 78]. In an MPC setting, an optimal trajectory is computed typically over a finite horizon. Then, the computed trajectory is executed for the first stage only and the problem is solved again. MPC-based solutions are popular when operating under unknown environments since solving the problem in a receding horizon setting compensates for unknown sources of disturbances. For swarm applications, MPC makes sense because even in the case of complete information about the environment, the robots themselves act as a source of dynamic uncertainty that has to be actively compensated.

While solving a multi-stage optimization problem in a distributed setting can lead to optimal solution, these methods are often not feasible for aerial swarms because of the computational and communication requirements of these methods. There has to be a trade-off between real-time performance of an algorithm and optimal performance [10]. For instance, in MPC-based solutions, the robots have to communicate their entire trajectories with their neighbors a large number of times (ideally infinitely often). Moreover, after each communication, the robots have to solve complex multi-stage optimization problem for updating their local estimates. The time required for this extensive communication and computation is not available for decision-making in applications involving aerial swarms. In a system comprising a large number of UAVs moving at high speeds, a slight delay in decision-making may lead to collisions among several robots and break down the entire system. In such applications, efficiently computing feasible actions that can guarantee obstacle and collision avoidance is often more desirable than computing optimal actions with delay. Thus, computing suboptimal actions with real-time performance guarantees is also an active area of research in multi-robot path planning [22, 25, 68].

The other major approach for path planning in multi-robot systems is motivated from game theory literature in which each robot is modeled as a self interested decision maker [79,80,81]. In this setup, each robot is assigned a utility function that depends on its local information only. Local information of a robot typically includes its own action and the actions of its immediate neighbors, which it can either observe using on-board sensors or can receive by communicating with the neighbors. Then, the robots update their actions to myopically maximize their utility through local learning rules. Learning in games is an active area of research and various learning rules have been proposed in the literature that guarantee convergence to Nash Equilibria that often correspond to desired global configurations (see for instance [82] and [83]). Game theoretic approaches have been used for various applications like vehicle target assignment problems [84], distributed coverage [85], and collision avoidance [80]. One important research challenge in this approach is utility design for individual robots [86]. If the utility functions of individual robots are not properly aligned with the global objective functions, the quality of the global solution may deteriorate. Various quality measures like price of anarchy and price of stability have been developed to quantify performance loss because of the misalignment between local incentives and global objective [87].

Towards a Generalized Swarm System Architecture

It is clear that aerial swarm systems require further real-world oriented development in order to be deployable at a large scale in realistic environments. Any UAV swarm system, regardless of the specific application, should include two main modules: state estimation (at the robot-level and swarm-level) and swarm mission planning (also at the robot-level and swarm-level).

As mentioned in Section 5, there are several ways to develop swarm localization systems. However, those are usually sensor-specific and, therefore, feasible to use under certain environment conditions (e.g., feature-rich environments for vision-based methods, outdoor environment for GPS-based methods). One way to enhance the localization accuracy is to develop multi-modal localization system, i.e., multi-sensor fusion. More specifically, GPS sensors can be used for outdoor localization, visual inertial odometry can be used for locally accurate localization when GPS signal is degraded or not available, and UWB can be used to provide omni-directional relative localization in featureless environments where vision-based methods fail. By fusing all these methods together, for example, using Kalman filter–based algorithms, the overall robot-level and swarm-level localization accuracy can be significantly improved and becomes less sensitive to environment conditions by overcoming the individual sensor limitations.

Complete swarm-level mission planning while accounting for individual robot’s trajectory planning and obstacle avoidance is known to be computationally prohibitive especially in large swarm systems operating in unknown dynamic environments. To simplify such complex system, a multi-layer mission planning architecture can be used where swarm-level mission planning and robot-level planning are decoupled. The swarm-level planning layer would be responsible for task assignment, high-level multi-robot path planning, and defining swarm behaviors and communication requirements for the mission. The output of the swarm-level mission planning would guide the individual robot’s task execution including local trajectory planing and reactive obstacle avoidance, for example. The abstraction of the swarm-level planning layer makes the swarm system modular and more general with respect to the robot platform type and capabilities, and also enables the development of heterogeneous swarm systems consisting of different types of robot platforms such as aerial and ground robots.

The aforementioned abstraction layers (swarm-level mission planning and state estimation) facilitate the development of generalized software architectures of swarm systems. For example, Fig. 3 shows a proposed hybrid system architecture where the swarm-level mission planning runs on a central control station and communicates with robot-level state estimation and mission execution modules to provide the desired high-level swarm behaviors. The individual robots locally executes the assigned missions while coordinating with each other by means of local sensing and communications. This architecture can be useful for small multi-UAV system, especially when the individual robot on-board capabilities can not perform the swarm-level mission planning tasks. Figure 4, on the other hand, shows a completely distributed architecture where the each individual robot has a swarm-level planning module which requires coordination among the robots by means of local sensing and communications.

Fig. 3
figure 3

Proposed swarm system architecture with a centralized swarm-level mission planning and distributed robot-level mission execution, navigation, and state estimation modules. The top block (green) acts like an interactive interface between the operator and the swarm-level mission module (red) which are both running on a centralized control station. The remaining lower blocks (blue) run local state estimation and mission execution on individual robots

Fig. 4
figure 4

Proposed swarm system architecture with both distributed swarm-level mission planning and robot-level mission execution, navigation, and state estimation modules. The operation interface module (green) provides interaction with the swarm-level mission module. In a distributed architecture, each robot has a local copy of the swarm-level mission planning module (red) which exchanges information with other robots for overall swarm coordination

Conclusions

This paper presented a summary of the main applications related to aerial swarm systems and the associated research works. Furthermore, a summary of research findings related to the main components of any swarm system, localization and mission planning, is presented. Finally, this paper presents a proposed abstraction of an aerial swarm system architecture that can help developers to understand the main required modules.