Skip to main content

An integrated solution for an autonomous drone racing in indoor environments


The use of drones is becoming more present in modern daily life. One of the most challenging tasks associated with these vehicles is the development of perception and autonomous navigation systems. Competitions such as Artificial Intelligence Robotic Racing (AIRR) and Autonomous Drone Racing were launched to drive the advances in such strategies, requiring the development of integrated systems for autonomous navigation in a racing environment. In this context, this paper presents an improved integrated solution for autonomous drone racing, which focuses on simple, robust, and computationally efficient techniques to enable the application in embedded hardware. The strategy is divided into four modules: (i) A trajectory planner computes a path that passes through the sequence of desired gates; (ii) a perception system that obtains the global pose of the vehicle by using an onboard camera; (iii) a localization system which merges several sensed information to estimate the drone’s states; and, (iv) an artificial vector field-based controller in charge of following the plan by using the estimated states. To evaluate the performance of the entire integrated system, we extended the FlightGoggles simulator to use gates similar to those used in the AIRR competition. A computational cost analysis demonstrates the high performance of the proposed perception method running on limited hardware commonly embedded in aerial robots. In addition, we evaluate the localization system by comparing it with ground truth and when there is a disturbance in the location of the gates. Results in a representative race circuit based on the AIRR competition showed the proposed strategy’s competing performance, presenting itself as a feasible solution for drone racing systems.


In the last years, the presence of robotics in everyday life increased rapidly. Common examples are autonomous mobile robots cleaning houses, drones capturing images, and manipulators assisting in surgeries. Drones receive special attention due to their capabilities to execute different tasks with better precision and lower cost than other traditional aerial vehicles. Several researches address the use of these flying robots for mapping [21], surveillance [31, 57], inspections [2, 9], package delivery [44, 58], among other tasks.

This growth has also influenced the entertainment and competition market, and some challenges emerged as an incentive to researches addressing robots capable of performing specific tasks autonomously. The Autonomous Drone Racing (ADR), for example, is an event promoted by the IEEE Robotics and Automation Society (RAS) in a conference where autonomous drones must navigate as fast as possible in a cluttered indoor environment without relying on any external sensing and with all processing done onboard [38].

In 2019, the North American companies Lockheed MartinFootnote 1 and Drone Racing League (DRL)Footnote 2 launched the Alpha Pilot ChallengeFootnote 3 and the Artificial Intelligence Robotic Racing (AIRR)Footnote 4. The first competition stage, called Alpha Pilot, consisted of the development of algorithms to make the drone fly through a sequence of gates in the FlightGoggles [18, 51] simulator environment (Fig. 1). It also had an image processing challenge in which the competing algorithms needed to identify gates in a set of images. In this phase, 424 teams from 81 countries around the world participated. The best 9 teamsFootnote 5 were selected for the AIRR, where the races occur in real environments. The authors of this paper are members of the XQuadFootnote 6 team, placed 8th and 7th in each stage, respectively.

The current paper presents an integrated solution for autonomous drone racing, considering an indoor circuit. The proposed navigation system consists in the integration of four main functional modules. The first one is the planner, which uses an optimization formulation to plan smooth paths that pass through the gates. The second module is perception, responsible for detecting the gates the drone must cross; this task is made by using an onboard camera. The third module is localization, which uses an Extended Kalman Filter (EKF) to merge the information from the perception block and from other onboard sensors to obtain a global estimation of the drone’s states. Finally, the fourth is the control module, which computes input signals that lead the drone to the planned path based on the estimated states.

In the competition, the drone must pass through a sequence of gates that defines the course of the race. As in our previous paper [45], the adopted strategy considers simple, robust, and computationally efficient techniques to enable the use of the drone’s limited embedded hardware to perform all computations. However, the current paper includes other localization strategies using visual perception and odometry and several other systems’ technical upgrades. The proposed improvements include: (i) a localization strategy based on gates perception, which allows the drone to improve its pose estimate to pass through the center of the gate fast but safely; (ii) a visual odometry system, which highly improves the localization when the perception block detects no gate; (iii) a new EKF, including some bias states that allows the fusion of pose estimation of two different sources that mutually diverge; (iv) an improved version of the nonlinear controller, which counts with convergence proofs [46]; and (v) a novel method, based on the frames of the curve (as in the Frenet–Serret frames), that allows the computation of the artificial vector field [17] used by the control algorithm.

Fig. 1

Image of the drone’s frontal camera in the FlightGoggles Simulator

The system is evaluated with simulated results in a modified version of FlightGoggles. We have developed an environment that incorporates gates with the design used in the AIRR. This scenario allows us to evaluate the entire proposed solution, including gate detection via image processing in the simulated environment. Figure 2a presents a view of the AIRR scenario on FlightGoggles, and Fig. 2b shows an example of the real race circuit. Besides that, the perception algorithm suffered a performance analysis on different hardware that was a determining factor in its choice, considering the need for a fast and accurate system for this application.

The remainder of the paper proceeds with Sect. 2 presenting some related researches. Section 3 describes the technical details of the drone racing challenge and a general view of the proposed solution. Sections 4, 56 and 7 present the deployed methodologies, associated with planning, perception, localization and control, respectively. The results obtained in the photorealistic simulator are in Sect. 8. Finally, Sect. 9 concludes the work and presents future developments.

Fig. 2

Competition environment comparison: a New FlightGoggles scenario with the AIRR gates; b closer view of the first race in Orlando-FL. Source

Related works

The development of an autonomous navigation system involves the integration of several techniques in robotics [60]: (i) trajectory (or path) planning; (ii) perception; (ii) localization and filtering; and (iv) control. These methods take into account the robotic application platform and the task to be performed, which can range from load transportation to a race like AIRR.

Trajectory Planning

Several trajectory planning methods are present in the literature, each with advantages and disadvantages. For instance, the Optimum Rapidly-Exploring Random Tree (RRT*) , presented by Karaman and Frazzoli [26], has the great advantage of considering obstacles in the environment. However, it does not generate smooth paths. Focusing on the ADR, Han et al. [19] propose a trajectory planner in the SE(3) space that also considers obstacles in the environment. The strategy is able to find a trajectory that passes through narrow gaps (gates), requiring the vehicle to adjust its orientation. This narrow gap crossing ability was also achieved by Falanga et al. [10] with a polynomial trajectory and an onboard detection of the gate. Other trajectory planning strategies consider Bézier curves. They are able to take into account dynamic constraints in the planning [22].

The recent work by Foehn et al. [13] proposes a time-optimal trajectory planning for a quadrotor flying through a sequence of points. A minimum time trajectory is computed given the maximum propulsion of each motor. The planner takes dozens of minutes to be computed, thus, is only applicable for race scenarios with a previously defined course. Using a Model Predictive control (MPC) controller and a motion capture system to feedback the vehicle states, the autonomous system was able to beat professional human pilots. Further research is needed to achieve equivalent results with onboard localization.

Mellinger and Kumar [37] present an algorithm that enables the real-time generation of optimal trajectories. They show that quadrotors have better performance in tracking trajectories with low snap values (the acceleration’s second derivative). Similarly, Richter et al. [47] present a quadratic programming method to generate polynomial trajectories for quadcopters that allow navigation in cluttered indoor environments. In the present paper, we use the methodology of Richter et al. [47] to obtain a trajectory between each pair of consecutive gates. The quadratic program constraints ensure continuity and smoothness of the concatenated path.

Perception, Localization and Filtering

In addition to trajectory and path planning, localization is always a challenge in mobile robotics and essential for autonomous navigation. It is common to use methods based on sensory fusion to improve the estimation of desired states, using data from several sensors. Many papers in the literature consider different sensor fusion methodologies to improve the state estimation ([23]). Some examples are the EKF, the Unscented Kalman Filter (UKF), and the Particle Filter ([55]). Saadeddin et al. [50] present a comparison between two estimation filters, the EKF and the Extend Information Filter (EIF). The proposed filters are used to merge information of a Global Positioning System (GPS), an Inertial Measurement Unit (IMU) and a velocity encoder to estimate the position, velocity and attitude of ground or aerial vehicles. Despite the good results, this approach depends on the GPS and cannot be used in indoor environments.

Searching for an alternative for the GPS-denied environments, Urzua et al. [56] propose a localization method based on an EKF that uses data of a monocular camera, an ultrasonic range-finder and a barometer. In their work, a monocular Simultaneous Localization and Mapping (SLAM) technique is used as feedback information for a quadrotor. Due to the problem of scale in the monocular SLAM, the authors propose an EKF that merges this information with the altitude data provided by the barometer and depth provided by the ultrasonic range-finder. Simulated experiments are presented.

As an alternative to Kalman filtering, Li et al. [33] propose a novel sensor fusion method called Visual Model–Predictive Localization (VML) that incorporates compensation of the delay in the visual information. Experimental results obtained with the VML show a good ability to deal with outliers, an advantage over a simple Kalman filtering.

Schauwecker and Zell [52] present a Micro Aerial Vehicle (MAV) that uses two pairs of stereo cameras for localization and mapping. One of these stereo pairs is positioned facing forward and the other facing downward. The use of two pairs improves the localization and reduces error in the SLAM. The authors also implement an EKF that fuses the pose estimation of the visual odometry with the IMU data. This method corrects some errors associated with the use of one stereo pair facing forward as a map and angular drifts. However, it requires high processing capabilities, raising the costs and weight of the onboard computers. An alternative is to use only one stereo pair of cameras for localization. For instance, Mur-Artal and Tardós [40] propose an algorithm with low computational costs for SLAM and consider a strategy of loop closure to improve the mapping.

Hu et al. [20] consider a depth RGB camera (RGB-D) for visual SLAM and present a method to improve the algorithm shown by Mur-Artal and Tardós [40], minimizing the tracking losses due to pure rotation, sudden movements and noise. In experimental results, the proposed method shows improvements in the tracking accuracy with low computational costs.

Karaduman et al. [25] also tackle the perception and visual localization problem, and they use road detection as feedback for navigation. Mustafah et al. [42] present an off-board stereo vision system to estimate the Unmanned Aerial Vehicle’s (UAV) localization. The method’s precision is high, but the communication delay of the information is a drawback, especially for an autonomous navigation system at high speeds.

The localization approach addressed in this paper follows the guidelines of the majority of the references listed. We use an EKF in order to fuse data of IMU, visual odometry, and gate detection. In contrast with the feeding of the filter with velocity estimates from the visual odometry, our approach counts with an important set of bias states, which is capable of estimating the cumulative error of the odometry relative to the gate’s positions. This strategy decreases the level of noise of the filter since the velocity estimates are relatively noisier than position estimates.

During an autonomous operation, the pose estimation from SLAM algorithms alone may not be sufficient to complete the task. Therefore, perception algorithms are commonly used to obtain information of specific structures of the environment, such as obstacles and gates to be crossed in a race. Neural networks have been used with this purpose, for instance, to identify the gates that define the course of a race and even the relative pose between these objects and the drone.

In general, we have two main approaches for these perception tasks in the literature. In one, we have an end-to-end neural network, in which the output of the network is responsible for planning and control [27, 53]. This type of approach uses Reinforcement Learning (RL) or similar approaches and consumes many simulated and real flight hours to teach the neural network to steer correctly on the racetrack. Other approaches use the output of the network as the racetrack obstacles directly. These methods are considered in the localization and planning phases of a flight algorithm [5, 12]. In this work, we adopted the second approach as it decouples the perception from the planning and control, allowing a more flexible approach for the racing algorithm.

The current state-of-the-art object detection and recognition algorithms for monocular images explore convolutional neural networks with great success [35]. Thus, to detect the gates, we use the YOLOv4 [3] network, which presents one of the best trade-offs between accuracy and speed over competitor neural networks.


A good state estimation is necessary to enable the computation of the control signals, which is another challenge to perform aggressive flights. Yousefi and Monfared [61] present a control method using a feedback linearization strategy for trajectory tracking. To deal with the nonlinear dynamic model, the authors propose a feedback linearization cascade method and a Proportional-Derivative (PD) controller to regulate the tracking error. Dolatabadi and Yazdanpanah [8] show a backstepping approach in conjunction with Linear-Quadratic Regulator (LQR) techniques to control a quadrotor UAV. Al Younes et al. [1] present a control method based on a Model-Free Control (MFC). This methodology is integrated with an integral backstepping technique and uses local models to compensate for uncertainties and disturbances. Experiments show the control system performance.

Another widely used control strategy is MPC, which considers the robot’s model to predict its behavior in the near future and choose the best control inputs given a defined cost function. Falanga et al. [11] merge the planning and control problems in a single MPC. Considering an onboard camera for localization, they minimize the velocity of the projection of a point of interest in the image plane, reducing blur in the images and improving the features extraction.

More recent techniques are explored to control quadrotors and other mobile robots. Koch et al. [29] propose a promising methodology to control the attitude of a quadrotor by using an intelligent flight control system trained with RL algorithms. The training is performed using simulation data of the controller that actuates in the inner control loop. Comparisons with different techniques of RL and Proportional-Integral-Derivative (PID) controllers are presented. Similarly, Kaufmann et al. [28] propose an intelligent control system for quadrotors to perform autonomous flights with extreme acrobatic maneuver, using only onboard sensing and computation. The system training uses demonstrations from an optimal controller with privileged feedback information in simulations. A comparison with other control strategies shows the high performance of the proposed method.

Another theory used to control quadrotors is the differential flatness [37, 39, 62]. The 3D position and the yaw angle of a quadrotor form a flat output of the system [37]. Using this property, Mellinger and Kumar [37] design a controller that has a Proportional Derivative (PD) action and needs a linearization over the hover point. Lee et al. [32] present a similar controller. Morrell et al. [39] use the differential flatness property to design a controller that does not have singularities at \(90^\circ \) roll and pitch angles during aggressive maneuvers.

Different techniques are explored by considering a controller at the lowest level, acting on the motors to control the thrust and angular velocities. In the Alpha Pilot and AIRR challenges, it is considered the presence of a PID controller to guarantee the desired angular velocities for the quadrotor. In this context, Rezende et al. [46] propose a nonlinear path-following method based on artificial vector fields to control a quadrotor on the Acro mode. In the context of robot navigation, an artificial vector field is a function \(F({\mathbf {p}}):{\mathbb {R}}^n \rightarrow {\mathbb {R}}^n\) that is used to guide the robot. For each \({\mathbf {p}}\) point of the space, \(F({\mathbf {p}})\) is a velocity reference to be followed by the robot. In general, the vector fields are designed so that the trajectories of the system \(\dot{{\mathbf {p}}} = F({\mathbf {p}})\) converge to and follow the desired curve. The designing methodology proposed by Gonçalves et al. [17] requires a representation of the curve as the intersection of zero level sets of two functions \(\alpha _1({\mathbf {p}})\) and \(\alpha _2({\mathbf {p}})\), a hard task for paths with a generic shape. Here, we incorporate a novel numerical method to compute the surfaces required to compute the field presented by Gonçalves et al. [17] and use the strategy by Rezende et al. [46] to control the drone.

Solutions for drone racing

Recently, several researchers are addressing the drone racing problem. The emerging results already use a wide range of the navigation strategies already discussed here. Guerra et al. [18], the developers of FlightGoggles, presented a performance overview of the 20 best Alpha Pilot solutions. Most teams relied on more traditional pipelines (planning, localization, and control) to complete the challenge. Variations of the Kalman and particle filters were the most used techniques for localization. Linear control methods lead in the team choices, followed by MPC techniques.

Rojas-Perez and Martinez-Carranza [48] present an overview of several hardware choices to deal with the Autonomous Drone Racing problem. They discuss the drones, onboard computers, and sensors used by several research groups in the world. It is clear the advantages of using Graphics Processing Unit (GPU), instead of only Central Processing Unit (CPU), to process artificial intelligence algorithms.

Focusing on high-speed performances and in the drone racing context, Li et al. [34] propose a computationally efficient solution. The gates are detected with an efficient pose estimation algorithm which has better tolerance to detection noise than the commonly used Perspective-n-Point (PnP) algorithms. The navigation control is simple and based on arcs of circles and straight lines. Cyba et al. [6] present a simpler and higher-level strategy. The solution is based on the identification of ArUco markers, and the authors compare two control strategies, one designed on the drone’s coordinates and the other on the ArUco coordinates. Experiments are performed with a real drone, but all processing is performed offline.

As in the present paper, some works address indoor navigation systems for quadrotors, using a combination of techniques to perform tasks autonomously [7, 30, 54]. Besides that, specific autonomous navigation strategies for a racing drone are also discussed in the literature. Recent papers [24, 36], and some previously commented [5, 12, 33] address methods of localization, control and planning for autonomous navigation in racing environments without obstacles.

Some solutions to the drone racing problem are strongly based on neural networks for localization [27] and trajectory planning [53]. Wang and Chang [59] use a Convolutional Neural Network (CNN) approach in a perception module that generates high-level references for the vehicle in a drone race. The CNN is trained to imitate an expert policy, such as the performed by a professional human pilot. In fact, the attempt to mimic the human behavior is becoming more common. For instance, the gaze of professional pilots during a race has already been studied in order to provide insights to improve autonomous navigation systems [43].

Problem setup

The problem proposed in the drone racing challenge consists in making an autonomous drone fly through a sequence of gates. In AIRR, the races take place in indoor environments, with gates of the same size (\(11 \times 11\) feet). Figure 3 displays an upper view of our modified scenario on FlightGoggles, with a course that counts with a sequence of 9 gates similar to those used in the competition. These gates may have their nominal position shifted in the x and y directions and their orientation changed in the z-axis. The perturbations on position are limited by \(2~\text {m}\) and on orientation by \(5^\circ \). Thus, the solution must be robust to such situations.

Fig. 3

Image of the FlightGoggles simulator with an example of trajectory to be performed

Available sensors

To enable the task execution, the FlightGoggles provides data of some simulated sensors: (i) IMU with accelerometer and gyro; (ii) stereo RGB frontal pair of cameras; and (iii) rangefinder pointing down. Seeking simplicity and robustness, we do not use the rangefinder since the simulated environment contains several objects on the ground, making it hard to define a reference plane.

A noise is incorporated into the simulated IMU measurements, and bias may also be considered. The simulated cameras have no noise in the calibration. Besides, the image data are provided by the simulator at 60 FPS and HD (\(1280 \times 720\)).

The original FligthGoggles simulator also provides information on which pixels of the camera image correspond to the projections of the gates’ vertices. These pixels correspond to the red points in Fig. 1. These points are similar to infrared beacons (IR-beacons). The modified simulator (Fig. 2a) has gates with the design used in the AIRR competition. Thus, instead of considering the ground truth beacons to detect the gates, it is necessary to identify them via image processing.

Quadcopter model

The quadcopter used in the drone racing challenge operates in the Acro mode, i.e., the control inputs are the thrust force and the angular body rates. In this work, a nonlinear model based on the proposed by Rezende et al. [46] will be considered. It is given by:

$$\begin{aligned}&\dot{{\mathbf {p}}} = {\mathbf {v}}, \end{aligned}$$
$$\begin{aligned}&\dot{{\mathbf {v}}} = R_d^w\hat{{\mathbf {z}}}\frac{\tau }{m} - g\hat{{\mathbf {z}}} - \frac{k_b{\mathbf {v}}}{m}, \end{aligned}$$
$$\begin{aligned}&{\dot{R}}_b^w = R_d^wS(\omega ), \end{aligned}$$

in which \({\mathbf {p}} = [x,\ y,\ z]^T\) corresponds to the Cartesian coordinates of the quadcopter in a given inertial frame (world frame), \({\mathbf {v}} = [v_x,\ v_y,\ v_z]^T\) is the drone’s velocity in the world frame, and \(R_d^w\in \hbox {SO}(3)\) is a rotation matrix that represents the orientation of the vehicle in the world frame. The mass of the drone is given by m, the gravity acceleration by g, and the drag coefficient associated with air resistance is given by \(k_b>0\). The control inputs are the thrust force \(\tau \), and the angular velocity \(\omega =[\omega _x,\ \omega _y,\ \omega _z]^T\). Finally, \(S(\omega )\) is the skew-symmetric matrix that implements the cross product in \({\mathbb {R}}^3\), such that \({\omega } \times {\mathbf {p}} = S({\omega }){\mathbf {p}}\).

The dynamic model in (1) considers a rotation matrix to represent the drone’s orientation. Thus, it is a better model than the one used by Rezende et al. [45], which is based on Euler angles and, consequently, admits representation singularities. Now, the only part of the solution that uses Euler angles representation is the EKF, further presented in Sect. 6.3.

Proposed Solution

The proposed solution can be divided into four main modules: (i) planning module, which plans the trajectory to be followed; (ii) perception module, that detects the gates and its corners; (iii) localization module, which has the goal of estimating the states of the drone, by merging the sensors’ information; and (iv) the control module, which computes the control signals. Figure 4 shows the information flow between these modules.

Fig. 4

Block diagram that shows the information flow between the modules used to solve challenge

The autonomous drone navigation system is designed by taking into account the embedded sensors and race conditions. Cameras are the only sensors able to detect the gates, which are rich in features (see Fig. 2b), thus being used for perception. Visual odometry is essential to provide a permanent estimate of position, regardless of a gate is being detected or not. The use of a pre-planned polynomial path is advantageous given that it is unnecessary to perform online computations. Even if the actual gates are slightly displaced, the proposed method does not require any re-planning. For simplicity and low computation cost, an EKF fuses all information obtained from the sensors. We also use a vector field-based nonlinear controller with guaranteed stability independent of precise gain tuning. This is of great importance for further deployment of the codes in real platforms.

Trajectory planning

The first module of the system consists in the planning of a trajectory that passes through the sequence of gates, similar to the one presented by Rezende et al. [45]. The plan corresponds to a concatenation of n smaller trajectories (sub-trajectories) represented by a N-th order polynomial \({\varvec{\gamma }}^i(s) \in {\mathbb {R}}^3, \ i = 1,2,...,n\). The i-th stretch ends at gate i. Differently from the previous work ([45]), here we consider a parametrization such that \(0\le s \le 1\).

Inspired by the results obtained by Mellinger and Kumar [37], we use the method proposed by Richter et al. [47] to minimize the snap of the sub-trajectories. The limitation \(s\le 1\) avoids bad numerical conditioning for the optimization problem when higher-order polynomials are used. The constraints of the problem are defined to: (i) ensure smoothness of the concatenated trajectory; (ii) make the trajectory cross the gate perpendicularly; (iii) avoid aggressive maneuvers when the drone is about to cross a gate.

Let \({\mathbf {g}}^i \in {\mathbb {R}}^3\) be the central point of gate i and \({\mathbf {g}}^0 \in {\mathbb {R}}^3\) the drone’s starting position, known a priori. Let also \({\mathbf {w}}^i \in {\mathbb {R}}^3\) be the unit norm vector that represents the normal direction of the i-th gate and the sense it must be crossed. The vector \({\mathbf {w}}^0 = [0,\ 0,\ 0]^T\) represents the drone’s initial null speed.

Figure 5 depicts the strategy used to define the constraints between gates \(i-1\) and i. Note that if \(i-1=0\), we have the starting point. The red cube represents a region in which the trajectory is required to pass through. The cube has edges with length \(\delta _q\) and is centered at \({\mathbf {q}}^i = {\mathbf {g}}^i - l{\mathbf {w}}^i\), with \(l>0\) being a predefined fixed distance from the gate to the cube. The idea is that, by forcing the drone to pass through the cube, we avoid aggressive maneuvers close to the gate since the vehicle will be in a good position to cross the gate perpendicularly. The parameter \(s_q^i\), in which the trajectory passes through the cube is defined by the heuristics \(s_q^i=\Vert {\mathbf {g}}^{i-1}-{\mathbf {q}}^{i}\Vert /L^i\), where \(L^i\) is an estimate for the length of the trajectory and is given by \(L^i = \Vert {\mathbf {g}}^{i-1}-{\mathbf {q}}^{i}\Vert + \Vert {\mathbf {q}}^{i}-{\mathbf {g}}^{i}\Vert \). The heuristics for \(L^i\) corresponds to the length of the red lines as illustrated in Fig. 5.

To ensure continuity, both the start and end of a segment are always at the center of a gate. To ensure smoothness, the starting velocity of trajectory i must be equal to the end velocity of trajectory \(i-1\). Moreover, these velocities are perpendicular to the gate’s plane, as represented by the red arrows in Fig. 5. Since \(0\le s\le 1\), the magnitude of this velocity constraint on the parameter s (consider the derivative \({\varvec{\gamma }}' = d{\varvec{\gamma }}/ds\)) must be proportional to the length of the sub-trajectory.

Fig. 5

Representation of a trajectory segment between two consecutive gates

The problem in Model 1 computes the x component of a trajectory \({\varvec{\gamma }}^i\). The y and z components are computed analogously. The complete trajectory is given by the concatenation of \({\varvec{\gamma }}^i(s) = \left[ \gamma ^i_x(s), \ \gamma ^i_y(s), \ \gamma ^i_z(s) \right] ^T\) for \(i = 1,2,...,n\), thus, 3n problems similar to the one in Model 1 need to be solved in the planning stage. The drone starts the race following trajectory \(i=1\). When the localization system detects that the robot has reached the end of trajectory i (crossed the gate i), the change \(i \leftarrow i+1\) is made and the drone starts to follow the next segment.

Model 1

(Trajectory  Optimizer)

$$\begin{aligned} \begin{aligned}&\min _{{\varvec{\beta }}_x^i}: \int _0^{1} \left( \frac{d^4}{d s^4}\gamma _x(s')\right) ^2 \ dt' \\&{{subject \,to:}} \\&{Continuity \,conditions:} \\&\gamma _x(0) = g_x^{i-1}, \ \\&\gamma _x(1) = g_x^{i}. \\&{Smoothness \,conditions:} \\&\gamma '_x(0) = L^iw_x^{i-1}, \\&\gamma '_x(1) = L^iw_x^{i}. \\&{Alignment \,conditions:} \\&q_x^{i}-\delta _q \le \gamma _x^i(s_q^i) \le q_x^{i}+\delta _q, \\ \end{aligned} \end{aligned}$$

where \({\varvec{\beta }}_x^i\) is the vector of \(N+1\) coefficients of the trajectory \(\gamma _x(s)\), i.e., the optimization variables.

The optimization problem in Model 1 only gives a position reference (x, y and z). The heading angle can still be chosen arbitrarily. The reference heading angle \(\psi _r\) will be chosen to increase the chances of the camera capture the next gate to be crossed. In order to point the camera toward the next gate, we define:

$$\begin{aligned} \psi _r = {{\,\mathrm{atan2}\,}}\left( g^i_y-y,\ g^i_x-x\right) , \end{aligned}$$

where \({{\,\mathrm{atan2}\,}}\) is the arctangent function in four quadrants.

Our trajectory planning is based on the nominal position of the gates, which may be different from the actual positions. However, no replanning is required. This is because the global pose estimation comes from the relative pose between the drone and the gate. Since the drone assumes that the gates are at their nominal positions, the position estimation error will be equal to the gate’s displacement concerning the world. In these cases, the trajectory executed by the drone will be sub-optimal in the sense of the cost function in Model 1. However, the robustness gain due to the simplicity of the replanning free approach pays off for this loss of optimality.

Visual perception

Our proposed visual perception module is responsible for detecting and estimating the gate’s location in the camera frame. The estimate of the gate’s pose is used in the EKF presented in Sect. 6. Therefore, the perception system must be accurate enough to detect the gate in many of the adverse illumination and unknown gate’s poses of the course of the competition and fast enough to be executed directly on the embedded drone’s computer. Our proposal is a series of chained steps (Fig. 6) that begins with (i) the use of a state-of-the-art convolutional neural network (CNN) single-shot detector to estimate the rough location of the gate and its respective corners in a set of bounding boxes, (ii) a refinement step to estimate the actual corner order and location, (iii) the labeling of the gate pose to a gate identification number via ArUco planar markers. At the end of this processing, the beacons and the gate identification are sent to the localization module.

Fig. 6

Proposed visual perception pipeline for gate detection and localization

CNN for gate detection

The CNN selected for identifying the gates is the YOLOv4 [3] network, which is an accurate and fast object detection network for monocular RGB images. Networks such as YOLOv4 perform a fast single-shot object detection over the images, returning a bounding box containing the object and the confidence probability of the detection. This network has two main architectures, full and tiny, which have a different trade-off between accuracy and frames per second (FPS). Considering the limited computing power on the embedded processor, we choose to use the tiny architecture to guarantee that the drone’s fast pacing, turns, and blurry images could be partially overcome with the increased processing speed. The smaller memory footprint of the tiny version of the network allows for more efficient use of the embedded resources without degrading the performance of the other drone’s modules.

The training dataset was initially formed of \(\sim \) \(9000\) images taken from different cameras and different angles of a real-size marker with varying illuminations. Besides this data, we extracted images from the simulation and real scenarios and performed several data augmentation techniques such as warping, rotations, flipping, noise, contrast, saturation, and partial image removal. We also sampled multiple gate instances and replaced the background with random backgrounds from the DTD dataset [4] to increase detection accuracy in multiple competition environments. Samples for the image dataset can be observed in Fig. 7.

Fig. 7

Samples of the training dataset for gate detection using YOLOv4: (top row) original dataset, (bottom row) augmented dataset

The network parameters using the darknet frameworkFootnote 7 are defined as shown in Table 1.

Table 1 Training parameters for the YOLOv4 gate detector

Corner refinement

After detecting a gate and extracting the bounding boxes of the gate’s visible corners, we need to sort them clockwise to locate better the gate position in space regarding the viewing angle when performing the pose estimation procedures. Since corners are critical for pose detection, the corners with an associated probability inferior to 0.7 are discarded to reduce false positives. To estimate the location of the gate’s internal vertex in each of the four corners, we explore the highly recognizable checkerboard markings located at every corner. In this sense, we extract the points between the checkerboards using an automatic sub-pixel checkerboard pattern detection algorithm [16]. Then, the geometric mean of the x and y locations those intersection points gives us the internal corner location on the camera frame cx and cy. If the checkerboard pattern extraction cannot return the intersection points, the centroid of the bounding box is used instead. With the internal corner of the gate located for every corner bounding box, the following equation defines the relative order of every corner in the gate:

$$\begin{aligned} \text {cornerPosition}(\alpha )= {\left\{ \begin{array}{ll} 1, &{} \text {if } \alpha \ge \pi \text { and } \alpha< 3\pi \\ 2, &{} \text {if } \alpha \ge \frac{3\pi }{2} \text { and } \alpha< 2\pi ,\\ 3, &{} \text {if } \alpha \ge 0 \text { and } \alpha < \frac{\pi }{2},\\ 4, &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \(\alpha \) is the relative angle of the estimated corner bounding box centroid (cx and cy), given by:

$$\begin{aligned} \alpha ={{\,\mathrm{atan2}\,}}(\hbox {d}y, \hbox {d}x)+\pi . \end{aligned}$$

Gate identification

The gates have an identification number extracted via an ArUco [15] planar marker located near the gate. With the identification number, the localization package could know if the drone follows the correct gate sequence and improve the drone’s relative location over the competition course.

The complete visual perception pipeline for gate detection and location is summarized in Algorithm 1.



In this section, we present the localization approach for our navigation solution. First, two vision-based strategies are used to estimate the vehicle’s pose on the environment: visual odometry that extracts features from the camera’s viewing environment to obtain the vehicle’s pose, and an approach that gets the perception information of the gates to provide the pose relative to this gate. Then, a sensor-fusion strategy combines the data from the vision-based algorithms and inertial measurements.

Visual odometry

The SLAM is a technique that uses sensors to build a map of an unknown environment and, at the same time, localize this sensor in this map. As in the works of Mur-Artal et al. [41] and Urzua et al. [56], it is possible to perform a SLAM algorithm with a monocular camera. However, it is not possible to determine the depth, causing scale errors on the map and the localization. Considering the drone stereo pairs of cameras, this problem is bypassed.

The ORB-SLAM2Footnote 8 is an open-source visual SLAM algorithm that performs three important tasks in parallel based on the information of a stereo pair of cameras, as described by Mur-Artal and Tardós [40]: tracking to localize the camera; local mapping; and loop closing to detect loops in the map and correct accumulated drifts in camera pose.

Although the loop closing is an important tool of this algorithm, our solution does not use it. The trajectory performed in the AIRR competition is not a closed-loop, which makes this technique unnecessary. Moreover, by disabling this feature, the algorithm runs faster. On the other hand, the tracking and local mapping of the ORB-SLAM2 proved to be a robust visual odometry technique with a low computational cost.

The execution of the tracking and local mapping depends on the stereo pair parameters and images. Using the camera’s intrinsic parameters, the algorithm executes a rectification process in each image captured by the left and right cameras. After that, ORB features [49] are extracted from both rectified images and a search is made to find a match between these features in the images. These features generate stereo keypoints with the coordinates of the ORB in the left camera and the horizontal coordinate of its right match, as illustrated in Fig. 8. These points can be triangulated and provide scale, translation and rotation information.

Fig. 8

ORB-SLAM2 tracked features in rectified image

Therefore, the input data for the ORB-SLAM2 are images and parameters of each camera of the stereo pair. The output of the algorithm is a homogeneous matrix \(M \in \hbox {SE}(3)\), which contains a rotation matrix \(R_s \in \hbox {SO}(3)\) and a translation vector \({\mathbf {p}}_s \in {\mathbb {R}}^3\). The matrix M contains the pose information of the camera concerning the SLAM map frame with origin at the start point of the algorithm (\(M_{c}^{m}\)). Since we have the transformation matrix from the camera frame to the quadrotor and the quadrotor’s initial pose in the world frame, it is possible to transform the algorithm results from the camera pose in the SLAM map frame to the drone pose in the world frame, as following:

$$\begin{aligned} M_{d}^{w} = M_{m}^{w} M_{c}^{m} \left( M_{c}^{d}\right) ^{-1}, \end{aligned}$$

where \(M_{c}^{d}\) represents the pose of the camera concerning the drone and \(M_{m}^{w}\) the pose of the map with respect to the world, which corresponds to the drone’s initial position. Both of them are known a priori. From the matrix \(M_d^w\), we extract a position \({\mathbf {p}}_s\) and a vector of Euler angles given by \({\mathbf {r}}_s = [\phi _s,\ \theta _s,\ \psi _s]^T\), used latter on the EKF.

Mur-Artal and Tardós [40] present a performance analysis including several datasets and comparing ORB-SLAM2 with other SLAM systems, showing better accuracy in estimated trajectory. Besides, the advantage of the visual odometry described in this section is that it does not need a specific object on the camera image, being more perennial information. However, a disadvantage is that it accumulates errors over time, which makes it insufficient for long missions. Also, this method alone cannot handle the scenarios in which the real positions of the gates are displaced. In order to avoid these errors, a global pose measurement is necessary. In the next section, we describe how this pose can be obtained.

Global pose measurement

Techniques of visual localization generally consist of the association of known objects in the environment with the corresponding pixels on the camera image. If the global position of the objects is known, the transformation between the camera frame and the world frame can be estimated by using a PnP algorithm [14]. This technique was described in our previous paper [45]. In the same way, consider a set of at least 4 pairs of points, formed by \(\left( {\mathbf {c}}^i_j, \ {\mathbf {b}}^i_j\right) , \ j = 1,2,3,4 \), where \({\mathbf {c}}^i_j \) is the approximate position of the i-th gates’ corners in the world frame and \( {\mathbf {b}}^i_j \) the pixels correspondent to the projections of these vertices on the camera image. The PnP algorithm uses this data to compute the position \({\mathbf {p}}_g \) and orientation \({\mathbf {r}}_g = [\phi _g,\ \theta _g,\ \psi _g]^T\) of the drone, with respect to the world frame.

Figure 9 illustrates the four-points projection of three-dimensional space on the camera plane (gray rectangle). The red balls on the gate are equivalent to the corners of a gate, while the crosses in the plane represent the image pixels identified by the perception module.

Fig. 9

Four-point projection of the gate’s corners on the camera image

When the drone follows the i-th trajectory, the pose estimation considers only the information from the i-th gate. This is an important premise to reduce errors in estimation due to the uncertainties in the values of \({\mathbf {c}}^i_j\), because of the disturbed position of the gate.

In many situations, the camera does not detect the four beacons of the gate. A common case is when the drone is at high speed while maintaining a relatively high pitch angle for air resistance compensation. Differently from our previous work [45], in the present paper, the PnP is also used when only 3 points are detected. In this case, an initial estimation should be provided to the algorithm. The current filter estimation is used for this purpose.

Note that when the drone passes through a gate and begins to locate itself based on the next gate, its position estimate may shift. This happens because the gates have different displacements. In this case, the vector field makes the drone converge again to the curve, i.e., the replanned trajectory corresponds to an integral line of the vector field. The adopted control strategy, presented in Sect. 7, is robust to such shifts, meaning that the drone does not respond aggressively to such changes on the localization.

State estimation

The proposed strategy uses an EKF to estimate a vector of states \(\bar{{\mathbf {x}}}\) that contains the information needed to compute the control laws. It is responsible for fusing the data from the IMU, the visual odometry and the global pose measurement. The implementation of an EKF in a digital computer is made by using its discrete form. This implementation has the prediction and correction stages.

Aiming simplicity, the filter equations are presented with the notation \(b\leftarrow a\) to indicate that b is updated with the value of a. The discrete prediction stage, for the states \(\bar{{\mathbf {x}}}\in {\mathbb {R}}^{15}\) and the covariance matrix \(P \in {\mathbb {R}}^{15{\times }15}\), of an EKF is:

$$\begin{aligned} \bar{{\mathbf {x}}}\leftarrow & {} f(\bar{{\mathbf {x}}},{\mathbf {u}},\varDelta t), \end{aligned}$$
$$\begin{aligned} P\leftarrow & {} \hbox {FPF}^T + \hbox {WQW}^T, \end{aligned}$$

in which f represents the propagation model that depends on the current states \(\bar{{\mathbf {x}}}\), a vector of inputs \({\mathbf {u}}\) (which is, in fact, also a measurement) and the step time \(\varDelta t\). The matrix \(F \equiv F(\bar{{\mathbf {x}}},{\mathbf {u}},\varDelta t)\in {\mathbb {R}}^{15{\times }15}\) is the partial derivative of f with respect to \(\bar{{\mathbf {x}}}\) and the matrix \(W \equiv W(\bar{{\mathbf {x}}},{\mathbf {u}},\varDelta t)\in {\mathbb {R}}^{15{\times }6}\) the partial derivative of f with respect to \({\mathbf {u}}\). The matrix \(Q\in {\mathbb {R}}^{6{\times }6}\) is the covariance of the measurement \({\mathbf {u}}\), thus \(\hbox {WQW}^T\) computes its effect on the process noise.

Before the definition of the correction stage, let us define an auxiliary function \(\zeta \). It is used to approximate the difference of angle states (those who belong to the space of the circle \({\mathbb {S}}\)), which may have undesirable shifts of \(2\pi \) radians. Given a vector \({\mathbf {o}}\), with elements \(o_i\), each i-th element of \(\zeta \) is defined as:

$$\begin{aligned} \zeta _i({\mathbf {o}}) = \left\{ \begin{array}{ll} o_i \quad \quad \quad \,\, \text{ if } \ \ o_i \in {\mathbb {R}},\\ \sin (o_i) \quad \text{ if } \ \ o_i \in {\mathbb {S}}.\\ \end{array}\right. \end{aligned}$$

The idea behind the definition of the function \(\zeta \) is that the \(\sin \) function is invariant to shifts of \(2\pi \). Assuming small values for \(o_i\) when \(o_i\in {\mathbb {S}}\), we have that \(\sin (o_i+k2\pi ) \approx o_i, \ k\in {\mathbb {Z}}\). Now, the discrete correction stage for the states and the covariance matrix is defined by:

$$\begin{aligned} \bar{{\mathbf {x}}}\leftarrow & {} \bar{{\mathbf {x}}} + K\zeta \left( {\mathbf {z}}-h(\bar{{\mathbf {x}}})\right) , \end{aligned}$$
$$\begin{aligned} P\leftarrow & {} \left( I-KH\right) P, \end{aligned}$$

where \({\mathbf {z}}\) is a vector of measurements and \(h(\bar{{\mathbf {x}}})\) is the measurement mode, i.e., the expected value for \({\mathbf {z}}\) given the current state estimation \(\bar{{\mathbf {x}}}\). The matrix H is the Jacobian of \((\bar{{\mathbf {x}}})\) and I is the identity matrix. The quantity \(\zeta ({\mathbf {z}}-h({\bar{\mathbf{x}}}))\) is what we will consider as the innovation of the filter. Note that in a working filter \({\mathbf {z}} \approx h(\bar{{\mathbf {x}}})\), which ensures that the function \(\zeta \) basically removes shifts of \(2\pi \). The matrix K is the Kalman gain, given by:

$$\begin{aligned} K = PH^T\left( HPH^T+N\right) ^{-1}, \end{aligned}$$

with N representing the covariance of the measured data \({\mathbf {z}}\).

Equations (6)–(11) are well known in the literature. The strategy relies on the choices of the states vector \(\bar{{\mathbf {x}}}\), the propagation inputs \({\mathbf {u}}\), the measurements \({\mathbf {z}}\) and the functions f and h.

In general, velocities and accelerations are used in the prediction stage, while position and orientations in the correction stage. Then, we use the IMU measurements as an input of the propagation model, defining \({\mathbf {u}} = [{\mathbf {u}}_\omega , \ {\mathbf {u}}_a]^T = [u_{\omega x}, \ u_{\omega y}, \ u_{\omega z}, \ u_{ax}, \ u_{ay}, \ u_{az}]^T\).

The measurement sources of position and orientation are two: from the visual odometry (Sect. 6.1) and the global pose measurement (Sect. 6.2). Since the data of the visual odometry may accumulate error over time and the pose from the PnP do not, over time, these two measurements will be very different. If we assume measurement models that directly correct the position and orientation with these data, the estimation will keep oscillating between them. We could say that both measurements would “compete” with each other. The Kalman filter theory assumes that the noise of the signals is of mean zero, which is not the case of the visual odometry due to the cumulative error. Thus, a straightforward strategy consists of using only the increments of the odometry in the filter. An example of the application of this strategy is the Robot Operating System (ROS) package robot_pose_ekfFootnote 9. In this work, we used a different approach, which considers some extra states representing the accumulated error of the visual odometry. These states influence the measurement model of the odometry and get updated when the filter receives global pose measurements.

Finally, the states vector used in the filter is given by:

$$\begin{aligned} \bar{{\mathbf {x}}} = \big [\underbrace{\bar{x}, \ \bar{y}, \ \bar{z}}_{\text{ position }}, \ \ \underbrace{{\bar{\phi }}, \ {\bar{\theta }}, \ {\bar{\psi }}}_{\text{ angles }}, \&\underbrace{\bar{v}_x, \ \bar{v}_y, \ \bar{v}_z}_{\text{ velocities }}, \ \ \underbrace{\bar{b}_{\omega x} ,\ \bar{b}_{\omega y}, \ \bar{b}_{\omega z}}_{\text{ bias } \text{ gyro }}, \nonumber \\ \underbrace{\bar{b}_{ax}, \ \bar{b}_{ay}, \ \bar{b}_{az}}_{\text{ bias } \text{ acc }},&\underbrace{\bar{b}_{px}, \ \bar{b}_{py}, \ \bar{b}_{pz}}_{\text{ bias } \text{ pos }}, \ \underbrace{\bar{b}_{\phi }, \ \bar{b}_{\theta }, \ \bar{b}_{\psi }}_{\text{ bias } \text{ angles }}\big ]^T\nonumber \\ \end{aligned}$$

The first nine states are the position, angles and velocities (in the body frame) used in the control system. The following six states are the bias on the measurements of the IMU, whose consideration allows for better precision of the prediction stage. The last six are the bias (or accumulative errors) of the visual odometry.

Let \(\bar{{\mathbf {p}}} = [\bar{x}, \ \bar{y}, \ \bar{z}]^T\), \(\bar{{\mathbf {r}}} = [{\bar{\phi }}, \ {\bar{\theta }}, \ {\bar{\psi }}]^T\) and \(\bar{{\mathbf {v}}}_b = [\bar{v}_x, \ \bar{v}_y, \ \bar{v}_z]^T\). Let also \(\bar{{\mathbf {b}}} = [\bar{{\mathbf {b}}}_\omega ^T, \ \bar{{\mathbf {b}}}_a^T, \ \bar{{\mathbf {b}}}_p^T, \ \bar{{\mathbf {b}}}_r^T]^T\) be the 12 dimensional vector with the bias states. The nine first elements of the function \(f(\bar{{\mathbf {x}}},{\mathbf {u}},\varDelta t)\) are based on the movement of a rigid body. The bias states do not change on the propagation, only on the correction; thus, we have that \(f(\bar{{\mathbf {x}}},{\mathbf {u}},\varDelta t)\) is given by the vector:

$$\begin{aligned} \left[ \begin{array}{ll} &{}\bar{{\mathbf {p}}} {+} \left( R_d^w\bar{{\mathbf {v}}}_b\right) \varDelta t\\ &{}\bar{{\mathbf {r}}} {+} \left( J_{r}({\mathbf {u}}_\omega {-}\bar{{\mathbf {b}}}_\omega )\right) \varDelta t\\ &{}\bar{{\mathbf {v}}}_b {+} \left( ({\mathbf {u}}_a{-}\bar{{\mathbf {b}}}_a) {+} ({\mathbf {u}}_\omega {-}\bar{{\mathbf {b}}}_\omega ){\times }\bar{{\mathbf {v}}}_b {-} g(R_d^w)^T{\hat{{\mathbf {z}}}}\right) \varDelta t\\ &{}\bar{{\mathbf {b}}} \end{array}\right] , \end{aligned}$$

in which \(J_r \equiv J_r\left( {\bar{\phi }},{\bar{\theta }}\right) \) is the Jacobian matrix that transforms the angular velocities \({\mathbf {u}}_\omega \) in the derivatives of the Euler angles \({\mathbf {r}}\). Note that the measurements \({\mathbf {u}}_\omega \) and \({\mathbf {u}}_a\) are corrected with the bias states of the IMU. It is important to emphasize that the term \({\mathbf {u}}_\omega \times \bar{{\mathbf {v}}}_b\) is the Coriolis compensation and \(\hat{{\mathbf {z}}}=[0,\ 0,\ 1]^T\).

The filter uses two measured data in the correction steps. In order to use the data that comes from the gate’s perception, we define the measurement model \(h_1(\bar{{\mathbf {x}}})\) and use the measurement \({\mathbf {z}} = [{\mathbf {p}}_g^T,\ {\mathbf {r}}_g^T]^T\). The data from the visual odometry will be used with the model \(h_2(\bar{{\mathbf {x}}})\) and measurement \({\mathbf {z}} = \left[ {\mathbf {p}}_s^T,\ {\mathbf {r}}_s^T\right] ^T\). The function \(h_1(\bar{{\mathbf {x}}})\) is:

$$\begin{aligned} h_1(\bar{{\mathbf {x}}}) = \left[ \begin{array}{l} \bar{{\mathbf {p}}} \\ \bar{{\mathbf {r}}} \end{array}\right] . \end{aligned}$$

Model \(h_1\) is given by the pose states directly, since the information from the gates’ perception does not accumulate errors over time. The function \(h_2(\bar{{\mathbf {x}}})\) is:

$$\begin{aligned} h_2(\bar{{\mathbf {x}}}) \equiv h_2(\bar{{\mathbf {x}}}) = \left[ \begin{array}{ll} R(\bar{{\mathbf {b}}}_p)R(\bar{{\mathbf {r}}})\bar{{\mathbf {p}}} + \bar{{\mathbf {b}}}_r \\ {{\,\mathrm{angles}\,}}\left( R(\bar{{\mathbf {b}}}_p)R(\bar{{\mathbf {r}}})\right) \end{array}\right] , \end{aligned}$$

where \(R(\cdot ):{\mathbb {R}}^3\rightarrow SO(3)\) is a rotation matrix equivalent to a given Euler angles vector and the function \({{\,\mathrm{angles}\,}}(\cdot )\) is its inverse, returning the associated vector of Euler angles. Model (15) is the currently estimated pose corrected with the estimated deviation of the visual odometry.

Note that, even though no direct measurement of velocity is inserted on the filter, the EKF can properly estimate the velocity states. The values of all bias states are also naturally estimated during the correction stages. It is also important to comment that the matrices F, G and H and the derivatives of f and h can be easily computed numerically.


The strategy adopted to control the quadcopter along the planned trajectories is based on artificial vector fields [17, 46]. In the three-dimensional space, a vector field is a function \(F({\mathbf {p}}):{\mathbb {R}}^3\rightarrow {\mathbb {R}}^3\) that represents a velocity reference F for each position \({\mathbf {p}}\). This reference will guide the navigation of the drone. Figure 10 presents the control diagram used. The vector field provides not only a velocity vector but also the associated Jacobian matrix, which is used on the control. The inverse model of the drone is used to generate references for the vehicle’s orientation. The blocks inside the red rectangle are the low-level PID controller and the simulated model on FlightGoggles.

Fig. 10

Representation of the nonlinear controller structure based on artificial vector fields

Vector field

The vector field structure used here is based on the work by Gonçalves et al. [17], who propose a method to define the function \(F({\mathbf {p}})\) that guides the robot toward a given curve. There, the proposed strategy requires the three-dimensional curve to be represented as the intersection of two zero level sets \(\alpha _1({\mathbf {p}})=0\) and \(\alpha _2({\mathbf {p}})=0\). The scalar functions \(\alpha _1\) and \(\alpha _2\) are usually defined analytically. In this paper, due to the generic formats of the paths, it is not easy to define these functions analytically. Thus, we will present a new numerical method that allows the computation of these functions. This novel method enables the computation of functions \(\alpha _1\) and \(\alpha _2\) such that the intersection of their zero level sets is the curve parametrized by \(\gamma ^i(s)\). Figure 11 illustrates the level sets \(\alpha _1=0\) and \(\alpha _2=0\) obtained with the proposed strategy.

Fig. 11

Functions \(\alpha _1\) (green) and \(\alpha _2\) (blue), used in the computation of the vector field for a path in between two gates. Functions \(\alpha _1\) and \(\alpha _2\) are obtained as the projections of the vector distance \({\mathbf {D}}\) in moving frame \([\hat{{\mathbf {x}}}_\gamma \ \hat{{\mathbf {y}}}_\gamma \ \hat{{\mathbf {z}}}_\gamma ]\) (color figure online)

The first step in the numerical method to compute \(\alpha _1\) and \(\alpha _2\) is the obtainment of the distance between the curve and the drone’s position \({\mathbf {p}}\) as:

$$\begin{aligned}&s^*({\mathbf {p}}) = \arg \ \min _{0 \le s \le T_i} \Vert {\mathbf {p}}-\varvec{\gamma }^i(s)\Vert , \end{aligned}$$
$$\begin{aligned}&{\mathbf {D}}^*({\mathbf {p}}) = {\mathbf {p}}-{\varvec{\gamma }}^i(s^*({\mathbf {p}})). \end{aligned}$$

The parameter \(s^*\) corresponds to the point \({\varvec{\gamma }}_i(s^*)\) of the path that is the closest to \({\mathbf {p}}\). The vector \({\mathbf {D}}^*\) is the vector that goes from \({\varvec{\gamma }}_i(s^*)\) to \({\mathbf {p}}\). The optimization problem in (16) can be easily solved numerically with good precision by using a golden search method.

Now, to compute the \(\alpha \) functions, consider an orthonormal basis \([\hat{{\mathbf {x}}}_\gamma (s^*), \ \hat{{\mathbf {y}}}_\gamma (s^*), \ \hat{{\mathbf {z}}}_\gamma (s^*)]\). Vector \(\hat{{\mathbf {x}}}_\gamma (s^*)\) is defined as the tangent of the path at the parameter \(s^*\):

$$\begin{aligned} \hat{{\mathbf {x}}}_\gamma = \frac{\partial \varvec{\gamma }^i}{\partial s}(s^*) \left\| \frac{\partial \varvec{\gamma }^i}{\partial s}(s^*)\right\| ^{-1}. \end{aligned}$$

Vector \(\hat{{\mathbf {z}}}_\gamma (s^*)\) is chosen to point upwards. Thus, it is defined from \(\hat{{\mathbf {z}}}\) by removing its component on \(\hat{{\mathbf {x}}}_\gamma \) and normalizing the result:

$$\begin{aligned} \hat{{\mathbf {z}}}_\gamma = \frac{\hat{{\mathbf {z}}} - \left( \hat{{\mathbf {z}}}_\gamma ^T\hat{{\mathbf {x}}}_\gamma \right) \hat{{\mathbf {x}}}_\gamma }{\left\| \hat{{\mathbf {z}}} - \left( \hat{{\mathbf {z}}}_\gamma ^T\hat{{\mathbf {x}}}_\gamma \right) \hat{{\mathbf {x}}}_\gamma \right\| }. \end{aligned}$$

To define \(\hat{{\mathbf {z}}}_\gamma \) properly, we need that \(\hat{{\mathbf {x}}}_\gamma \ne \hat{{\mathbf {z}}}\); in other words, the path cannot be vertical. The consideration of this fact is a weak assumption since we do not consider gates in a position such that the vector \({\mathbf {w}}^i\) (see Sect. 4) is vertical.

Finally, vector \(\hat{{\mathbf {y}}}_i(s^*)\) is defined as:

$$\begin{aligned} \hat{{\mathbf {y}}}_\gamma = \hat{{\mathbf {z}}}_\gamma \times \hat{{\mathbf {x}}}_\gamma . \end{aligned}$$

One frame formed by \(\hat{{\mathbf {x}}}_\gamma \), \(\hat{{\mathbf {y}}}_\gamma \) and \(\hat{{\mathbf {z}}}_\gamma \) is depicted in Fig. 11. Functions \(\alpha _1\) and \(\alpha _2\) are defined as the dot product between the distance vector \({\mathbf {D}}^*\) and the axis \(\hat{{\mathbf {y}}}_i(s^*)\) and \(\hat{{\mathbf {z}}}_i(s^*)\), respectively:

$$\begin{aligned}&\alpha _1({\mathbf {p}}) = \hat{{\mathbf {y}}}_\gamma ({\mathbf {p}})^T{\mathbf {D}}^*({\mathbf {p}}), \nonumber \\&\alpha _2({\mathbf {p}}) = \hat{{\mathbf {z}}}_\gamma ({\mathbf {p}})^T{\mathbf {D}}^*({\mathbf {p}}). \end{aligned}$$

Note that the first-order optimality condition of (16) implies \(\hat{{\mathbf {x}}}^T{\mathbf {D}}^*=0\). Since \([\hat{{\mathbf {x}}}^T,\ \hat{{\mathbf {y}}}^T,\ \hat{{\mathbf {z}}}^T]\) form a orthogonal basis in \({\mathbb {R}}^3\) and \(\hat{{\mathbf {x}}}^T{\mathbf {D}}^*=0\), we can conclude that \(\alpha _1=\alpha _2=0 \iff \Vert {\mathbf {D}}^*\Vert =0\). Thus, the surfaces in Fig. 11 correspond to the zero level of the functions in (21). The level set \(\alpha _1({\mathbf {p}})=0\) in Fig. 11 corresponds to a sweep of the vector \(\hat{{\mathbf {z}}}_\gamma \) over the path. While set \(\alpha _2({\mathbf {p}})=0\) is the equivalent sweep of \(\hat{{\mathbf {y}}}_\gamma \). With this method, we now have the necessary functions to compute the vector field F that will guide the drone toward the planned curve \({\varvec{\gamma }}\), defined in Sect. 4.

Now, given \(\alpha _1\) and \(\alpha _2\) we use the methodology by Gonçalves et al. [17] to compute the vector field \(F({\mathbf {p}})\) as:

$$\begin{aligned} F({\mathbf {p}}) = v_p\left( G\frac{\nabla V}{\Vert \nabla V\Vert } + H\frac{\nabla \alpha _1\times \nabla \alpha _2}{\Vert \nabla \alpha _1\times \nabla \alpha _2\Vert }\right) , \end{aligned}$$

in which \(V = \alpha _1^2 + \alpha _2^2\), the gradients \(\nabla \alpha _1\), \(\nabla \alpha _2\) and \(\nabla V\) are easily computed numerically and \(v_p\) is the reference speed for the robot. The functions G and H are defined as:

$$\begin{aligned}&G = -\frac{2}{\pi }{{\,\mathrm{atan}\,}}\left( k_f\sqrt{V}\right) , \end{aligned}$$
$$\begin{aligned}&H = \sqrt{1-G^2}, \end{aligned}$$

in which \(k_f>0\) is a gain that controls the intensity of the convergent action of the vector field.

Note that the first term in (22) points in the direction of decrease in V, thus it leads \(\alpha _1\) and \(\alpha _2\) go to 0, consequently, it makes the trajectory of the system \(\dot{{\mathbf {p}}}=F({\mathbf {p}})\) approach the curve. The second term is orthogonal to the gradients of \(\alpha _1\) and \(\alpha _2\), thus, for \(\alpha _1=\alpha _2=0\), this component is parallel to the curve. The functions G and H are modulation factors that regulate between the components of convergence and circulation. Formal convergence proofs of this vector field are presented by Gonçalves et al. [17].

The velocity \(v_p \equiv v_p(s^*)>0\) is the velocity of the vector field. One option is to set a constant \(v_p\), for instance, given a desired velocity \(v_d\), we set \(v_p=v_d\). A better result was obtained by modulating \(v_d\) with the speed profile of the plan \({\varvec{\gamma }}^i\) (see Sect. 4) evaluated in the parameter \(s^*\). It is computed by:

$$\begin{aligned} v_p(s^*) = \frac{v_d}{L^i} \left\| \frac{d{\varvec{\gamma }}^ i(s^*)}{ds}\right\| . \end{aligned}$$

The planner used in this work is originally designed to compute a trajectory. However, the vector field based controller is a path controller. Given an instant t, the desired position of the drone is not defined. The \(s^*\) parameter in Eq. (16) is only defined by the drone’s position. Thus, the reference to the controller is not directly tied to time, making it a path controller, not a trajectory controller. In other words, the reference velocity \(F({\mathbf {p}})\) is a function of \({\mathbf {p}}\) and does not depend on time.

This approach is particularly interesting because it does not have some common issues in trajectory tracking control: (i) if the reference is far from the drone’s starting point, it may pass through an aggressive transient; (ii) if there is a temporary system response failure, which causes the drone to stop, when responsiveness comes back the reference may be far from the current state and will cause an extra transient. (iii) in the case of shifts on the estimate states of the vehicle, due to disturbs in the gates’ position, for example, the control passes through one more transient; and (iv) if the drone is not able to keep tracking the trajectory, due to actuators’ saturation, the tracking error will keep increasing if no additional supervisory algorithm is used to slow down the reference. None of these problems are present in a path controller such as the vector field based.

Nonlinear control

Given that the vector field \(F({\mathbf {p}})\) is already defined, it is now necessary to compute the control signals \(\tau \) and \(\omega \). Here, we will use the control strategy presented by Rezende et al. [46], where the reader should look for the convergence proofs.

The control structure is presented in Fig. 10. The block velocity control computes the desired acceleration to make the velocity of the drone converge to the velocity of the field. The desired acceleration is computed as:

$$\begin{aligned} {\mathbf {a}}_d = J_F({\mathbf {p}}){\mathbf {v}} + k_v(F({\mathbf {p}})-{\mathbf {v}}), \end{aligned}$$

in which \(J_F({\mathbf {p}})\) is the Jacobian of \(F({\mathbf {p}})\) and \(k_v>0\) is a gain. Note that (26) does not contain the partial derivative of F with respect to time ([46]), since the curve is static.

The thrust of the drone acts on the axes \(\hat{{\mathbf {z}}}_b\). The resulting acceleration will also depend on the gravity and drag forces. Thus, to compute \(\tau \), we consider the acceleration \({\mathbf {a}}_r\) defined as:

$$\begin{aligned} {\mathbf {a}}_r = {\mathbf {a}}_d + g\hat{{\mathbf {z}}} + \frac{k_b{\mathbf {v}}}{m}, \end{aligned}$$

If the \(\hat{{\mathbf {z}}}_b\) axis is not in the direction of \({\mathbf {a}}_r\), it is impossible to apply, instantaneously, the acceleration \(a_d\) to the drone. Rezende et al. [46] choose the value of \(\tau \) that minimize the difference between \((\tau /m)\hat{{\mathbf {z}}}_b\) and \({\mathbf {a}}_r\). Thus, \(\tau \) is defined as:

$$\begin{aligned} \tau = \arg \min _{\varvec{\tau }'} \left\| {\mathbf {a}}_r-(\varvec{\tau }'/m){\hat{z}}_b\right\| ^2 = m\left( {{\hat{\varvec{z}}}}_b^T{\mathbf {a}}_r\right) . \end{aligned}$$

To apply the cinematic acceleration \({\mathbf {a}}_c\), it is necessary to align \(\hat{{\mathbf {z}}}_b\) with \({\mathbf {a}}_r\). Thus, we define a reference matrix \(R_r = [\hat{{\mathbf {x}}}_r,\ \hat{{\mathbf {y}}}_r,\ \hat{{\mathbf {z}}}_r]\). The axis \(\hat{{\mathbf {z}}}_r\) is given by \(\hat{{\mathbf {z}}}_r = {\mathbf {a}}_r/\Vert {\mathbf {a}}_r\Vert \). Now, a vector that represents the reference heading angle \(\psi _r\) (see Eq. (2)), is \({\mathbf {w}}_\psi = [\cos (\psi _r),\ \sin (\psi _r),\ 0]^T\). The axis \(\hat{{\mathbf {x}}}_r\) is defined from \({\mathbf {w}}_\psi \) by removing its component on \(\hat{{\mathbf {z}}}_r\) and normalizing the result:

$$\begin{aligned} {{\hat{\mathbf{x}}}}_r = \frac{{\mathbf {w}}_\psi - {\mathbf {w}}_\psi ^T{{\hat{\mathbf{z}}}}_b{{\hat{\mathbf{z}}}}_b}{\left\| {\mathbf {w}}_\psi - {\mathbf {w}}_\psi ^T{{\hat{\mathbf{z}}}}_b{{\hat{\mathbf{z}}}}_b\right\| }. \end{aligned}$$

Finally, \({{\hat{\mathbf{y}}}}_r = {{\hat{\mathbf{z}}}}_r \times {{\hat{\mathbf{x}}}}_r\).

Given the reference defined by \(R_r\), one may represent the orientation error as a matrix \(R_e = (R_d^w)^TR_r^w\). To define the control law for \(\omega \), we consider the map to the axis angle representation:

$$\begin{aligned} R_e \mapsto (\hat{{\mathbf {n}}},\beta ), \end{aligned}$$

where \(\hat{{\mathbf {n}}}\) is the axis and \(\beta \) is the angle. Finally, the control law for \(\omega \) is given by:

$$\begin{aligned} \omega = \omega _r + k_\beta \sin (\beta ){{\hat{\mathbf{n}}}}, \end{aligned}$$

in which \(\omega _r\) is a feed forward component obtained from the relation \(S(\omega _r) = \left( R_d^w\right) ^T{\dot{R}}_r^w\left( R_e\right) ^T\). See the work by Rezende et al. [46] for more information about this control strategy.


In this section, we first present comparison results of the perception algorithm, which is, in general, the most critical task in terms of computing resources. Then, we present the results of the complete system developed to solve the racing problem. The experiments were conducted on our modified version of the FlightGoggles simulatorFootnote 10. The source code implementation of the proposed pipeline using the Robot Operating System (ROS) under Ubuntu 16.04 is available onlineFootnote 11. Videos of the results using the proposed methods can also be accessed onlineFootnote 12.

Visual Perception

The visual perception module uses the YOLOv4 single-shot detector to estimate the gate location, including the visible corners. A Nvidia GTX Titan X GPU with 12GB GDDR5 was used for training the models. The training loss and mean average precision (mAP) over the two versions of the evaluated models for a dataset of over 16044 images are depicted in Figs. 12 and 13 for the full and tiny versions. For the validation set, we use roughly 9% of the training dataset (1368 images). The training loss with the YOLOv4-Tiny model is marginally inferior to the loss of the full model. However, the mAP for the validation dataset is superior in the full model for a confidence threshold of 0.25. The full version of the network is expected to be more accurate than the tiny version, given the increased number of parameters of the network. Nevertheless, as shown in Table 2, the smaller network maintained a similar or better precision and recall for different confidence thresholds.

Fig. 12

Training of the larger network for gate detection: YOLOv4-Full training loss and mAP

Fig. 13

Training of the smaller network for gate detection: YOLOv4-Tiny training loss and mAP

Table 2 Accuracy of the YOLOv4 models over the validation dataset

The increase on the number of parameters of the full network over the tiny version comes with the cost of an increased execution time, as shown in Fig. 14. In the Nvidia Tegra tx2 board, this reduction is over ten times, while in a desktop computer with Nvidia TITAN X, the reduction is over five times. For the other devices, we were only able to use the tiny version of the network given the memory requirements. In many instances of fast pacing movements, the increase in speed is desired even when there is a slight decrease in raw accuracy. In this sense, given the smaller model’s FPS increase, deployment capabilities, and overall performance, we selected the YOLOv4-Tiny for the vision pipeline.

Fig. 14

Comparison of average frames per seconds in various processing machines using YOLOv4-Full and YOLOv4-Tiny

The main purpose of the visual perception module is to estimate the gate’s location in the view of the drone’s camera. The filter uses such information to improve the estimation of the drone’s pose when passing through the gate. Thus, to assess the accuracy of this information, we measure the error in estimating the gate’s location as the drone approaches it at an average speed of \(5.0~\mathrm {m/s}\). Figure 15 shows the average error and the \(95\%\) confidence interval for 30 trials. The \(\epsilon _y\) error represents the gate’s height, and the \(\epsilon _z\) error the distance from the gate, both concerning the drone frame.

Fig. 15

Error in estimating the position of the gate as the drone approaches it using the proposed visual perception pipeline. The bold lines are the average errors concerning each axis, and the dashed lines represent the \(95\%\) confidence interval after performing 30 trials

In this experiment, we observe that the error is nearly proportional to the distance between the drone and the gate, producing a minimum error when the drone is close to it. We note that the \(\epsilon _x\) and \(\epsilon _z\) errors are the ones that most impact the estimate of the position of the gate, mainly for long distances (above \(8~\mathrm {m}\)). However, for short distances (less than \(6~\mathrm {m}\)), the details of the gate’s corner are more prominent, allowing a refinement step to improve the gate’s location and decreasing the norm error to less than \(0.4~\mathrm {m}\). This strategy is essential to improve the estimation of the gate’s position, enabling the drone to pass through the gate fast but safely.

FlightGoggles Simulations

The integrated system algorithms run in ROS. The codes related to trajectory planning were implemented in Python. The optimization was solved with the quadratic programming solver solvers.qp from the cvxoptFootnote 13 library. The Perception, EKF and the vector field based controller algorithms were implemented in C\(++\). The solvePnp function of the opencvFootnote 14 library was used to execute the PnP in the gate detection algorithm. A ROS version of the ORB-SLAM2Footnote 15 was considered.

Figure 16 illustrates the results obtained in the execution of the task in the new scenario of the FlightGoggles simulator. The blue rectangles correspond to the nominal position of each gate. In this first experiment, no disturbance was added to the nominal position of the gates. The black line is the trajectory planned to cross the gates, considering its nominal position. In magenta is the performed trajectory executed by the drone (ground truth), and in green is the trajectory estimated by the EKF. It is possible to observe the system performance following the trajectory and estimating the drone’s pose.

The vector field controller’s parameter was settled as \(k_f=1.2\). The trajectory planner parameters are \(v_d = 5.0~\mathrm {m/s}\), \(l=3.0~\mathrm {m}\) e \(\delta _q = 0.8~\mathrm {m}\). The controller gains were adjusted as: \(k_f = 1.2\), \(k_v =1.0\), \(k_\beta =2.8\). The EKF’s covariance matrices were defined empirically. The filter, perception and SLAM codes run at \(60~\mathrm {Hz}\) and controller at \(120~\mathrm {Hz}\).

Fig. 16

Obtained trajectories: planned, measured and performed

Figure 17 presents an illustration of the visual localization strategies. Note that sometimes the gate detection algorithm cannot detect a necessary number of features for a confidence estimation of the drone’s pose with the PnP. On the other hand, the ORB-SLAM2 algorithm, in this experiment, presented an error in the estimated orientation, which is propagated over time. Even in these adverse situations, the EKF provides a satisfactory estimation of the drone’s pose by combining information of these visual localization methods and the IMU. This shows the robustness of the proposed method.

Fig. 17

Localization strategies: Extended Kalman Filter, ORB-SLAM2, Gate Detection

Figure 18 presents a comparison between the reference velocities given by the vector field (in red), the estimated by EKF (in green), and the performed by the drone (blue), during the trajectory shown in Fig. 16. Note that the EKF estimates the velocities without direct measurement data. In Fig. 19, the red lines represent the reference signals for the state angles (roll, pitch and yaw), the estimated are in green and the performed in blue. The reference Euler angles are obtained from the matrix \(R_r\) only to plot the graph. In both Figures, the performed velocities and angles are close to the references and the estimations track the real states. In Fig. 19, there is a discontinuity in \(\psi \), especially at the time \(16~\mathrm {s}\). In fact, \(\psi _r\) is discontinuous in the gates’ transitions.

Fig. 18

Reference for velocity \(F(\bar{{\mathbf {p}}})\) (in red), the estimated (in green), and the performed ones (in blue) (color figure online)

Fig. 19

Reference for angles (\(\phi _r\), \(\theta _r\) and \(\psi _r\)) (in red), the estimated (in green), and the performed ones (in blue) (color figure online)

Figure 20 presents another test in the FlightGoggles simulator with the gates at disturbed positions (yellow rectangles) concerning the nominal ones (blue rectangles). Note that the planned path, in black, passes through the nominal gates, while the trajectory performed by the drone (magenta) passes through the real gates in the disturbed locations. As expected, the estimated trajectory, in green, passes through the nominal gates. In fact, since the drone locates itself with respect to the next gate to be crossed, the system measures that a gate is being crossed at a nominal position while the real drone crosses the gate at its disturbed location. When the next gate to be crossed is detected by the first time, the filter corrects the estimation according to this new target. This correction is evident, for instance, in between gates 2 and 3.

Figure 21 presents a top view (xy plane) of Fig. 20. It shows that the drone passes through the disturbed gate even if the planned and estimated trajectories cross the nominal gate. Note that before reaching gate 3, the estimated trajectory (green) converges to the planned (black) and the drone’s real position (magenta) crosses the disturbed gate (yellow rectangle). Since gate 4 is not disturbed and that much, when the drone traverses gate 3 and begins to locate based on gate 4 (see the red dots), the estimated trajectory converges to the real one, which together converge to the planned trajectory. This global localization that assumes a relative pose with respect to the next gate is the tool that allows the navigation without the replanning of the trajectories.

Conclusions and future work

This paper presented the integration of various techniques of mobile robotics as a solution for a drone racing challenge. In this sense, our proposal shows a complete pipeline for a competition drone, including trajectory planning in three-dimensional environments, visual perception modules for gate pose estimation, localization, and robust control algorithms using vector fields. The proposed pipeline is designed to be executed in the small and lightweight drone’s embedded computers and does not depend on external localization systems.

The methodology was validated in a photorealistic simulator. An earlier version of our strategy resulted in 8th place for the XQuad team in the Alpha Pilot challenge, classifying for the AIRR competition, and 7th place in the finals of AIRR. We showed simulations in an adapted version of the FlightGoggles, illustrating how our solution operates and demonstrating its robustness when the gates are displaced from their nominal positions. The results showed the proposed strategy’s competing performance, presenting itself as a feasible pipeline for drone racing systems. The modified version of the simulation and the source code implementations of our proposed solution is available online.

Fig. 20

EKF and control correction in a track with disturbed gates

Fig. 21

Example of drone’s path correction in between gates 3 and 4

The perception modules showed to be the most challenging task, especially at high speeds. In these cases, problems such as camera blur and in-place rotations make it challenging to achieve good perception and localization. Thus, besides the drone’s dynamics difficulties at higher speeds, the control module needs also to handle poorly state estimations given by the perception module. In this sense, when the speed increase, the importance of decreasing computational costs becomes critical. A holistic overview of the proposed pipeline shows that the more significant bottleneck to increasing the performance of the integrated system is related to the visual perception modules, including mapping, localization, and object detection.

As future work, we intend to execute experiments embedding the proposed system in a physical drone, including a collision-avoidance mechanism using stereo cameras. Considering that the obstacles are known, a strategy of repulsive vector fields can be easily embedded in the controller. We also intend to take advantage of our proposed navigation system to improve the execution of tasks with autonomous drones such as package delivery, inspection, monitoring, among other industrial and commercial drone activities. The inclusion of other sensors in our localization method should also contribute to the execution of these tasks.

Data availability

Algorithms and data sets can be made available upon acceptance and publication.


  1. 1.

  2. 2.

  3. 3.

  4. 4.

  5. 5.

  6. 6.

  7. 7.

  8. 8.

  9. 9.

  10. 10.

  11. 11.

  12. 12.

  13. 13.

  14. 14.

  15. 15.


  1. 1.

    Al Younes Y, Drak A, Noura H, Rabhi A, El Hajjaji A (2016) Robust model-free control applied to a quadrotor UAV. J Intell Robot Syst 84(1–4):37–52

    Article  Google Scholar 

  2. 2.

    Araar O, Aouf N (2014) Visual servoing of a quadrotor uav for autonomous power lines inspection. In: 22nd mediterranean conference on control and automation. IEEE, pp 1418–1424

  3. 3.

    Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:200410934

  4. 4.

    Cimpoi M, Maji S, Kokkinos I, Mohamed S, Vedaldi A (2014) Describing textures in the wild. In: Proceedings of the IEEE conf. on computer vision and pattern recognition (CVPR)

  5. 5.

    Cocoma-Ortega JA, Martinez-Carranza J (2019) A CNN based drone localisation approach for autonomous drone racing. In: Proceedings of the 11th international micro air vehicle competition and conference, Madrid, Spain, vol 30

  6. 6.

    Cyba A, Szolc H, Kryjak T (2021) A simple vision-based navigation and control strategy for autonomous drone racing. arXiv preprint arXiv:210409815

  7. 7.

    Deng H, Fu Q, Quan Q, Yang K, Cai KY (2019) Indoor multi-camera based testbed for 3D tracking and control of UAVs. IEEE Trans Instrum Meas 69(6):3139–3156

    Article  Google Scholar 

  8. 8.

    Dolatabadi SH, Yazdanpanah MJ (2015) MIMO sliding mode and backstepping control for a quadrotor UAV. In: 2015 23rd Iranian conference on electrical engineering. IEEE, pp 994–999

  9. 9.

    Durdevic P, Ortiz-Arroyo D, Li S, Yang Z (2019) Vision aided navigation of a quad-rotor for autonomous wind-farm inspection. IFAC-PapersOnLine 52(8):61–66

    Article  Google Scholar 

  10. 10.

    Falanga D, Mueggler E, Faessler M, Scaramuzza D (2017) Aggressive quadrotor flight through narrow gaps with onboard sensing and computing using active vision. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 5774–5781,

  11. 11.

    Falanga D, Foehn P, Lu P, Scaramuzza D (2018) Pampc: perception-aware model predictive control for quadrotors. In: 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS). pp 1–8,

  12. 12.

    Foehn P, Brescianini D, Kaufmann E, Cieslewski T, Gehrig M, Muglikar M, Scaramuzza D (2020) Alphapilot: autonomous drone racing. Robot Sci Syst

  13. 13.

    Foehn P, Romero A, Scaramuzza D (2021) Time-optimal planning for quadrotor waypoint flight. Sci Robot.

    Article  Google Scholar 

  14. 14.

    Gao XS, Hou XR, Tang J, Cheng HF (2003) Complete solution classification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 25(8):930–943

    Article  Google Scholar 

  15. 15.

    Garrido-Jurado S, Muñoz-Salinas R, Madrid-Cuevas FJ, Marín-Jiménez MJ (2014) Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognit 47(6):2280–2292

    Article  Google Scholar 

  16. 16.

    Geiger A, Moosmann F, Car Ö, Schuster B (2012) Automatic camera and range sensor calibration using a single shot. In: 2012 IEEE international conference on robotics and automation. IEEE, pp 3936–3943

  17. 17.

    Gonçalves VM, Pimenta LC, Maia CA, Dutra BC, Pereira GA (2010) Vector fields for robot navigation along time-varying curves in \(n\)-dimensions. IEEE Trans Robot 26:647–659

    Article  Google Scholar 

  18. 18.

    Guerra W, Tal E, Murali V, Ryou G, Karaman S (2019) Flightgoggles: photorealistic sensor simulation for perception-driven robotics using photogrammetry and virtual reality. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS). pp 6941–6948,

  19. 19.

    Han Z, Wang Z, Xu C, Gao F (2021) Fast-racing: an open-source strong baseline for SE(3) planning in autonomous drone racing. arXiv preprint arXiv:210510276

  20. 20.

    Hu F, Cheng J, Bao Y, He Y (2020) Accuracy enhancement for the front-end tracking algorithm of rgb-d slam. Intell Serv Robot 13(2):207–218

    Article  Google Scholar 

  21. 21.

    Huang AS, Bachrach A, Henry P, Krainin M, Maturana D, Fox D, Roy N (2017) Visual odometry and mapping for autonomous flight using an rgb-d camera. Robotics research. Springer, New York, pp 235–252

    Chapter  Google Scholar 

  22. 22.

    Jayasinghe J, Athauda A (2016) Smooth trajectory generation algorithm for an unmanned aerial vehicle (UAV) under dynamic constraints: using a quadratic bezier curve for collision avoidance. In: 2016 manufacturing & industrial engineering symposium (MIES). IEEE, pp 1–6

  23. 23.

    Jing X, Cui J, He H, Zhang B, Ding D, Yang Y (2017) Attitude estimation for UAV using extended kalman filter. In: 2017 29th Chinese control and decision conference (CCDC). IEEE, pp 3307–3312

  24. 24.

    Jung S, Hwang S, Shin H, Shim DH (2018) Perception, guidance, and navigation for indoor autonomous drone racing using deep learning. IEEE Robot Autom Lett 3(3):2539–2544

    Article  Google Scholar 

  25. 25.

    Karaduman M, Çınar A, Eren H (2019) UAV traffic patrolling via road detection and tracking in anonymous aerial video frames. J Intell Robot Syst 95(2):675–690

    Article  Google Scholar 

  26. 26.

    Karaman S, Frazzoli E (2011) Sampling-based algorithms for optimal motion planning. Int J Robot Res 30(7):846–894

    Article  Google Scholar 

  27. 27.

    Kaufmann E, Gehrig M, Foehn P, Ranftl R, Dosovitskiy A, Koltun V, Scaramuzza D (2019) Beauty and the beast: optimal methods meet learning for drone racing. In: 2019 international conference on robotics and automation (ICRA). pp 690–696,

  28. 28.

    Kaufmann E, Loquercio A, Ranftl R, Müller M, Koltun V, Scaramuzza D (2020) Deep drone acrobatics. arXiv preprint arXiv:200605768

  29. 29.

    Koch W, Mancuso R, West R, Bestavros A (2019) Reinforcement learning for UAV attitude control. ACM Trans Cyber-Phys Syst 3(2):1–21

    Article  Google Scholar 

  30. 30.

    Kwon W, Park JH, Lee M, Her J, Kim SH, Seo JW (2019) Robust autonomous navigation of unmanned aerial vehicles (UAVs) for warehouses inventory application. IEEE Robot Autom Lett 5(1):243–249

    Article  Google Scholar 

  31. 31.

    Lee KS, Ovinis M, Nagarajan T, Seulin R, Morel O (2015) Autonomous patrol and surveillance system using unmanned aerial vehicles. In: 2015 IEEE 15th international conference on environment and electrical engineering (EEEIC). IEEE, pp 1291–1297

  32. 32.

    Lee T, Leok M, McClamroch NH (2010) Geometric tracking control of a quadrotor UAV on SE(3). In: 49th IEEE conference on decision and control (CDC). pp 5420–5425.

  33. 33.

    Li S, van der Horst E, Duernay P, De Wagter C, de Croon G (2019) Visual model-predictive localization for computationally efficient autonomous racing of a 72-gram drone. arxiv 2019. arXiv preprint arXiv:190510110

  34. 34.

    Li S, Ozo MM, De Wagter C, de Croon GC (2020) Autonomous drone race: a computationally efficient vision-based navigation and control strategy. Robot Auton Syst 133:103621.

    Article  Google Scholar 

  35. 35.

    Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318

    Article  Google Scholar 

  36. 36.

    Loquercio A, Kaufmann E, Ranftl R, Dosovitskiy A, Koltun V, Scaramuzza D (2019) Deep drone racing: from simulation to reality with domain randomization. IEEE Trans Robot 36(1):1–14

    Article  Google Scholar 

  37. 37.

    Mellinger D, Kumar V (2011) Minimum snap trajectory generation and control for quadrotors. In: 2011 IEEE international conference on robotics and automation. pp 2520–2525.

  38. 38.

    Moon H, Martinez-Carranza J, Cieslewski T, Faessler M, Falanga D, Simovic A, Scaramuzza D, Li S, Ozo M, De Wagter C et al (2019) Challenges and implemented technologies used in autonomous drone racing. Intell Serv Robot 12(2):137–148

    Article  Google Scholar 

  39. 39.

    Morrell B, Rigter M, Merewether G, Reid R, Thakker R, Tzanetos T, Rajur V, Chamitoff G (2018) Differential flatness transformations for aggressive quadrotor flight. In: 2018 IEEE international conference on robotics and automation (ICRA). pp 5204–5210.

  40. 40.

    Mur-Artal R, Tardós JD (2017) ORB-SLAM2: an open-source SLAM system for monocular, stereo and RGB-D cameras. IEEE Trans Robot 33(5):1255–1262.

    Article  Google Scholar 

  41. 41.

    Mur-Artal R, Montiel JM, Tardós JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163.

    Article  Google Scholar 

  42. 42.

    Mustafah YM, Azman AW, Akbar F (2012) Indoor UAV positioning using stereo vision sensor. Procedia Eng 41:575–579

    Article  Google Scholar 

  43. 43.

    Pfeiffer C, Scaramuzza D (2021) Human-piloted drone racing: visual processing and control. IEEE Robot Autom Lett 6(2):3467–3474.

    Article  Google Scholar 

  44. 44.

    Raffo GV, de Almeida MM (2016) Nonlinear robust control of a quadrotor uav for load transportation with swing improvement. In: 2016 American control conference (ACC). IEEE, pp 3156–3162

  45. 45.

    Rezende AM, Miranda VR, Machado HN, Chiella AC, Gonçalves VM, Freitas GM (2019) Autonomous system for a racing quadcopter. In: 2019 19th International conference on advanced robotics (ICAR). IEEE

  46. 46.

    Rezende AM, Gonçalves VM, Pimenta LC (2020) Robust quadcopter control with artificial vector fields. In: 2020 IEEE/RSJ international conference on robotics and automation (ICRA). IEEE, pp 6381–6387

  47. 47.

    Richter C, Bry A, Roy N (2016) Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. Robotics research. Springer, New York, pp 649–666

    Chapter  Google Scholar 

  48. 48.

    Rojas-Perez LO, Martinez-Carranza J (2021) On-board processing for autonomous drone racing: an overview. Integration 80:46–59.

    Article  Google Scholar 

  49. 49.

    Rublee E, Rabaud V, Konolige K, Bradski G (2011) ORB: an efficient alternative to SIFT or SURF. In: 2011 International conference on computer vision. IEEE, pp 2564–2571

  50. 50.

    Saadeddin K, Abdel-Hafez MF, Jarrah MA (2014) Estimating vehicle state by GPS/IMU fusion with vehicle dynamics. J Intell Robot Syst 74(1–2):147–172

    Article  Google Scholar 

  51. 51.

    Sayre-McCord T, Guerra W, Antonini A, Arneberg J, Brown A, Cavalheiro G, Fang Y, Gorodetsky A, McCoy D, Quilter S, et al. (2018) Visual-inertial navigation algorithm development using photorealistic camera simulation in the loop. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 2566–2573

  52. 52.

    Schauwecker K, Zell A (2013) On-board dual-stereo-vision for autonomous quadrotor navigation. In: 2013 international conference on unmanned aircraft systems (ICUAS). IEEE, pp 333–342

  53. 53.

    Song Y, Steinweg M, Kaufmann E, Scaramuzza D (2021) Autonomous drone racing with deep reinforcement learning. arXiv preprint arXiv:210308624

  54. 54.

    Sung Y, Kwak J, Jeong YS, Park JH (2016) Beacon distance measurement method in indoor ubiquitous computing environment. Advances in parallel and distributed computing and ubiquitous services. Springer, New York, pp 125–130

    Chapter  Google Scholar 

  55. 55.

    Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. MIT press, Cambridge

    MATH  Google Scholar 

  56. 56.

    Urzua S, Munguía R, Grau A (2019) Monocular slam system for MAVs aided with altitude and range measurements: a GPS-free approach. J Intell Robot Syst 94(1):203–217

    Article  Google Scholar 

  57. 57.

    Vaidis M, Otis MJD (2020) Toward a robot swarm protecting a group of migrants. Intell Serv Robot 1–16

  58. 58.

    Villa DK, Brandao AS, Sarcinelli-Filho M (2019) A survey on load transportation using multirotor uavs. J Intell Robot Syst 1–30

  59. 59.

    Wang T, Chang DE (2021) Robust navigation for racing drones based on imitation learning and modularization. arXiv preprint arXiv:210512923

  60. 60.

    Yang H, Lee Y, Jeon SY, Lee D (2017) Multi-rotor drone tutorial: systems, mechanics, control and state estimation. Intell Serv Robot 10(2):79–93

    Article  Google Scholar 

  61. 61.

    Yousefi F, Monfared SB (2018) Cascade feedback linearization controller with actuators dynamic for trajectory tracking of flying robot. In: 2018 8th conference of AI & robotics and 10th RoboCup Iranopen international symposium (IRANOPEN). IEEE, pp 46–51

  62. 62.

    Zhou D, Schwager M (2014) Vector field following for quadrotors using differential flatness. In: 2014 IEEE international conference on robotics and automation (ICRA). pp 6567–6572,

Download references


This work was developed with the financial support of Coordenação de Aperfeiçamento de Pessoal de Nível Superiror (CAPES), Conselho Nacional de Desenvolvimento Cientíifico e Tecnológico (CNPq), the companies Vale S.A., Instituto Tecnológico Vale (ITV), and DTI Digital. From the Universidade Federal de Minas Gerais (UFMG), we are thankful for the support of Graduate Program in Electrical Engineering (PPGEE), Department of Electrical Engineering (DEE), Department of Computer Science (DCC), and the School of Engineering.


Universidade Federal de Minas Gerais (UFMG) Instituto Tecnológico Vale (ITV); Vale S.A.; Conselho Nacional de Desenvolvimento Cientíifico e Tecnológico (CNPq).

Author information



Corresponding author

Correspondence to Adriano M. C. Rezende.

Ethics declarations

Conflict of interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rezende, A.M.C., Miranda, V.R.F., Rezeck, P.A.F. et al. An integrated solution for an autonomous drone racing in indoor environments. Intel Serv Robotics (2021).

Download citation


  • Autonomous drone racing
  • Localization
  • Trajectory planning
  • Artificial vector fields
  • Computer vision
  • Nonlinear control