1 Introduction

An autonomous agent acts on its environment by its own means, relying from little to nothing on other parties. An agent has to model its environment and locate itself in it, then it can create a plan and execute it to achieve a certain goal.

This translates to a set of subproblems to make out of a UAV an agent: (a) model its environment and estimate its current status; (b) given a certain goal, generate a feasible plan for the UAV to execute; (c) control the vehicle so that it can execute the plan. In the following sections we describe the multilayered hierarchical approach we propose to give a solution to all this subproblems; more importantly, the use of computer vision to locate the vehicle and the possibility to operate different kinds of aerial vehicles. The results show how this approach allowed for the operation of heterogeneous vehicles while describing different three dimensional trajectories.

2 Related Work

A robot consists of a series of highly heterogenous systems that are complex in nature and require an orchestrated integration to function properly. For a robot with certain mechanical features, depending on the problem it is intended to solve, there are many approaches to control it, but the archetypes are only three [5]: hierarchical, behavioral and hybrid. The hierarchical approach follows a sense-plan-act scheme, prioritizing deliberative control above all else, making it not very flexible. The behavioral approaches follow a bottom-up priority scheme, they use behaviors to react to the environment in real-time; because its reactive nature, achieving complex objectives is often difficult. Hybrid architectures try to make the best out of the hierarchical and behavioral approaches, combining the deliberative skills from the first and the flexibility of the latter.

Some examples of high-level and computationally demanding features are: path-planning, human interaction and multi-robot coordination; examples for low-level routines are: sensor reading, actuator control and localization. A hybrid multi-layered architecture has proven successful on the field of mobile robotics because it allows to properly interface, upgrade and coordinate high-level and low-level routines [14], interacting with humans [6] and a group of identical robots [10, 18]. This kind of architecture, like the one shown in Fig. 1 [3], consists of three layers: (1) the low-level control layer allows to directly manage and access all hardware peripherals in real-time, it also implement some reactive behaviors; (2) the planner represents a set of high-level processes that given the current status of the robot and its environment, create a plan for the robot to achieve a certain goal. (3) the sequencer is the intermediate layer between the low level control and the planner that will execute the steps of the plan in sequence. In case an error occurs it will update the planner with the current status and ask for a new course of action.

Fig. 1.
figure 1

The three-layer software architecture for an autonomous robot.

On the field of UAVs, control schemes have been tested following a reactive approach, i.e. they act proportionally to an error metric, usually defined by tracking and triangulating salient features with computer vision [19, 20]. There are two schemes on how to profit from the payload of a UAV; in the first scheme, all generated information during the flight is stored in a non-volatile storage system for analysis after landing [9, 13], in the second scheme, all gathered information is sent to the Ground Control Station (GCS) for further analysis and decision making [1, 2]. Whatever the scheme, the UAV acts as a teleoperated entity with little autonomy to react to its flying conditions whether adverse or not. Even so, these schemes are popular and good enough for most civilian and military applications.

Nowadays, there’s a growing research community working on a third scheme, on which the flight plan of the UAV is not only dictated by a set of georeferenced waypoints or remotely piloted. Instead, the UAV process the information collected from the onboard sensors to further understand its environment and react to or interact with it. This problem is known as Simultaneous Localization and Mapping (SLAM) [11, 17]. Solving the SLAM problem means that a robot is able to navigate autonomously through its environment while it creates a virtual representation of its surroundings: the map [4, 12]. The work presented on this article is related to the third scheme. On this article, we describe how we plan to go one step further from the reactive approach by introducing a three layer architecture for the control of UAVs and the first steps we have taken.

3 Hardware Description

This work was successfully tested with two different UAVs, the first one we tested was the Solo from 3D Robotics and the second vehicle we tested was the AR-Drone v2, manufactured by Parrot (see Fig. 2). Both vehicles are ready-to-fly UAVs and feature an onboard monoscopic camera. To communicate with these vehicles we only had to adapt the low-level control routines, for the AR-Drone we used the ROS package created to communicate with it. To gain access to the Solo we used GstreamerFootnote 1 to receive the video feed and Dronekit (the software library to interface with UAVs compatible with the MAVlink protocol [15]). As a physical interface to operate the vehicle we used a hardware remote control, we gave a bigger priority to pilot commands over autonomous control; in case of unforeseen situations, the pilot can bypass the autonomous control immediately by operating the hardware controller. The software development was based on the Linux operating system and the Robotic Operating System (ROS) [16].

Fig. 2.
figure 2

The two drones tested.

4 Proposed Approach and Methodology

The test scenario is shown on Fig. 4, the UAV follows the desired trajectory marked in blue while, overflying artificial markers fixed on the ground, the downward looking camera captures the aerial view and transmits the video feed to the GCS. Figure 4 also shows the reference frames attached to the monoscopic camera , the world reference frame , the center of gravity (CoG) of the vehicle and the frame (X: North, Y: East, Z: Down).

At the top level, the proposed hierarchical multi-layer architecture defines the desired trajectory \(\mathbf {r}_d(t)\) with respect to as a flight plan; and at the bottom level, it makes noisy estimations of the position of with respect to using computer vision. Noisy estimations are filtered using a Kalman Filter and then compared with the desired position, resulting in an error to be minimized by driving the vehicle close to the desired position.

Figure 3 shows the structure of the architecture proposed in this paper. At the top, we show the high-level planner node, in charge of computing the desired flight plan for the UAV. At the bottom, the low-level nodes including: the hardware interface to the vehicle and the camera, the computer vision localization nodes and the controller. The trajectory generator node, defines the desired position for the UAV according to the parameters defined by the planner. For now, the trajectory tracker incorporates the ability of generating a lemniscate or an spline trajectory; the sequencer is in charge of switching between the two, depending on the flight plan.

Fig. 3.
figure 3

The three-layer architecture for the UAV

To deal with spatial relationships from frame to frame , we used rigid body transformations in homogeneous coordinates denoted as:

where and are the rotation and translation components, respectively. Within the multi-layered architecture, we used the work from Foote [7] to manage all rigid body transformations.

Note that if we solve with computer vision, we can locate with respect to because the camera is rigidly mounted on the UAV (the rigid body transformation from the camera to the center of mass of the vehicle is known beforehand). To estimate we used the technique developed by Garrido et al. [8]; which consists on segmenting from the images the artificial markers and estimate the pose of the camera from all detected corners. Then, the location of with respect to can be computed with .

Fig. 4.
figure 4

The use case scenario for the UAV. For every reference frame, the color convention is: X axis red, Y axis green and Z axis blue. (Color figure online)

As discussed earlier, we used a Kalman Filter over to improve its accuracy and reduce the effects of errors when computing camera parameters, corner detection, image rectification and pose estimation. The state vector for the Kalman filter is , it defines the position, orientation and velocities of with respect to . From the onboard inertial measurement unit (IMU), we receive the horizontal velocity components with respect to , flight’s altitude and heading ; the a priori estimate of the Kalman filter was updated using only , i.e. the inertial measurements with respect to . The used state transition model, with k defining the time instant, is:

The a posteriori step runs at 24 Hz, a slower rate than the a priori, using as measurement the pose of the camera , estimated by computer vision [8]. After the innovation step in the Kalman filter, state vector defines the latest estimation for the pose of the UAV, i.e. .

Fig. 5.
figure 5

The graph representing the rigid body transformation between frames.

The desired trajectory \(\mathbf {r}_d(t)\) to be described by the vehicle is dynamically computed using the waypoints delivered by the planner, joined together by a cubic spline or by a lemniscate. The spline is such that \(\mathbf {\dot{r}}_d(t)\) is continuous, creating a smooth trajectory, while the parametric equation for the lemniscate defined the smooth trajectory as:

$$\begin{aligned} \mathbf {r}_d(t)= \begin{bmatrix} x_d(t)\\y_d(t)\\z_d(t)\\\psi _d(t) \end{bmatrix} = \begin{bmatrix} a \sin (\frac{t}{\epsilon })\\ b \sin (\frac{2t}{\epsilon })\\ c \sin (\frac{3t}{\epsilon })\\ 0 \end{bmatrix} \end{aligned}$$
(1)

Figure 5 shows the resultant directed graph of the spatial relationships, using nodes as reference frames and labels on edges as the modules on the architecture that update the spatial relationship between two reference frames. The direction of every edge represents the origin and target frames of the homogeneous transform. In turn, the error measurement with respect to \(\mathbf {B}\) is given by the rigid body transformation defined by:

After decomposing \(\mathbf {R}_e\) on its three Euler angles \((\theta ,\phi , \psi )_e\), we can compute a control command using a Proportional-Derivative controller:

$$\begin{aligned} \mathbf {u}=\mathbf {K}_p \begin{bmatrix} \mathbf {r}_d(t)-\mathbf {x}\\\psi _e \end{bmatrix} +\mathbf {K}_d (\mathbf {\dot{r}}_d-\mathbf {\dot{x}}) \end{aligned}$$

where \(\mathbf {x}=[x, y, z, \psi ]\) and \(\mathbf {\dot{x}}=[\dot{x}, \dot{y}, \dot{z}, \dot{\psi }]\) are estimated by the Kalman filter described before and \(\mathbf {K}_P\) and \(\mathbf {K}_D\) are diagonal matrices \(\mathbb {R}^{4 \times 4}\)

Fig. 6.
figure 6

The software architecture working, creating a virtual representation of the real world and locating the drone with resect to the center of the board.

5 Results

The proposed approach was tested with the AR-Drone 2.0 and the 3DR Solo. We made the front camera of the AR-Drone to point downwards, so we could get a higher quality image from above. The Solo had a gimbal installed, as a result, we had to update using the navigational data we received from the UAV. The camera settings for the GoPro are very versatile, for this exercise, we used a narrow field of view with a resolution of \(1028\times 720\) pixels. Furthermore, to better estimate the pose of the camera, the video feed was rectified using the camera intrinsic parameters. The computer vision algorithm was set to track a board of artificial markers, for the Solo the board measured \(1.4\times 2.4\) m (\(2\times 5\) artificial markers), for the AR-Drone the board measured \(4\times 4\) m (\(20\times 21\) markers, see Fig. 6a).

To observe the current status of the vehicle and its environment, we used Rviz to show the virtual representation of the world. What is shown on Fig. 7 is an screenshot of Rviz displaying: the location of the vehicle, the trajectory being followed and the detected board. The tests were done with a computer based on the Ubuntu Linux, i5 processor, 8 GB RAM.

Fig. 7.
figure 7

Flight path along the spline. The markers are displayed as white squares on the ground (\(z=0\)) at the moment they are detected by the computer vision system. The desired position and the estimated location of the drone are displayed as two coordinate frames, the two are almost always overlapping because of the adequate track of the trajectory.

Fig. 8.
figure 8

Plot of the desired position vs. the estimated position of the vehicle while describing the spline trajectory.

Fig. 9.
figure 9

Plot of the desired position vs. the estimated position of the vehicle while describing the Lemniscate trajectory.

On Figs. 8 and 9, we display the results as measured by the computer vision system while executing the spline and lemniscate maneuvers in x and y coordinates with respect to . The \(\mathbf {r}_d\) plot is the desired trajectory, corresponding to the spline generated from the waypoints. For completeness, we also display the error plot. The maximum measured error was 30 cm for the lemniscate trajectory and 22 cm for the spline trajectory. The waypoints used for the spline were a set of coordinates in 3D space: \(p_1=\begin{bmatrix}-1.3, -1.3, 1.0\end{bmatrix}\), \(p_2=\begin{bmatrix}-0.9, 0.0, 1.0 \end{bmatrix}\), \(p_3=\begin{bmatrix}-0.3, 0.9, 1.0\end{bmatrix}\), \(p_4=\begin{bmatrix}0.45,-0.1, 0.0\end{bmatrix}\), \(p_5=\begin{bmatrix}1.7,1.0,1.0\end{bmatrix}\). Parameters for the lemniscate trajectory with the AR-Drone were: \(a=1.0\), \(b=0.8\), \(c=0.2\), \(\epsilon =30.0\), with a height offset of \(z=1.2\).

6 Conclusions and Future Work

We have discussed a three layer architecture intended for the control of UAVs, that successfully guided the vehicle to describe the lemniscate and spline trajectories. Because the framework we used for this development runs on multiple platforms, including ARM on embedded computers, it is plausible to execute it onboard the UAV. Further development on the Sequencer and Planner layers would make the UAV and autonomous agent and leads the way towards a swarm of UAVs.

This document shows the results from the first step on our development and implementation roadmap. The next step is to execute it onboard the UAV. We are currently looking forward to extending the computer vision system with an visual odometry approach.