Online trajectory planning and control of a MAV payload system in dynamic environments

Micro Aerial Vehicles (MAVs) can be used for aerial transportation in remote and urban spaces where portability can be exploited to reach previously inaccessible and inhospitable spaces. Current approaches for path planning of MAV swung payload system either compute conservative minimal-swing trajectories or pre-generate agile collision-free trajectories. However, these approaches have failed to address the prospect of online re-planning in uncertain and dynamic environments, which is a prerequisite for real-world deployability. This paper describes an online method for agile and closed-loop local trajectory planning and control that relies on Non-Linear Model Predictive Control and that addresses the mentioned limitations of contemporary approaches. We integrate the controller in a full system framework, and demonstrate the algorithm’s effectiveness in simulation and in experimental studies. Results show the scalability and adaptability of our method to various dynamic setups with repeatable performance over several complex tasks that include flying through a narrow opening and avoiding moving humans.


Introduction
The small size, agility, and low upfront costs of Micro Aerial Vehicles (MAVs) could instigate their widespread use and rapid deployment for payload transport in areas that are inaccessible or dangerous for humans and conventional (aerial) vehicles. Current applications for MAVs with slung pay-The Micro Aerial Vehicle Lab, Delft University of Technology, 2629 HS Delft, The Netherlands loads (the MAVP system) include search and rescue (Ryan and Hedrick 2005), package/cargo delivery, and construction (Lee 2018) primarily in large, rural, obstacle-free spaces.
Operation of MAVPs in urban settings presents itself with notable challenges given the complex and dynamic environment within which they would operate. A MAVP system is required to be able to quickly, safely, and autonomously navigate an obstacle-ridden space while adapting to different situations. Carriage of a swinging payload vastly increases the system's spatial footprint making operation in restrictive spaces challenging. In such situations MAV trajectory planning and control is necessary to generate the desired swing motions to avoid collisions with obstacles. Failing to acknowledge the system's future response when performing agile flight could result in inevitable collisions as by the time an obstacle is to be avoided, the MAV might be unable to divert the swinging payload away. Working around the problem, one may pre-generate trajectories for fully defined environments (and thus static), or actively minimise swing to reduce the system's dynamic response, however, we will demonstrate that these undermine the real-world practicality of the approaches in dynamic environments.

Contributions
Our main contribution is an online local motion planner and controller for the safe, agile, and collision-free flight of a MAVP system in dynamic, multi-obstacle settings. Our method is formulated as a constrained optimisation using a finite-horizon Non-linear Model Predictive Control (NMPC). The optimisation problem can be efficiently solved thanks to contemporary fast solvers.
A full system framework is outlined integrating the NMPC controller in a combined hardware and software based control loop. The proposed framework is verified and validated in simulated and experimental studies where we showcase our method's scalability, adaptability, and performance over various complex tasks in static and dynamic environments. We compare it to state of the art methods and discuss the effect of computation time on the resulting system performance. Unlike previous works, we successfully demonstrate the safe operation and readiness of our method in realistic settings involving a MAVP system operating amongst multiple moving humans.

Related work
Historically, studies of aerial vehicle control with suspended payloads involved helicopter systems with applications to load transportation as shown in Cicolani and Kanning (1992), however, with the advent of MAVs the research into MAVP systems has gained traction. This paper addresses MAVP motion planning for collision avoidance which we broadly classify into two types, namely open-loop planning with feedback control, and unified closed-loop planning and control; our method contributes to the latter. We introduce contemporary approaches for both followed by a general discussion of NMPC control for MAV(P) systems and its application to closed-loop planning and control.

Open-loop MAVP trajectory planning
Most contemporary approaches to collision-free trajectory planning for MAVP systems have addressed the tracking of pre-generated, possibly agile, trajectories in static workspaces. We refer to these as offline, open-loop planning approaches as there is no in-the-loop dynamic re-planning of trajectories.
Using pre-generated trajectories, planar and spatial tracking of MAVP trajectories has successfully been demonstrated through accurate modelling and stabilisation of the vehicle (Feng et al. 2014;Pizetta et al. 2015) sometimes utilising swing minimisation (Bisgaard et al. 2010;Palunko et al. 2012b;Trachte et al. 2014) to mitigate coupling disturbance effects. The latter approach is energetically inefficient and over-conservative as the vehicle devotes considerable control effort to reduce swing thus resulting in a sluggish system. To accomplish desirable yet feasible MAV and payload responses, the pre-generated trajectories are computed taking the MAVP system model into account. Algorithms to achieve this have included, amongst others, optimisation and Reinforcement Learning (RL) techniques. In the former, optimal trajectories are computed as a cost minimisation problem subject to the task objectives and the MAVP model which are then encoded as full state evolutions (Foehn et al. 2017;Palunko et al. 2012a) or a reduced dimension state using differential-flatness (Sreenath et al. 2013;Tang and Kumar 2015). In RL, as used in Palunko et al. (2013), Faust et al. (2013) and Faust et al. (2017), feasible action policies (the trajectory) are generated that enforce the MAVP model on state transitions.
The main limitation of pre-generating MAVP trajectories is the reliance on task-specific full motion planning which is consequently inherently non-adaptive at run-time thus precluding handling of uncertain, dynamic obstacles. Additionally, any trajectory infeasibility at run-time is catastrophic as the planning method is unable to accommodate for this. Therefore, the aforementioned studies only address fully known environments with static obstacles limiting the practical application of these methods to limited specialised demonstrative cases. In contrast, our method is able to rapidly re-plan even if the local trajectories intermittently become infeasible due to prevailing conditions.

Closed-loop MAVP trajectory planning
Motion planning in dynamic environments requires replanning at run-time to accommodate for the changing environment. Closed-loop re-planning of full motion trajectories on a global level is intractable for a high-dimensional system, such as that of a MAVP, thus necessitating the use of local planners with finite time-horizons Brock and Khatib (2000). Additionally, local planners need to cope with infeasibility during run-time while still conforming towards a higher global planning objective.
In De Crousaz et al. (2014), an agile and collision-free local trajectory generator and controller method was demonstrated in simulated and experimental setups with static obstacles using iterative Linear Quadratic Gaussian (iLQG) control. The optimal control iLQG method relies on a cost function that is minimised at every control step such that user-defined planning objectives are met; the result is a local trajectory satisfying the objectives and system dynamics. The iLQG's iterative algorithm is exploited to generate locally optimal linear feedback controls to achieve the realtime, closed-loop performance. In a similar fashion to Tang and Kumar (2015), De Crousaz et al. (2014) apply iLQG to demonstrate impressive manoeuvres which included the flight through a narrow opening. However, the study by De Crousaz et al. (2014) did not consider planning in a dynamic environment. Furthermore, required rotor thrust inputs were generated at run-time which could saturate as the method did not account for vehicle constraints during planning. The consequence of saturating inputs over sustained time periods is the potential system instability and/or significant deviation of the actual to planned trajectory. In contrast, our approach takes into account the vehicle and environmental constraints to ensure the physical feasibility of the generated trajectories.

Non-linear model predictive control and unified planning and control of MAV(P)s
Early studies have successfully demonstrated the use of (N)MPC for real-time MAV (Kim et al. 2002;Tzes et al. 2012) and MAVP (Trachte et al. 2014;Gonzalez et al. 2015) simple trajectory tracking. Focussing on the latter, in Trachte et al. (2014) NMPC for MAVP trajectory tracking control was addressed with a comparison to LQR control; the results demonstrated NMPC's superior physical constraint handling for feasibility guarantees, and larger attainable MAVP flight envelope from the non-linear MAVP model description. Overall NMPC outperformed LQR in simulated tasks involving swing minimisation and agile manoeuvres. In Gonzalez et al. (2015), studies from Trachte et al. (2014) were extended to an experimental setup validating the results, however, unlike in De Crousaz et al. (2014), both studies only addressed the control aspect of tracking pre-generated trajectories. In contrast, our method unifies the online planning and control extending the capability beyond simple tracking of a pre-defined plan approach. Traditionally, (N)MPC algorithms for unified motion planning and control of MAVs have seldom been studied as the required real-time re-planning was computationally intractable (Neunert et al. 2016). With today's improved computing capabilities and fast optimisation solvers applications have been demonstrated for a MAV without swung payload (Neunert et al. 2016;Naegeli et al. 2017). The method of Neunert et al. (2016) utilises the Sequential Linear Quadratic (SLQ) algorithm to solve an unconstrained optimal control problem that is paired with an external high-level planner that pre-generates feasible waypoints accounting for only static obstacles. Therefore, their method is only deployable to MAVs in static environments for traversing a specific pregenerated set of waypoints.
Building upon the NMPC planning algorithm introduced by Naegeli et al. (2017), we utilise an interior-point based algorithm from Domahidi and Jerez (2014) within a NMPC setting to solve a constrained optimal control problem that is subject to the constraints on the system dynamics and environment. Introduction of constraints in our method allows us to perform real-time obstacle avoidance while still providing guarantees on the trajectory feasibility. Thus, in this work we demonstrate the viability of real-time NMPC based unified motion planning and control for MAVPs in dynamic, multi-obstacle settings.

Paper organisation
We introduce preliminaries in Sect. 2 with our notations and system models. In Sect. 3 we describe our method for online and closed-loop, unified motion planning and control with NMPC. For the simulated and experimental studies performed, we outline our system setup and framework in Sect. 4. In Sects. 5 and 6 we discuss our findings followed by concluding remarks in Sect. 7.

Notation
The following notations are observed; scalars x, vectors x, matrices X, sets X , and reference frames {X }. Time derivatives use dot accenting. Position vectors are denoted by p ∈ R 3 . Unless otherwise stated, vectors are expressed in the East-North-Up (ENU) inertial frame {I }. For vector x ∈ R n and positive semi-definite n×n matrix Q, the weighted squared norm is (3) and basic axial rotations around x by R x ∈SO(3).

Quadrotor with swung payload model
The system comprises a quadrotor of mass m q and a suspended point mass m l attached by a l length cable from the quadrotor centroid. Let p q , p l be the quadrotor, load position, and r ql = p l − p q . All reference frames are defined in Fig. 1. The load suspension angles θ l , φ l parametrise the orientation of {L} to {S}. Intermediary frame {S} is used to avoid the singularity for a downward equilibrium load position when computing a rotation from {L} to {E} directly. Then let the MAVP configuration and its time derivative be given by variables and let θ q , φ q be the true quadrotor pitch and roll with yaw remaining constant. The following additional model assumptions are adopted; -rigid, massless cable with free suspension points, A non-rigid cable would introduce switching dynamics increasing the complexity of the system modelling. A preliminary study on the cable rigidity during agile manoeuvres, as shown in "Appendix A", shows the cable rarely slacks during flight allowing this assumption to be made. The second assumption regarding coincidence is made to simplify the model as the listed points tend to physically be in close proximity. Though the effects of the real point offsets on the system dynamics were not considered, for the purpose of performing predictions in the order of seconds, the effect was considered to be negligible. The assumption of no cable drag stems from the rationale that the string's exposed surface area to the flow is small resulting in negligible effects on the system dynamics.
We first describe the quadrotor's input handling and the aerodynamic drag model. We then complete the model by derivation of the coupled quadrotor-load dynamics.

Quadrotor inputs
As in Klausen et al. (2015), we abstract quadrotor motor inputs and assume fast attitude and motor control such that by actuating the quadrotor's pitch and roll, and setting a thrust command, we produce an inertial control force F u in any desired direction for realising translational motion. Therefore, let the inputs be a desired quadrotor pitch (radians), roll (radians), and thrust command (meters/second) defined in {E} giving This input choice is consistent with our chosen Parrot Bebop 2 1 quadrotor that internally controls motors based on inputs u to achieve full spatial flight; the internal controller is schematised in "Appendix B". The input magnitude ranges are limited and hardware specific; these are accounted for in the design of the model predictive controller using constraints as further discussed in Sect. 3.3. We note that our method is not limited to the our chosen hardware and could easily be adapted to other quadrotors.
As the hardware-specific internal controller dynamics u → F u are not documented, we empirically model the function. We define F q as the purely vertical control force generated by the four rotors when commanded by inputw q . The quadrotor's true pitch, roll response and the vertical control force F q resulting from commanded inputs are given by where, using the method presented in Stanculeanu and Borangiu (2011), h θ , h φ , h F are identified for the fast dynamics and decoupled as three linear second-order blackbox models with model states and state transition Note that with h F we modelw q → F q directly as the internal vertical velocity stabiliser controls the vertical control force F q (in {E}) generated by the motors based on the thrust commandw q (See "Appendix B"). Then similar to Naegeli et al. (2017), using outputs from (1) and based on equilibrium relations, the input control force is given by where m = m q + m l and g = 9.81 m/s 2 . See "Appendix C" for a derivation of this relation. The full-form of (1) identified for the Parrot Bebop 2 quadrotor is provided in "Appendix D".

Aerodynamic drag effects
We derive the drag induced forces on the MAVP system; as in Derafa et al. (2006), assuming relatively low quadrotor velocitiesṗ q (up to 5 m/s) we model a proportional linear drag force on the quadrotor with drag constant k Dq giving As in Klausen et al. (2015), for the payload we only consider the rotational load motion relative to the quadrotor, hence, its drag force is assumed to always be perpendicular to the moment arm (the rigid cable). The approximation introduces swing associated damping due to the relatively large rotational payload velocities. This allows the load's drag to be described in terms of the suspension angles and its derivative which are components of q andq. We also avoid defining the drag in terms of load velocity as this term is not a variable available inq. Additionally, following from our free suspension point assumption, there are no payload drag induced reactive forces or moments on the quadrotor. We note that prolonged linear translation of the system would make the linear drag contribution to the load dynamics significant as the load would trail behind the quadrotor; this could be included in future studies. Under these simplifications, the load's signed quadratic drag force with drag constant k Dl is given by where v = ωl for circular motion with v, ω the linear, angular load velocities and l the cable length. Substituting ω in (5) by the load's suspension angular rates and computing the induced moment at the suspension point we obtain where ω θ =θ l , ω φ =φ l and τ θ , τ φ are the load's drag force induced moments on the suspension angles θ l , φ l . With (4) and (6), the total exogenous system drag term is

System kinematics and dynamics
The MAVP Equations of Motion (EOMs) are derived in frame {I } according to Lagrangian mechanics. With frame transformations and l = [0, 0, l] the rigid cable vector in {L}, we define the load position as The payload velocity is then given bẏ The Lagrangian in terms of the system kinetic and potential energies is where K = diag(m q(1×3) , m l(1×3) ) and e 3 = [0, 0, 1] . Using Lagrange's equations according to D'Alembert's principle, the non-linear EOMs describing the MAVP dynamics in compacted form are given bÿ with force F = [F u , 0, 0] ∈ R 5 , mass M, drag D from (7), Coriolis C and gravitational G matrix terms. Equation (13) in its full form is presented in Klausen et al. (2015). Using (13), the system state and state transition are given by

Full MAVP model
Combining the quadrotor input and system model from (2) and (14), we denote the full MAVP state and state transition by

Obstacle model
Obstacles with each position p o are user-specified as cuboids and subsequently modelled by enclosing ellipsoids. Human obstacles are also specified as a cuboid of comparable size. Ellipsoids create smooth convex bounding volumes for (nonconvex) obstacles making them appropriate for representing

Obstacle ellipsoid definitions
We define two ellipsoids with buffers β as shown in Fig. 2; Note by setting β, a minimum cuboid to ellipsoid separation of β is warranted. Buffers β o , β e are used for collision-avoidance purposes as will become clear later.

Obstacle motion prediction
Static obstacle positions are assumed to be readily available for planning. As in Naegeli et al. (2017), we assume a constant velocity model for dynamic obstacles and predict their future positions based on a velocity estimate produced by a linear Kalman Filter using measured obstacle position data.

MAVP-obstacle collision avoidance requirements
Imperative to collision avoidance is ensuring separation between the MAVP and obstacles. By quantifying the quadrotor, load, and cable's proximity to an obstacle we define mathematical requirements to guarantee a collision-free system.

Point to ellipsoid distance
The point to an ellipsoid signed distance is approximated as the true value cannot be expressed in closed form (Uteshev and Goncharova 2018). For a generic ellipsoid S with buffered dimensions (a o + β, b o + β, c o + β) and position p o , the approximate signed distance to a point p based on the ellipsoid equation is where When p is inside or on S, d o ≤ 0, and as p is further away from S, d o increases from 0 to infinity.

Quadrotor and payload proximity
We model the quadrotor and payload individually by a bounding sphere with an associated radius r c . Without loss of generality, we assume an equal r c for the quadrotor and payload. Consider the quadrotor; using the obstacle's bounding ellipsoid S o and setting β o > r c and p = p q , then using (16) we can guarantee the quadrotor does not collide with the cuboid shaped obstacle provided Similarly, considering the payload associated bounding sphere and position p l gives

Rigid cable proximity
Modelling the cable as a mobile finite line segment we identify the cable's Closest Point of Approach (CPA) to S o denoted by p * c ; this is the cable's most critical point for collisions simplifying the check as a point to ellipsoid computation. Given the cable cross-section dimensions are negligible, no buffer is required so β o = 0. Using (16), p * c is computed by . "Appendix E" shows the problem (19) is expressible in closed-form and analytically solvable. Using (19) the cable is guaranteed to be collision-free with respect to the cuboid obstacle provided Requirements (17-18,20) must be satisfied with respect to each obstacle to guarantee a collision-free MAVP system.

Method overview
The planning and control objective is to navigate the MAVP system from an initial position p start to a user-definable goal position p goal in a safe, agile, and collision-free manner.
To accommodate for the dynamic environment, we perform dynamic and closed-loop local motion planning using NMPC which is a receding finite-horizon controller.

Receding horizon dynamic planning
Denote by Δt the time-step, by k the stage index, and by N the finite planning horizon (number of stages). At every sampling instance t we generate a local trajectory of duration N Δt encoded as a sequence of N + 1 states that includes the initial state x 0 , the transition states x k , and a terminal state x N thus giving For state realisation, the associated input sequence up to the terminal state is denoted bỹ Following execution of u 0 , the planning is receded by Δt to t + Δt. At the next sampling instance the new obstacle positions and a new initial state estimate x 0 are obtained. Subsequently, a local trajectory is re-generated by initialising the solver with a time-shifted version of the previous solution.
The time-shift in simulation studies is a fixed simulation time step, and in real experiments the actual control loop time is used. This approach results in our method's computationally efficient closed-loop performance with robustness to model uncertainties (Naegeli et al. 2017); we illustrate this process in Fig. 3.

Local trajectory generation
At every sampling instance we solve a constrained optimisation problem. The designer encodes the desired planning objectives in an objective function using costs to quantify the generated trajectory's performance. The costs are designed to lower with an increasing satisfaction of the objective. For every trajectory stage k, we evaluate an associated scalar cost giving a cost sequence Within (23), the trajectory stage costs are given by where function c s is evaluated on the predicted state, input, and any online environment variables (obstacle positions, navigation goal, slacks etc.) that we denote by * . The terminal cost is given by where function c t is evaluated on the variables of the terminal stage N . Terminal costs are used to achieve closed-loop stability of the finite-horizon planner (Mayne et al. 2000). We then quantify the full trajectory's performance by the objective function defined as Constraints are introduced to limit the solution space for the trajectory which is encoded inx andũ thus providing (feasibility) guarantees for the computed trajectory. To ensure the optimiser always returns a solution at run-time, we may tolerate minor constraint violations by introducing non-negative slack variables that soften the constraint (Zheng and Morari 1995). We encode the slack variables associated with the trajectory iñ A planning violation occurs when the optimiser produces positive entries ofs. A physical violation only occurs when the real system breaches constraints, i.e., the current slack s 0 ofs is positive. By associating a high slack related cost in the optimisation objective function, we avoid positive entries ofs and accordingly any planning and physical violations (Zheng and Morari 1995). During optimal trajectory generation we minimise (26) while respecting the constraints resulting in a N Δt length locally feasible trajectory. In subsequent sections we introduce the costs and constraints after which we formalise the optimisation algorithm in Sect. 3.4.

Costs
We introduce cost terms derived from our planning objectives presented in Sect. 3.1. We use our weighted square norm definition from Sect. 2.1 with an n×n identity matrix denoted by I n to make all cost terms scalar and positive.

Point-to-point navigation
For navigation we minimise the displacement between the quadrotor position and goal p goal . Let p start be the start position, then we normalise the cost to treat all start to goal distances equally. The cost term is given by Making (28) a stage cost would mean the shortest path (straight line) is always preferred which may result in deadlock for cases where it is necessary to go around an obtrusive obstacle. Therefore, we use (28) only as a terminal cost thus allowing curved paths to be generated such that locally and terminally the system reaches a more favourable position.

Potential field based obstacle separation
For obstacle separation, we employ a MAVP to obstacle proximity related cost analogous to a reactive potential field (Khatib 1986). We combine this with constraints to guarantee collision-free trajectories as is presented in Sect. 3.3.3. This two layered approach, similar to Naegeli et al. (2017), enhances the operational safety by pro-actively reducing the collision risk especially for unmodelled system and obstacle dynamics.
Let p be the quadrotor, load, or cable's CPA position [see (19)]; for each object we compute a cost. Let p o be the obstacle's predicted position, then the potential cost term activates when p is in the obstacle's expanded ellipsoid S e , i.e., using (16), d o ( p, S e ) < 0. We choose the S e associated buffer β e such that β e β o . Observing that |d o ( p, S e )| increases from zero to one as point p moves from the ellipsoid surface towards its centre, by penalising a p more towards the centre, we naturally discourage p from getting closer to the smaller bounding ellipsoid S o . Given S o models the actual obstacle, using this method we promote separation from the obstacle. With this insight, and using (16), the cost is formalised as

Input magnitude regulation
The input magnitude associated cost is given by For our agile manoeuvres, we weigh this cost low. Associating a high cost will improve the system's energy-efficiency by the conservative use of large inputs.

Payload suspension angles regulation
The suspension angle associated cost is given by For our agile manoeuvres, we weigh this cost low. Associating a high cost will minimise the swing angles if desired.

Constraints
We derive planning constraints from our system and setup limits in addition to the global planning objectives.

MAVP dynamics
The process model state transition given by (15) is discretised and enforced on the trajectory state evolution by an interstage equality constraint where k is the stage index.

State and input limits
The state and input values are bound to the system allowable ranges. Let X min , X max and U min , U max denote the state and input range limits, then the following inequalities must be satisfied We specify the hardware-specific limits in Section 4.

Collision-free planning
Collision-free trajectory planning is guaranteed by constraining the allowable system's spatial states. Let p be the quadrotor, load, or cable's CPA position [see (19)]; for each we define a constraint. Adopting the requirements (17-18, 20) for a collision-free MAVP system as presented in Sect. 2.4, and using (16), the associated constraint is formalised as with the non-negative scalar slack s c . Note that even though (35) is defined for every obstacle, only one slack s c is defined and used in all obstacle associated inequality constraints per timestep in our optimisation problem definition. When any d o ( p, S o ) becomes negative, s c assumes the highest value such that all obstacle associated inequality constraints are satisfied according to (35) for a specific timestep. Ideally s c is zero hence the system is collision-free with respect to all obstacles. Provided a sufficiently high penalty cost associated with the slack, the feasible solution is recovered as shown in Kerrigan and Maciejowski (2000). Summarising, only one slack is necessary to differentiate between a system in collision or collision-free state with no necessary differentiation with respect to which obstacle that occurs.

Workspace limits
For confined (indoor) operation, the quadrotor and load position is limited to the workspace limits. Assume a cuboid workspace, then let W min , W max denote the minimum and maximum workspace coordinates in frame {I }, and between which the cuboid's space diagonal is defined, then the following inequalities must be satisfied with 1 3 = [1, 1, 1] and the non-negative scalar slacks s q , s l . When a constraint violation occurs, the slacks assume the highest value required to satisfy the associated workspace inequalities. Under workspace convexity, we also guarantee the rigid cable remains inside W. Note that the inequalities are written in short form, however, for implementation each vector dimension would each have an individually defined inequality.

Scalability to large obstacle rich workspaces
We set a maximum omnidirectional obstacle detection range originating from the quadrotor position whereby we disregard any obstacles beyond the range for planning purposes. Therefore, the previously introduced obstacle related costs and constraints are dynamically implemented.

Optimisation algorithm
Local trajectory generation is formulated as a constrained optimisation problem subject to the following costs and constraint definitions;

Costs
We define the stage and terminal cost functions based on the cost term definitions (28-29, 65, 30-31). Let w denote a user-definable weighting used to assign relative importance to costs and their associated objective, then the stage cost function, as introduced in (24), is defined as where w pf and c pf are vectors of weights and costs equal in size to the number of obstacles, and w slack s all the slacks associated cost where s = s c , s q , s l ∈ R 3 . The terminal cost function, as introduced in (25) is defined as

Constraints
We impose the system dynamics, and state/input constraints, as introduced in Sect. 3.3, on the optimiser. By function g(x k , u k , * k ) we denote all additional inequality constraints as defined in (35-37) which are functions of state, input and additional (time specific) variables/parameters, e.g., obstacle positions and workspace limits. Combining our previously introduced trajectory variables (21-22, 27), we denote the optimisation variable bỹ With the estimated initial state x 0 , we optimisez such that the objective function (26) is minimised resulting in a locally optimal and feasible trajectory. With costs and constraints stacked together over all stages and obstacles, the optimisation problem that is solved at every planning time instance t is formally defined as

Problem dimensionality
Variablez is optimised at every planning time instance encoding the optimised local trajectory in its solution. As given by (40),z comprises a sequence of N + 1 states x ∈ R 16 , N inputs u ∈ R 3 and N + 1 slacks s ∈ R 3 , hencez ∈ R 22N +19 .

Optimality and feasibility
We use a fast non-linear programming based optimiser, namely FORCES Pro (Zanelli et al. 2017), on our nonconvex optimal control problem. FORCES Pro implements a primal-dual interior-point constrained optimisation algorithm which is outlined in "Appendix F". Consequently, the computed trajectories are only locally optimal over the planning horizon N with the possibility of deadlocks when the planned trajectory converges to any local optima in the solution space. When this occurs, our planner holds the MAVP at this locally optimal state until a solution becomes available as a result of the changing environment or external perturbation to the system. Planning feasibility is warranted over the full N stages when all optimised slackss are zero. When full planning feasibility is not realised, provided that at least the current slack s 0 is zero, the current system state and inputs will be feasible. Re-planning at a future instance can re-establish full planning feasibility.
A comprehensive overview of the optimality and stability of (N)MPC algorithms is available in Mayne et al. (2000).

System setup and framework
We outline our particular implementation of the system model and NMPC controller for simulated and experimental studies, including a state estimator.

System properties and hardware
The MAVP system properties used for all studies are given in Table 2. The maximum pitch, roll and thrust command

Workspace
We perform studies in a simulated and real workspace measuring 6.0 × 3.0 × 2.6 m (L×W×H). The real indoor workspace has an OptiTrack 2 Motion Capturing System (MCS) that can track markers for obtaining rigid-body pose measurements in S E(3) at around 120 Hz.

Programmed control system framework
The control framework is schematised in Fig. 5; in "Appendix B" the on-board control component is expanded upon. The off-board components run on an Intel i7-3610QM Quad Core 3.3GHz processor PC, and is programmed in MATLAB with an efficient C language solver FORCES Pro performing the online NMPC computations (Zanelli et al. 2017). All studies are performed on the same computer with at maximum one core being utilised by FORCES Pro at run-time. The on-board components run on the MAVP hardware; for simulated studies we replicate this with our system model. In experiments, communication between hardware is performed over a ROS based network. As it is shown in Sect. 5, the controller loop time was on median in the order of 10 −2 s, therefore, for simulation and controller design, explicit Runge-Kutta 2 nd order integration of the system model is used. Runge-Kutta 2 nd was chosen as a preliminary examination of the predicted system response showed it provided a good balance between loop time performance and dynamics approximation. An in-depth comparison of  other discretisation methods is beyond the scope of this paper and should be investigate in future studies. The implemented NMPC cost weights are given in Table 3.

Cascaded Kalman filter state estimator
Kalman Filtering (KF) based state estimation of x is performed using the system process model, and MCS based sub-millimetre measurements of the quadrotor and load's pose in S E(3) (Point 2009). As the Parrot Bebop's standard interface lacks a high frequency output of internal sensor measures to off-board clients, they were unusable for our state estimation routine. The MAVP system for which we present process models in Sect. 2.2 comprises (i) a quadrotor specific input controller, and (ii) the general non-linear MAVP system. We distribute the state estimation over two KFs permitting individual treatment of the subsystems and maintaining modularity. A Linear KF is used to estimate state x c of the quadrotor input model (2). The MCS provides measures of real pitch θ q and roll φ q for estimating states Lacking necessary measurements of the vertical control force F q , states x F,1 , x F,2 are only predicted without performing the KF measurement update step. Similar to Bisgaard et al. (2007), a non-linear Unscented KF is used to estimate the MAVP states x q of the system model (14). Measures for state variable q are directly reconstructed from MCS data using the kinematic relations introduced in Sect. 2.2.3. Using the process model, the Unscented KF is primarily tuned to provide noise reduced estimates of the time derivatives of q in x q . The Unscented KF directly uses the non-linear and observable process model for state prediction without performing linear approximations as traditionally required by the Extended KF thus usually improving the prediction accuracy (Julier Fig. 6 Cascaded state estimation using Linear and Unscented KF and measured MCS data. Linear KF estimates x c based on the quadrotor input model, measured true pitch, roll and inputs. Unscented KF estimates x q based on MAVP model, measured MAVP configuration q and control force inputs computed by (3) using the Linear KF outputs and Uhlmann 1997). The full cascaded KF based estimator design is schematised in Fig. 6.

Simulation study
We showcase our method's scalability, robustness and performance in simulated studies. This is presented as a precursor towards our experimental results involving flight amongst multiple human obstacles and acrobatic manoeuvers.
The following metrics are used; let the system's distanceto-goal be defined as then the time-to-goal is the elapsed run-time such that d goal strictly remains below 0.2 m.

Scalability of the optimisation problem
The scalability of the optimisation problem is studied against the number of planning stages and separately the number of obstacles.

Scaling with number of planning stages
The quadrotor, with a randomised initial swing θ l , φ l < 10 • starts at (−2.0, 0.0, 1.1) with the dynamic obstacle at (2.0, 0.0). A collinear position swap is performed with the obstacle moving at 0.5 m/s such that the head-on paths critically tests the predictive planning behaviour. The number of planning stages N is increased from 10 by 4 to 26. Using Δt = 0.05 s, default cost weights we perform 16 runs per case. Results in Fig. 7 show the scaling of the NMPC solve time with N ; it shows a positive correlation which is expected as the optimisation variable, given byz ∈ R 22N +19 , increases the problem dimension with a larger N . As shown in Fig. 8a where N =10 the system responds late to the incoming obstacle leading to a near-miss or collision (physical violation). The late response means the attempted aggressive evasive behaviour causes the system to move far off-track, sometimes leading to workspace limit violations, and overall increasing the time-to-goal. As depicted in Fig. 8b, with a higher N the collisions are averted and the system responds in a smooth agile motion. However, with N = 26, the planner favours a greater MAVP to obstacle separation over each generated trajectory resulting in a lower potential field associated cost of the objective function. As a result of the greater separation, the total distance covered from the start to goal is greater thus increasing time-to-goal. Important to note is that the generated planning remains tunable through modification of the cost function weights resulting in different system behaviours.
Based on the results and qualitative observations, N = 18 was used for all subsequent studies as it balances run-time and planning performance. To address the pitfalls of using a low N , a novel assistive steering approach can be used that guides the planned trajectory away from obstacles even when the obstacle is not within the prediction horizon; this method is outlined in "Appendix G". The results and benefits of assistive steering is discussed in "Appendix H". For short prediction horizons which are low N , the assistive steering aids the planning performance for obstacle avoidance. For N higher than 14 and the system speeds achieved in our experiments, the planner's lengthened predictive horizon makes the assistive steering redundant as no benefit are apparent. Therefore, in the experiments discussed in this paper, we do not employ assistive steering.

Scaling with number of dynamic obstacles
We perform a navigation task from (−2.5, −1.0, 1.0) to (2.5, 1.0, 1.0) amongst n o randomly placed obstacles with Results in Fig. 9 indicate a positive trend in MPC solve time with n o resulting from the additional cost and constraints introduced into the optimisation problem per additional obstacle. The time-to-goal shows an increasing spread with n o as the obstacles are more likely to obstruct the system's most direct path to the goal thus requiring a re-route resulting in a lengthier route. In Fig. 10 we show one run demonstrating the MAVP's agile response amongst 8 dynamic obstacles. The outlier at six obstacles is the result of a temporary deadlock situation that is resolved by a lengthier planned route. As mentioned, NMPC is locally optimal, therefore, the deadlock situation arises from a local minimum of the objective function that occurs when several obstacles corner or obstruct the MAVP's path. In those cases, the planning may not be able to detour around the obstruction as the objective function over the planning length may only have a positive gradient. This local optimality is a limiting characteristic of local planning algorithms (LaValle 2011). However, a benefit of our local planning method is that the system holds its position during deadlock and continues identifying solutions in the evolving solution space (due to dynamic obstacles). Therefore, in most cases it is capable of self-resolving the deadlock given sufficient time.

Performance comparison to contemporary approaches
We compare the total task completion time for three methods; (i) our NMPC (ii) pre-generated and (iii) minimal swing trajectory planning and control. A navigation task is performed for a simple static obstacle and a difficult slalom setup. For (i) we use N = 18 and default cost weights, for (ii) we use our optimiser with N = 200 for sufficient stages to pre-plan the entire trajectory and then simply track it, and for (iii) we use N = 18 with a high swing cost w swing = 1. We use Δt = 0.05 and perform 4 repeated runs. Table 4 shows a comparison of the total task completion times (off-line computation and trajectory execution), and Fig. 11 depicts the executed trajectories using the three approaches. As expected, the pre-generated trajectory has the shortest time-to-goal for both tasks due to its highly optimised planning which requires large off-line computation times. The minimal swing approach results in sharp turns as the system accelerates and decelerates at the turning points making the motion slow and space inefficient as substantial effort is required to maintain a low swing angle through the turns. The NMPC based trajectory is marginally slower and less optimal than pre-generation, however, direct deployability means the simple task is completed within  Comparison of executed trajectories for manoeuvring around an obstacle (simple) and completing a slalom course (difficult) using NMPC with N =18, pre-generated and minimal swing planning and control. Observe that pre-generation leads to the smoothest, most optimal path resulting from the global planning scope. NMPC response resem-bles pre-generation and only initially reacts later to the presence of obstacles due to local planning. Minimal-swing response is sluggish as the turning motions are more suited for agile behaviour. Top down view with t 3 > t 2 > t 1 > t 0 shown 2.65 s, a 48% reduction, and the difficult task within 5.35 s, a sizeable 65% reduction compared to pre-generation. Unlike pre-generation where a task-specific trajectory is generated, our NMPC method adapts to both tasks without any reconfiguration. With increasing task complexity and duration, greater reductions can be realised making NMPC's scalability unparalleled. Furthermore, our NMPC method applies to dynamic scenarios.

Robustness to change in control time step and lags
We demonstrate the robustness of our method by (i) increasing Δt from 0.05 s to 0.20 s to simulate a slower NMPC controller (on a less-powerful computer), and (ii) artificially adding a 0.1 s lag between NMPC generated input commands and actually executing them. We use the simple task from Sect. 5.2 for analysis and N = 18 and default cost weights. Given that N is a pre-configured design value of the MPC controller, and that Δt depends on the real loop-time, this study aims to show the affects of different Δt resulting from the use of hardware with varying computationally capability. Therefore, it is important to note that the lookahead horizon N Δt is different in each case. Comparing Fig. 12a, b with the different Δt, notice that NMPC automatically adjusts and reduces the computed input magnitudes for the longer time step resulting in a slower, less agile system; this is apparent from the distance-to-goal and load angles plots. With Δt=0.20 s, agile manoeuvres are inconceivable as large inputs over the long time-step would result in excessive accelerations with detrimental consequences on overall performance. Using the process model, NMPC is able to appropriately adapt its planning and control to the time-step size to realise the desired motion.
With a 0.10 s lag, notice in Fig. 12c that the system's distance-to-goal and load angles are similar to those with no lag in Fig. 12a. Due to our method's closed-loop setup, the true system behaviour is continually used to re-initialise the planning instance thus modelling errors do not accumulate. If pre-generated trajectories were used, any unaccounted lag would result in significant deviations of the real system from the planned path due to model mismatch. NMPC is therefore more robust to small modelling inaccuracies making it a safer and more practical method for real-world applications.
Increasing Δt further to 0.25 s and lag to 0.15 s destabilises the NMPC controller in simulation. We attribute this to several causes; first large time-steps used in combination with NMPC's discretised process model can result in prediction error divergence. Second, unmodelled time lags result in the prolonged execution of the large magnitude inputs required for agile flight resulting in excessive, destabilising accelerations; for short lags, the closed-loop control is able to prevent this from occurring. By acknowledging the presence of a long time-step and/or lag in the controller design, the method's prediction accuracy can be improved; this is future work.
In the simulation study we were able to select Δt for a preconfigured N and show its effects on the system dynamics approximation and system performance. In the experimental (a) (b) (c) Fig. 12 Distance-to-goal for a simple point-to-point navigation task, load suspension angles and NMPC generated pitch and roll inputs with N = 18 and an increased NMPC time-step from Δt=0.05 s to 0.20 s and control to execution time lag of 0.10 s. In (b), observe that NMPC compensates for the long time-step over which inputs are executed by reducing the input magnitudes resulting in a stable yet slower, less agile motion. Comparing (a) and (c), observe that even with a high time lag, the MAVP responds in a stable and agile manner setup the Δt is equal to the real control loop-time which positively correlates to N as shown in Sect. 5.1. Therefore, in experimental studies it is important to tune N such that the obtained Δt is compatible with the discretisation method used on the system dynamics.

Experimental study
We showcase complex, agile behaviour in static and dynamic experimental setups. First we analyse the effects of prediction accuracy on our planner's performance as it is integral to our discussion of the experimental results. The same distance/time-to-goal definitions as introduced in Sect. 5 are used. Videos of the experiments performed are at https:// youtu.be/9C7O34W1w8Y.

Planning prediction accuracy over prediction horizon
To demonstrate the effect of our local planner's prediction accuracy on system performance, we replicated the simulation as described in Sect. 5.1.1 in a real-world setup. Six identical runs were performed with the NMPC planner configured to use the real time-step, N = 18 and default cost weights. As our method is online and local, closed-loop trajectory planning is achieved every cycle by using the estimated system and dynamic obstacle states, and a plant model for propagating these states over N planning stages. For each generated local trajectoryx, we compute the absolute error between the predicted and future true measured quadrotor/payload position for all N stages. With Δt our time-step, for the k-th stage we compare the predicted position to the true measurements obtained for the system kΔt in the future. As shown in Fig. 13, our planner's prediction error of the quadrotor and payload position are in the order of magnitude 10 −2 m up to 3 stages. As stated in Sect. 1, each local plan must remain feasible during execution for a full computation cycle, therefore, it is important that these first stages are accurately predicted. During one computation cycle of Δt, the system translates to the first predicted state of our plan before we re-plan, therefore, it is critical that this prediction is accurate to guarantee feasibility during execution. We demonstrate with metrics in Fig. 13 that our planner performs as desired for this critical prediction horizon.
Furthermore, as NMPC utilises plant model propagation over the N stage prediction we observe that the accumulation of inter-stage errors partially contributes to the reduction of prediction accuracy at the higher stage counts. Furthermore, with our local planning approach in a dynamic environment, planned trajectories can significantly differ from cycle-tocycle as new routes become feasible at run-time. This can skew prediction error results as a predicted trajectory may never actually be executed. These two factors results in an increase of absolute position error with number of prediction stages as seen in Fig. 13. We refer to this prediction accuracy analysis in the discussion of our experimental results to support explanation of observed system behaviour.

Agile acrobatic manoeuvres
Two complex agile manoeuvres are performed; (i) the MAVP must fly over a high bar at 0.95 m with a virtual ceiling of 1.8 m, and (ii) similar to De Crousaz et al. (2014) and Tang and Kumar (2015) , the MAVP must fly through a narrow 0.7 × 0.7 m opening. For both manoeuvres, three passes over/through the obstacle are performed in a rapid, successive and bidirectional manner. The tasks are impossible to execute without reducing the system's total vertical dimension (0.9 m when stationary) by swinging the load. The NMPC uses the real time-step, N = 18 and default cost weights. For the narrow opening, the maximum pitch/roll input is increased to 20 • .
In Fig. 14 the two agile manoeuvres and the obstacle to MAVP clearance over all passes is shown. As the planning must excite the load's swing over a relatively short distance, large rapid inputs are commanded. Following the manoeuvre, the controller is able to stabilise the system at the goal position. As we do all computations online, and perform the passes in rapid succession, the clearances over the three passes differ while maintaining acceptable separation to the obstacle(s). For both manoeuvres, the entire system setup is identical with only the obstacles changed exemplifying our method's adaptability to different tasks.
Of the 48 tests performed over both tasks, 77% were successfully executed. In the rare cases when the manoeuvre was not successful, the system would either end up in a deadlock in front of the obstacle or make a momentary contact with the obstacle. Flight was recoverable following the contact with only four tests where this was not the case. For our unsuccessful tests, the primary cause was traced to our local planning approach or inaccuracies in the model resulting in the discrepancy between the observed and planned motion.
As addressed in Sect. 3.5, local planning method such as ours are prone to deadlocks. In this particular case, one reason could be an insufficient planning horizon necessary to compute a feasible trajectory over/through the obstacle. Furthermore, our NMPC optimisation algorithm utilises the time-shifted previous solution as an initial guess for the optimisation problem thus reduce computation time, however, this strategy contributes to an increased likelihood of deadlocks. The latter can be mitigated at the cost of few longer computation cycles by randomising the initial solution when deadlock situations arise. An automated and structural method to address these situations within our local planning framework is an interesting topic to be addressed in future work.
The cases involving obstacle contact were only observed in the flight through an opening experiment as the margins of error were small. Our method enables us to mitigate the effects of long horizon planning inaccuracies by using observed system states to re-plan rapidly. However, in this case, the prediction accuracy over longer horizons is critical to ensure the planner only initiates an agile manoeuvre that is feasible over the full planned trajectory. In the rare case the planner experiences an intermittent infeasibility during runtime, such as system-obstacle contact, it will quickly recover stable controlled flight once feasible trajectories become available.
The setup was extended to the case of a moving high bar manoeuvre for agile dynamic obstacle avoidance (see video).

MAVP human obstacle avoidance
Obstacle avoidance performance is demonstrated amongst dynamic human obstacles with (i) test cases involving intersecting MAVP-human paths, and (ii) random motion in a shared MAVP-human space. The humans are represented by Note that we define the MAVP's closest approach to the human's associated ellipsoid as the smallest value of either the quadrotor to ellipsoid, or load to ellipsoid distance.

One human with crossing paths
A human walks perpendicularly and diagonally on a path crossing the MAV performing a navigation task from (−1.5, 0, 1.9) to (1.5, 0, 1.9). In Fig. 15 we show a snapshot and the full executed trajectory for both cases. Observe the MAVP's smooth, safe and agile execution of the task which includes the use of full spatial avoidance exploiting the available horizontal and vertical space around the obstacle (video shows this clearly). NMPC's predictive capability means load is actively swung away from the human's direction of motion to avoid a potential load-human collision. The minimum MAVP to human separations for the perpendicular and diagonal crossing task were 0.45 m and 0.61 m.

Two humans walking randomly
Two humans walk for 150 s in random directions crossing the MAVP's path. The MAVP autonomously follows a goal position moving anti-clockwise with a 7 s period along a circle with radius 1.5 m and a constant 1.4 m height. Experiment was conducted in a larger workspace measuring 3.2 × 3.2 m.
In Fig. 16 we show a snapshot from our experiment alongside the system's closest approach to the humans and the framework/NMPC loop time. As shown in Fig. 16c, a safe distance is maintained by the MAVP from the humans with no collisions over the entire run; the minimum observed separation was 0.38 m from to the human's collision avoidance limit. In some cases the humans were apprehensive about the MAVP getting too close so they would perform a precautionary evasive motion, however, as shown in Fig. 16c, the MAVP still always maintains a safe separation. To address (a) (b) (c) (d) Fig. 16 MAVP follows a circular path avoiding two randomly walking human obstacles modelled as ellipsoids moving at a mean 0.78 m/s with max. 1.45 m/s. As shown in (c), the 0.2 m buffered collision avoidance limit is never violated. The system only intermittently enters the larger potential field ellipsoids with a 1.0 m buffer. In (d), observe the steady loop times for the full framework (mean 71 ms) which includes the NMPC controller (mean 46 ms); brief spiking arises from situations where significant re-planning was required observable closeness, it is possible to enlarge the obstacle associated buffers to increase the separation. Observe from Fig. 16d that the NMPC solve time resembles the statistics obtained from the simulated study with two dynamic obstacles as shown in Fig. 9, therefore, the NMPC computation performance is preserved going from a simulation to the experimental setup. As the optimiser is initialised using the time-shifted previous solution, a roughly constant solve time is achieved. Spiking occurs when the optimiser's iterative solver requires more time to compute solutions which primarily occurs when considerable re-planning is required. Examples where we observed spikes included situations where the humans would inhibit the NMPC planner from feasibly planning a path to go to the goal position, or the MAVP would be trapped. The spikes only lasted one to two time-steps so observations showed the overall performance was not degraded. Specific to experiments is a mean 25 ms overhead (on top of NMPC solve time) associated with the framework's state estimation, communication and data parsing. The low overhead means controller's performance is not severely affected.
Thanks to its online and receding-horizon nature, our method can execute agile and safe continuous manoeuvres and avoid dynamic obstacles such as humans. Our method is extendable to larger spaces with more humans/obstacles as we have already demonstrated in simulation with eight obstacles.

Conclusion
In this paper, we presented an optimisation based unified motion planner and controller to accomplish online, closed-loop and agile flight of a Micro Aerial Vehicle slung payload system. We formulated the optimisation objective function, constraints and relied on a state of the NMPC solver to achieve collision-free flight in dynamic environments over various complex tasks including flying through a narrow opening and avoiding moving humans. With simulation and experimental studies we demonstrate the method's (i) scalability with the planning stages and the number of obstacles, (ii) robustness to different controller time-step durations and input execution lags, (iii) adaptability and repeatability over various complex tasks, and (iv) fast online performance in experimental conditions. For future studies we recommend the method's extension to non-rigid cables, improving the model's realism, accuracy and consequentially the NMPC prediction performance. Furthermore, a study involving variations of the model parameters would showcase the generality of the approach to different systems and setups. Also, due to our reliance on off-board NMPC control and motion capturing we limited our experiments to indoor spaces, however, with the controller frequency achieved off-board, we believe on-board computations would be feasible with hardware available today. Combining our method with contemporary obstacle detection, localisation and state estimation techniques could make urban MAVP operation a reality.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

A Analysis of cable slackening during maneuvers
A preliminary experimental study was performed to identify whether the cable slackens during flight maneuvers that were foreseen to be part of this study. The position of the cable suspension point and attachment to the load were recorded then the difference was used to compute the cable length. A flight was performed maintaining a close to constant altitude while combining (aggressive) pitch and roll inputs. The maximum pitch and roll angle was capped at 20 • which is the limit set by the internal controller on the quadrotor. Fig. 17 shows the computed cable length, suspension angles and inputs to the system for a section of the flight. As shown, the cable length varies by 0.01 m during the run, about 1.3% of the nominal value of 0.755 m. If slackening had occurred then deep troughs would have been visible in the cable length data, however, this was not the case therefore assuming the cable remains taut is a reasonable assumption. Slackening is likely when the vehicle suddenly accelerates in various direction not allowing sufficient time for the payload to respond, however, given the system performance limits this was not realised.  Figure 18 shows an expanded schematic of the quadrotor's on-board controller. The pitch, roll inputθ q ,φ q are tracked by the fast attitude controller resulting in longitudinal and lateral control forces F x , F y . From our observations, the Parrot Bebop quadrotor can perform level flight under a pitch/roll tilt suggesting the absence of an additional vertical force component. The quadrotor vertical velocity is stabilised by a controller based on reference thrust command inputw q resulting in vertical control force F q which when trimmed for the system weight gives F z . All forces F are in the world East-North-Up inertial frame.

C Input control force derivation
For modeling the input control force relations the premise is that the quadrotor is able to perform horizontal maneuvers while maintaining altitude, hence, the total vertical thrust component must always counteract the system weight when not changing elevation. The assumption made is that the horizontal and vertical input force components are decoupled. We also assume elevation changes only occur at negligible roll and pitch angles, so the system is in or near equilibrium. Let T be the total thrust force generated by the four rotors, R B E define the quadrotor's orientation and m the system mass, then the generated thrust force vector in frame {E} is defined as Additionally, as altitude is maintained then Solving for T in (44) gives, As we assume elevation changes only occur from a (near) equilibrium state, the vertical control input F q directly controls the generated vertical thrust force component. Then using (44) and (45), the thrust force vector is defined as cos(φ q ) g, − m tan(φ q )g, F q + mg ∈ R 3 . (46)

D Identified Parrot Bebop 2 input model
The quadrotor pitch θ q , roll φ q and vertical control force F q response to inputs pitchθ q , rollφ q and thrust commandw q are identified using the MATLAB system identification toolbox. The linear second-order, state-space, black-box models, which we denote by h θ , h φ , h F , are given by equations (47)  (48) and (49) The states For system identification, we experimentally collected two datasets (estimation and validation) of the Parrot Bebop 2 response on a 5 • amplitude 0.5 Hz square wave pitch/roll input over 15 s, and a 1 m/s pulse of width 1 s for the thrust command. Table 5 shows the quadrotor input model's fit. We use NRMSE (MATLAB's definition) to facilitate comparison; a 100% NRMSE means a perfect fit.
As the thrust command activates a velocity stabiliser loop, the commanded value will be tracked by the quadrotor hence a velocity input-output model can be identified. Using this relation, to obtain a model for the vertical force generated F q , the velocity model's differential output is multiplied by the system mass. This approach to model the thrust command response is rudimentary so future studies would explore a better approach to modeling this.  Fig. 19 Identified second order input model fit to experimentally collected system response validation data on the Parrot Bebop 2. For roll and pitch a 5 • , 0.5 Hz square wave input was used over a duration of 15 s starting at 2 s. For the thrust command, a 1 m/s pulse of width 1 s was used starting at 2s Figure 19 shows the second order models' fit to the validation data experimentally collected on the Parrot Bebop 2. For thrust command, the velocity input-output model response is shown (thus prior to taking the differential output and multiplying by the system mass).
As shown the identified model properly tracks the quadrotor's true pitch, roll and the vertical velocity response when given a pitch, roll and thrust command. This is also reflected in the NRMSE model fit shown in Table 5.

E Derivation of the closest point of approach (CPA) of a finite line segment to an ellipsoid
the end-points. Let [u, v, w] ≡ p l − p q and r qo = p o − p q . Substituting L in the ellipsoid equation and expanding vectors into x, y, z components, we approximate the signed line to ellipsoid distance function by Minimising (50) with respect to s, we get the ellipsoid's closest point along the infinite expansion of line L Then on the finite line segment we obtain the Closest Point of Approach (CPA) where s * = min{max{ŝ, 0}, 1}.

F Primal-dual interior-point constrained optimisation algorithm
The Model Predictive Controller (MPC) algorithm is a receding and finite-horizon, constrained optimisation problem that solves for quasi-optimal solutions. Global optima can be found for convex problems under specific conditions. For non-convex problems, the optimiser cannot be guaranteed to converge to the global optimum but rather any minima/maxima, local or global. A Newton-type, optimisation scheme is implemented in the software package FORCES Pro that is used to achieve real-time computational performance Domahidi and Jerez (2014). The specific algorithm is called Barrier Interior-Point Optimisation for which a short theoretical exposition is provided based on the derivation provided in Vanderbei (2012). The notation convention may differ from the main paper and is made clear from the context or by explicit definition in this appendix.

F.1 Optimisation problem definition
Take the cost function minimisation optimisation problem with the total cost J , and with f : R n → R subject to p inequality and q − p equality constraints as defined by Rewrite an equivalent problem using slack variables s i to convert the inequality constraints to equality constraints. where, The non-negativity constraint on the slack variables is rewritten as a logarithmic barrier function cost term giving which gives the equivalent barrier problem. A small positive barrier parameter μ is introduced such that for lim μ→0 min B = min J , the new optimisation problem is equivalent to the original. By definition, the natural logarithmic is undefined for s i < 0 thereby implicitly satisfying the inequality s i ≥ 0. Finally, incorporate the equality constraints into the objective function using Lagrange multipliers λ giving the Lagrange function where primal variable is x with the dual variable λ ∈ R q .

F.2 Newton-Raphson iterative root finding search
The optimisation problem is solved iteratively with a Newton-Raphson gradient based search direction algorithm. The gradients of (55) is defined by equations Applying the Newton-Raphson method for root-finding to the set of Eqs. 59 to 61, the following system of equations is obtained with the Δ search steps; , s, λ).
The left matrix is the Jacobian to set of Eqs. 59 to 61. Computing the search direction given by the vector of Δ is the most computationally expensive step of the algorithm Domahidi et al. (2012). In practice this is achieved through various methods, however, this is beyond the scope of this paper.
Then the variables x, s, λ are iteratively found by taking step of size α in the search directions identified in Eq. 62 resulting in the update where the step-size is regulated to achieve a converging linesearch.

F.3 Algorithm convergence and extensions
As the Newton-Raphson method is iterative in nature, the KKT conditions defined by Eqs. 59 to 61 are set to be satisfied when a certain tolerance is satisfied.
max ∇ x f (x) − ∇ x c(x, s) λ ≤ 1 max |ΛSe − μe| ≤ 2 max |c(x, s)| ≤ 3 As the Hessian W is not analytically defined in FORCES Pro, an approximation method is used resulting in a quasi-Newton method known as the Broyden-Fletcher-Goldfarb-Shanno algorithm Domahidi and Jerez (2014); the specific's of which are beyond the scope of this paper. The standard Newton method as presented in this appendix is sufficient to understand the general method utilised to solve the constrained optimisation problem.
Note in Karmarkar (1984), Karmarkar first showed that interior-point methods are polynomial time algorithms hence the number of algorithmic steps is O(n k ) for a non-negative k and n size input. Furthermore, for interested readers, the article Forsgren et al. (2002) provides a comprehensive overview of interior-point methods for non-linear optimisation.

G Goal directed assistive steering
Collision-free trajectory generation for low planning horizons can be augmented using an assistive steering based cost term; the idea is inspired from Vector Field Histograms Borenstein and Koren (1991). The quadrotor position and obstacle ellipsoids are projected onto the world horizontal plane P by the transformation T P : R 3 → R 2 where T P = diag(1, 1, 0). On P, we define a set of n d candidate angular directions for steering D = γ | γ = i 2π n d , i ∈ {1, . . . , n d } ⊂ N originating from our projected quadrotor position T P p q . Checking all γ ∈ D, we determine D free ⊆ D which are all the non-obstructed (free) directions in P up to a maximum omnidirectional range from T P p q . With γ goal the heading of the goal position from the quadrotor position, the steering direction is chosen to minimise the angular offset to the goal as given by With T Pṗq the quadrotor's heading, the cost is evaluated as its deviation from γ * by Under a short planning horizon, evasive actions for obstacles are generally performed abruptly as there is limited planning foresight. Steering assists planning by guiding the system towards obstacle-free areas for smoother trajectories; we demonstrate the utility of steering when using low planning horizons. As the R 2 steering method is only amendable to planar obstacle avoidance, for R 3 spatial avoidance we disable steering. Extension to R 3 could be done analogously.

H Effects of steering on system response
To address the underperformance of the system response to incoming obstacles using a low number of stages, assistive steering is used to guide the MAVP towards obstacle-free regions without compromising on run-time performance. Inclusion of the Goal Directed Assistive Steering as introduced in "Appendix G" is demonstrated by repeating the simulation study from Sect. 5.1. The steering associated cost weight is set to w steer = 0.05, with a detection range of 3.5 m and 8 pre-defined uniformly space steering direction were used. As with the other cost weights, w steer was found by manually tuning the weights. Figure 20 shows how the NMPC solve time and time-to-goal scales with the number of stages when using the assistive steering associated cost. Comparing the NMPC algorithm solve times for the case when assistive steering is disabled and enabled, as shown in Figs. 7 and 20, respectively, demonstrates that there is a negligible delta. Referring to Fig. 21 for the N =10 case, observe how the steering assisted trajectory is guided away from the obstacle resulting in a collision-free task completion. Compare this behaviour to the N =10 case with no steering shown in Fig. 8a where clearly the steering guides the system preemptively. In general, application of the steering command over the entire planning length makes the path more conservative thus increasing the time-to-goal especially for the higher N when compared to no steering. The benefits of steering are more apparent for low stage (N ) counts where the guidance is used to improve local planning. A low N with steering is a viable method to maintain a reasonable run-time frequency and collision-free performance. In our simulation, we always used default cost weights, however, by fine-tuning the weights to accommodate a shorter/longer prediction length, better performance may be realised.