1 Introduction

Effective navigation of multiple robots in cluttered environments is key to emerging industries such as warehouse automation (Wurman et al., 2008), autonomous driving (Furda & Vlacic, 2011), and automated intersection management (Dresner & Stone, 2008). One of the core challenges of navigation systems in such domains is trajectory planning. Planning safe trajectories for multiple robots is especially challenging when there is no central entity that plans all robots’ trajectories a priori or re-plans mid-execution if there is a fault. In some cases, such a central entity is undesirable because of the communication link that must be maintained between each robot and the central entity. If the map is not known a priori, building and relaying the observed map of the environment to the central entity through the communication channel adds further challenges. In some cases, it is impractical to have such a central entity because it cannot react to robot trajectory tracking errors and map updates fast enough due to communication and computation delays. In practice, robots must operate safely even if there is limited communication available. This necessitates the ability for robots to plan in a decentralized fashion, where each robot plans a safe trajectory for itself while operating in environments with other robots and obstacles.

Fig. 1
figure 1

RLSS runs in real-time in dense environments. Each robot plans a trajectory by itself using only the position information of other robots and obstacles

Decentralized algorithms delegate the computation of trajectories: each robot plans for itself and reacts to the environment by itself. In this paper, we introduce RLSS, a real-time decentralized trajectory planning algorithm for multiple robots in a shared environment that requires no communication between robots and requires relatively few sensing capabilities: each robot senses the relative positions of other robots and obstacles along with their geometric shapes in the environment, and is able to distinguish robots from obstacles. RLSS requires relatively few robot capabilities than most state-of-the-art decentralized planners used for multi-robot navigation, which typically require communication  (Luis et al., 2020; Tordesillas & How, 2021; Wang et al., 2021), higher order derivative estimates (Park & Kim, 2021; Wang et al., 2021), or predicted trajectories of objects (Park & Kim, 2021). However, the ability to distinguish robots from obstacles is not required by some state-of-the-art algorithms (Zhou et al., 2017), which require modelling obstacles as robots. RLSS is cooperative in the sense that we assume each robot stays within its current cell of a tessellation of the space until the next planning iteration. We assume obstacles are static, which is required to guarantee collision avoidance.

RLSS explicitly accounts for the dynamic limits of the robots and enforces safety with hard constraints, reporting when they cannot be enforced, thus guaranteeing collision avoidance when it succeeds. The planning algorithm can be guided with desired trajectories, thus it can be used in conjunction with centralized planners. If no centralized planner or plan is available, RLSS can be used on its own, without any central guidance, by setting the desired trajectories to line segments directly connecting robots’ start positions to their goal positions.

There are 4 stages in RLSS.

  1. 1.

    Select a goal position to plan toward on the desired trajectory;

  2. 2.

    Plan a discrete path toward the selected goal position;

  3. 3.

    Formulate and solve a kinematically feasible convex optimization problem to compute a safe smooth trajectory guided by the discrete plan;

  4. 4.

    Check if the trajectory obeys the dynamic limits of the robot, and temporally rescale the trajectory if not.

RLSS works in the receding horizon fashion: it plans a long trajectory, executes it for a short duration, and replans at the next iteration. It utilizes separating hyperplanes, i.e., linear spatial separations, between robot shapes, obstacles, and sweep geometries (the subsets of space swept by robots while traversing straight line segments) to enforce safety during trajectory optimization.

We demonstrate, through simulation and experiments on physical robots, that RLSS works in dense environments in real-time (Fig. 1). We compare our approach to two state-of-the-art receding horizon decentralized multi-robot trajectory planning algorithms in 3D. In the first, introduced by Zhou et al. (2017), robots plan for actions using a model predictive control-style optimization formulation while enforcing that each robot stays inside its buffered Voronoi cell at every iteration. We refer to this method as BVC. The original BVC formulation works only for discrete single-integrator dynamics and environments without obstacles. We extend the BVC formulation to discrete time-invariant linear systems with position outputs and environments with obstacles, and call the extended version eBVC (short for extended BVC). RLSS and eBVC have similar properties: they require no communication between robots, and require position sensing of other objects in the environment. However, unlike RLSS, eBVC does not require that robots are able to distinguish robots from obstacles, as it treats each obstacle as a robot. In the second, introduced by Park and Kim (2021), robots plan for trajectories by utilizing relative safe navigation corridors, which they execute for a short duration, and replan. We refer to this method as RSFC. RSFC requires no communication between robots, and utilizes positions as well as velocities of the objects in the environment, thus requires more sensing capabilities than RLSS. We demonstrate empirically that RLSS results in no deadlocks or collisions in our experiments in forest-like and maze-like environments, while eBVC is prone to deadlocks and RSFC results in collisions in such environments. However, RLSS results in slightly longer navigation durations compared to both eBVC and RSFC.

The contribution of this work can be summarized as follows:

  • A carefully designed, numerically stable, and effective real-time decentralized planning algorithm for multiple robots in shared environments with static obstacles with relatively few requirements: no communication between robots, position-only sensing of robots and obstacles, and ability to distinguish robots from obstacles.

  • An extension (eBVC) of a baseline planner (BVC) to environments with obstacles and a richer set of dynamics than only single-integrators.

  • The first comparison of more than two state-of-the-art communication-free decentralized multi-robot trajectory planning algorithms, namely, RSFC, eBVC, and RLSS, in complicated forest-like and maze-like environments, some of which containing more than 2000 obstacles.

2 Related work

The pipeline of RLSS contains three stages (discrete planning, safe navigation corridor construction, and trajectory optimization) that are employed by several existing single-robot trajectory planning algorithms. Richter et al. (2013) present a single-robot trajectory planning method for aggressive quadrotor flight which utilizes RRT* (Karaman & Frazzoli, 2010) to find a kinematically feasible discrete path and formulates an unconstrained quadratic program over polynomial splines guided by the discrete path. Collisions are checked after optimization, and additional decision points are added and optimization is re-run if there is collision. Chen et al. (2016) present a method that utilizes OcTree representation (Hornung et al., 2013) of the environment during discrete search. They find a discrete path using unoccupied grid cells of the OcTree. Then, they maximally inflate unoccupied grid cells to create a safe navigation corridor that they use as constraints in the subsequent polynomial spline optimization. Liu et al. (2017) uses Jump Point Search (JPS) (Harabor & Grastien, 2011) as the discrete planner, and construct safe navigation corridors that are used as constraints in the optimization stage. Our planning system uses these three stages (discrete planning, safe navigation corridor construction, and corridor-constrained optimization) and extends it to multi-robot scenarios in a decentralized way. We handle robot-robot safety by cooperatively computing a linear partitioning of the environment, and enforcing that each robot stays within its own cell of the partition.

We categorize multi-robot trajectory planning algorithms first on where computation is done, since the location of computation changes the properties of the algorithms. There are two main strategies to solve the multi-robot trajectory planning problem: centralized and decentralized. Centralized algorithms compute trajectories for all robots on a central entity with global information; these trajectories are communicated back to the robots for them to execute. In the decentralized strategy, each robot runs an algorithm on-board to compute its own trajectory; these algorithms may utilize direct communication between robots. Centralized algorithms often provide strong theoretical guarantees on completeness or global optimality because (i) they have the complete knowledge about the environment beforehand, (ii) they generally run on powerful centralized computers, and (iii) there are generally less restrictive time limits on their execution. Decentralized algorithms generally forgo such strong theoretical guarantees in favor of fast computation because (i) they are often used when complete information of the environment is not known beforehand, and (ii) they have to work in real-time on-board.

2.1 Centralized algorithms

We further categorize centralized algorithms into those that do not consider robot dynamics during planning and those that do.

Centralized without dynamics: If planning can be abstracted to agents moving along edges in a graph synchronously, we refer to the multi-agent path finding (MAPF) problem. Some variants of the MAPF problem are NP-Hard to solve optimally (Yu & LaValle, 2013). For NP-Hard variants, there are many optimal  (Sharon et al., 2015; Lam et al., 2019), bounded suboptimal (Barer et al., 2014), and suboptimal (Solovey et al., 2013; Ma et al., 2019) algorithms that perform well in some environments. Trajectories generated by planning algorithms that do not model robot dynamics may not be followed perfectly by real robots, resulting in divergence from the plan.

Centralized with dynamics: These algorithms deliver smooth control or output trajectories that are executable by real robots. Trajectories are usually generated by formulating the problem under an optimization framework, and communicated to the robots for them to execute. Tang and Kumar (2016) propose an approach that combines a discrete motion planning algorithm with trajectory optimization that works only in obstacle-free environments. Another approach combines a MAPF solver with trajectory optimization to plan trajectories for hundreds of quadrotors in environments with obstacles, while ensuring that the resulting trajectories are executable by the robots (Hönig et al., 2018). Park et al. (2020) combine a MAPF solver with trajectory optimization and provide executability and feasibility guarantees. Desai and Michael (2020) propose an approach utilizing position-invariant geometric trees to find kinodynamically feasible trajectories.

Centralized algorithms can provide theoretical completeness and global optimality guarantees under the assumptions of perfect prior knowledge and execution of trajectories. However, centralized replanning is required when the environment is discovered during operation or the robots deviate from the planned trajectories. This necessitates continuous communication between the central computer and the individual robots. In some cases, solving the initial problem from scratch may be required. If the communication between the robots and the central computer cannot be maintained, or solving the initial problem from scratch cannot be done in real-time, the robots may collide with each other or obstacles.

2.2 Decentralized algorithms

When the requirements of the centralized strategies cannot be satisfied, decentralization of the trajectory planning among robots is required. In decentralized planning, robots assess the state of the environment and plan for themselves using their on-board capabilities. We categorize decentralized algorithms into reactive and long horizon algorithms.

Decentralized reactive: In reactive algorithms, each robot computes the next action to execute based on the state of the environment without considering the actions that might follow it. One such approach is Optimal Reciprocal Collision Avoidance (ORCA) (Alonso-Mora et al., 2013). In ORCA, each robot chooses a velocity vector that is as close as possible to a desired velocity vector such that collisions between each pair of robots are provably avoided. It requires robots to (i) be cooperatively executing the same algorithm, (ii) obey single integrator dynamics, and (iii) either sense other robots’ velocities or receive them through communication. While ORCA handles dynamic changes in the environment and is completely distributed, it fails to avoid deadlocks in environments with obstacles (Şenbaşlar et al., 2019). Safety barrier certificates guarantee collision avoidance by computing a safe control action that is as close as possible to a desired control action (Wang et al., 2017). Similarly to ORCA, the robots using safety barrier certificates may get stuck in deadlocks. Most learning-based approaches fall into the body of reactive strategies as well. Here, neural networks are trained to compute the next action to execute, given a robot’s immediate neighborhood and goal.Footnote 1 For instance, PRIMAL (Sartoretti et al., 2019) is a computationally efficient suboptimal online MAPF solver, but it only works on grids, and does not consider robot dynamics. GLAS (Riviere et al., 2020), which considers robot dynamics, can guarantee safety for some robot dynamics by combining the network output with a safety term. It performs better than ORCA in terms of deadlocks, nevertheless results in deadlocks in dense environments. Another approach employs graph neural networks (GNNs) that allow communication between robots to avoid collisions (Li et al., 2020), but results in deadlocks in dense environments similar to GLAS. Batra et al. (2021) propose a method that outputs direct motor commands for a quadrotor using the positions and velocities of objects in the its immediate neighborhood, but results in collisions between robots.

Decentralized long horizon: In long horizon algorithms, robots generate a sequence of actions or long trajectories instead of computing a single action to execute. These approaches employ receding horizon planning: they plan long trajectories, execute them for a short duration, and replan. Zhou et al. (2017) present a model predictive control (MPC) based approach in which robots plan for a sequence of actions while enforcing each robot to stay within its buffered Voronoi cell. It requires no inter-robot communication and depends on sensing of other robots’ positions only. Another MPC-based approach approximates robots’ controller behaviors under given desired states as a linear system (Luis et al., 2020). A smooth Bézier curve for each robot is computed by solving an optimization problem, in which samples of desired states are drawn from the curve and fed into the model of the system. The approach requires communication of future states for collision avoidance. Another MPC-based approach generates plans using motion primitives to compute time-optimal trajectories, then trains a neural network to approximate the behavior of the planner, which is used during operation for fast planning (Wang et al., 2021). It requires sensing or communicating the full states of robots. Tordesillas and How (2021) present a method for dynamic obstacle avoidance, handling asynchronous planning between agents in a principled way. However, the approach requires instant communication of planned trajectories between robots to ensure safety. Park and Kim (2021) plan a piecewise Bézier curve by formulating an optimization problem. It utilizes relative safe flight corridors (RSFCs) for collision avoidance and requires no communication between robots, but requires position and velocity sensing. In Peterson et al. (2021)’s approach, robot tasks are defined as time window temporal logic specifications. Robots plan trajectories to achieve their tasks while avoiding each other. The method provably avoids deadlocks, but requires communication of planned trajectories between robots.

RLSS falls into this body of algorithms. It requires sensing only the positions of other robots and obstacles, and does not require any communication between robots. The work of Zhou et al. (2017) (BVC) is the only other approach with these properties. Different from their approach, RLSS uses discrete planning to avoid local minima, plans piecewise Bézier trajectories instead of MPC-style input planning, and does not constrain the full trajectory to be inside a constraining polytope. These result in better deadlock and collision avoidance as we show in the evaluation section.

In terms of inter-robot collision avoidance, RLSS belongs to a family of algorithms that ensure multi-robot collision avoidance without inter-robot communication by using the fact that each robot in the team runs the same algorithm. In these algorithms, robots share the responsibility of collision avoidance by computing feasible action sets that would not result in collisions with others when others compute their own feasible action sets using the same algorithm. Examples include BVC (Zhou et al., 2017), which partitions position space between robots and makes sure that each robot stays within its own cell in the partition, ORCA (Alonso-Mora et al., 2013), which computes feasible velocity sets for robots such that there can be no collisions between robots as long as each robot chooses a velocity command from their corresponding sets, and SBC (Wang et al., 2017), which does the same with accelerations.

BVC and RLSS also belong to a family of decentralized multi-robot trajectory planning algorithms that utilize mutually computable separating hyperplanes for inter-robot safety as defined by Şenbaşlar and Sukhatme (2022). These algorithms do not require any inter-robot communication and utilize only position/geometry sensing between robots. The inter-robot safety is enforced by making sure that each pair of robots can compute the same separating hyperplane and enforce themselves to be in the different sides of this hyperplane at every planning iteration. Separating hyperplanes that can be mutually computed by robots using only position/geometry sensing and no communication are defined as mutually computable. BVC uses Voronoi hyperplanes and RLSS uses support vector machine hyperplanes, both of which are mutually computable.

RLSS is an extension to our previous conference paper (Şenbaşlar et al., 2019). We extend it conceptually by

  • supporting robots with any convex geometry instead of only spherical robots; and

  • decreasing the failure rate of the algorithm from \(3\%\) in complex environments to \(0.01\%\) by providing important modifications to the algorithm. These modifications include (i) changing the discrete planning to best-effort \(A^*\) search, thus allowing robots to plan towards their goals even when goals are not currently reachable; (ii) increasing the numerical stability of the algorithm by running discrete planning at each iteration instead of using the trajectory of the previous iteration when it is collision free to define the homotopy class, and adding a preferred distance cost term to the optimization problem in order to create a safety zone between objects when possible; and (iii) ensuring the kinematic feasibility of the optimization problem generated at the trajectory optimization stage.

We extend our previous work empirically by

  • applying our algorithm to a heterogeneous team of differential drive robots in 2D;

  • applying our algorithm to quadrotors in 3D; and

  • comparing the performance of our algorithm to stronger baselines.

A preliminary version of RLSS previously appeared in a workshop (Şenbaşlar et al., 2021). In the workshop version, the RLSS optimization stage fails about \({3}{\%}\) of the time, and switches to a soft optimization formulation when that happens. In the current version, the optimization rarely (close to 1 in every 10000 iterations in our experiments) fails thanks to the newly added preferred distance cost term, changed goal selection stage, and changed discrete planning stage that ensures kinematic feasibility of the optimization problem. Hence, there is no longer a different soft optimization formulation. In the workshop version, we compare RLSS against Luis et al. (2020)’s work, which is based on distributed MPC, which requires communication of planned trajectories between robots. Herein, we compare our approach to two planners that do not require communication between robots as this provides a more comparable baseline.

3 Problem statement

We first define the multi-robot trajectory planning problem, then, we indicate the specific case of the problem we solve.

Consider a team of N robots for which trajectories must be computed at time \(t=0\). The team can be heterogeneous, i.e., each robot might be of a different type or shape. Let \(\mathcal {R}_i\) be the collision shape function of robot i such that \(\mathcal {R}_i(\textbf{p})\) is the collision shape of robot i located at position \(\textbf{p}\in \mathbb R^d, d\in \{2,3\}\), i.e., the subset of space occupied by robot i when placed at position \(\textbf{p}\). We define \(\mathcal {R}_i(\textbf{p}) = \{\textbf{p}+ \textbf{x}\mid \textbf{x}\in \mathcal {R}_{i,0}\}\) where \(\mathcal {R}_{i,0} \subset \mathbb {R}^d\) is the the space occupied by robot i when placed at the origin. Note that we do not model robot orientation. If a robot can rotate, the collision shape should contain the union of spaces occupied by the robot for each possible orientation at a given position; which is minimally a hypersphere if all orientations are possible.

We assume the robots are differentially flat (Murray et al., 1995), i.e., the robots’ states and inputs can be expressed in terms of output trajectories and a finite number of derivatives thereof. Differential flatness is common for many kinds of mobile robots, including differential drive robots (Campion et al., 1996), car-like robots (Murray & Sastry, 1993), omnidirectional robots (Jiang & Song, 2013), and quadrotors (Mellinger & Kumar, 2011). When robots are differentially-flat, their dynamics can be accounted for by (i) imposing \(C^c\) continuity on the trajectories for the required c (i.e. continuity up to \(c^{th}\) degree of derivative), and (ii) imposing constraints on the maximum \(k^{th}\) derivative magnitude of trajectories for any required k. For example, quadrotor states and inputs can be expressed in terms of the output trajectory, and its first (velocity), second (acceleration) and third (jerk) derivatives (Mellinger & Kumar, 2011). Hence, to account for quadrotor dynamics, continuity up to jerk should be enforced by setting \(c=3\), and velocity, acceleration, and jerk of the output trajectory should be limited appropriately by setting upper bounds for \(k=1\), 2, and 3.

Let \(\mathcal {O}(t) = \{\mathcal {Q}\subset \mathbb {R}^d\}\) be the set of obstacles in the environment at time t. The union \(\mathcal {Q}(t) = \cup _{\mathcal {Q}\in \mathcal {O}(t)}\mathcal {Q}\) denotes the occupied space at time t. Let \(\mathcal {W}_i \subseteq \mathbb R^d\) be the workspace that robot i should stay inside of.

Let \(\textbf{d}_{i}(t): [0, T_i] \rightarrow \mathbb R^d\) be the desired trajectory of duration \(T_i\) that robot i should follow as closely as possible. Define \(\textbf{d}_{i}(t) = \textbf{d}_{i}(T_i)\ \forall t \ge T_i\). We do not require that this trajectory is collision-free or even dynamically feasible to track by the robot. If no such desired trajectory is known, it can be initialized, for example, with a straight line to a goal location.

The intent is that each robot i tracks a Euclidean trajectory \(\textbf{f}_{i}(t): [0, T]\rightarrow \mathbb R^d\) such that \(\textbf{f}_{i}(t)\) is collision-free, executable according to the robot’s dynamics, is as close as possible to the desired trajectory \(\textbf{d}_{i}(t)\), and ends at \(\textbf{d}_{i}(T_i)\). Here, T is the navigation duration of the team. We generically define the multi-robot trajectory planning problem as the problem of finding trajectories \(\textbf{f}_{1}, \ldots , \textbf{f}_{N}\) and navigation duration T that optimizes the following formulation:

$$\begin{aligned}&{\displaystyle \min _{{\textbf {f}}_{1}, \ldots , {\textbf {f}}_{N}, T} \sum _{i=1}^{N}\int _{0}^{T} \left\| {\textbf {f}}_{i}(t) {-} {\textbf {d}}_{i}(t)\right\| _2}&{dt\; s.t.} \end{aligned}$$
(1)
$$\begin{aligned}&\quad \textbf{f}_{i}(t) \in C^{c_i}&\forall i\in \{1,\ldots ,N\} \end{aligned}$$
(2)
$$\begin{aligned}&\quad \textbf{f}_{i}(T) = \textbf{d}_{i}(T_i)&\forall i \in \{1,\ldots ,N\} \end{aligned}$$
(3)
$$\begin{aligned}&\quad \frac{d^c\textbf{f}_{i}(0)}{dt^c} = \frac{d^c\textbf{p}_{i}^{0}}{dt^c}&\forall i \forall c\in \{0, \ldots , c_i\} \end{aligned}$$
(4)
$$\begin{aligned}&\quad \mathcal {R}_i(\textbf{f}_{i}(t)) \cap \mathcal {Q}(t) = \emptyset&\forall i\forall t\in [0, T] \end{aligned}$$
(5)
$$\begin{aligned}&\quad \mathcal {R}_i(\textbf{f}_{i}(t)) \cap \mathcal {R}_j(\textbf{f}_{j}(t))\!=\! \emptyset&\forall j\!\ne \! i\forall t\in [0,T] \end{aligned}$$
(6)
$$\begin{aligned}&\quad \mathcal {R}_i(\textbf{f}_{i}(t)) \in \mathcal {W}_i&\,\,\forall i\forall t\in [0, T] \end{aligned}$$
(7)
$$\begin{aligned}&\quad \underset{t \in [0, T]}{\max }\left\Vert \frac{d^k \textbf{f}_{i}(t)}{dt^k}\right\Vert _2\!\le \!\gamma _i^k&\forall i\forall k \in \{1, \ldots , K_i\} \end{aligned}$$
(8)

where \(c_i\) is the order of derivative up to which the trajectory of the \(i^{th}\) robot must be continuous, \(\textbf{p}_{i}^{0}\) is the initial position of robot i (derivatives of which are the initial higher order state components, e.g., velocity, acceleration, etc.), \(\gamma _i^k\) is the maximum \(k^{th}\) derivative magnitude that the \(i^{th}\) robot can execute, and \(K_i\) is the maximum derivative degree that robot i has a derivative magnitude limit on.

The cost (1) of the optimization problem is a metric for the total deviation from the desired trajectories; it is the sum of position distances between the planned and the desired trajectories. (2) enforces that the trajectory of each robot is continuous up to the required degree of derivatives. (3) enforces that planned trajectories end at the endpoints of desired trajectories. (4) enforces that the planned trajectories have the same initial position and higher order derivatives as the initial states of the robots. (5) and (6) enforce robot-obstacle and robot-robot collision avoidance, respectively. (7) enforces that each robot stays within its defined workspace. Lastly, (8) enforces that the dynamic limits of the robot are obeyed.

Note that only constraint (6) stems from multiple robots. However, this seemingly simple constraint couples robots’ trajectories both spatially and temporally, making the problem much harder. As discussed in Sect. 2.1, solving the multi-agent path finding problem optimally is NP-Hard even for the discrete case while the discrete single-agent path finding problem can be solved optimally with classical search methods in polynomial time. This curse of dimensionality affects continuous motion planning as well, where even geometric variants are known to be PSPACE-Hard (Hopcroft et al., 1984).

A centralized planner can be used to solve the generic multi-robot trajectory planning problem in one-shot provided that: the current and future obstacle positions are known a priori, computed trajectories can be sent to robots over a communication link, and robots can track these trajectories well enough that they do not violate the spatio-temporal safety of the computed trajectories. In the present work, we aim to approximately solve this problem in the case where obstacles are static, but not necessarily known a priori by a central entity, and there is no communication channel between robots or between robots and a central entity. Each robot plans its own trajectory in real-time. They plan at a high frequency to compensate for trajectory tracking errors.

4 Preliminaries

We now introduce essential mathematical concepts used herein.

4.1 Parametric curves and splines

Trajectories are curves \(\textbf{f}: [0, T]\rightarrow \mathbb R^d\) that are parametrized by time, with duration T. Mathematically, we choose to use splines, i.e. piecewise polynomials, where each piece is a Bézier curve defined by a set of control points and a duration.

A Bézier curve \(\textbf{f}: [0, T] \rightarrow \mathbb R^d\) of degree h is defined by \(h+1\) control points \(\textbf{P}_{0}, \ldots , \textbf{P}_{h} \in \mathbb R^d\) as follows:

$$\begin{aligned} \textbf{f}(t)&= \sum _{i=0}^h \textbf{P}_{i} {h \atopwithdelims ()i} \left( \frac{t}{T}\right) ^i\left( 1-\frac{t}{T}\right) ^{(h-i)}. \end{aligned}$$

Any Bézier curve \(\textbf{f}\) satisfies \(\textbf{f}(0) = \textbf{P}_{0}\) and \(\textbf{f}(T) = \textbf{P}_{h}\). Other points guide the curve from \(\textbf{P}_{0}\) to \(\textbf{P}_{h}\). Since any Bézier curve \(\textbf{f}\) is a polynomial of degree h, it is smooth, meaning \(\textbf{f}\in C^\infty \).

We choose Bézier curves as pieces because of their convex hull property: the curves themselves lie inside the convex hull of their control points, i.e., \(\textbf{f}(t) \in ConvexHull\{\textbf{P}_{0},\ldots ,\textbf{P}_{h}\} \forall t\in [0, T]\) (Farouki, 2012). Using the convex hull property, we can constrain a curve to be inside a convex region by constraining its control points to be inside the same convex region. Tordesillas and How (2020) discuss the conservativeness of the convex hulls of control points of Bézier curves and show that the convex hulls are considerably less conservative than those of B-Splines (Piegl & Tiller, 1995), which are another type of curve with the convex hull property. Yet, they also show that convex hulls of the control points of the Bézier curves can be considerably more conservative compared to the smallest possible convex sets containing the curves.

4.2 Linear spatial separations: half-spaces, convex polytopes, and support vector machines

A hyperplane \(\mathcal {H}\) in \(\mathbb R^d\) can be defined by a normal vector \(\mathcal {H}_\textbf{n}\in \mathbb R^d\) and an offset \(\mathcal {H}_a\) as \(\mathcal {H}= \{\textbf{x}\in \mathbb {R}^d \mid \mathcal {H}_\textbf{n}^\top \textbf{x}+ \mathcal {H}_a = 0\}\). A half-space \(\tilde{\mathcal {H}}\) in \(\mathbb {R}^d\) is a subset of \(\mathbb {R}^d\) that is bounded by a hyperplane such that \(\tilde{\mathcal {H}} = \{\textbf{x}\in \mathbb R^d \mid \mathcal {H}_\textbf{n}^\top \textbf{x}+ \mathcal {H}_a \le 0\}\). A convex polytope is an intersection of a finite number of half-spaces.

Our approach relies heavily on computing safe convex polytopes and constraining spline pieces to be inside these polytopes. Specifically, we compute hard-margin support vector machine (SVM) (Cortes & Vapnik, 1995) hyperplanes between spaces swept by robots along line segments and the obstacles/robots in the environment, and use these hyperplanes to create safe convex polytopes for robots to navigate in.

5 Assumptions

Here, we list our additional assumptions about the problem formulation defined in Sect. 3 and the capabilities of robots.

We assume that obstacles in the environment are static, i.e., \(\mathcal {O}(t) = \mathcal {O}\ \forall t\in [0, \infty ]\), and convex.Footnote 2 Many existing, efficient, and widely-used mapping tools, including occupancy grids (Homm et al., 2010) and octomaps (Hornung et al., 2013), internally store obstacles as convex shapes; such maps can be updated in real-time using visual or RGBD sensors, and use unions of convex axis-aligned boxes to approximate the obstacles in the environment.

Similarly, we assume that the shapes of the robots are convex.

We assume that the workspace \(\mathcal {W}_i \subseteq \mathbb {R}^d\) is a convex polytope. It can be set to a bounding box that defines a room that a ground robot must not leave, a half-space that contains vectors with positive z coordinates so that a quadrotor does not hit the ground or simply be set to \(\mathbb R^d\). A non-convex workspace \(\tilde{\mathcal {W}_i}\) can be modeled by a convex workspace \(\mathcal {W}_i\) such that \(\tilde{\mathcal {W}_i} \subseteq \mathcal {W}_i\) and a static set of convex obstacles \(\tilde{\mathcal {O}}\) that block portions of the convex workspace so that \(\tilde{\mathcal {W}_i} = \mathcal {W}_i \backslash \left( \cup _{\mathcal {Q}\in \tilde{\mathcal {O}}} \mathcal {Q}\right) \).

To provide a guarantee of collision avoidance, we assume that robots can perfectly sense the relative positions of obstacles and robots as well as their shapes in the environment. The approach we propose does not require sensing of higher order state components (e.g., velocity, acceleration, etc.) or planned/estimated trajectories of objects, as the former is generally a noisy signal which cannot be expected to be sensed perfectly and the latter would require either communication or a potentially noisy trajectory estimation.

RLSS treats robots and obstacles differently. It enforces that each robot stays within a spatial cell that is disjoint from the cells of other robots until the next planning iteration to ensure robot-robot collision avoidance. To compute the spatial cell for each robot, RLSS uses positions and shapes of nearby robots, but not obstacles, in the environment. Therefore, robots must be able to distinguish other robots from obstacles. However, we do not require individual identification of robots.

We assume that the team is cooperative in that each robot runs the same algorithm using the same replanning period.

Lastly, we assume that planning is synchronized between robots, meaning that each robot plans at the same time instant. The synchronization assumption is needed for ensuring robot-robot collision avoidance when planning succeeds.Footnote 3

6 Approach

Under the given assumptions, we solve the generic multi-robot trajectory planning problem approximately using decentralized receding horizon planning. Each robot plans a trajectory, executes it for a short period of time, and repeats this cycle until it reaches its goal. We call each plan-execute cycle an iteration.

We refer to the planning robot as the ego robot, and henceforth drop indices for the ego robot from our notation as the same algorithm is executed by each robot. Workspace \(\mathcal {W}\) is the convex polytope in which the ego robot must remain. \(\mathcal {R}\) is the collision shape function of the ego robot such that \(\mathcal {R}(\textbf{p})\subset \mathbb {R}^d\) is the space occupied by the ego robot when it is placed at position \(\textbf{p}\). The ego robot is given a desired trajectory \(\textbf{d}(t):[0, T]\rightarrow \mathbb {R}^d\) that it should follow. The dynamic limits of the robot are modeled using required derivative degree c up to which the trajectory must be continuous, and maximum derivative magnitudes \(\gamma ^k\) for required degrees \(k \in \{1, \ldots , K\}\).

At every iteration, the ego robot computes a piecewise trajectory \(\textbf{f}(t)\), which is dynamically feasible and safe up to the safety duration s, that it executes for the replanning period \(\delta t \le s\), and fully re-plans, i.e., runs the full planning pipeline, for the next iteration. The only parameter that is shared by all robots is the replanning period \(\delta t\).

Without loss of generality, we assume that navigation begins at \(t=0\), and at the start of planning iteration u, the current timestamp is \(\tilde{T} = u\delta t\).

RLSS fits into the planning part of the classical robotics pipeline using perception, planning, and control. The inputs from perception for the ego robot are:

  • \(\mathcal {S}\): Shapes of other robots.Footnote 4\(\mathcal {S}_j \in \mathcal {S}\) where \(j \in \{1,\ldots , i-1, i+1,\ldots ,N\}\) is the collision shape of robot j sensed by the ego robot such that \(\mathcal {S}_j \subseteq \mathbb {R}^d\).

  • \(\mathcal {O}\): The set of obstacles in the environment, where each obstacle \(\mathcal {Q}\in \mathcal {O}\) is a subset of \(\mathbb {R}^d\).

  • \(\textbf{p}\): Current position of the ego robot, from which derivatives up to required degree of continuity can be computed.Footnote 5

We define \(\hat{\mathcal {O}} = \cup _{\mathcal {Q}\in \mathcal {O}}\mathcal {Q}\) as the space occupied by the obstacles, and \(\hat{\mathcal {S}} = \cup _{\mathcal {S}'\in \mathcal {S}}\mathcal {S}'\) as the space occupied by the robot shapes. Robots sense the set of obstacles and the set of robot shapes and use those sets in practice. We use spaces occupied by obstacles and robots for brevity in notation.

Fig. 2
figure 2

RLSS planning pipeline. Based on the sensed robots \(\mathcal {S}\), sensed obstacles \(\mathcal {O}\), and current position \(\textbf{p}\), the ego robot computes the trajectory \(\textbf{f}(t)\) that is dynamically feasible and safe up to time s

There are 4 main stages of RLSS: (1) goal selection, (2) discrete planning, (3) trajectory optimization, and (4) temporal rescaling. The planning pipeline is summarized in Fig. 2. At each planning iteration, the ego robot executes the four stages to plan the next trajectory. In the goal selection stage, a goal position and the corresponding timestamp of the goal position on the desired trajectory \(\textbf{d}(t)\) is selected. In the discrete planning stage, a discrete path from robot’s current position toward the selected goal position is computed and durations are assigned to each segment. In the trajectory optimization stage, discrete segments are smoothed to a piecewise trajectory. In the temporal rescaling stage, the dynamic limits of the robot are checked and duration rescaling is applied if necessary.

Next, we describe each stage in detail. Each stage has access to the workspace \(\mathcal {W}\), the collision shape function \(\mathcal {R}\), the desired trajectory \(\textbf{d}(t)\) and its duration T, the maximum derivative magnitudes \(\gamma ^k\), and the derivative degree c up to which the trajectory must be continuous. We call these task inputs. They describe the robot shape, robot dynamics, and the task at hand; these are not parameters that can be tuned freely and they do not change during navigation.

6.1 Goal selection

At the goal selection stage (Algorithm 1), we find a goal position \(\textbf{g}\) on the desired trajectory \(\textbf{d}(t)\) and a timestamp by which it should be reached. These are required in the subsequent discrete planning stage, which is a goal-oriented search algorithm.

Goal selection has two parameters: the desired planning horizon \(\tau \) and safety distance D. It uses the robot collision shape function \(\mathcal {R}\), desired trajectory \(\textbf{d}(t)\) and its duration T, and workspace \(\mathcal {W}\) from the task inputs. The inputs of goal selection are the shapes of other robots \(\mathcal {S}\), obstacles in the environment \(\mathcal {O}\), current position \(\textbf{p}\), and the current timestamp \(\tilde{T}\).

At the goal selection stage, the algorithm finds the timestamp \(T'\) that is closest to \(\tilde{T} + \tau \) (i.e., the timestamp that is one planning horizon away from the current timestamp) when the robot, if placed on the desired trajectory at \(T'\), is at least safety distance D away from all objects in the environment. We use the safety distance D as a heuristic to choose goal positions that have free volume around them in order not to command robots into tight goal positions. Note that goal selection only chooses a single point on the desired trajectory that satisfies the safety distance; the actual trajectory the robot follows will be planned by the rest of the algorithm. Formally, the problem we solve in the goal selection stage is given as follows:

$$\begin{aligned} \begin{aligned} T' = \arg&\min _{t} |t-(\tilde{T}+\tau )|\ s.t.\\&t\in [0, T]\\&\text {min-dist}(\mathcal {R}(\textbf{d}(t)), \hat{\mathcal {O}} \cup \hat{\mathcal {S}} \cup \partial \mathcal {W}) \ge D \end{aligned} \end{aligned}$$
(9)

where \(\partial \mathcal {W}\) is the boundary of workspace \(\mathcal {W}\), and min-dist returns the minimum distance between two sets.

We solve (9) using linear search on timestamps starting from \(\tilde{T} + \tau \) with small increments and decrements.

Figure 3, demonstrates the goal selection procedure for a particular instance.

If there is no safe point on the desired trajectory, i.e. if the robot is closer than D to objects when it is placed on any point on the desired trajectory, we return the current position and timestamp. This allows us to plan a safe stopping trajectory.

figure d

Note that while the selected goal position has free volume around it, it may not be reachable by the ego robot. For example, the goal position may be encircled by obstacles or other robots. Therefore, we use a best-effort search method during discrete planning (as described in Sect. 6.2) that plans a path towards the goal position.

The goal and the timestamp are used in the subsequent discrete planning stage as suggestions.

Fig. 3
figure 3

Goal Selection. a Blue and red squares are robots, while obstacles are black boxes. The desired trajectories \(\textbf{d}_{red}(t)\) and \(\textbf{d}_{blue}(t)\) of each robot are given as dotted lines. Safety distance D is set to 0 for clarity. b The desired trajectory of the red robot is not collision-free at timestamp \(\tilde{T} + \tau \). It selects its goal timestamp \(T'\) (and hence its goal position) by solving (9), which is the closest timestamp to \(\tilde{T} + \tau \) when the robot, when placed on the desired trajectory, would be collision free. c Since the desired trajectory of blue robot is collision free at timestamp \(\tilde{T} + \tau \), it selects its goal timestamp \(T' = \tilde{T} + \tau \) after solving (9). d Selected goal positions are shown

6.2 Discrete planning

figure e

Discrete planning (Algorithm 2) performs two main tasks: (i) it finds a collision-free discrete path from the current position \(\textbf{p}\) towards the goal position \(\textbf{g}\), and (ii) it assigns durations to each segment of the discrete path. The discrete path found at the discrete planning stage represents the homotopy class of the final trajectory. Trajectories in the same homotopy class can be smoothly deformed into one another without intersecting obstacles (Bhattacharya et al., 2010). The subsequent trajectory optimization stage computes a smooth trajectory within the homotopy class. The trajectory optimization stage utilizes the discrete path to (i) generate obstacle avoidance constraints and (ii) guide the computed trajectory by adding distance cost terms between the discrete path and the computed trajectory. It uses the durations assigned by the discrete planning stage as the piece durations of the piecewise trajectory it computes.

Finding a discrete path from the start position \(\textbf{p}\) to the goal position \(\textbf{g}\) is done with best-effort \(A^*\) search (Line 2, Algorithm 2), which we define as follows. If there are valid paths from \(\textbf{p}\) to \(\textbf{g}\), we find the least-cost path among such paths. If no such path exists, we choose the least-cost path to the position that has the lowest heuristic value (i.e., the position that is heuristically closest to \(\textbf{g}\)). This modification of \(A^*\) search is done due to the fact that the goal position may not always be reachable, since the goal selection stage does not enforce reachability.

The ego robot plans its path in a search grid where the grid has hypercubic cells (i.e. square in 2D, cube in 3D) with edge size \(\sigma \), which we call the step size of the search. The grid shifts in the environment with the robot in the sense that the robot’s current position always coincides with a grid center. Let \(\mathcal {F}= \mathcal {W}\backslash (\hat{\mathcal {O}} \cup \hat{\mathcal {S}})\) be the free space within the workspace (Line 1, Algorithm 2). We do not map free space \(\mathcal {F}\) to the grid. Instead, we check if the robot shape swept along any segment on the grid is contained in \(\mathcal {F}\) or not, to decide if a movement is valid. This allows us to (i) model obstacles and robot shapes independently from the grid’s step size \(\sigma \), and (ii) shift the grid with no additional computation since we do not store occupancy information within the grid.

The states during the search have two components: position \(\varvec{\pi }\) and direction \(\varvec{\Delta }\). Robots are allowed to move perpendicular or diagonal to the grid. This translates to 8 directions in 2-dimension, 26 directions in 3-dimension. Goal states are states that have position \(\textbf{g}\) and any direction. We model directions using vectors \(\varvec{\Delta }\) of d components where each component is in \(\{-1, 0, 1\}\). When the robot moves 1 step along direction \(\varvec{\Delta }\), its position changes by \(\sigma \varvec{\Delta }\). The initial state of the search is the ego robot’s current position \(\textbf{p}\) and direction \(\textbf{0}\).

Table 1 Discrete search actions and their costs

There are 3 actions in the search formulation: ROTATE, FORWARD, and REACHGOAL, summarized in Table 1. ROTATE action has a cost of 1. The cost of the FORWARD action is the distance travelled divided by the step size (i.e. cells travelled), which is equal to the size \(\left\Vert \varvec{\Delta }\right\Vert _2\) of the direction vector. REACHGOAL has the cost of one rotation plus cells travelled from \(\varvec{\pi }\) to goal position \(\textbf{g}\): \(1 + \frac{\left\Vert \varvec{\pi }- \textbf{g}\right\Vert _2}{\sigma }\). One rotation cost is there because it is almost surely required to do one rotation before going to goal from a cell. ROTATE actions in all directions are always valid whenever the current state is valid. FORWARD and REACHGOAL actions are valid whenever the robot shape \(\mathcal {R}\) swept along the movement is contained in free space \(\mathcal {F}\).

For any state (\(\varvec{\pi }, \varvec{\Delta }\)), we use the Euclidean distance from position \(\varvec{\pi }\) to the goal position \(\textbf{g}\) divided by step size \(\sigma \) (i.e. cells travelled when \(\varvec{\pi }\) is connected to \(\textbf{g}\) with a straight line) as the admissible heuristic.

Lemma 1

In the action sequence of the resulting plan

  1. 1.

    Each ROTATE action must be followed by at least one FORWARD action,

  2. 2.

    The first action cannot be a FORWARD action,

  3. 3.

    And no action can appear after REACHGOAL action.

Proof Sketch

1. After each ROTATE action, a FORWARD action must be executed in a least-cost plan because (i) a ROTATE action cannot be the last action since goal states accept any direction and removing any ROTATE action from the end would result in a valid lower cost plan, (ii) the REACHGOAL action cannot appear after ROTATE action because REACHGOAL internally assumes robot rotation and removing the ROTATE action would result in a valid lower cost plan, and iii) there cannot be consecutive ROTATE actions in a least-cost path as each rotation has the cost of 1 and removing consecutive rotations and replacing them with a single rotation would result in a valid lower cost plan.

2. The first action cannot be a FORWARD action since initial direction is set to \(\textbf{0}\) and FORWARD action is available only when \(\varvec{\Delta }\ne \textbf{0}\).

3. No action after a REACHGOAL action can appear in a least cost plan because REACHGOAL connects to the goal position, which is a goal state regardless of the direction. Removing any action after REACHGOAL action would result in a valid lower cost plan. \(\square \)

By Lemma 1, the action sequence can be described by the following regular expression in POSIX-Extended Regular Expression Syntax:

$$\begin{aligned} (\text {(ROTATE)}\text {(FORWARD)}^+)^*\text {(REACHGOAL)}^{\{0,1\}} \end{aligned}$$

We collapse consecutive FORWARD actions in the resulting plan and extract discrete segments (Line 3, Algorithm 2). Each \(\text {(ROTATE)}\text {(FORWARD)}^+\) sequence and REACHGOAL becomes a separate discrete segment. Let \(\{\textbf{e}_1, \ldots , \textbf{e}_{L}\}\) be the endpoints of discrete segments. We prepend the first endpoint to the endpoint sequence in order to have a 0-length first segment for reasons that we will explain in Sect. 6.3 (Line  4, Algorithm 2). The resulting L segments are described by \(L+1\) endpoints \(\{\textbf{e}_0, \ldots , \textbf{e}_{L}\}\) where \(\textbf{e}_0 = \textbf{e}_1\). Example discrete paths for two robots are shown in Fig. 4.

Fig. 4
figure 4

Discrete Planning. a Goal positions of two robots computed at the goal selection stage are given. b Red robot plans a discrete path from its current position to its goal position on a search grid that is aligned on its current position. c Blue robot plans a discrete path from its current position to its goals position on a search grid that is aligned on its current position. Computed discrete paths are prepended with robot start positions to have 0-length first segments. d The resulting discrete paths are given

Next, we assign durations to each segment. The total duration of the segments is computed using the ego robot’s maximum velocity \(\gamma ^1\), the timestamp \(T'\) that the goal position should be reached by, and the current timestamp \(\tilde{T}\). We use \(T' - \tilde{T}\) as the desired duration of the plan. However, if \(T' - \tilde{T}\) is negative (i.e. \(T' < \tilde{T}\), meaning that the goal position should been reached in the past), or \(T' - \tilde{T}\) is small such that the robot cannot traverse the discrete segments even with its maximum velocity \(\gamma ^1\), we adjust the desired duration to a feasible one, specifically the total length of segments divided by the maximum velocity (Line 6, Algorithm 2). We distribute the feasible duration to the segments except for the first one, proportional to their lengths (Loop at line 8, Algorithm 2). We set the duration of the first segment, which has zero length, to the safety duration s (Line 7, Algorithm 2); the reason for this will be explained in Sect. 6.3.

The outputs of discrete planning are segments described by endpoints \(\{\textbf{e}_0, \ldots , \textbf{e}_L\}\) with assigned durations \(\{T_1, \ldots , T_L\}\) that are used in the trajectory optimization stage.

6.3 Trajectory optimization

In the trajectory optimization stage (Algorithm 3), we formulate a convex quadratic optimization problem (QP) to compute a piecewise trajectory \(\textbf{f}(t)\) by smoothing discrete segments. The computed trajectory is collision-free and continuous up to the desired degree of derivative. However, it may be dynamically infeasible (i.e., derivative magnitudes may exceed the maximum allowed derivative magnitudes \(\gamma ^k\)); this is resolved during the subsequent temporal rescaling stage.

The decision variables of the optimization problem are the control points of an L-piece spline where each piece is a Bèzier curve. The duration of each piece is assigned during discrete planning. Let \(T = \sum _{i=1}^{L}T_i\) denote the total duration of the planned trajectory. The degree of the Bézier curves is tuned with the parameter h. Let \(\textbf{P}_{i, j} \in \mathbb {R}^d\) be the \(j^{th}\) control point of the \(i^{th}\) piece where \(i\in \{1,\ldots ,L\}, j\in \{0,\ldots ,h\}\). Let \(\mathcal {P}= \{\textbf{P}_{i, j} \mid i\in \{1,\ldots ,L\}, j\in \{0,\ldots ,h\}\}\) be the set of all control points.

6.3.1 Constraints

There are 4 types of constraints on the trajectory; all are linear in the decision variables \(\mathcal {P}\).

1) Workspace constraints: We shift each bounding hyperplane of workspace \(\mathcal {W}\) (there are a finite number of such hyperplanes as \(\mathcal {W}\) is a convex polytope) to account for the robot’s collision shape \(\mathcal {R}\), such that when the robot’s position is on the safe side of the shifted hyperplane, the entire robot is on the safe side of the original hyperplane (Line 2, Algorithm 3).

figure f

Let \(\Upsilon _{\mathcal {W}}\) be the set of shifted hyperplanes of \(\mathcal {W}\). We constrain each control point of the trajectory to be on the valid sides of the shifted hyperplanes (Line 3, Algorithm 3). Since Bézier curves are contained in the convex hulls of their control points, constraining control points to be in a convex set automatically constrains the Bézier curves to stay within the same convex set.

2) Robot-robot collision avoidance constraints: For robot-robot collision avoidance, recall that each robot replans with the same period \(\delta t\) and planning is synchronized between robots.

At each iteration, the ego robot computes its SVM cell within the SVM tessellation of robots using hard-margin SVMs. SVM tessellation is similar to Voronoi tesselation, the only difference is that pairwise SVM hyperplanes are computed between collision shapes instead of Voronoi hyperplanes. We choose SVM tessellation because (i) hard-margin SVM is convex, hence pairs of robots can compute the same exact hyperplane under the assumption of perfect sensing, (ii) it allows for a richer set of collision shapes than basic Voronoi cells, which is valid only for hyperspherical objects, and (iii) SVM cells are always convex unlike generalized Voronoi cells.

We buffer the ego robot’s SVM cell to account for its collision shape \(\mathcal {R}\), and constrain the first piece of the trajectory to stay inside the buffered SVM cell (BSVM) (loop at Line 5, Algorithm 3). Only the first piece of the trajectory is constrained to remain in the buffered SVM cell, since the entire planning pipeline is run after \(\delta t\), which is smaller than the duration s of the first piece. At that time, planning begins at a new location, generating a new first piece that must remain in the new buffered SVM cell.

Fig. 5
figure 5

Trajectory Optimization. a Discrete segments of two robots computed at the discrete planning stage. b Robot-robot collision avoidance constraints are computed using BSVMs. The green hyperplane is the SVM hyperplane between two robots. Each robot shifts the SVM hyperplane to account for its geometry, and constrains the first piece of the trajectory with the resulting BSVM. ce Active set of robot-obstacle collision avoidance constraints for three different pieces (one belonging to the blue robot, two belonging to the red robot). In each figure, the region swept by the robot while traversing the segment is shown in robot’s color. SVM hyperplanes between the swept region and the obstacles are given as light-colored lines. SVM hyperplanes are buffered to account for the robot’s collision shape and shown as dark-colored lines (BSVMs). The shift operations are shown as arrows. Obstacles and constraints generated from them are colored using the same colors. For each piece, the feasible region that is induced by the robot-obstacle collision avoidance constraints is colored in gray. f Trajectories computed by the trajectory optimization stage are given

Buffering is achieved by changing the offset of the hyperplane. \(\mathcal {R}(\textbf{x})\) is the shape of the robot when placed at \(\textbf{x}\), defined as \(\mathcal {R}(\textbf{x}) = \{\textbf{x}\}\oplus \mathcal {R}_0\), where \(\mathcal {R}_0\) is the shape of the robot when placed the origin and \(\oplus \) is the Minkowski sum operator. Given a hyperplane with normal \(\mathcal {H}_\textbf{n}\) and offset \(\mathcal {H}_a\), we find \(\mathcal {H}_{a'}\) that ensures \(\mathcal {R}(\textbf{x})\) is on the negative side of the hyperplane with normal \(\mathcal {H}_\textbf{n}\) and offset \(\mathcal {H}_a\) whenever \(\textbf{x}\) is on the negative side of the hyperplane with normal \(\mathcal {H}_\textbf{n}\) and offset \(\mathcal {H}_{a'}\), and vice versa, by setting \(\mathcal {H}_{a'} = \mathcal {H}_a + \max _{\textbf{y}\in \mathcal {R}_0}\mathcal {H}_\textbf{n}\cdot \textbf{y}\). The following shows that whenever \(\mathcal {R}(\textbf{x})\) is on the negative side of the hyperplane \((\mathcal {H}_\textbf{n}, \mathcal {H}_a)\), \(\textbf{x}\) is on the negative side of the hyperplane \((\mathcal {H}_\textbf{n}, \mathcal {H}_{a'})\). The converse can be shown by following the steps backwards.

$$\begin{aligned}&\forall \textbf{y}\in \mathcal {R}(\textbf{x})\ \mathcal {H}_\textbf{n}\cdot \textbf{y}+ \mathcal {H}_a \le 0\\&\implies \max _{\textbf{y}\in \mathcal {R}(\textbf{x})}\mathcal {H}_\textbf{n}\cdot \textbf{y}+ \mathcal {H}_a \le 0\\&\implies \max _{\textbf{y}\in \{\textbf{x}\} \oplus \mathcal {R}_0}\mathcal {H}_\textbf{n}\cdot \textbf{y}+ \mathcal {H}_a \le 0\\&\implies \max _{\textbf{y}\in \mathcal {R}_0}\mathcal {H}_\textbf{n}\cdot (\textbf{y}+\textbf{x}) + \mathcal {H}_a \le 0\\&\implies \mathcal {H}_\textbf{n}\cdot \textbf{x}+ \mathcal {H}_a + \max _{\textbf{y}\in \mathcal {R}_0}\mathcal {H}_\textbf{n}\cdot \textbf{y}\le 0 \end{aligned}$$

Since the duration of the first piece was set to the safety duration \(s\ge \delta t\) in discrete planning, the robot stays within its buffered SVM cell for at least \(\delta t\). Moreover, since planning is synchronized across all robots, the pairwise SVM hyperplanes they compute will match, thus the buffered SVM cells of robots are disjoint. This ensures robot-robot collision avoidance until the next planning iteration.

Computed SVMs and BSVMs are shown in Fig. 5b for a two robot case.

To ensure that the number of constraints of the optimization problem does not grow indefinitely, SVM hyperplanes are only computed against those robots that are at most \(\tilde{r}\) away from the ego robot. This does not result in unsafe trajectories so long as \(\tilde{r}\) is more than the total maximum distance that can be traversed by two robots while following the first pieces of their trajectories, for which an upper bound is \(\max _{i,j\in \{1,\ldots ,N\}}(\gamma _i^1s_i+\gamma _j^1s_j)\), where \(s_i\) and \(s_j\) are the durations of the first pieces of the trajectories of robots i and j respectively.

3) Robot-obstacle collision avoidance constraints: Buffered SVM hyperplanes are used for robot-obstacle collision avoidance as well. Let \({\zeta }_i \subset \mathbb {R}^d\) be the region swept by the ego robot while traversing the \(i^{th}\) segment from \(\textbf{e}_{i-1}\) to \(\textbf{e}_{i}\). We compute the SVM hyperplane between \({\zeta }_i\) and each object in \(\mathcal {O}\), buffer it as explained before to account for robot’s collision shape, and constrain the \(i^{th}\) piece by the resulting buffered SVM (Loop at line 8, Algorithm 3). This ensures that trajectory pieces do not cause collisions with obstacles.

The use of SVMs for collision avoidance against obstacles is a choice of convenience, as we already use them for robot-robot collision avoidance. For robot-obstacle collision avoidance, one can use any separating hyperplane between \({\zeta }_i\) and objects in \(\mathcal {O}\), while in the case of robot-robot collision avoidance, pairs of robots must compute matching hyperplanes using the same algorithm.

Elements of the active set of SVM and BSVM hyperplanes for robot-obstacle collision avoidance are shown in an example scenario in Fig. 5c–e for three different pieces.

Similar to robot-robot collision avoidance, to ensure that the number of constraints of the optimization problem does not grow indefinitely, we only compute SVM hyperplanes between \({\zeta }_i\) and obstacles that are not more than \(\tilde{o}\) away from \({\zeta }_i\).

Let \(\Upsilon _i\) be the set of buffered SVM hyperplanes that constrain the \(i^{th}\) piece \(\forall i\in \{1,\ldots ,L\}\). \(\Upsilon _1\) contains both robot-robot and robot-obstacle collision avoidance hyperplanes while \(\Upsilon _j\ \forall j\in \{2,\ldots ,L\}\) contain only robot-obstacle collision avoidance hyperplanes. This is because the first piece is the only piece that we constrain with robot-robot collision avoidance hyperplanes because it is the only piece that will be executed until the next planning iteration.

4) Continuity constraints: We add two types of continuity constraints: (i) continuity constraints between planning iterations, and (ii) continuity constraints between trajectory pieces (Line 15, Algorithm 3).

To enforce continuity between planning iterations, we add constraints that enforce

$$\begin{aligned} \frac{d^j\textbf{f}(0)}{dt^j} = \frac{d^j\textbf{p}}{dt^j}\ \forall j\in \{0,\ldots ,c\} \end{aligned}$$

where c is the task input denoting the continuity degree up to which the resulting trajectory must be continuous and \(\textbf{p}\) is the current position.

To enforce continuity between trajectory pieces, we add constraints that enforce

$$\begin{aligned} \frac{d^j\textbf{f}_{i}(T_i)}{dt^j}&= \frac{d^j\textbf{f}_{i+1}(0)}{dt^j}\\&\forall i\in \{1,\ldots ,L-1\}\ \forall j\in \{0,\ldots ,c\} \end{aligned}$$

where \(\textbf{f}_{i}(t)\) is the \(i^{th}\) piece of the trajectory.

Remark 1 discusses that the SVM problems generated during trajectory optimization are feasible, i.e., the trajectory optimization stage will always succeed constructing the QP.

Remark 1

All SVM problems generated for robot-robot and robot-obstacle collision avoidance from the discrete path outputted by the discrete planning stage are feasible.

Reasoning

The discrete planning stage outputs a discrete path such that a robot with collision shape \(\mathcal {R}\) following the path does not collide with any obstacles in \(\mathcal {O}\) or any other robots in \(\mathcal {S}\). It also ensures that \(\textbf{p}=\textbf{e}_0\) since the search starts from the robot’s current position \(\textbf{p}\). Hence, \(\mathcal {R}(\textbf{p})\) does not intersect with any \(\mathcal {S}_j \in \mathcal {S}\). Since each robot is assumed to be convex, there exists at least one hyperplane that separates \(\mathcal {R}(\textbf{p})\) from \(\mathcal {S}_j\) for each j by the separating hyperplane theorem. SVM is an optimal separating hyperplane according to a cost function. Therefore, SVM problems between \(\mathcal {R}(\textbf{p})\) and \(\mathcal {S}_j\in \mathcal {S}\) for robot-robot collision avoidance are feasible.

Since the robot moving along the discrete segments is collision free, \({\zeta }_i\) is collision free for all i. Also, since it is a sweep of a convex shape along a line segment, \({\zeta }_i\) is convex as shown in Lemma 2 in Appendix A. Similar to the previous argument, since \({\zeta }_i\) is collision-free and convex and all obstacles in the environment are convex, each SVM problem between \({\zeta }_i\) and \(\mathcal {Q}\in \mathcal {O}\) for robot-obstacle collision avoidance is feasible by the separating hyperplane theorem.

Workspace, robot-robot collision avoidance, robot-obstacle collision avoidance, and position continuity constraints are kinematic constraints. Higher-order continuity constraints are dynamic constraints. Remark 2 discusses the ensured feasibility of the kinematic constraints. The feasibility of the dynamic constraints cannot be ensured for arbitrary degrees of continuity.

Remark 2

Kinematic constraints of the optimization problem generated from a discrete path output from the discrete planning stage are feasible for the same path when the degree of Bézier curves \(h\ge 1\).

Reasoning

Any Bézier curve with degree \(h\ge 1\) can represent a discrete segment by setting half of the points to the start of the segment and other half to the end of the segment. Hence, we will only show that a discrete path output from discrete planning stage by itself satisfies the kinematic constraints generated. Remember that discrete path output from the discrete planning stage has the property \(\textbf{p}=\textbf{e}_0=\textbf{e}_1\).

Each robot-robot SVM problem is feasible (see Remark 1). Let \(\mathcal {H}_\textbf{n}\) be the normal and \(\mathcal {H}_a\) be the offset of any of the robot-robot SVMs. It is shifted by setting the offset to \(\mathcal {H}_{a'} = \mathcal {H}_a + \max _{\textbf{y}\in \mathcal {R}_0}\mathcal {H}_\textbf{n}\cdot \textbf{y}\). The point \(\textbf{p}\) satisfies \(\mathcal {H}_\textbf{n}\cdot \textbf{p}+ \mathcal {H}_{a'} \le 0\) because \(\mathcal {R}(\textbf{p})\) is on the negative side of the SVM. The robot-robot BSVM hyperplanes are used to constrain the first piece of the trajectory, and setting the first piece as a 0-length segment with \(\textbf{p}=\textbf{e}_0=\textbf{e}_1\) satisfies the robot-robot collision avoidance constraints.

Each robot-obstacle SVM problem is feasible (see Remark 1). Let \(\mathcal {H}_\textbf{n}\) be the normal and \(\mathcal {H}_a\) be the offset of any of the robot-obstacle SVMs that is between \({\zeta }_i\) and an obstacle. It is shifted by setting the offset to \(\mathcal {H}_{a'} = \mathcal {H}_a + \max _{\textbf{y}\in \mathcal {R}_0}\mathcal {H}_\textbf{n}\cdot \textbf{y}\). All points on the line segment \(\textbf{p}_i(t) = \textbf{e}_{i-1} + t(\textbf{e}_i - \textbf{e}_{i-1}), t\in [0,1]\) from \(\textbf{e}_{i-1}\) to \(\textbf{e}_i\) satisfy the BSVM constraint because \(\mathcal {R}(\textbf{p}_i(t))\) is on the negative side of the SVM hyperplane \(\forall t\in [0,1]\). Since each SVM hyperplane between \({\zeta }_i\) and obstacles is only used to constrain the \(i^{th}\) piece, constraints generated by it are feasible for the segment from \(\textbf{e}_{i-1}\) to \(\textbf{e}_i\).

The feasibility of the workspace constraints are trivial since the robot moving along discrete segments is contained in the workspace, and we shift bounding hyperplanes of the workspace in the same way as SVM hyperplanes. Hence, the discrete path satisfies the workspace constraints.

Initial point position continuity of the robot is satisfied by the given discrete segments since \(\textbf{p}=\textbf{e}_0\). Position continuity between segments are trivially satisfied by the given discrete segments, since the discrete path is position continuous by its definition. \(\square \)

6.3.2 Cost function

The cost function of the optimization problem has 3 terms: (i) energy usage, (ii) deviation from the discrete path, and (iii) preferred distance to objects.

We use the sum of integrated squared derivative magnitudes as a metric for energy usage (Line 16, Algortihm 3), similar to the works of Richter et al. (2013) and Hönig et al. (2018). Parameters \(\varvec{\lambda }= \{\lambda _j\}\) are the weights for integrated squared derivative magnitudes, where \(\lambda _j\) is the weight for \(j^{th}\) degree of derivative. The energy term \(\mathcal {J}_{energy}(\mathcal {P})\) is given as

$$\begin{aligned} \mathcal {J}_{energy}(\mathcal {P}) = \sum _{\lambda _j \in \varvec{\lambda }} \lambda _j \int _0^T\left\Vert \frac{d^j\textbf{f}(t)}{dt^j}\right\Vert _2^2dt. \end{aligned}$$

We use squared Euclidean distances between trajectory piece endpoints and discrete segment endpoints as a metric for deviation from the discrete path (Line 17, Algorithm 3). Remember that each Bézier curve piece i ends at its last control point \(\textbf{P}_{i, h}\). Parameters \(\varvec{\theta }=\{\theta _i\}\in \mathbb {R}^L\) are the weights for each segment endpoint. The deviation term \(\mathcal {J}_{dev}(\mathcal {P})\) is given as

$$\begin{aligned} \mathcal {J}_{dev}(\mathcal {P}) = \sum _{i\in \{1,\ldots ,L\}} \theta _i\left\Vert \textbf{P}_{i, h} - \textbf{e}_i\right\Vert _2^2. \end{aligned}$$

The last term of the cost function models the preferred distance to objects. We use this term to discourage robots from getting closer to other objects in the environment; this increases the numerical stability of the algorithm by driving robots away from tight states.

We shift each hyperplane in \(\Upsilon _1\) (i.e., buffered SVM hyperplanes constraining the first piece) by the preferred distance to objects, \(\tilde{p}\), to create the preferred hyperplanes \(\overline{\Upsilon }_1\) (Line 18, Algorithm 3). Since each robot replans with the period \(\delta t\), we add a cost term that drives the robot closer to the preferred hyperplanes at the replanning period (Line 19, Algorithm 3). We take the sum of squared distances between \(\textbf{f}(\delta t)\) and hyperplanes in \(\overline{\Upsilon }_1\):

$$\begin{aligned} \mathcal {J}_{pref}(\mathcal {P}) = \alpha \sum _{\mathcal {H}\in \overline{\Upsilon }_1} \left\Vert \mathcal {H}_{\textbf{n}}^\top \textbf{f}(\delta t) + \mathcal {H}_a\right\Vert _2^2 \end{aligned}$$

where \(\mathcal {H}_{\textbf{n}}\) is the normal and \(\mathcal {H}_a\) is the offset of hyperplane \(\mathcal {H}\), and \(\alpha \) is the weight of \(\mathcal {J}_{pref}\) term. Notice that this term is defined over the control points of first piece since \(\delta t\le s = T_1\), supporting the utilization of \(\Upsilon _1\) only.

The overall trajectory optimization problem is:

$$\begin{aligned} \min _{\mathcal {P}}\ {}&\mathcal {J}_{energy}(\mathcal {P}) + \mathcal {J}_{dev}(\mathcal {P}) + \mathcal {J}_{pref}(\mathcal {P}) \text { s.t}.\\&\mathcal {H}_{\textbf{n}}^\top \textbf{P}_{i, j} + \mathcal {H}_{a} \le 0 \ \forall i\in \{1, \ldots , L\}\\&\forall j\in \{0,\ldots ,h\} \\&\forall \mathcal {H}\in \Upsilon _i \cup \Upsilon _\mathcal {W}\\&\frac{d^j\textbf{f}(0)}{dt^j} = \frac{d^j\textbf{p}}{dt^j}\forall j\in \{0,\ldots ,c\} \\&\frac{d^j\textbf{f}_{i}(T_i)}{dt^j} = \frac{d^j\textbf{f}_{i+1}(0)}{dt^j}\ \forall i\in \{1,\ldots ,L-1\}\ \\&\forall j\in \{0,\ldots ,c\}. \end{aligned}$$

Notice that we formulate continuity and the safety of the trajectory using hard constraints. This ensures that the resulting trajectory is kinematically safe and continuous up to degree c if the optimization succeeds for all robots and planning is synchronized.

6.4 Temporal rescaling

At the temporal rescaling stage, we check whether the dynamic limits of the robot are violated. If the resulting trajectory is valid, i.e. derivative magnitudes of the trajectory are less than or equal to the maximum derivative magnitudes \(\gamma ^k\ \forall k\in \{1,\ldots ,K\}\), the trajectory is returned as the output of RLSS. If not, temporal rescaling is applied to the trajectory similar to Hönig et al. (2018) and Park et al. (2020) by increasing the durations of the pieces so that the trajectory is valid. We scale the durations of pieces by multiplying them with a constant parameter greater than 1 until the dynamic limits are obeyed.

We choose to enforce dynamic feasibility outside of the trajectory optimization problem as a post processing step because (i) the output of the trajectory optimization stage is often dynamically feasible, hence rescaling is rarely needed, and (ii) adding piece durations as variables to the optimization problem would make it a non quadratic program, potentially decreasing performance.

7 Evaluation

Here, using synchronized simulations, we first evaluate our algorithm’s performance when different parameters are used in Sect. 7.1. Second, we conduct an ablation study to show the effects of two important steps, namely the prepend operation of the discrete planning stage and the preferred distance cost term of the trajectory optimization stage, to the performance of the algorithm in Sect. 7.2. Third, we compare our algorithm to two state-of-the-art baseline planners in Sect. 7.3. Finally, we show our algorithm’s applicability to real robots by using it on quadrotors and differential drive robots in Sect. 7.4.

We conduct our simulations on a laptop computer with Intel i7-8565U @ 1.80GHz running Ubuntu 20.04. We implement our algorithm for a single core because of implementation simplicity and fairness to the baseline planners, which are not parallelized. The memory usage of each simulation is 30MB on average for our algorithm.

In synchronized simulations, we compute trajectories for each robot using the same snapshot of the environment, move robots perfectly according to computed trajectories for the re-planning period, and replan. The effects of planning iterations taking longer than the re-planning period are not modeled in the synchronized simulations. We show that all algorithms (RLSS and the baselines) can work in \({1}{Hz} - {10}{Hz}\) on the hardware we use, and assert that more powerful computers can be used to shorten planning times. In addition, parallelization of the A* search on GPUs is possible. For example, Zhou and Zeng (2015) show the possibility of 6–7 x speedup in A* search for pathfinding problems on GPUs. Moreover, parallelization of quadratic program solving is possible through (i) running multiple competing solvers in multiple cores and returning the answer from the first one that solves the problem, or (ii) parallelizing individual solvers. For instance, IBM ILOG CPLEX OptimizerFootnote 6 and Gurobi OptimizerFootnote 7 support running multiple competing solvers concurrently. IBM ILOG CPLEX Optimizer supports a parallel barrier optimizer,Footnote 8 which parallelizes a barrier algorithm for solving QPs. Gondzio and Grothey (2009) propose a parallel interior point method solver that exploits nested block structure of problems. Performance improvements of these approaches are problem dependent, and we do not investigate how much the performance of our trajectory optimization stage could be improved with such methods.

In all experiments, 32 robots are placed in a circle formation of radius 20m in 3D, and the task is to swap all robots to the antipodal points on the circle. The workspace \(\mathcal {W}\) is set to an axis aligned bounding box from \(\begin{bmatrix}{-25}{m}&{-25}{m}&{0}{m}\end{bmatrix}^\top \) to \(\begin{bmatrix} {25}{m}&{25}{m}&{5}{m}\end{bmatrix}^\top \). Robot collision shapes are modeled as axis aligned cubes with 0.2m edge lengths. The desired planning horizon \(\tau \) is set to 5s. The safety distance D of goal selection is set to 0.2m. Robots have velocity limit \(\gamma ^1 = {3.67}{\frac{m}{s}}\), and accelaration limit \(\gamma ^2 = {4.88}{\frac{m}{s^2}}\), which are chosen arbitrarily. The safety duration s is set to 0.11s and re-planning period \(\delta t\) is set to 0.1s. We set integrated derivative cost weights \(\lambda _1 = 2.0\) for velocity and \(\lambda _2 = 2.8\) for acceleration. We set endpoint cost weights to \(\theta _1 = 0\), \(\theta _2 = 150\), \(\theta _3 = 240\), \(\theta _i = 300\ \forall i \ge 4\). Setting \(\theta _1=0\) allows the optimization to stretch the first 0-length segment freely, and setting other \(\theta \)s incrementally increases the importance of tail segments. We set the preferred distance to objects to \(\tilde{p}={0.6}{m}\) and the preferred distance cost weight \(\alpha = 0.3\). We use the OcTree data structure from the octomap library (Hornung et al., 2013) to represent the environment. Each leaf-level occupied axis aligned box of the OcTree is used as a separate obstacle in all algorithms. OcTree allows fast axis aligned box queries which return obstacles that intersects with a given axis aligned box, an operation we use extensively in our implementation. For example, we use axis aligned box queries in discrete planning as broadphase collision checkers to find the obstacles that are close to the volume swept by the ego robot while traversing a given segment, and check collisions between the robot traversing the segment and only the obstacles returned from the query. Also, to generate robot-obstacle collision avoidance constraints, we execute axis aligned box queries around the segments to retrieve nearby obstacles. We generate BSVM constraints only against nearby obstacles that are no more than \(\tilde{o}\) away from the robot traversing the segment. Other popular approaches for environment representations include (i) Euclidean signed distance fields (ESDFs) (Oleynikova et al., 2017), which support fast distance queries to nearest obstacles, (ii) 3D circular buffers (Usenko et al., 2017), which aim to limit memory usage of maps and supports fast occupancy checks, and (iii) Gaussian mixture models (O’Meadhra et al., 2019), which continuously represent occupancy instead of discretizing the environment as the former approaches do. None of these representations are as suitable as OcTrees for RLSS since they do not allow fast querying of obstacles in the vicinity of segments. We use the IBM ILOG CPLEX OptimizerFootnote 9 to solve our optimization problems. In all experiments, robots that are closer than 0.25m to their goal positions are considered as goal reaching robots. If a robot that has not reached its goal does not change its position more than 1cm in the last 1s of the simulation, it is considered as a deadlocked robot. At any point of the simulation, if each robot is either at the goal or deadlocked, we conclude the simulation.

7.1 Effects of selected parameters

We evaluate the performance of our algorithm when 4 important parameters are changed: step size \(\sigma \) of the search grid, degree h of Bézier curves, obstacle check distance \(\tilde{o}\), and robot check distance \(\tilde{r}\). Step size \(\sigma \) of the search grid is the parameter that affects discrete planning performance most because it determines the amount of movement at each step during the \(A^*\) search. The degree h of Bézier curves is important in trajectory optimization because it determines the number of decision variables. The obstacle check distance \(\tilde{o}\) and robot check distance \(\tilde{r}\) determine the number of collision avoidance constraints in the optimization problem.

In all of the parameter evaluations, we use a random 3D forest environment with OcTree resolution 0.5m in which \({10}{\%}\) of the environment is occupied. There are 2332 leaf-level boxes in OcTree, translating to 2332 obstacles in total. We set the desired trajectory of each robot to the straight line segment that connects the robot’s start and goal positions. We set the duration of the segment to the length of the line segment divided by the maximum velocity of the robot. We enforce continuity up to velocity, hence set \(c=1\).

In our experiments, our algorithm does not result in any collisions or deadlocks.

We report average computation time per iteration (Fig. 6) and average navigation duration of robots (Fig. 7). Average navigation duration is the summed total navigation time for all robots divided by the number of robots.

Fig. 6
figure 6

Average computation time per stage are given when different parameters are used. Goal selection and temporal rescaling steps are given as“other”since they take a small amount time compared to other two stages. All experiments are done in a 3D random forest environment with \({10}{\%}\) occupancy

Fig. 7
figure 7

Average navigation duration of robots from their start positions to their goal positions are given when the selected parameters are changed. In all cases, average navigation duration is not affected by the changes in the selected parameters in the chosen ranges

7.1.1 Step size \(\sigma \) of discrete search

We evaluate our algorithm’s performance when step size \(\sigma \) is changed. We set \(\sigma \) to values between 0.25m to 1.5m with 0.25m increments. We set obstacle check distance \(\tilde{o}={1.0}{m}\), robot check distance \(\tilde{r}={2.0}{m}\), and degree of Bézier curves \(h=12\) in all cases, which are determined by premiliminary experiments and the results of the experiments in Sects. 7.1.27.1.3, and 7.1.4. The results are summarized in Figs. 6a and  7a.

As the step size gets smaller, discrete search takes more time; but the algorithm can still work in about 2Hz even when \(\sigma = {0.25}{m}\). The average navigation duration of robots are close to 22.5s in each case, suggesting the robustness of the algorithm to the changes in this parameter. At \(\sigma \ge 0.75\), the time discrete planning takes is less than \({6}{\%}\) of the time trajectory optimization takes.

We also run the algorithm with \(\sigma = {3}{m}, \sigma = {6}{m},\) and \(\sigma = {12}{m}\), which decreases the flexibility of the discrete search considerably. In all of those cases, discrete search results in fluctuations, and some robots get stuck in livelocks, in which they move between same set of positions without reaching to their goal positions.

7.1.2 Degree h of Bézier curves

Next, we evaluate our algorithm’s performance when the degree h of Bézier curves is changed. We set h to values in \(\{5,\ldots ,12\}\). We set step size \(\sigma ={0.77}{m}\), obstacle check distance \(\tilde{o}={1.0}{m}\), and robot check distance \(\tilde{r}={2.0}{m}\), which are determined by premiliminary experiments and the results of the experiments in Sects. 7.1.17.1.3, and 7.1.4. The results are summarized in Figs. 6b and  7b.

Even if the degree of the Bézier curves determine the number of decision variables of the trajectory optimization, the computation time increase of the trajectory optimization stage is not more than \({10}{\%}\) between degree 5 Bézier curves and degree 12 Bézier curves. Also, average navigation duration of robots are close to 22.5m in each case, suggesting the robustness of the algorithm to the changes in this parameter as well.

7.1.3 Obstacle check distance \(\tilde{o}\)

Next, we evaluate our algorithm’s performance when the obstacle check distance \(\tilde{o}\) is changed. We set \(\tilde{o}\) to values between 0.5m and 3m with 0.5m increments. Since the replanning period \(\delta t = {0.1}{s}\), and maximum velocity \(\gamma ^1 = 3.67\frac{m}{s}\), the maximum amount of distance that can be traversed by a robot until the next planning iteration is 0.367m. The obstacle check distance must be more than this value for safety. We set step size \(\sigma ={0.77}{m}\), robot check distance \(\tilde{r}={2.0}{m}\), and degree of Bézier curves \(h=12\), which are determined by premiliminary experiments and the results of the experiments in Sects. 7.1.17.1.2, and 7.1.4. The results are summarized in Figs. 6c and  7c.

The obstacle check distance is the most important parameter that determines the speed of trajectory optimization, and hence the planning pipeline. As \(\tilde{o}\) increases, the number of SVM computations and the number of constraints in the optimization problem increases, which results in increased computation time. Average navigation durations of the robots are close to 22.5m in all cases, suggesting the robustness of the algorithm to this parameter. We explain the reason of this robustness as follows. All obstacles are considered during discrete search and \(\tilde{o}\) determines the obstacles that are considered during trajectory optimization. Therefore, the path suggested by the discrete search is already very good, and obstacle avoidance behavior of trajectory optimization is only important when the discrete path is close to obstacles. In those cases, all obstacle check distances capture the obstacles in the vicinity of the path. Therefore, the quality of the planned trajectories does not increase as \(\tilde{o}\) increases.

7.1.4 Robot check distance \(\tilde{r}\)

Last, we evaluate our algorithm’s performance when the robot check distance \(\tilde{r}\) is changed. We set \(\tilde{r}\) to values between 1m and 3.5m with 0.5m increments. Robot check distance must be at least twice the amount of distance that can be traversed by the robot in one planning iteration, i.e. 0.734m, because two robots may be travelling towards each other with their maximum speed in the worst case. We set step size \(\sigma ={0.77}{m}\), obstacle check distance \(\tilde{o}={1.0}{m}\), and degree of Bézier curves \(h=12\), which are determined by premiliminary experiments and the results of the experiments in Sects. 7.1.17.1.2, and 7.1.3.

The speed of the algorithm is not affected by the robot check distance considerably, because there are 32 robots in the environment, and from the perspective of the ego robot, there are at most 31 constraints generated from other robots. Since there are more than 2000 obstacles in the environment, effects of the obstacle check distance are more drastic than robot check distance. Similar to other cases, average navigation duration of the robots is not affected by the choice of \(\tilde{r}\) because constraints generated by the robots far away do not actually constrain the trajectory of the ego robot since the ego robot cannot move faster than its maximum speed and hence the first piece of the trajectory is never affected by those constraints.

Overall, RLSS does not result in collisions or deadlocks when parameters are not set to extreme values. In addition, changes in parameters, outside of extreme ranges, do not result in significant changes in the average navigation durations. These suggest that RLSS does not need extensive parameter tuning.

7.2 Ablation study

We investigate the effects of (i) the prepend operation of the discrete planning stage (Line 4, Algorithm  2) and (ii) the preferred distance cost term \(\mathcal {J}_{pref}\) of the trajectory optimization stage to the performance of the algorithm. The prepend operation enables the kinematic feasibility of the generated optimization problem as shown in Remark 2. The preferred distance cost term increases the numerical stability of the algorithm by encouraging robots to create a gap between themselves and other objects.

We consider four versions our algorithm: RLSS, RLSS without the prepend operation (RLSS w/o prepend), RLSS without the preferred distance cost term (RLSS w/o pref. dist.) and RLSS with neither. We set step size \(\sigma ={0.77}{m}\), obstacle check distance \(\tilde{o}={1.0}{m}\), robot check distance \(\tilde{r}={2.0}{m}\), degree of Bézier curves \(h=12\), and \(c=1\) (continuity up to velocity). Robots navigate in 3D maze like environments. The desired trajectories are set to straight line segments connecting robot start positions to goal positions and the durations of the segments are set to the length of the line segments divided by the maximum speed of the robots. During simulation, robots continue using their existing plans when planning fails. Also, colliding robots continue navigating and are not removed from the experiment.

Table 2 The results of the ablation study

We generate ten 3D-maze like environments and list the average and standard deviation values for our metrics in Table 2. We report the failure rate of all algorithms in the form of the ratio of number of failures to the number of planning iterations (Fail Rate column in Table 2), the number of robots that are involved in at least one collision during navigation (# Coll. column in Table 2) and the average navigation duration of all robots (Avg. Nav. Dur. column in Table 2).

The failure rate of RLSS is \(0.01\%\) on average. RLSS w/o pref. dist. fails \(0.06\%\) of the time. RLSS w/o prepend fails \(12.73\%\) of the time. This is drastically more than RLSS w/o pref. dist because our prepend operation ensures the kinematic feasibility of the optimization problem while our preferred distance cost term tackles numerical instabilities only. RLSS with neither results in a failure rate of \(10.62\%\). Interestingly, RLSS with the preferred distance cost term but without the prepend operation (RLSS w/o prepend) results in a higher failure rate than RLSS with neither. We do not investigate the root cause of this since failure rates in both cases are a lot higher than of RLSS.

The effects of failures are seen in the next two metrics. RLSS results in no collisions. The number of colliding robots increase to 0.67 in RLSS w/o pref. dist, 6.78 in RLSS w/o prepend and 8.67 in RLSS with neither on average. The average navigation duration of robots is lowest in RLSS. It increases by \(3.65\%\) in RLSS w/o pref. dis, \(54.66\%\) in RLSS w/o prepend and \(87.03\%\) in RLSS with neither compared to RLSS.

These results show that the prepend operation is more important than the preferred distance cost term for the success of the algorithm. Nevertheless, RLSS needs both to be safe and effective.

7.3 Comparisons with baseline planners

We compare the performance of our planner to two baseline planners that do not require communication in 3D experiments. We set step size \(\sigma ={0.77}{m}\), obstacle check distance \(\tilde{o}={1.0}{m}\), robot check distance \(\tilde{r}={2.0}{m}\), and degree of Bézier curves \(h=12\) in RLSS in all cases.

7.3.1 Extended buffered Voronoi cell (eBVC) planner

The first baseling planner is a MPC-style planner based on buffered Voronoi cells introduced by Zhou et al. (2017), which we call BVC. In the BVC approach, each robot computes its Voronoi cell within the Voronoi tesselation of the environment. This is done by using the position information of other robots. Robots buffer their Voronoi cells to account for their collision shapes and plan their trajectories within their corresponding buffered Voronoi cells. Similar to RLSS, BVC does not require any communication between robots, requires perfect sensing of robot positions in the environment, and the resulting trajectories are safe only when planning is synchronized between robots. They also require that each robot stays within its buffered Voronoi cell until the next planning iterations. Unlike RLSS, BVC is based on buffered Voronoi cells and it cannot work with arbitrary convex objects. Instead, robots are modeled as hyperspheres in BVC.

The original formulation presented in the BVC article only allows position state for the robots, which cannot model a rich set of dynamics, including double, triple, or higher order integrators. We extend their formulation to all discrete linear time invariant systems with position output. We define the systems with three matrices \(\textbf{A}, \textbf{B}, \textbf{C}\). Since the output of the system is the position of the robot, it cannot depend on the current input, hence \(\textbf{D}= \textbf{0}\). This allows us to formulate the problem for a richer set of dynamics and have constraints on higher order derivatives than robot velocity.

BVC as published also does not consider obstacles in the environment. We extend their formulation to static obstacles by modeling obstacles as robots. Since obstacles are static, they stay within their buffered Voronoi cells at all times. Since we model obstacles as robots, extended BVC does not require the ability of distinguishing robots from obstacles.

Our extended BVC formulation, which we call eBVC, is as follows:

$$\begin{aligned}&\min _{\textbf{u}_0, \ldots , \textbf{u}_{M-1}} \sum _{i=1}^M \lambda _i \left\Vert \textbf{p}_i- \textbf{d}(\tilde{T} + M\Delta t)\right\Vert _2^2+ \sum _{i=0}^{M-1}\theta _i \left\Vert \textbf{u}_i\right\Vert _2^2 s.t.\\&\ \ \textbf{x}_{i+1} = \textbf{A}\textbf{x}_{i} + \textbf{B}\textbf{u}_{i}\quad \quad {\forall i\in \{0,\ldots ,M-1\}}\\&\ \ \textbf{p}_{i} = \textbf{C}\textbf{x}_{i}\quad \quad {\forall i \in \{0,\ldots ,M\}}\\&\ \ \textbf{p}_{i} \in \mathcal {V}\quad \quad {\forall i \in \{0,\ldots ,M\}}\\&\ \ \textbf{p}_i \in \mathcal {W}\quad \quad {\forall i \in \{0,\ldots ,M\}}\\&\ \ \textbf{u}_{min} \preceq \textbf{u}_i \preceq \textbf{u}_{max} \quad \quad {\forall i \in \{0, \ldots , M-1\}}\\&\ \ \textbf{x}_{min} \preceq \textbf{x}_i \preceq \textbf{x}_{max} \quad \quad {\forall i \in \{0, \ldots , M\}} \end{aligned}$$

where \(\tilde{T}\) is the current timestamp, \(\textbf{d}(t)\) is the desired trajectory for the robot, \(\Delta t\) is the discretization timestep of the system, M is the number of steps to plan for, \(\textbf{u}_i\) is the input to apply from timestep i to \(i+1\), \(\textbf{x}_i\) is the state at timestep i, \(\textbf{p}_i\) is the position at timestep i, \(\mathcal {V}\) is the buffered Voronoi cell of the robot, \(\textbf{u}_{min}\) and \(\textbf{u}_{max}\) are the limits for the inputs, \(\textbf{x}_{min}\) and \(\textbf{x}_{max}\) are the limits for the states (which can be used to bound velocity in a double integrator system, or velocity and acceleration in a triple integrator system for example), and \(\mathcal {W}\) is the workspace of the robot. \(M\Delta t\) is the planning horizon. A robot plans toward the position \(\textbf{d}(\tilde{T} + M\Delta t)\), which is the position of the robot after the planning horizon if it could follow the desired trajectory perfectly. \(\textbf{x}_0\) is the current state of the robot. The first term of the cost function penalizes deviation from the goal position for the final and each intermediate position with different weights \(\lambda _i\). The second term of the cost function is the input cost that penalizes input magnitudes with different weights \(\theta _i\). We apply the first input of the solution for duration \(\Delta t\), and replan at the next timestep.

We use our own implementation of eBVC as explained above during the comparisons.

Table 3 The results of the comparisons of RLSS, eBVC, and RSFC are summarized

7.3.2 Relative safe flight corridor (RSFC) planner

The second planner we compare against is presented by Park and Kim (2021), in which piecewise Bézier curves are computed, executed for a short duration, and replanning is done at the next iteration similar to our work. It utilizes the fact that the difference of two Bézier curves is another Bézier curve by constraining these relative Bézier curves to be inside safe regions (relative safe flight corridors, or RSFCs) defined according to robot collision shapes. We call this algorithm RSFC for short. RSFC does not require any communication between robots. It utilizes both positions and velocities of other objects in the environment, hence requires more sensing information than our algorithm. Velocities are used to predict the trajectories of other robots as piecewise Bézier curves. While it can handle dynamic obstacles as well, we use it in static environments in our comparisons, since RLSS does not handle dynamic obstacles explicitly.

We use the authors’ implementation of RSFC during our comparisons.

7.3.3 Experiments and results

We compare RLSS against eBVC and RSFC in 10 different experiments differing in required degree of continuity, desired trajectories, and map of the environment. In all experiments, 32 robots are placed in a circle formation with radius 20m in 3D. The task is to swap the positions of robots to the antipodal points on the circle. The results of the experiments are summarized in Table 3.

Fig. 8
figure 8

Desired trajectories and the used forest map of experiment 4 is given in (a). Same forest map is used in each forest experiment. Desired trajectories and the used maze map of experiment 7 is given in (b). Same maze map is used in each maze experiment. c shows the executed trajectories of robots running RLSS from top in experiment 4. d shows the executed trajectories of robots running RLSS from top in experiment 7

There are 3 maps we use: empty, forest (Fig. 8a), and maze (Fig. 8b), listed in the map column of Table 3. In the empty map, there are no obstacles in the environment. The forest map is a random forest with \({10}{\%}\) occupancy; it has a radius of 15m and each tree is a cylinder with radius 0.5m. The maze map is a maze-like environment with choke regions.

We compute the desired trajectories of the robots by running a single-agent shortest path using the discrete planning stage of RLSS. We run single-agent shortest path on a prior map, which is set to either a full map of the environment or to an empty map; this is listed in the prior map column of Table 3. When the prior map is empty, single-agent shortest paths are straight line segments connecting robot start positions to robot goal positions.

We set two different continuity requirements: velocity and acceleration, listed in the continuity column of Table 3. When velocity continuity is required, the system of eBVC is a double integrator with position output. When acceleration continuity is required, the system of eBVC is a triple integrator with position output.

We compare RSFC to RLSS and eBVC only with acceleration continuity requirement because the authors’ implementation hard codes the acceleration continuity requirement, while in theory it can work with any degree. Also, we compare RSFC only in the case of an empty prior map in order not to change RSFC’s source code, while in theory it can be guided with arbitrary trajectories.

We plan for 5s long trajectories in every 0.1s with all algorithms. In eBVC, we plan for \(M=50\) steps with discretization timestep \(\Delta t= {0.1}{s}\). We set state and input upper and lower bounds in eBVC in order to obey the dynamic limits of the robots. We set distance to goal weights \(\lambda _1=120\), \(\lambda _i = 20\ \forall i\ge 2\) in eBVC, putting more importance on the position of the robot after 1 timestep. We set \(\theta _i=1\ \forall i\) in eBVC.

Both eBVC and RSFC require spherical obstacles. We use the smallest spheres containing each leaf-level box of the OcTree structure as obstacles in eBVC and RSFC. Similar to RLSS, we use robot and obstacle check distance to limit the number of obstacles considered at each iteration. We set both obstacle and robot check distance \(\tilde{o} = \tilde{r} = {2.0}{m}\) in eBVC, and set \(\tilde{o} = \tilde{r} = {5.0}{m}\) in RSFC, since smaller values in RSFC result in a high number of collisions and higher values for the parameters do not improve the success of the algorithms. Robots are modeled as spheres in eBVC and RSFC as well. We set robot shapes to spheres with radius 0.173m, which are the smallest spheres containing the actual robot shapes. We count collisions only when contained leaf-level OcTree boxes and contained robot shapes intersect. This gives both eBVC and RSFC buffer zones for collisions.

We report the number of deadlocking robots (# Deadlocks column in Table 3), number of robots that are involved in at least one collision (# Coll. Robots column in Table 3), collision duration of robots that are involved in collisions averaged over robots (Avg. Collision Dur. column in Table 3), average navigation duration of non-deadlocking robots from their start positions to goal positions (Avg. Nav. Dur. column in Table 3), and computation time per iteration averaged over each planning iteration of each robot (Avg. Comp. Time column in Table 3). We continue the navigation of colliding robots and do not remove them from the experiment.

Executed trajectories of robots running RLSS are shown in Fig. 8c for experiment 4, and in Fig. 8d for experiment 7 as examples.

RLSS does not result in any deadlocks or collisions in all cases. eBVC has a significant number of deadlocks and RSFC results in collisions in experiments with obstacles.

When there are no obstacles in the environment, e.g., in experiments 1 and 2, no algorithm results in deadlocks or collisions. When velocity continuity is required, e.g., in experiment 1, the average navigation duration of robots running eBVC is \({18}{\%}\) lower than those that run RLSS. Both eBVC and RLSS run close to 9Hz on average. When acceleration continuity is required, e.g., in experiment 2, the average navigation duration of robots running RSFC is \({10}{\%}\) lower than those that run RLSS; and the average navigation duration for eBVC is \({7}{\%}\) lower than that for RLSS. RLSS runs at about 9Hz on average, eBVC runs close to 6Hz on average, and RSFC runs close to 21Hz on average. When there are no obstacles in the environment, the discrete search of RLSS results in unnecessary avoidance movements, which is the main reason for average navigation duration differences.

When there are obstacles in the environment, performance of both eBVC and RSFC degrades in terms of the number of deadlocks and collisions.

In experiment 3, even if the full prior map of the environment is given during initial discrete search with only velocity continuity, 10 out of 32 robots deadlock, and 7 out of the remaining 22 get involved in at least one collision when eBVC is used. When acceleration continuity is required (experiment 4), 8 robots deadlock and 13 other robots get involved in collisions, resulting in only 11 robot reaching their goal volumes without collisions in eBVC. RLSS both works faster than eBVC and results in no deadlocks or collisions in those cases.

When the prior map is not known in the forest environment (experiments 5 and 6), eBVC results in a lot of deadlocks and collisions. All robots in experiment 5 either deadlock or collide when eBVC is used. In experiment 6, RSFC does a lot better than eBVC. RSFC results in no deadlocks while eBVC results in 11 deadlocks. 15 out of remaining 21 robots get involved in at least one collision when eBVC is used with an average collision duration of 0.71s. 6 out of 32 robots running RSFC collide at least once with average collision duration of 0.42s. RLSS does not result in any deadlocks or collisions.

When the environment is a complicated maze, the performance of both eBVC and RSFC degrades more. With full prior map and velocity continuity (experiment 7), 21 out of 32 robots deadlock, 9 out of remaining 11 are involved in collisions, leaving only 2 reaching to their goal volumes without an incident when eBVC is used. With full prior map and acceleration continuity (experiment 8), 17 out of 32 robots deadlock, 13 out of remaining 15 are involved in collisions, again leaving only 2 that reach their goal volumes without incident when eBVC is used. RLSS does not result in any deadlocks or collisions in these scenarios. When the prior map is empty, i.e. the desired trajectories are straight line segments, the performance of eBVC degrades even more. All robots deadlock in the case with velocity continuity (experiment 9), while 30 out of 32 robots deadlock and the rest get involved in collisions in the case with acceleration continuity (experiment 10). No robot running RSFC deadlocks but 22 out of 32 get involved in collisions at least once with 0.34s average collision duration in experiment 10 with acceleration continuity. RLSS does not result in any deadlocks or collisions in those cases.

Overall, robots running RLSS have higher navigation durations than those that use eBVC or RSFC in all experiments. While the higher navigation duration is not an important metric when other algorithms cause deadlocks or collisions, it is an important metric when they do not (experiments 1 and 2 with no obstacles in particular). eBVC does not have an integrated discrete planner, which is the reason for its good performance in terms of average navigation durations. When robots are close to each other, RLSS uses the free space less effectively since discrete planning has a step size 0.77m. eBVC does not rely on discrete planning and hence avoids other robots by executing small direction changes. RSFC utilizes velocities of other robots on top of positions, which allows it to estimate the intents of robots more effectively, resulting in a better usage of the free space, and hence results in better navigation durations on average. Since RLSS does not utilize communication and uses only positions of other robots, it cannot deduce the intents of other robots. This results in fluctuations of plans between planning iterations, which increases the average navigation durations of robots. Fluctuations of plans increase when the environment is dense as seen in the supplemental video.Footnote 10 If the environment becomes overly constraining, e.g. tens of robots trying to pass through a narrow tube, these fluctuations may turn into livelocks.

However, when obstacles are introduced in the environment, the performance of RLSS is better than other algorithms. eBVC suffers greatly from deadlocks. RSFC does not result in deadlocks but results in collisions, even while using more information than RLSS (velocity and position instead of position only).

A note about the statistical significance of the results. In each experiment, we run each algorithm on single randomly generated forest-like or maze-like environment. To show that the results are consistent for environment types, we generate 10 forest-like environments with the same parameters for experiment 3 and run RLSS and eBVC. RLSS does not result in deadlocks or collisions in any of the cases. The average navigation duration of robots averaged over 10 environments is 19.15s with standard deviation 0.24s for eBVC and 23.59s with standard deviation 0.25s for RLSS. The ratio between average navigation durations when eBVC or RLSS is used is consistent with the reported values given in Table 3 for experiment 3.

7.4 Physical robots

Fig. 9
figure 9

Physical robot experiments using Turtlebot2s, Turtlebot3s, iRobot Create2s, and Crazyflie 2.0 s. RLSS works in real-time under external disturbances

We implement and run RLSS on physical robots using iRobot Create2s, Turtlebot2s and Turtlebot 3 s in 2D; and Crazyflie 2.0 s in 3D. We use a VICON motion tracking system for localization. Robots do not sense, but receive the position of others using the VICON system. iRobot Create2s, Turtlebot3s, and Turtlebot2s are equipped with ODROID XU4 and ODROID C1+ single board computers running ROS Kinetic on the Ubuntu 16.04 operating system. In all cases the algorithm is run on a centralized base station computer using separate processes for each robot. Therefore, unlike simulations, planning is not synchronized between robots on real robot implementations. RLSS does not result in any deadlocks in collisions in these asynchronized deployments as well. Commands are sent to 2D robots over a WiFi network, and to Crazyflie 2.0 s directly over their custom radio.

We conduct external disturbance experiments with 2 iRobot Create2s, 3 Turtlebot3s, and 2 Turtlebot2s (Fig. 9a). A human changes the positions of some robots by moving them arbitrarily during execution several times. In all cases, robots replan in real-time and avoid each other successfully.

We demonstrate the algorithm in 3D using 6 Crazyflie 2.0 s. We conduct an experiment without obstacles in which Crazyflies swap positions with straight lines as desired trajectories (Fig. 9b). In another experiment, we show that Crazyflies can navigate through an environment with obstacles (9c, d). In each case, we externally disturb the Crazyflies and show that they can replan in real-time.

The recordings for our physical robot experiments are included in the supplemental video.Footnote 11

8 Conclusion

In this article, we present RLSS, a real-time decentralized long horizon trajectory planning algorithm for the navigation of multiple robots in shared environments with obstacles that provides guarantees on collision avoidance if the resulting problems are feasible. The generated optimization problem to compute a smooth trajectory is convex and kinematically feasible. It does not require any communication between robots, requires only position sensing of obstacles and robots in the environment, and robots to be able to distinguish other robots from obstacles. With its comparatively minimal sensing requirements and no reliance on communication, it presents a new baseline for algorithms that require communication and sensing/prediction of higher order state components of other robots. The algorithm considers the dynamic limits of the robots explicitly and enforces safety using hard constraints.

We show in synchronized simulation that RLSS performs better than two state-of-the-art planning algorithms (eBVC and RSFC), one of which requires velocity sensing on top of position sensing, in environments with obstacles in terms of number of deadlocks and number of colliding robots. In our experiments, RLSS does not result in any deadlocks or collisions, while eBVC suffers from deadlocks and RSFC results in collisions (while RLSS provides theoretical guarantees on collision avoidance when it succeeds, it does not provide theoretical guarantees on deadlock avoidance). When there are no obstacles in the environment, RSFC and eBVC outperform RLSS in terms of average navigation duration by 7–20%.

In future work, we would like to extend our planner to asynchronously planning robots where each robot starts and finishes planning at different unknown timestamps by utilizing communication. Also, we would like to incorporate the noise in the sensing systems within our algorithm in order to solve the cases with imperfect sensing in a principled way.