Autonomous Robots

, Volume 37, Issue 1, pp 1–25

Online decentralized information gathering with spatial–temporal constraints


    • Australian Centre for Field Robotics (ACFR)The University of Sydney
  • Robert Fitch
    • Australian Centre for Field Robotics (ACFR)The University of Sydney
  • Salah Sukkarieh
    • Australian Centre for Field Robotics (ACFR)The University of Sydney

DOI: 10.1007/s10514-013-9369-5

Cite this article as:
Gan, S.K., Fitch, R. & Sukkarieh, S. Auton Robot (2014) 37: 1. doi:10.1007/s10514-013-9369-5


We are interested in coordinating a team of autonomous mobile sensor agents in performing a cooperative information gathering task while satisfying mission-critical spatial–temporal constraints. In particular, we present a novel set of constraint formulations that address inter-agent collisions, collisions with static obstacles, network connectivity maintenance, and temporal-coverage in a resource-efficient manner. These constraints are considered in the context of the target search problem, where the team plans trajectories that maximize the probability of target detection. We model constraints continuously along the agents’ trajectories and integrate these constraint models into decentralized team planning using a computationally efficient solution method based on the Lagrangian formulation and decentralized optimization. We validate our approach in simulation with five UAVs performing search, and through hardware experiments with four indoor mobile robots. Our results demonstrate team planning with spatial–temporal constraints that preserves the performance of unconstrained information gathering and is feasible to implement with reasonable computational and communication resources.


Team planningDecentralized optimizationInformation gatheringSpatial–temporal constraintsCollision avoidanceNetwork connectivity Temporal-coverage

1 Introduction

Coordinated teams of mobile sensor agents are useful in performing information gathering tasks. Example application areas include atmospheric mapping (Kovacina et al. 2002), sampling of dynamic fields (Antonelli et al. 2012), area coverage (Renzaglia et al. 2012), target searching (Hollinger et al. 2009) and target tracking (Frew et al. 2008). In these tasks, sensor mobility provides flexibility and adaptability in dynamically changing environments, potentially yielding a higher rate of information gain. However, teams of agents are often required to move in close proximity to each other in a shared workspace, which may contain obstacles. Agents may also need to communicate with each other for decentralized data fusion and decentralized decision making. Therefore, it is important to develop coordinated team motion planning strategies that satisfy general mission-critical constraints such as collision between agents, collision between agents and obstacles and network communication integrity. In this paper, we address the problem of multi-agent coordination for information gathering under such constraints.

Agents involved in a cooperative mission often operate in a common workspace. Self collisions must be taken into account during cooperative planning such that all planned reference trajectories are feasible. Mitigating the risk of collision is especially crucial for airborne systems, where collision is likely to result in catastrophic loss of the platform and possible damage to the environment. If team planning does not explicitly consider safety constraints, the optimized team decisions may not be executed as planned. An additional layer of real-time reactive approaches may be frequently activated. This may result in wasted computational power and communication effort during the team planning process, and may also induce uncertainty in the overall system behavior.

Additional constraints often appear in real-world scenarios, and neglecting these constraints can limit the applicability of multi-agent coordination strategies. Decentralized data fusion and decentralized decision making may require the team to maintain a certain communication topology. A planned trajectory that disregards team network integrity could potentially result in disconnected communication links that limit further information sharing and decentralized planning. There may exist further spatial constraints such as obstacles and prohibited regions of entry on the field, or temporal-coverage constraints where agents must pass by a ground station or a user to report their observations at arbitrary time intervals.

A common approach to planning for information gathering tasks in continuous domains is to formulate the problem as the optimization of an information-theoretic objective function over a finite future action horizon. Integrating constraints into this optimization problem is challenging for several reasons. Collision and network connectivity constraints are continuous along the agent’s action horizon. Constraints may also vary with time due to moving obstacles. Furthermore, constraints may not be fixed during planning. For example, when considering agent-agent collisions, constraints are coupled during optimization. Finding a feasible trajectory for one agent depends on the trajectories chosen by other agents in the team.

In this paper, we present a novel set of continuous spatial–temporal constraints for efficient online decision making in decentralized information gathering problems. We show how these continuous constraints can be converted to discrete form while preserving continuity guarantees, and derive analytical gradients. Constraints are integrated into a unified optimization framework using an augmented Lagrangian formulation and solved using asynchronous decentralized gradient-descent. Planning is then performed using a multi-step receding horizon approach. Because the constraints are represented discretely and have closed-form gradients, decision making can be performed efficiently and in an online manner while ensuring continuous constraint satisfaction along the agents’ trajectories.

Using this formulation, we first consider the case of inter-agent collision constraints. We present our constraint derivation and online team planning algorithm along with complexity analysis. We demonstrate the algorithm in the context of coordinated search, where agents seek to maximize the probability of target detection, and present simulation results from 200 randomized trials with a team of five UAVs where we compare our algorithm to the search-only and collision-avoidance-only cases. These simulation results complement the complexity analysis by showing empirically that our algorithm finds feasible solutions within a reasonable time horizon, and also demonstrate that our algorithm preserves the information gathering performance of the unconstrained case while minimizing the chance of collisions.

To further validate computational and communication efficiency, we demonstrate our algorithm using a team of four non-holonomic mobile robots in an indoor environment. The robots are equipped with on-board processing and communication capabilities. Because our work concentrates on optimizing motion, sensors are simulated using a probabilistic sensor model. Each robot travels at near-constant velocity and performs continuous planning online. No centralized processing is involved. These experimental results verify that our algorithm is feasible with respect to realistic computational and communication resources available on modest robot platforms.

We then consider three additional constraints for coordinated information gathering: (1) network integrity constraints, (2) stationary obstacle collision constraints, and (3) temporal-coverage constraints where timed objectives cannot be met within the planning horizon. These cases cover important practical scenarios that can arise in information gathering tasks. Network integrity constraints limit the maximum inter-agent distance, which is useful in modeling limited communication range. Stationary collision constraints force agents to avoid known (stationary) obstacles. Temporal-coverage constraints allow agents to plan long-term patrolling behavior that extends far beyond the agents’ action horizon. Again, we derive explicit gradient models for each case and present simulation results in the context of coordinated search with a team of simulated UAVs.

A preliminary version of this work appears in (Gan et al. 2012). Here the work is extended to include experimental results with real robots, the three additional constraint cases, and associated experimental results in simulation.

The paper is organized as follows. Related work is discussed in Sect. 2. The team information gathering problem is introduced in Sect. 3. Inter-agent collision constraints are presented in Sect. 4 and integrated into online team planning in Sect. 5, with experimental results presented in Sect. 6. Additional spatial–temporal constraints are presented in Sect. 7, and Sect. 8 concludes the paper. Details of gradient derivations are provided in three appendices.

2 Related work

Research that studies decentralized multi-agent team negotiation for information gathering has focused primarily on coordination strategies in the absence of spatial–temporal constraints. Representative work includes environmental monitoring (Casbeer et al. 2006), area exploration and mapping (Grocholsky et al. 2003), perimeter surveillance (Kingston et al. 2008), target search (Tisdale et al. 2009), target surveillance and tracking (Tang and Ozguner 2005) and search and rescue (Furukawa et al. 2006).

The problem of considering continuous spatial–temporal constraints during decentralized trajectory planning for information gathering has not been comprehensively studied in the literature. In airborne systems for example, a team of UAVs are usually assigned to operate in non-conflicting altitude layers to avoid self collisions. This assumption is commonly found in both simulations (Ahmadzadeh et al. 2006; Leung et al. 2006) and real-life demonstrations  (Cole et al. 2008; How and King 2004). This technique, however, is not suitable for missions that require fixed-altitude operations, such as a mapping scenario, and certainly not available to ground-based robot teams.

Another common approach is to decouple the constraints from the decision-making process. In this approach, team planning is performed at a mission level without considering constraints. The planned trajectories, which may be non-feasible, are modified during closed-loop execution by reactive systems in separate lower-level controllers (Ayanian and Kumar 2010). Strategies for generating collision-free maneuvers include potential field methods (Tanner and Christodoulakis 2007; Zavlanos and Pappas 2007), geometric approaches (Bretl 2012), navigation functions (Dimarogonas et al. 2006), reachable sets (Gillula et al. 2011; Wu and How 2012) and barrier certificates (Barry et al. 2012). For network integrity, graph Laplacian methods are commonly used for maintaining network connectivity (Zavlanos and Pappas 2008; Sabattini et al. 2012). Reactive systems can often guarantee constraint satisfaction, especially under the assumption of holonomic robot kinematics and are important for online systems. However, reactive strategies usually deviate the robots from their nominal plan (Lapierre and Zapata 2012), either temporally, spatially, or combinations of both, which induces uncertainty in the overall system behavior. Our intention is to explicitly incorporate spatial–temporal constraints while simultaneously optimizing the team trajectory over some finite action horizon to achieve a common mission objective.

Approaches that do incorporate constraints during planning typically apply to holonomic agents. A decentralized gradient controller is derived in (Julian et al. 2012) for a team of holonomic agents surfing the maximum information gradient while implicitly satisfying network connectivity. We focus on more challenging scenarios involving non-holonomic agents with a minimum speed requirement, such as fixed-wing UAVs in a time-critical search and rescue mission. Another common approach is the assumption of discretized workspace. Static environmental constraints such as obstacles can often be easily represented in a discrete world and incorporated into planning (Bhattacharya et al. 2010; Durham et al. 2012). In contrast, we address the constrained team planning problem in a continuous workspace. This is important for many information gathering tasks since a discrete workspace potentially limits the ability to gather critical information.

Inter-agent constraints are commonly addressed with the approach of sequential planning (Schouwenaars et al. 2004; Hollinger and Singh 2010; Desaraju and How 2012). At any moment, only one agent plans its trajectory while fixing the trajectories of others. With this sequential planning approach, coupled constraints are usually easier to satisfy at the expense of longer planning time as the team size increases. Asynchronous algorithms, on the other hand, do not rely on an ordered sequence. Each agent is able to perform continuous iterations based on the latest communicated knowledge from others, fully utilizing the time allocated for team planning (Mathews et al. 2009).

Work that does incorporate constraints while maximizing some team objective falls under the category of decentralized constraint optimization. The classical approach, which we adopt, transforms a constrained problem into an unconstrained problem using an augmented Lagrangian formulation and then solves using a gradient-based algorithm (Bertsekas 1996). The work of (Hoffmann and Tomlin 2010) demonstrates the use of decentralized optimization in solving for a one-step action that maximizes information gain and satisfies inter-agent safety constraints. Instead of planning for a one-step action, a less myopic approach performs planning for a finite horizon trajectory (Chung and Burdick 2007). Trajectory planning can also be seen in the work of multi-agent team formation and navigation problems, where agents are constrained to remain in some formation or constrained to prevent colliding into each other while traversing some pre-planned team reference trajectory (Raffard et al. 2004; Kuwata and How 2006; Inalhan 2004). Constraints, however, are applied at discretized states along the action horizon without the consideration of the full continuous trajectory.

3 A cooperative team problem

In this section, we introduce the motion model and multi-step look-ahead action horizon setup for a single agent. We then formulate the cooperative team planning problem as an optimization problem over this action horizon.

3.1 Motion model and action vector

Consider a mobile agent involved in a decision-making process, as shown in Fig. 1.
Fig. 1

Parameterized action vector for N-step look-ahead actions. At the end of each action segment is the agent’s expected sensing state \(s\). Within each segment is a constant action variable \(u\). For the case of platform safety constraints, the end of each segment is also the agent’s safety region \(\Omega \), represented as a disc

At every decision-making instant \(k\), the agent plans for its local open loop action vector \(v^k= \left[ {u^k,u^{k+ 1}, \ldots , u^{k+N-1}} \right] ^{T}\)\(\in \mathcal {V}\) where \(N\) is the number of look-ahead actions and \(u^k\in \mathcal {U}\subset \mathbb {R}\) is an action variable. Actions are held constant within a segment. The time interval between an action segment \(u^k\) is defined as \(\Delta t^k= t^{k+1} - t^k\). The total action horizon is thus \(H= \sum \limits _{l =0}^{N-1} {\Delta t^{k+ l}}\).

The agent utilizes a non-holonomic motion planner. We use a common 2D constant velocity kinematic model to govern the agent’s state transition:
$$\begin{aligned} \dot{s} = \left[ \begin{array}{l} \dot{x}\\ \dot{y}\\ \dot{\psi }\\ \end{array}\right] = \left[ \begin{array}{l} V\cos \psi \\ V\sin \psi \\ u\\ \end{array} \right] , \end{aligned}$$
where \(s\in \mathcal {S}\) is the agent’s state, \(\left[ x,y\right] ^T = p\in \mathbb {R}^2\) is the agent’s position vector in the Euclidean plane, \(\psi \) is the heading angle, \(V\) is the nominal speed and \(u\) is the action variable. In this case, the action variable is the agent’s turn rate command \(\dot{\psi }\), bounded between \(\pm \dot{\psi }_{\text {max}}\).
The state at the end of each action segment \(k\) is governed by its state transition function \(F_{u} : \mathcal {S}^k\times \mathcal {U}^k\rightarrow \mathcal {S}^{k+1}\):
$$\begin{aligned} \!\!s^{k+1}&= s^{k} + \mathop \int \limits _{t^k}^{t^{k+1}} {\dot{s}} \,\mathrm {d}t\nonumber \\ \!\!&= s^{k} + \left[ \begin{array}{l} {\frac{V}{u^k} \left( \sin \left( \psi ^{k}+\Delta \psi ^k \right) -\sin {\psi ^{k}} \right) } \\ {\frac{V}{u^k} \left( -\cos \left( \psi ^{k}+\Delta \psi ^k \right) +\cos {\psi ^{k}} \right) }\\ {\Delta \psi ^k} \\ \end{array} \right] , \end{aligned}$$
where \(\Delta \psi ^k= u^k\Delta t^k\). Thus, given the initial state \(s^k\) and an action vector \(v^k\), the state at the end of each action segment along \(H\) can be explicitly obtained. Note that at the limit of \(u^k\rightarrow 0\), Eq. (2) becomes:
$$\begin{aligned} s^{k+1} = s^{k} + \left[ \begin{array}{l} {{V}{\Delta t^k} \cos {\psi ^{k}}}\\ {{V}{\Delta t^k} \sin {\psi ^{k}}}\\ {0}\\ \end{array} \right] . \end{aligned}$$

3.2 Team objective function

Consider now a team of \(Q\) agents involved in a cooperative information gathering task, where the joint index of agents is \(\mathcal {J}= \left[ 1,2,\ldots ,Q \right] \subset \mathbb {Z}^+\). The joint team action vector \(v_\mathcal {J}^k\) is defined as \(v_\mathcal {J}^k\mathop {=}\limits ^{\Delta } \left[ v_1^k,v_2^k, \ldots ,v_Q^k \right] \in \mathcal {V}_1 \times \mathcal {V}_2 \times \cdots \times \mathcal {V}_Q\mathop {=}\limits ^{\Delta } \mathcal {V}_\mathcal {J}\). The performance loss of a joint team action is captured by the team objective function:
$$\begin{aligned} J: \mathcal {S}_\mathcal {J}\times \mathcal {V}_\mathcal {J}\times \mathcal {E}\rightarrow \mathbb {R} , \end{aligned}$$
where \(\mathcal {E}\) is the state space of the environment where information is to be obtained. If \(s_\mathcal {J}^k\) is deterministic and the state of the environment is known at the start of the optimization problem, the only variable left during optimization is \(v_\mathcal {J}^k\).
The optimization objective is to solve for the optimal joint team action vector that minimizes the team objective function over the action horizon \(H\):
$$\begin{aligned} v_\mathcal {J}^{k*} = \mathop {\arg \min } \limits _{v_\mathcal {J}^k} J\left( v_\mathcal {J}^k \right) . \end{aligned}$$

4 Inter-agent safety constraints

In this section, we show how inter-agent safety constraints can be modeled and integrated into the team objective function in Eq. (5). This constraint can be expressed generally as a mapping \(g: \mathcal {V}_\mathcal {J}\rightarrow \mathbb {R}\). The constrained team optimization problem is thus:
$$\begin{aligned} \mathop {\min } \limits _{v_\mathcal {J}^k} J\left( v_\mathcal {J}^k \right) \quad { \text {s.t.}} \quad g\left( v_\mathcal {J}^k \right) \le 0. \end{aligned}$$
In this paper, \(s.t.\) refers to subject to.
We now detail the process of modeling \(g\). Agent \(i\) at any time step \(t\) is assumed to have its own safety region \(\Omega _i^t\). A conservative constraint imposed by any pair of agents \(i\) and \(j\) can be defined such that two safety regions indexed to the same time \(t\) shall not overlap:
$$\begin{aligned} \Omega _i^m \cap \Omega _j^m = \emptyset \quad \forall \{i,j\} \in \mathcal {J}, i \ne j, m \in \left( t^k:t^N\right] . \end{aligned}$$
A common way of representing this safety region is to define it as a disc of radius \(r_i^t\) centered at \({ p}_i^t\), where \({ p}_i^t\) is the \(i\)th agent’s position vector at time \(t\). Agents are assumed to be homogeneous and their safety regions are constant during the mission; thus \(r_i^t=r_j^t=R\).
Equation (7) describes the constraint in the continuous time domain. We can equivalently represent it in the discrete time domain with modified safety regions and through worst-case analysis. Since the action horizon has been discretized into finite segments, we will take this advantage and impose the constraints at the end of each segment (see Fig. 1):
$$\begin{aligned} \Omega _i^m \cap \Omega _j^m = \emptyset \quad \forall \{i,j\} \in \mathcal {J}, i \ne j, m \in \left[ k+1:k+N\right] .\nonumber \\ \end{aligned}$$
For a non-holonomic platform traveling along a constant curvature segment, a lower bound can be obtained when we consider the segment’s maximum curvature. Figure 2 depicts the worst-case scenario for two agents undergoing a maximum rate of turn next to each other. Let \(R_{mc} = V/{\dot{\psi }_{max}}\) be the agent’s minimum turning radius and \(\Delta \psi _{mc} = {\dot{\psi }_{max}} \left( t^{k+1}-t^k \right) \) be the maximum change in heading angle in a segment, where subscript mc stands for maximum curvature. The distance lower bound can be obtained from geometry:
$$\begin{aligned} d_{mc,lower}&= \inf _{t \in \left[ t^k:t^{k+1} \right] } d_{ij}\left( t \right) \nonumber \\&= \sqrt{ (2r)^2 - {l_AB}^2 } -2 \left( R_{mc}-l_{CO} \right) , \end{aligned}$$
where \(l_{CO}= \sqrt{ {R_{mc}}^2 - \frac{1}{4} {l_{AB}}^2 },\) and \( l_{AB}= \sqrt{2 {R_{mc}}^2 \left( 1-\cos {\Delta \psi _{mc}} \right) }\). The radius \(r\) of the modified disc can be solved analytically by setting \(d_{mc,lower} = 2R\):
$$\begin{aligned} r = \sqrt{ R^2 + 2 R_{mc} \left( R+R_{mc} \right) \left( 1-\cos {\frac{\Delta \psi _{mc}}{2}} \right) }. \end{aligned}$$
A pair of segments indexed to the same time step that have non-overlapping modified safety regions at the start and at the end of the segments will be collision-free in the worst-case scenario. This means that the agents will be continuously collision-free, satisfying Eq. (7) while they traverse along this segment pair. Note that Eq. (9) requires \(l_{{AB}} < 2r\). This prevents the two segments from intersecting when Eq. (8) is satisfied.
Fig. 2

Minimum shortest distance between two agents in a collision-free configuration. In this configuration, the pair of safety regions \(\Omega _i^k\) and \(\Omega _j^k\) and the pair \(\Omega _i^{k+1}\) and \(\Omega _j^{k+1}\) are marginally overlapping and thus collision-free. The theoretical distance lower bound can be obtained by considering the worst-case scenario with maximum turn rate

Having defined the size of the modified disc at the end of the action segments, a suitable function has to be designed that captures the constraint between two agents at a particular time step \(k\). This function can be any distance function that is continuous, differentiable and satisfying the condition in Eq. (8). Here, we design this function based on the ratio of area of overlapping discs, which has properties that makes it a suitable candidate for our problem. Let \(d_{ij}^k= ||{p}_i^k-{p}_j^k||\) be the Euclidean distance between the position vector of two discs at time step \(k\). We first normalize it with respect to the radii of the modified discs
$$\begin{aligned} \hat{d}_{ij}^k = \frac{d_{ij}^k}{r_i^k+r_j^k} = \frac{d_{ij}^k}{2r}\,, \end{aligned}$$
and define:
$$\begin{aligned} \bar{d}_{ij}^{k} = {\left\{ \begin{array}{ll} \hat{d}_{ij}^k &{} \text {if } \hat{d}_{ij}^k<1 \\ 1 &{} \text {if } \hat{d}_{ij}^k \ge 1 \, . \end{array}\right. } \end{aligned}$$
The constraint function which is defined as the ratio of overlapping areas is:
$$\begin{aligned} G_{ij}^k = \frac{ \mathcal {A} \left( \Omega _{i}^k \cap \Omega _{j}^k \right) }{\mathcal {A} \left( \Omega \right) } = \frac{2}{\pi }\left( \arccos {\bar{d}_{ij}^k} - \bar{d}_{ij}^k \sqrt{1-\bar{d}_{ij}^{k 2}} \right) ,\nonumber \\ \end{aligned}$$
where \(\mathcal {A(\cdot )}\) computes the area of a region. We use \(G\) instead of \(g\) to distinguish the discrete representation of the constraint. We can now represent Eq. (8) as an equality constraint:
$$\begin{aligned} G_{ij}^m=0 \quad \forall \, \{i,j\} \in \mathcal {J}, i \ne j, m \in \left[ k+1:k+N\right] . \end{aligned}$$
Its partial derivative with respect to state is:
$$\begin{aligned} \frac{\partial G_{ij}^k}{\partial s_{i}^k} = \frac{1}{2r} h\left( \bar{d}_{ij}^k \right) \frac{\partial {d}_{ij}^k}{\partial s_{i}^k} , \end{aligned}$$
where \(\frac{\partial d_{ij}}{\partial s_{i}} = \left[ {\frac{(p_i - p_j)^T}{d_{ij}}} , {0} \right] ^T\), and
$$\begin{aligned} h\left( \bar{d}_{ij} \right) = \frac{\partial G_{ij}}{\partial \bar{d}_{ij}} = -\frac{4}{\pi } \sqrt{1-\bar{d}_{ij}}. \end{aligned}$$
This partial derivative will be important in Sect. 5.4 for the computation of the explicit gradient during gradient-based optimization.
There are several remarkable properties of the constraint function Eq. (13). As illustrated in Fig. 3, the function value reduces smoothly and approaches zero as \(\bar{d}_{ij}^k\) increases. Moreover, the form of gradient of this function is relatively simple, which has an important computational implication in gradient-based optimization. This function is smooth and twice differentiable, implying that higher order gradient-based methods are applicable.
Fig. 3

a Constraint \(G_{ij}\) between two safety regions, and b its first derivative \(\frac{\partial G_{ij}}{\partial \bar{d_{ij}}}\) . This constraint function has a maximum value when two safety regions completely overlap and approaches zero as the percentage of overlapping decreases

5 Decentralized decision making

This section details our approach in solving the constrained team problem in its decentralized form. First, we discuss the required information to be exchanged between agents to achieve decentralized optimization. We then solve the constrained optimization problem in augmented Lagrangian form using decentralized gradient descent. Finally, we detail the process of implementing team decision making online.

5.1 Decentralization

After modeling and representing the inter-agent safety constraints in Eq. (14), we can rewrite the global constrained problem [Eq. (6)] in a simpler form as an Equality Constraint Problem (ECP):
$$\begin{aligned}&\!\!\!v_\mathcal {J}^{k*} = \mathop {\arg \min } \limits _{v_\mathcal {J}^k} J\left( v_\mathcal {J}^k \right) \nonumber \\&\!\!\!{\text {s.t.}}\, G_{ij}^m\left( v_\mathcal {J}^k \right) = 0 \,\, \forall \, \{i,j\} \in \mathcal {J}, i \ne j, m {\in } \left[ k\!+\!1:k\!+\!N\right] .\nonumber \\ \end{aligned}$$
For the purpose of enforcing decentralization, agent \(i\) only has access to its local action vector \(v_i^k\) while other information will be communicated from external agents in the form of impact (Mathews et al. 2009). Impact space is a pre-defined space common to every team member. An impact space in its fully decentralized form has the potential to abstract the decision variables, motion model and sensor model of an agent. This eliminates the need for each agent to store potentially different models of every other agent, resulting in a scalable system.
As our focus is on information gathering applications, we first define impact, \(\alpha \), to contain the sensor observation likelihood \(O\) to facilitate decentralized decision making for information gathering. For the case of safety constraints, we further augment the impact to contain spatial–temporal information about the safety region \(\Omega \) as \(\alpha \triangleq \left[ O,p,r,t \right] \). Impact communicated from agent \(j\) to agent \(i\) is denoted as \(\alpha _{ji}^k \triangleq [\alpha _j^{k+1:k+N}]\). It contains a list of future impacts of agent \(j\) along \(H\). The joint impact of all agents \(j \in \mathcal {J}, j \ne i\) is denoted as \(\alpha _{\mathcal {J}^{-i}i}^k\), where \(\mathcal {J}^{-i}\) denotes every other agent excluding agent \(i\). Every agent \(i\) will now solve for the distributed form of Eq. (17):
$$\begin{aligned}&v_i^{k*} = \mathop {\arg \min } \limits _{v_i^k} J_i\left( v_i^k, \alpha _{{\mathcal {J}^{-i}i}}^k \right) \nonumber \\&\text {s.t.} \, G_{ij}^m\left( {v_i^k, \alpha _{\mathcal {J}^{-i}i}^k} \right) \!=\!0 \,\, \forall \, j {\in } \mathcal {J}, i \!\ne \! j, m \in \left[ k\!+\!1:k\!+\!N\right] .\nonumber \\ \end{aligned}$$
\(J_i\) is a local copy of \(J_\mathcal {J}\) which agent \(i\) can access after incorporating delayed information communicated from others.

5.2 Augmented Lagrangian method

We solve Eq. (18) using the Lagrangian multiplier method (Bertsekas 1996). The above ECP problem is first transformed into an unconstrained problem by defining the augmented Lagrangian function as
$$\begin{aligned} L_i\left( v_i^k,\alpha _{\mathcal {J}^{-i}i}^k, {\lambda }_{i} \right) =&J_i\left( v_i^k,\alpha _{\mathcal {J}^{-i}i}^k \right) + \sum \limits _{\begin{array}{c} j = 1, \\ j \ne i \end{array}}^Q \sum \limits _{m=k+1}^{k+N} \lambda _{ij}^m G_{ij}^m \nonumber \\&+ \frac{1}{2}c \sum \limits _{\begin{array}{c} j = 1, \\ j \ne i \end{array}}^Q \sum \limits _{m=k+1}^{k+N} ||G_{ij}^m||^2\,, \end{aligned}$$
where \(\lambda \) and \(c\) are the multiplier and penalty parameters respectively. Note that at this moment there is one multiplier \(\lambda _{ij}^m\) associated to each constraint \(G_{ij}^m\), resulting in \(Q \times N\) multipliers. Since the constraint model in Eq. (13) is always positive, we can collect the constraints for the same time step into one cumulative constraint:
$$\begin{aligned} G_i^m = \sum \limits _{\begin{array}{c} j = 1, \\ j \ne i \end{array}}^{Q} G_{ij}^m = 0 \quad \, \forall \quad m \in \left[ k+1:k+N\right] . \end{aligned}$$
This reduces the number of multipliers to \(N\), independent of the size of the system. Equation (19) now becomes:
$$\begin{aligned} L_i = J_i + \Delta J_i , \end{aligned}$$
where \(\Delta J_i = \lambda _{i}^T G_i + \frac{1}{2}c ||G_{i}||^2,\,\lambda =\left[ \lambda ^{k+1}, \ldots ,\lambda ^{k+N} \right] ^T\) and \(G=\left[ G^{k+1},\ldots ,G^{k+N} \right] ^T\).
Instead of solving the ECP problem, every agent \(i\) now solves the augmented unconstrained problem:
$$\begin{aligned} v_i^{k*} = \mathop {\arg \min } \limits _{v_i^k} L_i\left( v_i^k,\alpha _{\mathcal {J}^{-i}i}^k, {\lambda }_{i} \right) . \end{aligned}$$

5.3 Local gradient-based refinement

One possible solution approach to solve Eq. (22) is through a gradient-based optimization algorithm. A standard first-order gradient update equation has the form:
$$\begin{aligned} v_{i}^{k} \leftarrow v_i^k- {\gamma _i} \frac{{\partial L_i}}{{\partial v_i^k}}^T, \end{aligned}$$
where \(0<\gamma _i<\Gamma _i\) is the update step size, \(\Gamma _i\) is the step size upper bound that captures communication delay and packet drop (Mathews et al. 2009), and \(\frac{{\partial L_i}}{{\partial v^k}}\) is the partial derivative of the augmented Lagrangian with respect to each action variables in the local action vector:
$$\begin{aligned} \frac{{\partial L_i}}{{\partial v_i^k}} = \left[ {\frac{{\partial L_i}}{{\partial u_i^k}}, \ldots , \frac{{\partial L_i}}{{\partial u_i^{k+ N- 1}}}} \right] . \end{aligned}$$
Important algorithm criteria include (Bertsekas 1996):
  1. 1.

    Objective function \(J\) is bounded from below;

  2. 2.

    Constraint function \(G\) is continuous and differentiable;

  3. 3.
    Multipliers \(\lambda _i\) is a bounded sequence. A common method in updating \(\lambda \) is:
    $$\begin{aligned} \lambda _{i} \leftarrow \lambda _i + c_i G_i ; \end{aligned}$$
  4. 4.

    Penalty parameter \(c_i\) is a series of positive increasing values for each iteration loop.

Each agent \(i\) implements decentralized gradient descent recursively and asynchronously based on the latest impact communicated from other agents. For every action vector update, \(v_{i}^{k}\) is converted to its corresponding impact \(\alpha _i^{k+1:k+N}\) and then is communicated to other team members. We show other constraint cases in Sect. 7 that follow a similar solution procedure. Equation (21) can incorporate additional constraint cases by adding the corresponding \(\Delta J\) term for that constraint with its own set of Lagrangian multipliers \(\lambda \) and penalty parameter \(c\).

Note that our algorithm does not guarantee solution existence nor global convergence. There are many possible situations such as deadlocks and saturations of UAV turn rate where feasible solutions do not exist in the local action subspace. However, when a feasible team solution is found, given that all agents perfectly execute their plan, our algorithm guarantees that the entire team is continuously collision-free along the planned action horizon (as shown in Sect. 4).

5.4 Explicit gradient model

The gradient \(\frac{{\partial L_i}}{{\partial v^k}}\) in Eq. (23) can be obtained explicitly through mathematical derivatives or implicitly through finite differencing approximation. The latter approach is usually more computationally expensive and may suffer numerical instability depending on the chosen perturbation step size. Here, we derive the analytical form of \(\frac{{\partial L_i}}{{\partial v^k}}\). This analytical gradient model is exact, which eliminates the sensitivity of perturbation step size in a finite differencing approach. More importantly, it is computationally efficient and thus serves an important role for an algorithm intended to operate online. This gradient is obtained as
$$\begin{aligned} \frac{\partial L_i^k}{\partial v_i^k} = \frac{\partial J_i^k}{\partial v_i^k} + \frac{\partial \Delta J_i^k}{\partial v_i^k}, \end{aligned}$$
where \(\frac{\partial \Delta J_i^k}{\partial v_i^k} = \lambda _i^T \frac{\partial G_i}{\partial v_i^k} + c G_i^T \frac{\partial G_i}{\partial v_i^k},\) and \(\frac{\partial G_i}{\partial v_i^k}\) is an \(N \times N\) lower triangular matrix with elements:
$$\begin{aligned} \frac{\partial G_i^m}{\partial u_i^n} = {\left\{ \begin{array}{ll} \frac{\partial G_i^m}{\partial s_i^m} \frac{\partial s_i^m}{\partial u_i^n} &{} \forall \quad m>n,\{m,n\} \in \left[ k+1:k+N\right] \\ 0 &{} \text {otherwise}. \end{array}\right. }\nonumber \\ \end{aligned}$$
Here, \(\frac{\partial s_i^m}{\partial u_i^n}\) is obtainable from the platform motion model derivative included in “Multistep motion model derivatives” section and \(\frac{\partial G_i^m}{\partial s_i^m}\) is from Eq. (15). The term \(\frac{\partial J_i^k}{\partial v_i^k}\) is mission dependent. For the information-theoretic search scenario we have chosen to demonstrate our results in Sects. 6 and 7, its analytical gradient \(\frac{dJ}{dv}\) has been derived in our previous work (Gan and Sukkarieh 2011) and is included in “Information-theoretic search derivatives” section for convenience.

5.5 Forward planning architecture

The decision making process of online systems often involves forward planning. This is especially critical for systems with non-zero velocity motion constraints, such as UAV systems. One possible solution to this problem is to plan for future actions while executing the previously planned actions. Algorithm 1 is our proposed decision making architecture for an online decentralized system.
A diagram of this process is depicted in Fig. 4. This type of approach requires an accurate motion model for the planner and minimal disturbances while the agent is executing its planned actions. Failure in any of these requirements will result in a bad prediction of the platform state at the next replanning time step in which the plan cannot be executed at the expected state and time instants.
Fig. 4

A fixed-time forward planning architecture. During each replanning step, the agent implements its latest control action and propagates this control forward in time to determine its expected state at the beginning of the next replanning step. While traversing the current reference path, the agent performs online decision making from this expected future state

5.6 Sampling-based initialization
At the end of each planning horizon, every agent applies its latest negotiated actions and repeats the planning process with a new initial action vector obtained using a sampling-based strategy, in our case, Rapidly-exploring Random Trees (RRT) (Yang et al. 2010); see Algo. 2. The selection of the initial action vector highly influences which local minima the gradient descent algorithm can achieve. The focus of this work is not on sampling-based planning. It is merely used for initializing the action vector in which RRT is sufficient for our needs. It has the ability to rapidly span the state space and search for a better region to explore prior to the optimization routine to yield potentially better performance. Note that action vector initialization is not limited to sampling-based methods and other approaches are possible.

5.7 Analysis

This section describes the communication and computation complexity of the decentralized constrained optimization algorithm. The variables involved are the size of the team  \(Q\) and the length of the lookahead action vector \(N\). We provide complexity analysis for a single iteration of the optimization algorithm. Multiple iterations are required for convergence, but in practice (for minimum velocity platforms) we typically bound the number of iterations by a constant. Therefore this analysis represents the work required for agents to choose their next action.

In a fully connected network, each agent has to send messages to every other agent in the team. Thus the communication complexity for each agent is \(\mathcal {O}\bigl (Q\bigr )\) and the overall team has communication complexity \(\mathcal {O}\bigl (Q^2\bigr )\).

The gradient of the local search objective function \(\frac{\partial J_i^k}{\partial v_i^k}\) is a chained partial derivative of the action vector which has computational complexity \(\mathcal {O}\bigl (N^2\bigr )\). On the other hand, the gradient of the local constraints \(\frac{\partial \Delta J_i^k}{\partial v_i^k}\) requires computation of the constraints with respect to each agent which has computational complexity \(\mathcal {O}\bigl (N Q\bigr )\). The total computational complexity of gradient evaluation is thus \(\mathcal {O}\bigl (N^2 + N Q\bigr )\).

6 Experiments with inter-agent safety constraints

This section demonstrates the behavior of our algorithm in handling inter-agent safety constraints while performing a cooperative information gathering task. We consider the scenario of five UAVs cooperatively searching for a target in a decentralized network, as shown in Fig. 5a. We compare our algorithm to the case of unconstrained search and to the case of collision-avoidance-only in simulation. We validate the online computational and communication aspects of our algorithm using a team of mobile robots performing search.
Fig. 5

Top view snapshots of the 240-s, five-UAV target search mission. The gray region at \(t=0\,s\) is a Gaussian-shaped initial target belief placed at the center of the search area. Safety radius here is \(R = 7.5\) m. ae Pure Search (PS). The inability of PS to handle inter-platform constraint causes a collision event in b; fj Pure Collision Avoidance (PCA). The lack of information gathering objective in PCA results in a large unexplored area after \(t=240\,s\); ko Search with Collision Avoidance (SCA) approach preserves the information gathering capability of PS and prevents inter-platform collision

6.1 Simulation experiments

6.1.1 Problem setup

In this cooperative target search scenario, the UAVs are assumed to operate at a constant altitude, which exposes them to potential self-collision threats. The search area is 200 m x 200 m. Initially, the target belief is a Gaussian-shaped distribution at the center of the search area. UAVs operate at an altitude 15 m above the search area at a constant speed of 5 m/s. A downward pointing sensor is attached to each UAV taking an observation every 2 s. Sensor observation is simulated using a commonly used distance model, where the no-detection likelihood is a function \(O\) of the distance between sensor and target positions (Wong et al. 2005):
$$\begin{aligned} O\left( \xi ,p \right) = P\left( z = \overline{D} | \xi , p \right) = 1 - P_{d_{max}} e^{-\sigma \left( \frac{d}{d_{max}} \right) ^2}. \end{aligned}$$
Here, \(z\) is the sensor observation about the target existence, \(\overline{D}\) refers to a no-detection event, \(\xi \) is the potential target position and \(d = ||\xi - p||\). The parameters \(P_{d_{max}}\), \(d_{max}\) and \(\sigma \) define the shape of the model. For all simulations in this section, \(P_{d_{max}}=1\), \(\sigma =2\) and \(d_{max}=30\) m.

Each UAV has an action vector of 10 s which is uniformly discretized into five segments. Each segment is thus 2 s in duration and holds a piecewise constant turn rate action limited to the range between \(\pm 30\,^{\circ }\)/s. The system employs a multi-channel broadcast communication network where UAVs are able to freely communicate with each other without interference. A communication frequency of 10 Hz is used with a 0.1 s message delay. Optimization frequency is 10 Hz and each UAV replans every 6 s, resulting in 60 optimization iterations per UAV per planning horizon.

The objective function \(J\) for the search mission is defined as the joint probability of target no-detection events. By assuming the sensor observations are independent and the target belief is stationary over the length of action horizon, the objective function from the perspective of UAV-\(i\) in an impact-based decentralized form can be written as
$$\begin{aligned}&J_i\left( v_i^k, \alpha _{\mathcal {J}^{-i}i}^k\right) \nonumber \\&\qquad = \mathop \int \limits _{\xi } P\left( z_{i}^{k+1:k+N}=\overline{D} | \xi ,p_{i}^{k+1:k+N} \right) {}^{i}\bar{b}_\xi ^k d\xi ,\quad \end{aligned}$$
where \(^{i}\bar{b}_\xi ^k\) is the modified target belief after fusing the expected sensor information contained in the communicated joint impact \(\alpha _{\mathcal {J}^{-i}i}^k\) through Bayes update:
$$\begin{aligned} {}^{i}\bar{b}_\xi ^k = \frac{1}{\eta } \prod \limits _{l=1}^{N} \prod \limits _{\begin{array}{c} j = 1 \\ j \ne i \end{array}}^Q p\left( z_{j}^{l} = \overline{D} | \xi , p_{j}^{l} \right) {}^{i}{b}_\xi ^k \end{aligned}$$
and \(\eta \) is the normalization constant.

Minimizing \(J\) equivalently maximizes the rate of cumulative probability of target detection \(P_D\), which increases the probability of finding the target in a shorter time frame (Wong et al. 2005).

6.1.2 Comparison of PS, PCA and SCA

Three mission objectives are to be compared: (1) Pure Search (PS), (2) Pure Collision Avoidance (PCA) and (3) Search under Collision Avoidance (SCA). This comparison is important as it reveals the advantages and effectiveness of our algorithm.

Figure 5 shows simulation snapshots of the three mission objectives. The main advantages of our SCA algorithm are highlighted in Fig. 6 where its effectiveness in information gathering and collision avoidance are compared to PS and PCA. As is depicted in Fig. 6a, PS favors actions with a higher rate of information compared to PCA, whose objective has no correlation to information gathering. This can also be observed in Fig. 5e, j, where after 240 s into the mission, the PCA approach is left with a relatively large unexplored region compared to PS.
Fig. 6

Comparison of the 240-s mission outcome of PS, PCA and SCA objective functions. Our SCA algorithm is able to maintain the superior rate of information gathering of PS (a) while having the properties of collision avoidance of PCA (b)

Without considering inter-UAV collision constraints, PS has resulted in several potential hazards during the mission. An instance is shown in Fig. 5b where UAVs are clustered in one region and colliding. Figure 6b shows that in nearly 36 % of the mission there exists at least one pair of UAVs below the safety distance threshold, which is unacceptable for a real-life mission. The SCA algorithm, however, is able to maintain the superior information gathering rate of PS and at the same time adopt the properties of PCA by successfully satisfying the inter-UAV safety requirement. Figure 6b shows for both PCA and SCA, the minimum distance \(d\) is kept above the mission specified safety distance \(2R\) of 15 m.

In this particular case, SCA even outperformed PS in achieving the 99 % probability threshold earlier. Some UAVs are forced to turn back earlier due to collision constraints while they are about to cluster around the top right corner, as in Fig. 5l. This has caused the UAVs to explore high information regions earlier. However, this behavior is a side effect of our algorithm for this particular information gathering scenario and it shall not be the main focus of our work. The main advantage is its ability in satisfying inter-agent constraints while maximizing information.

The behavior of the team negotiation process can be observed in Fig. 7. Within each planning horizon, the algorithm attempts to progressively minimize the team objective and eliminate collision constraints. At the end of the planning horizons, constraints are feasible. These data show empirically that the computational efficiency of the algorithm is sufficient to find feasible solutions within a reasonable time window in practice.
Fig. 7

Gradient descent team negotiation behavior of SCA. Top half and bottom half show the mean of the search objective function and maximum constraint of the overall team respectively. The background shading distinguishes each planning horizon. Team negotiation gradually minimizes the search objective within each horizon. Constraints diminish at the end of each planning horizon, indicating feasible actions

6.1.3 Tightly constrained workspace

Figure 8 shows the cases with increasing safety radius \(R\) relative to the total size of the workspace. By increasing \(R\), the area of safety region of each UAV increases. This in turn reduces the flexibility in planning for a collision-free path, since the whole search space is now more constrained. The UAVs are placed in an initially collision-free position for all cases. In this test, we are less interested in comparing their information rate but instead focus on evaluating their performance in satisfying the constraints, appearing in Fig. 8d. Results show that the UAV distances are kept above their theoretical lower bound even with increasing constraint difficulties.
Fig. 8

Comparison of the 120-s mission outcome of SCA algorithm with three different safety radii R (7.5, 15 and 25 m). As the safety radius increases, the proportion of the UAV safety regions to the mission search area increases, which increases the difficulty of obtaining a team collision-free plan due to a more constrained joint action space. Results in (d) show the effectiveness of our algorithm in successfully bounding these constraints while performing the search mission

6.1.4 Monte Carlo simulation

In this example, a Monte Carlo simulation is performed for PS, PCA and SCA to further validate their information gathering and constraint handling capabilities. The experimental setting remains the same as in Sect. 6.1.2. For each case, simulations are performed for 200 randomly generated initial target belief distributions. Each randomized initial belief is a mixture of \(N_G\) Gaussians, where \(N_G\) is a randomized value between \(1\) and \(10\). Each Gaussian is a two-dimensional normal distribution with mean \(\mu \) and standard deviation \(\sigma \). Variable \(\mu \) is generated from a uniform distribution within the search workspace while \(\sigma \) is a positive random value less than a quarter of the workspace diameter.

Figures 9a–c show the cumulative probability of target detection of PS, PCA and SCA respectively. It can be observed that PS and SCA have more consistent information gathering curves while the behavior of PCA is more random due to the lack of information objective. The solid line on each plot is the mean information gathering curve over 200 runs and is superimposed on Figure 9d for further comparison. Figure 9d clearly shows that the average information gathering capability of SCA is expected to behave similarly to PS.
Fig. 9

Results of Monte Carlo simulation on 200 sets of randomly generated prior target beliefs. ac: Cumulative probability of target detections of PS, PCA and SCA respectively. The mean cumulative probability of detection for each case is displayed as the solid line and further superimposed on (d)

The constraint handling capability for each case is illustrated in Fig. 10. PS is seen to have the highest rate of constraint violation due to its lack of constraint handling capability. Furthermore, results show that PCA and SCA do not always result in feasible team solutions. Constraint violations are mainly seen in difficult optimization conditions. The use of constant velocity motion model and UAVs’ maximum rate of turn could cause deadlock situations, such as cases where multiple UAVs are heading towards each other. Nevertheless, SCA out-performed PCA in achieving about 45 % collision-free search missions. The search objective of SCA implicitly spreads the UAVs exploring the workspace to maximize information. This reduces the probability of the team falling into difficult constrained situations, thus increasing the chances of the gradient-based optimizer in solving for feasible team solutions.
Fig. 10

Constraint satisfaction analysis of results from Fig. 9. ac shows the ratio of constraint violation of PS, PCA and SCA respectively. PS is shown to have the highest rate of constraint violation due to the lack of constraint handling capability. SCA has the highest success rate of about 45 % in achieving a collision-free search mission. Both PCA and SCA are seen to have a maximum 12 % rate of constraint violations during search. These constraint violations are usually due to one or a combination of the following: (1) infeasible states (e.g. difficult scenarios where many UAVs heading towards each other), (2) non-holonomic constraints and action saturations (i.e. maximum turn rate), and (3) insufficient optimization loops to arrive at a feasible solution

6.2 Mobile robots experiment

We have further validated the SCA algorithm using four iRobot-Create indoor mobile robots. The main purpose of the experiment is to verify the online capabilities of the algorithm on systems with limited computation and communication resources.

6.2.1 Experimental setup

Experiments are undertaken in a \(5m\times 5m\) indoor environment. Each robot is equipped with an acoustic sensor module, a ZigBee communication module and an on-board ARM-architecture processing module running at 72Mhz (Lal and Fitch 2009). Robot localization is achieved by measuring the acoustic waves generated at 2 Hz frequency from the three acoustic speakers located around the field. This information is filtered to produce a 2D planar position measurement of the on-board acoustic microphone. This noisy global position measurement, together with odometry measurement, is then incorporated into an Extended Kalman Filter (EKF) that estimates the mean and covariance of the robot’s 2D position vector.

The robots operate at a speed of 0.05 m/s. The action horizon is 40 s discretized into \(5\) uniform segments. Optimization frequency is 2 Hz and replanning occurs every 24 s, resulting in 48 iterations per planning horizon. Communication between robots operates on a point-to-point basis and information is exchanged at 1 Hz with noisy delay between 0.1 and 2 s. The sensor model used here is the same as in the simulation experiments with parameters \(P_{d_{max}}=0.85\), \(\sigma =2\) and \(d_{max}=0.75 m\). A pure-pursuit guidance controller is used for reference path following. The safety radius of each robot is set to 0.4 m taking into account its physical size and its expected EKF uncertainty bound.

6.2.2 Experimental results

Figure 11 shows snapshots from a 600-second experimental run. The first row shows the actual experiment in a camera image frame while the second row is its simulation playback with recorded EKF localization solutions. For ease of visualization, the simulation playback is projected back onto the camera image frame. Note that the EKF-recorded robot safety discs are not directly at the center of the robots due to camera calibration error and EKF uncertainties.
Fig. 11

Snapshots of the 600-s four iRobot-Create target search experiment. ae illustrate the actual experiment with target belief and safety regions projected into the image frame through camera calibration. fj show the corresponding top view simulation playback. The smaller discs are the instantaneous safety regions of individual robots. The center of each robot’s safety region is obtained from the on-board EFK solution

The performance of this experiment can be observed by studying Fig. 12. The first figure shows the information gathering curve. High information rate similar to the SCA simulation results (see Fig. 6a) is expected due to the presence of the search objective function. Figure 12b shows that the minimum distances between robot pairs are kept above threshold throughout the experiment. The full experiment is thus able to run completely without any reactive obstacle avoidance system and human intervention.
Fig. 12

Results of the multi-robot experiment. a shows the cumulative probability of target detection. The high information gathering rate is due to the minimization of the search objective function component. The constraints component successfully maintained a minimum distance between robots and between robots and stationary obstacles, as in b. The process of gradient descent team negotiation can be observed in c, where the top half and bottom half show the mean of the search objective function and maximum constraint of the overall team respectively. The background shading distinguishes each planning horizon. The value of the search objective function generally decreases while constraints are inactive. Constraints diminish at the end of each planning horizon, indicating feasible actions

Figure 12c shows the team optimization process. At the beginning of each planning time slot, the initial action vector of each robot is obtained through RRT planning without the knowledge of others’ initial action vectors. This hardware experiment has a higher safety region to workspace area ratio compared to the SCA simulation in Sect. 6.1, resulting in a more constrained workspace. Because of this, constraints are more likely to be violated at the beginning of the planning stages compared to the SCA simulation results in Fig. 7. These constraint violations are subsequently minimized during the team optimization process. Furthermore, the objective function may increase while joint actions are optimized towards a feasible subspace. The objective function resumes in a decent direction once all constraints are feasible. It can be seen that no constraints are violated at the end of each planning time slot, resulting in continuous collision-free team trajectories.

7 Additional constraints

This section now extends the approach in modeling inter-agent safety constraints to the modeling of three additional constraint cases: (1) network integrity constraints, (2) stationary safety region, and (3) temporal-coverage. Capturing these important spatial–temporal constraints highlights the strength and adaptability of the approach. The constraints in the first two cases are modeled based on worst-case analysis. The approaches are similar to the case of inter-agent safety constraints. For temporal-coverage, we analytically model this constraint using forward reachable set analysis. Simulation results are subsequently presented for each constraint case.

7.1 Network integrity constraints

One of the most critical components in multi-agent systems is communication, especially for decentralized systems. Agents in a team are required to share information for decentralized data fusion and decentralized decision making. The network has to be connected in order for an agent to pass messages to other agents in the same network.

Agent \(i\) at time step \(t\) is assumed to have its communication region \(\Omega _i^t\), as depicted in Fig. 13. Any other agent within this communication region is able to receive information sent from agent \(i\). It is common to model a communication region purely as a function of Euclidean distance, where agents within this distance threshold are able to establish a communication link between agent pairs. With this type of communication model, we can now represent the communication region as a disc of radius \(r_i^t\) centered at \(p_i^t\) where \(p_i^t\) is the position vector of agent \(i\) at time \(t\). Agents are again assumed to be homogeneous and thus \(r_i^t=r_j^t=R_{c}\) where \(R_c\) is the required communication range.
Fig. 13

Maximum distance between two agents in maintaining a constant communication link configuration. In this configuration, each agent is inside the communication regions of other agents. Theoretical distance upper bound can be obtained by considering worst-case scenario with agents’ maximum rate of turn

By only imposing constraints at the end of each action segment, we need to analyze the worst-case scenario when agents are traveling along the segments in order to maintain constant communication between agent pairs. Figure 13 depicts the worst-case scenario of agents \(i\) and \(j\) both traveling along their maximum curvature segment. From geometry, the distance upper bound is:
$$\begin{aligned} d_{{mc,upper}}&= \sup _{t \in \left[ t^k:t^{k+1} \right] } d_{ij}\left( t \right) \nonumber \\&= r + 2 l_{{CD}} \nonumber \\&= r + 2\left( R_{{mc}} - l_{{CO}} \right) , \end{aligned}$$
where \(R_{{mc}}\) and \(l_{{CO}}\) are the same as in Eq. (9). The modified range \(r\) applied at the end of each segment can be obtained by setting \(d_{{mc,upper}} = R_{c}\), yielding:
$$\begin{aligned} r = R_c - 2 R_{{mc}} \left( 1 - \cos {\frac{\Delta \psi _{{mc}}}{2}} \right) . \end{aligned}$$
In modeling the communication constraint function, we use Eq. (13) as a reference. However, the difference here is we want to invoke communication constraints when the distance between agents is greater than the communication range. We approach this by defining the inverted normalized distance induced from the normalized distance
$$\begin{aligned} \hat{d}_{ij}^k = \frac{d_{ij}^k}{r} , \end{aligned}$$
and define:
$$\begin{aligned} \bar{d}_{ij}^{k} = {\left\{ \begin{array}{ll} \hat{d}_{ij}^k &{}\quad \text {if } \hat{d}_{ij}^k > 1 \\ 1 &{}\quad \text {if } \hat{\delta }_{ij}^k \le 1 . \end{array}\right. } \end{aligned}$$
The inverted normalized distance \(\bar{\delta }_{ij}\) is:
$$\begin{aligned} \bar{\delta }_{ij} = \left( \bar{d}_{ij} \right) ^{-1}. \end{aligned}$$
The communication constraint can now be represented using \(\bar{\delta }_{ij}\) instead of \(\bar{d}_{ij}\) :
$$\begin{aligned} G_{ij}^k = \frac{2}{\pi }\left( \arccos {\bar{\delta }_{ij}^k} - \bar{\delta }_{ij}^k \sqrt{1-\bar{\delta }_{ij}^{k 2}} \right) . \end{aligned}$$
Since this constraint model is now based on inverse distance, its partial derivative with respect to sensor states becomes:
$$\begin{aligned} \frac{\partial G_{ij}^k}{\partial s_{i}^k} = -\frac{r}{{{d}_{ij}^k}^2} h\left( \bar{\delta }_{ij}^k \right) \frac{\partial {d}_{ij}^k}{\partial s_{i}^k}. \end{aligned}$$
The shape of \(G_{ij}\left( \bar{\delta }_{ij} \right) \) and its gradient \(h\left( \bar{\delta }_{ij} \right) \) are shown in Fig. 14. Unlike the safety region constraint in Sect. 4, this inverted distance constraint reaches an upper bound when two agents are far apart and approaches zero when their distance is converging towards the communication range.
Fig. 14

a Communication constraint \(G_{ij}\) between two agents, and b its first derivative \(\frac{\partial G_{ij}}{\partial \bar{d_{ij}}}\) as a function of normalized distance. This constraint reaches an upper bound when two agents are out of communication range and approaches zero as distance decreases

7.1.1 Simulation: network integrity

This section demonstrates the behavior of Search with Communication Constraints (SCC). The search area is 300 m x 300 m, where the prior target belief is a mixture of Gaussians. Communication range is 100 m. The team network topology is a chain of connected UAVs. The nature of the initial belief is intended to separate the team heading towards two information rich areas. The chain can be easily disconnected without the presence of communication constraints.

Simulation snapshots are shown in Fig. 15. The top and bottom rows show the behavior of SCA and SCC. In Fig. 15g, the team tries to separate into two groups like the case in Fig. 15b. The communication constraints however, have successfully maintained the network topology. This can be seen in Fig. 16b, where the maximum distances between connected UAV pairs are upper bounded by the mission required communication range. However, the presence of communication constraints has clustered the team, limited the flexibility of the individuals in exploring the field. As a result, the information gain of SCC lags behind SCA from \(t=90\) s, as can be observed in Fig. 16a.
Fig. 15

Top view snapshots of the 240-s five-UAV target search mission subject to inter-UAV collision constraints and communication constraints. The gray region at \(t=0\) s is the initial target belief represented as a mixture of Gaussians. Safety radius is \(R = 7.5\) m and communication radius is \(R_c = 100\) m. ae: Search with Collision Avoidance (SCA). The absence of communication constraints allows agents to spread freely around the work space; fj: Search with Communication Constraint (SCC). The team is tied in a chain of connecting agents 3-2-4-1-5 (dark lines) which reduces the flexibility of the team.
Fig. 16

Comparison of the 240-s mission outcome of SCA and SCC. a the SCA reaches the information threshold earlier as agents are more flexible to travel around the field compared to the more constrained SCC scenario. The upper bounded distances between connected UAV pairs are shown in b

7.2 Stationary safety regions

Our approach to modeling inter-agent safety constraints can be extended to model static obstacles. Let \(\Omega _{o}\) be the safety region of a static obstacle \(o\), simplified and represented as a circular disc with radius \(R_o\) centered at \(p_o\). The constraint is imposed such that the safety region of agent \(i\) and static obstacle \(o\) shall not overlap, as in Fig. 17.
Fig. 17

Minimum shortest distance between an agent and a static obstacle in a collision-free configuration. In this configuration, agent \(i\)’s safety regions \(\Omega _i^k\) and \(\Omega _i^{k+1}\), and the modified safety region of static obstacle \(\Omega _o\) are marginally overlapping and thus collision-free. The theoretical distance lower bound can be obtained by considering the worst-case scenario with the agents’ maximum rate of turn

By analyzing this worst-case scenario, the distance lower bound between agent \(i\) and static obstacle \(o\) can be obtained from geometry:
$$\begin{aligned} d_{{mc,lower}}&= l_{{CE}} - l_{{CD}} \nonumber \\&= l_{{CE}} - \left( R_{{mc}}-l_{{CO}} \right) , \end{aligned}$$
where \(l_{{CO}}\) and \(l_{{AB}}\) are the same as in Eq. (9), and \(l_{{CE}}=\sqrt{\left( R + r_{o} \right) ^2 - \frac{1}{4}{l_{{AB}}}^2}\).
From Eq. (37), we can see that modifying the safety region of the agent and modifying the safety region of an obstacle are mathematically identical. Here, we have chosen to modify the safety region of static obstacles, as is the case shown in Fig. 17. By setting \(d_{{mc,lower}} = R+R_{o}\), the radius \(r_{o}\) of the obstacle modified disc can be solved:
$$\begin{aligned}&r_o = -R \quad + \nonumber \\&\quad \sqrt{ \left( R{+}R_o \right) ^2 {+} 2 R_{{mc}} \left( R{+}R_o{+}R_{{mc}} \right) \left( 1-\cos { \frac{\Delta \psi _{{mc}}}{2}} \right) }.\nonumber \\ \end{aligned}$$
We can model the constraint using the ratio of overlapping discs. However, this constraint function will be more complicated, since the size of safety regions between agent and obstacles are different. Instead, we can directly adapt the same constraint function from Eq. (13). Let \(d_{io}^k= ||{p}_i^k-{p}_o||\) be the Euclidean distance between the position vector of the center of agent \(i\) at time step \(k\) and the center of static obstacle \(o\), rewriting normalized distance based on Eq. (11) as
$$\begin{aligned} \hat{d}_{io}^k = \frac{d_{io}^k}{R+r_{{o}}} , \end{aligned}$$
and define:
$$\begin{aligned} \bar{d}_{io}^{k} = {\left\{ \begin{array}{ll} \hat{d}_{io}^k &{}\quad \text {if } \hat{d}_{io}^k<1 \\ 1 &{}\quad \text {if } \hat{d}_{io}^k \ge 1 . \end{array}\right. } \end{aligned}$$
The constraint function is thus:
$$\begin{aligned} G_{io}^k = \frac{2}{\pi }\left( \arccos {\bar{d}_{io}^k} - \bar{d}_{io}^k \sqrt{1-\bar{d}_{io}^{k 2}} \right) . \end{aligned}$$
This is equivalent to applying Eq. (13) on two uniform discs with mean radius \(\frac{1}{2}\left( R+r_o \right) \). Its partial derivative is:
$$\begin{aligned} \frac{\partial G_{io}^k}{\partial s_{i}^k} = \frac{1}{R+r_{o}} h\left( \bar{d}_{io}^k \right) \frac{\partial {d}_{io}^k}{\partial s_{i}^k} \,. \end{aligned}$$

7.2.1 Simulation: a cluttered environment

We demonstrate the behavior of the team performing search in a cluttered environment that contains stationary obstacles with different sizes. Figure 18 depicts the snapshots of this scenario with five stationary obstacles scattered in the field. From the traversed path of the UAVs, it can be observed that obstacles are avoided while performing search. This is further depicted in Fig. 19 where the minimum distances of UAVs to each obstacle are shown. All minimum distances are continuously bounded above the threshold.
Fig. 18

Top view snapshots of 240-s, five-UAV target search mission with stationary obstacle avoidance. Five circular obstacles with different radii are on the field represented by the circular regions. With static obstacle avoidance active, the UAVs plan paths that maximize search information but avoid planning into those restricted regions
Fig. 19

From top to bottom, minimum distances of UAVs to each stationary obstacle. Dashed lines are the mission specified safety distances between UAVs and each obstacle. Due to the worst-case analysis, all minimum distances are keep above the theoretical lower bound

7.3 Temporal-coverage

There are many scenarios where agents would have to pass through a certain area in the operational field at a specific time. In remote sensing and reconnaissance scenarios, agents may want to send cumulative sensor information onto ground stations for analysis and data processing at fixed periods. This will require the agents to account for temporal-coverage requirements while they are planning their future actions. In a receding horizon framework, one way of accounting for some temporal-coverage requirement at a future time \(T_s\) is to plan for an action horizon \(H\) long enough that includes \(T_s\). A longer horizon usually increases the size of the problem (size of action vector increases if we hold action time interval constant) which will be more difficult to solve. We propose a novel approach that integrates temporal-coverage requirements into our multi-step look-ahead planning framework without the need to vary the horizon length.

Considering the diagram shown in Fig. 20, let \(\Omega _s\) be the temporal-coverage attraction region represented as a circular region with radius \(R_s\) centered at \(p_s\) and time of arrival \(T_s\). Let \(\Omega _i^k\) be the maximum forward reachable set of agent \(i\) from current time \(t^k\) to \(T_s\). In order to ensure there exists a feasible path for agent \(i\) to reach \(\Omega _s\) at the required temporal specification, \(\Omega _i^k\) has to overlap or at least marginally overlap with \(\Omega _s\). This temporal-coverage constraint shares similar intention as the communication constraint described in Sect. 7.1 in terms of imposing a minimum distance requirement between two regions. The differences here are that the shape of \(\Omega _i\) is irregular instead of circular and it varies as a function of available time horizon. Note this time horizon is defined as the time available to reach the coverage region, which is independent of the action horizon \(H\). Constraining this temporal-coverage requirement at the end of a finite horizon trajectory ensures any states along the trajectory have at least one feasible path that intercepts \(\Omega _s\) at time step \(T_s\).
Fig. 20

An example of agent \(i\)’s maximum forward reachable set boundary. There exists at least one feasible path that leads the agent to \(\Omega _s\) at time step \(T_s\) if both regions are overlapping or marginally overlapping

A naive approach is to compute the full reachable set by uniformly sampling \(\Omega _i\) and plan for actions that ensure a sample exists inside \(\Omega _s\). Although this discrete form of the maximal reachable set can be pre-computed at the beginning of each replanning time slot, it is a relatively expensive process while performing constraint checks for large number of samples. Instead of computing the full reachable set, we propose a method that models this constraint explicitly through feasiblility analysis. For a path that satisfies this temporal-coverage requirement, the time available from \(t^k\) to \(T_s\) must be greater than or equal to the time required for agent \(i\) to reach \(\Omega _s\):
$$\begin{aligned} T_s-t^k \ge t_{\widehat{AB}} + t_{BC} - \frac{R_s}{V}, \end{aligned}$$
where \(t_{\widehat{AB}}\) and \(t_{BC}\) are the minimum time required to turn along arc \(\widehat{AB}\) and to travel from \(B\) towards \(p_s\). Rearranging terms yields:
$$\begin{aligned} V\left( \Delta t^k-t_{\widehat{AB}} \right) + R_s \ge d_{BC}, \end{aligned}$$
where \(\Delta t^k = T_s-t^k\) and \(d_{BC} = ||p_s-p_B||\).
Solving \(t_{\widehat{AB}}\) and \(d_{BC}\) requires us to identify the agent’s shortest path turning circle \(p_O\). The center of the two turning circles is:
$$\begin{aligned} \tilde{p}_O = p^k + R_{mc}\left[ \begin{array}{l} {\cos ({\psi ^k+\tilde{w}\frac{\pi }{2}})} \\ {\sin ({\psi ^k+\tilde{w}\frac{\pi }{2}})} \end{array} \right] , \quad \tilde{w} \in \{-1,1\}. \end{aligned}$$
The turning circle that yields that shortest path from \(A\) to \(C\) is:
$$\begin{aligned} \{p_O,w\} = \mathop {\arg \min } \limits _{\tilde{p}_O,\tilde{w}} ||\tilde{p}_O-p_s||. \end{aligned}$$
The term \(w\) describes the direction of the turn. Distance \(d_{BC}\) can be obtained as
$$\begin{aligned} d_{BC} = \sqrt{{d_{CO}}^2 - {R_{mc}}^2}, \end{aligned}$$
where \(d_{CO} = ||p_O-p_s||\).
To solve for \(t_{\widehat{AB}}\), two angles \(\alpha \) and \(\beta \) are needed:
$$\begin{aligned}&\alpha = \arcsin {\frac{R_{mc}}{d_{CO}}},\end{aligned}$$
$$\begin{aligned}&\beta = \arccos { \left( \hat{v_1} \cdot {\hat{v_2}} \right) }, \end{aligned}$$
where \(\hat{v_1} = \left[ \cos {\psi ^k},\sin {\psi ^k} \right] ^T\) and \(\hat{v_2} = \frac{p_s-p_O}{d_{CO}}\). The time to turn along arc \(\widehat{AB}\) is:
$$\begin{aligned} t_{\widehat{AB}} = \frac{\alpha +\beta }{\dot{\psi }_{\text {max}}}. \end{aligned}$$
Once we have \(d_{BC}\) and \(t_{\widehat{AB}}\), the inverted normalized distance function can be modeled by manipulating Eq. (44):
$$\begin{aligned} \hat{\delta }_{is}^k = \frac{V\left( \Delta t^k-t_{\widehat{AB}} \right) + R_s}{ d_{BC}}, \end{aligned}$$
$$\begin{aligned} \bar{\delta }_{is}^{k} = {\left\{ \begin{array}{ll} \hat{\delta }_{is}^k &{}\quad \text {if } \hat{\delta }_{is}^k<1 \\ 1 &{}\quad \text {if } \hat{\delta }_{is}^k \ge 1 . \end{array}\right. } \end{aligned}$$
The temporal-coverage constraint is thus:
$$\begin{aligned} G_{is}^k = \frac{2}{\pi }\left( \arccos {\bar{\delta }_{is}^k} - \bar{\delta }_{is}^k \sqrt{1-\bar{\delta }_{is}^{k 2}} \right) . \end{aligned}$$
Its partial derivative is relatively more complicated and it is included in “Temporal-coverage derivatives” section.
Special attention is required for Eq. (48). For the case
$$\begin{aligned} d_{CO} < R_{mc}, \end{aligned}$$
\(\alpha \) has no solution since \(p_s\) is inside the nearest turning circle. However, if the problem satisfies the condition
$$\begin{aligned} 2 R_{mc} \le R_{s}, \end{aligned}$$
we will thus have:
$$\begin{aligned}&d_{CO} < R_{mc} \nonumber \\&\therefore \quad d_{CO} + R_{mc} < 2R_{mc} \le R_{s}. \end{aligned}$$
Equation (56) guarantees that under condition (55), the nearest turning circle is completely bounded within \(\Omega _s\) for every instant of (54). In this case, there always exists a feasible path that satisfies temporal-coverage constraint in which we can safely set constraint \(G_{is}\) to zero.

7.4 Simulation: ground station temporal-coverage

In this final scenario, we demonstrate the ability of our algorithm to account for temporal-coverage constraints. A team of \(Q=5\) UAVs with id \(i \in [1,2,\ldots ,Q]\) are assigned to a decentralized search operation. A ground station monitoring the operation is located at the center of the map with a coverage radius of \(R_s = 50\) m. UAVs within this coverage region are assumed to be able to report collected observations to the ground station. In addition to the search mission objective previously defined in Eq. (28), each UAV is required to pass-by the ground station at \(H_s = 40\) s intervals in a predefined sequential ordering. This requirement can be modeled as a temporal-coverage constraint and integrated into decentralized decision making. Each UAV-i is initially assigned the value \(T_s^i = i \times H_s\) and this parameter is incremented by \(Q \times H_s\) after its groundstation revisit.

The simulation snapshots are illustrated in Fig. 21. Note the successful attempts of the UAVs passing by the ground station coverage region at the predefined time slots. The effect of this constraint allows the team to gather information freely around the workspace, but will soon get restricted to move further from the ground station when time of revisit is approaching. In Fig. 22, it can be seen that the distance between the UAVs and the ground station is less than the coverage radius at the required revisit time slots indicated by the triangular markers.
Fig. 21

Top view snapshots of 240-s target search mission with pre-specified temporal coverage constraint. The circular region represents the receiver zone of a ground station located at the center of the map. Each UAV is required to sequentially pass by this region every 40 s. af show successful pass-by of UAV [1,2,3,4,5,1] respectively
Fig. 22

Distances of UAVs to ground station. The horizontal line represents the radius of the ground station receiver zone. The triangle markers depict the time slot where the commanded UAVs were to pass by the ground station region. Markers below the thresholds are successful attempts

8 Discussion and future work

We have presented a novel set of spatial–temporal constraints for decentralized multi-agent information gathering tasks. We showed how these constraints can be efficiently integrated into an online constrained optimization framework and demonstrated the behavior of our algorithms through simulated examples of multi-UAV trajectory planning for the target search problem. To validate computational and communication efficiency for practical applications, we performed experiments with a team of indoor mobile robots with modest onboard computation and communication resources.

In this work, we demonstrated coordination in the context of probabilistic search, which is an important information gathering problem with many practical applications. However, our constraint models can also be directly applied to other information gathering objectives such as mapping and tracking.

Our current constraint models are restricted to 2D constant velocity motions and constant curvature segments. Important avenues for future work include extending the collision constraint model to 3D space and considering variable velocity and curvature along a segment. These extended models could then be applied to many 3D trajectory planning problems, such as 3D collision-free path planning. In particular, ellipsoidal safety regions are important to consider because localization uncertainty is usually represented as a Gaussian ellipsoid. Many missions and approaches are able to predict the future localization uncertainty of a platform along its planned trajectory, and the generalized collision constraint model could then take advantage of this localization uncertainty information in planning. With this model, it may be possible to represent safety constraints as soft constraints, in which the optimization problem can be formulated as the minimization of some form of an arbitrary risk reward function. It is also useful to generalize the constraint model to handle different forms of safety regions and other indoor obstacles such as straight wall edges.

The current form of network connectivity constraint is based on a constant network topology. This constant topology highly restricts the flexibility for the team since the network is rigidly connected. Extending the ability of this model to handle dynamic network topologies will have great impact on the team planning flexibility, potentially yielding better information gathering performance. For the case of temporal-coverage constraints, the current model is derived based on the absence of other spatial–temporal constraints that could potentially affect the shape of the agent’s reachable set. Extensions are necessary to generalize the model for arbitrary-shaped reachable sets. Furthermore, we would like to consider dynamic combinations of constraint models to yield more sophisticated coordination. For example, a network-connected team could decide to split and then regroup at a particular region at a specific time, sacrificing temporal network discontinuity in gaining a better team reward. When the team splits, network connectivity constraints could be replaced by a temporal-coverage constraint, and then later re-activated once the team meets.

Supplementary material

$$\begin{aligned} s^{k+1} = s^{k} + \left[ {\begin{array}{l} { \frac{V}{u^k} \left( \sin \left( \psi ^{k}+\Delta \psi ^k \right) -\sin {\psi ^{k}} \right) } \\ { \frac{V}{u^k} \left( -\cos \left( \psi ^{k}+\Delta \psi ^k \right) +\cos {\psi ^{k}} \right) } \\ { \Delta \psi ^k } \\ \end{array}} \right] , \nonumber \\ \end{aligned}$$$$\begin{aligned} \frac{{\partial {s}^{k + 1} }}{{\partial u^k }} = \left[ {\begin{array}{l} \frac{{V\Delta t^k \cos \psi ^{k + 1} - x^{k + 1} + x^{^k } }}{u^k} \\ \frac{{V\Delta t^k \sin \psi ^{k + 1} - y^{k + 1} + y^{^k } }}{u^k} \\ {\Delta t^k } \\ \end{array}} \right] . \end{aligned}$$$$\begin{aligned} s^{k + 2}&= {s}^{k + 1} \nonumber \\&+ \left[ {\begin{array}{*{20}c} {\frac{V}{{u^{k + 1} }}\left( {\sin \left( \psi ^{k+1} + \Delta \psi ^{k+1} \right) - \sin \psi ^{k + 1} } \right) } \\ {\frac{V}{{u^{k + 1} }}\left( { - \cos \left( \psi ^{k+1} + \Delta \psi ^{k+1} \right) + \cos \psi ^{k + 1} } \right) } \\ {\Delta \psi ^{k+1} } \\ \end{array}} \right] .\nonumber \\ \end{aligned}$$$$\begin{aligned} \frac{{\partial {s}^{k + 2} }}{{\partial u^k }} = \frac{{\partial {s}^{k + 1} }}{{\partial u^k }} + \frac{{\partial \psi ^{k + 1} }}{{\partial u^k }}\left[ {\begin{array}{c} { - \left( {y^{k + 2} - y^{k + 1} } \right) } \\ {x^{k + 2} - x^{k + 1} } \\ 0 \\ \end{array}} \right] . \end{aligned}$$$$\begin{aligned} \frac{{\partial { s^{k+1:k+N}}}}{{\partial v^k }} = \left[ {\begin{array}{l@{\quad }l@{\quad }l@{\quad }l} {\frac{{\partial {s}^{k + 1} }}{{\partial u^k }}} &{} 0 &{} \ldots &{} 0 \\ {\frac{{\partial {s}^{k + 2} }}{{\partial u^k }}} &{} {\frac{{\partial {s}^{k + 2} }}{{\partial u^{k + 1} }}} &{} 0 &{} \vdots \\ \vdots &{} {\frac{{\partial {s}^{k + 3} }}{{\partial u^{k + 1} }}} &{} \ddots &{} 0\\ {\frac{{\partial {s}^{k + N} }}{{\partial u^k }}} &{} \ldots &{} {\frac{{\partial {s}^{k + N - 1} }}{{\partial u^{k + N - 2} }}} &{} {\frac{{\partial {s}^{k + N} }}{{\partial u^{k + N - 1} }}} \\ \end{array}} \right] , \end{aligned}$$$$\begin{aligned} J_i\left( v_i^k, \alpha _{\mathcal {J}^{-i} i}^k \right) \!=\! \mathop \int \limits _{\xi } P \left( z_{i}^{k+1:k+N}\!=\!\overline{D}| \xi ,s_{i}^{k+1:k+N} \right) {}^{i}\bar{b}^k_\xi d\xi .\nonumber \\ \end{aligned}$$$$\begin{aligned} P\left( {z}^{k + 1:k + N}=\overline{D}|\xi ,s^{k + 1:k + N} \right)&= \prod \limits _{l = 1}^N {P\left( {z}^{k + l}=\overline{D}|\xi ,s^{k + l} \right) }\nonumber \\&= \prod \limits _{l = 1}^N {O\left( \xi ,s^{k + l} \right) }. \end{aligned}$$$$\begin{aligned} \frac{{\partial J_i}}{{\partial v_i^k }} = \left[ {\frac{{\partial J_i}}{{\partial u_i^k }}, \ldots ,\frac{{\partial J_i}}{{\partial u_i^{k + N - 1} }}} \right] ^T . \end{aligned}$$$$\begin{aligned}&\frac{{\partial J_i }}{{\partial u_i^{k + m} }}\nonumber \\&\quad = \mathop \int \limits _\xi {\sum \limits _{\begin{array}{c} n =\\ m + 1 \end{array}}^N {\frac{{\partial O\left( {s_i^{k + n} },\xi \right) }}{{\partial u_i^{k + m} }}\prod \limits _{\begin{array}{c} l = 1 \\ l \ne n \end{array}}^N {O\left( {s_i^{k + l},\xi } \right) {}^{i}\bar{b}_\xi ^k d\xi } } } \nonumber \\&\quad = \mathop \int \limits _\xi {\sum \limits _{\begin{array}{c} n =\\ m + 1 \end{array}}^N { \left( {\frac{{\partial O\left( {s_i^{k + n} },\xi \right) }}{{\partial {s_i} }}\frac{{\partial s_i^{k + n} }}{{\partial u_i^{k + m} }}} \right) \prod \limits _{\begin{array}{c} l = 1 \\ l \ne n \end{array}}^N {O\left( {s_i^{k + l} ,\xi } \right) {}^{i}\bar{b}_\xi ^k d\xi } } } , \nonumber \\ \end{aligned}$$$$\begin{aligned} O\left( \xi , s \right) = 1 - P_{d_{max}} e^{-\sigma \left( \frac{d}{d_{max}} \right) ^2}. \end{aligned}$$$$\begin{aligned} \frac{{\partial O\left( {\xi ,s } \right) }}{{\partial { p}}} = \frac{{2\sigma }}{{d_{\max }^2 }}\left( {{ p} - \xi } \right) \left( 1-O\left( {\xi ,s } \right) \right) , \end{aligned}$$$$\begin{aligned} \frac{{\partial O\left( {\xi ,s } \right) }}{{\partial {s}}} = \left[ \frac{{\partial O\left( {\xi ,s} \right) }}{{\partial { p}}}, 0 \right] . \end{aligned}$$$$\begin{aligned} \frac{\partial G_{is}^k}{\partial s_{i}^k} = h\left( \bar{\delta }_{is}^k \right) \frac{\partial {\bar{\delta }}_{is}^k}{\partial s_{i}^k} , \end{aligned}$$$$\begin{aligned} \dfrac{\partial \delta }{\partial s}= -\frac{-R_{mc}}{d_{BC}}\left( \frac{\partial \alpha }{\partial s} + \frac{\partial \beta }{\partial s} \right) - \frac{d_{CO} \delta }{{d_{BC}}^2}\left( \frac{\partial d_{CO}}{\partial s} \right) . \end{aligned}$$$$\begin{aligned} \frac{\partial d_{CO}}{\partial s} = \left[ \begin{array}{c} {\frac{p_O-p_s}{d_{CO}}} \\ { \frac{\left( x_O-x \right) \left( y-y_s \right) +\left( y_O-y \right) \left( x_s-x \right) }{d_{CO}}} \end{array} \right] , \end{aligned}$$$$\begin{aligned} \frac{\partial \alpha }{\partial s} = -\frac{R_{mc}}{{d_{CO}}^2\cos {\alpha }} \frac{\partial d_{CO}}{\partial s}, \end{aligned}$$$$\begin{aligned} \frac{\partial \beta }{\partial s} = \left[ \begin{array}{c} {\frac{{d_{CO}}^2\cos {\psi }-a_1(x_s-x_O)}{{d_{CO}}^3\sin {\beta }}} \\ {\frac{{d_{CO}}^2\sin {\psi }-a_1(y_s-y_O)}{{d_{CO}}^3\sin {\beta }}} \\ {\frac{a_2 {d_{CO}} - wR_{mc} d_{CO} + a_1 \frac{\partial d_{CO}}{\partial \psi } }{{d_{CO}}^2\sin {\beta }}} \end{array} \right] , \end{aligned}$$$$\begin{aligned} a_1 = (x_s-x_O)\cos {\psi } + (y_s-y_O)\sin {\psi }, \end{aligned}$$$$\begin{aligned} a_2 = (x_s-x_O)\sin {\psi } + (y_s-y_O)\cos {\psi }. \end{aligned}$$

Copyright information

© Springer Science+Business Media New York 2014