Introduction

The demand for multi-robot systems (MRSs) is increasing, due to their performance, flexibility, and fault tolerance [1, 2]. Successful multi-robot deployments have been completed in a range of domains, such as fulfilment centres [3], fruit fields [4], and roads [5]. For safe and robust multi-robot coordination in the real world, it is often desirable to consider formal models of the MRS, which enable policy synthesis for well-defined objectives, as well as a formal analysis of such policies. In this review paper, we consider formal models that capture the task-level behaviour of the MRS. These model high-level capabilities such as navigation or manipulation whilst abstracting the lower-level control required to implement these capabilities. Formal models are used alongside multi-robot planning [6] and reinforcement learning (RL) [7] techniques to synthesise robot behaviour, and alongside model checking [8] and simulation [9] techniques to evaluate task-level metrics of multi-robot performance. However, the success of these techniques is limited by the model’s accuracy, in particular its capacity to capture and predict execution-time multi-robot behaviour [10]. For example, if we plan on an inaccurate model, our expectations of robot behaviour during planning diverge from what is observed during execution, which can lead to inefficient execution-time behaviour or robot failure in the worst case.

In this paper, we focus on modelling the stochasticity of MRSs as, in any environment, robot behaviour is affected by the stochastic dynamics of the environment and the other robots. For example, a mobile robot operating in an office may fail to navigate upon a door being closed unexpectedly, or it may be unable to dock at a charging station if another robot is charging for longer than expected. We begin by introducing the types of uncertainty encountered by MRSs, including uncertainty over action outcomes [11], a robot’s current state [12], and the duration and start time of robot actions [13, 14••]. Next, we review modelling formalisms which capture these sources of uncertainty. We then describe how formal multi-robot models have been used to support advances in the application of planning, RL, model checking, and simulation techniques to MRSs.

Uncertainty in Multi-Robot Systems

In this section, we outline the common forms and sources of uncertainty experienced by MRSs.

Outcome Uncertainty

Robot uncertainty is most commonly captured over discrete action outcomes [11], such as whether a grasp action is executed successfully. Stochastic outcomes can occur due to robot navigation failure [15], battery depletion [16], or stochastic features of the environment such as hazards [17], resources [18], and doors [19].

Partial Observability

In some MRSs, robots only partially observe the environment, which prevents them from knowing each other’s states. This is often caused by limited communication and sensing capabilities, such as imperfect localisation [20], limited network range [21], or object occlusion [22]. Under partial observability, robots form a belief over the true state of the environment and other robots using possibly noisy observations obtained from sensors.

Temporal Uncertainty

Sources of temporal uncertainty affect the duration and start time of robot actions during execution [13, 23, 24]. Temporal uncertainty occurs in almost any robot environment, where action durations are affected by environmental disturbances, such as unknown obstacles or adverse weather conditions. For example, a mobile robot’s tire may slip on a carpet whilst navigating through an office, slowing it down. Furthermore, robots may have to wait for stochastic temporal processes in the environment, such as order arrival in a fulfilment centre, before beginning task execution [25].

The Effect of Robot Interactions

A particularly relevant driver of uncertainty in MRSs is the fact robots typically share resources, such as space or access to a charging station, and must interact with each other [14••]. For example, when multiple mobile robots navigate in the same physical space simultaneously, they may experience congestion, which increases uncertainty over action duration [23]. Alternatively, a robot manipulator may be more likely to fail a grasp if another robot is nearby, restricting its movement.

Formal Multi-Robot Models

In this section, we review modelling formalisms for MRSs, which we summarise in Table 1. At their foundation, each of these models consists of states, which describe a snapshot of the MRS and environment, and transitions between states, which define the system dynamics.

Table 1 A summary of multi-robot modelling formalisms

Classical Multi-Robot Models

Joint transition systems (JTSs) model MRSs with deterministic dynamics [10, 38,39,40,41]. JTS states are often factored into local states for each robot, e.g. their location and battery level, and a shared set of global state features, such as whether doors in the environment are open. JTSs are fully deterministic and so fail to capture the stochastic dynamics of real robot environments. Multi-agent Markov decision processes (MMDPs) are a natural extension of JTSs to stochastic domains [6]. Similar to JTSs, MMDPs capture robots in a joint state and action space, but MMDP actions have probabilistic outcomes. MMDPs are a common formalism for MRSs and have been used to model drone fleets [42••], warehouse robots [25], and human–robot teams [43]. MMDPs and JTSs assume synchronous execution, i.e. robots execute their actions in lockstep, and all actions have the same duration. Furthermore, the joint state and action spaces yield an exponential blow-up in the number of robots being modelled. In practice, robot action durations are inherently continuous and uncertain, where robot interactions contribute towards this uncertainty [14••, 23, 24, 44, 45]. Thus, to accurately capture multi-robot behaviour, we require formalisms which model asynchronous multi-robot execution and uncertainty over action duration. One approach for explicitly doing this is to use continuous-time Markov models, which we discuss later in this section.

Avoiding the Exponential Scalability of Joint Models

The number of MMDP or JTS states and actions increases exponentially in the number of robots [6], which makes optimal solutions for planning [46], RL [47], and model checking [10] intractable. This can be improved by making different assumptions which simplify the model. In fact, there has been a significant research effort to identify realistic assumptions for specific multi-robot problems. Transition-independent MMDPs (TI-MMDPs) [26] and constrained MMDPs (CMMDPs) [27•] assume the transition dynamics of each robot are independent, but couple the MRS through rewards and shared resources, respectively. Team MMDPs [28] also treat the transition dynamics independently, modelling robots sequentially in the context of simultaneous task allocation and planning problems. Transition independence assumptions allow for weakly coupled models that operate outside of the joint state and action space and reduce the model size, thus facilitating the use of more efficient solution methods. However, in cases where execution-time robot interactions affect the outcome and duration of robot actions, the transition-independent models above are unable to accurately reflect the MRS.

For many multi-robot problems, robots can act independently for the majority of execution, as interactions are sparse. For example, two robots conducting a handover can ignore each other until they are close. Interaction-driven Markov games (IDMGs) [29] and decentralised sparse interaction MDPs (Dec-SIMDPs) [30, 48] exploit this to reduce the space complexity whilst still accounting for execution-time interactions. IDMGs and Dec-SIMDPs are equivalent and capture an MRS using an independent MDP per robot and a set of interaction MMDPs, which define joint MRS behaviour in interaction areas, such as near a doorway. Though interaction MMDPs are joint models, they are significantly smaller than the full MMDP, as they are defined over only a small fraction of the full MMDP state space. However, these models are only useful when interactions are localised to a small, fixed part of the environment. If this does not hold, they become equivalent to the full MMDP.

Finally, a commonly used approach to avoid the use of joint models whilst still considering robot dependencies and execution-time interactions is to model the MRS as a set of single-robot models that are extended to include some knowledge of the other robots. In [25, 31], spatial task allocation problems (SPATAPs) are modelled using single-robot models which aggregate the response of the other robots. The aggregate response is represented as a distribution which predicts whether any robot is present at a given location. This is computed by combining individual distributions over each robot’s location and allows robots to predict which tasks will be handled by other robots during planning. A similar approach is taken in [23], where an MRS is modelled using single-robot time-varying Markov automata (TVMA) which capture the probabilistic effects of congestion caused by the other robots. In this context, congestion is represented as a distribution over the number of robots present at each area of the environment, and distributions of navigation duration under the presence of a specific number of robots are obtained from real-world multi-robot navigation data. To solve multi-robot planning problems, [24] augment single-robot models with a cost function which captures the effects of robot interactions. This cost function is then adjusted iteratively during planning to encourage robot collaboration.

Partially Observable Multi-Robot Models

Partially observable MDPs (POMDPs) are widely used to model partially observable problems, where robots make observations which update their belief over their current state [12]. Decentralised POMDPs (Dec-POMDPs) extend POMDPs to multi-robot settings [32], where each robot has its own set of local observations. Dec-POMDPs have been used for warehouse robotics [49], cooperative package delivery [34], and teams of unmanned aerial vehicles [50]. If the combined local observations of each robot uniquely identify the joint state, Dec-POMDPs are reduced to Dec-MDPs, which are easier to solve [32]. However, these are still joint models, and optimal solvers for both Dec-POMDPs and Dec-MDPs have even higher time complexity than MMDP solvers [32]. To reduce the space complexity related to the joint modelling in Dec-POMDPs, [51, 52•] consider decoupling them into local POMDPs for each robot. For each of these local POMDPs, they compute a distribution which captures how external state factors influence its local state. These external state factors include the states of the other robots. This is then used to marginalise out the external state factors to construct single-robot POMDPs. This influence based abstraction produces smaller models. However, computing influence distributions is intractable in general [52•].

Another class of relevant POMDP-based models are macro action Dec-POMDPs (MacDec-POMDPs) [33••] and decentralised partially observable semi-MDPs (Dec-POSMDPs) [34], which consider macro actions which execute a series of primitive low-level actions, such as moving one grid cell forward. This hierarchical paradigm is based on the options framework [53] for MDPs and has two main benefits. First, it reduces model size by leveraging existing behaviour, such as navigation, and modelling behaviour at the macro-action level, rather than each time step. Second, the use of temporally extended actions seamlessly enables asynchronous action execution. Each MacDec-POMDP and Dec-POSMDP has an underlying Dec-POMDP which captures the low-level actions that form the macro actions. For MacDec-POMDPs, the underlying Dec-POMDP and the policies for each macro action are assumed to be known [54]. MacDec-POMDP policies can then be evaluated by unrolling the macro actions on the low-level Dec-POMDP. Unlike MacDec-POMDPs, Dec-POSMDPs capture macro actions using distributions over their completion time, where Dec-POSMDP policies can be evaluated through simulation.

Continuous-Time Multi-Robot Models

Several models have been proposed to take into account uncertainty over action duration in the context of MRSs which are evolving asynchronously. These make use of continuous-time distributions which capture the stochasticity in robot action durations. Continuous-time MDPs (CTMDPs) extend MDPs to include durative transitions represented as exponential delays [35] and have been used to model multi-robot data collection problems [55]. To model asynchronous multi-robot execution, CTMDPs can be defined over a joint state and action space, similar to MMDPs. Thus, as with MMDPs, they scale exponentially in the number of robots. To mitigate this, [55] constructs single-robot CTMDPs assuming transition independence, similar to [26, 27•]. The duration of each action in a CTMDP is modelled with a single exponential distribution. This is a convenience which allows for simpler solution approaches which exploit the memoryless property of the exponential distribution, but limits the accuracy with which we can capture robot action durations.

Many multi-robot models can capture heterogeneous MRSs (see Table 1), where robots have different capabilities and resource usage etc. This is often achieved using local action spaces or reward functions for each robot. Generalised stochastic Petri nets (GSPNs) [36] are a modelling formalism for homogeneous MRSs, i.e. the robots are identical, where robots are represented anonymously as tokens. Furthermore, as in CTMDPs, durations are restricted to exponentials. GSPNs remain exponential in the team size, but robot anonymity provides a practical reduction in the number of states. GSPNs have been used to model teams of football robots [56], autonomous haulers [57], and monitoring robots [58]. Generalised semi-MDPs (GSMDPs) can capture concurrent execution and stochastic durations and have been applied to MRSs in [37, 44], but are complex to define and hard to solve, as GSMDPs allow for arbitrary duration distributions. Multi-robot Markov automata (MRMA) [14••] also allow for arbitrary duration distributions to capture asynchronous multi-robot execution in continuous time. Markov automata (MA) extend MDPs and CTMDPs by explicitly separating instantaneous robot action choice and the duration of robot actions [59]. MRMA are joint models, where robot action durations are represented as phase-type distributions (PTDs), which are sequences of exponentials capable of capturing any nonnegative distribution to an arbitrary level of precision [60]. In an MRMA, there is a different duration distribution for each spatiotemporal situation an action may be executed under, referred to as the context, which captures the effects of robot interactions on action execution. By separating robot decision-making from action duration, robot interactions can be detected at the instant an action is triggered by analysing the joint MRMA state. MRMA are connected to other continuous-time multi-robot models. First, GSPN semantics can be described with an MA [61]. Second, a standard solution for GSMDPs involves converting all duration distributions into PTDs [60], which produces a model similar to an MRMA [37]. However, MRMA are simpler to define and can be solved directly [62], as all durations are exponentials/PTDs by definition.

Model Applications

In this section, we discuss how the multi-robot models in Table 1 have been solved and analysed for multi-robot planning, RL, model checking, and simulation. We summarise this discussion in Table 2. Note that in Table 2, we do not list foundational works which apply to more general models, such as heuristic search approaches for MDPs which can be applied to MMDPs [46] or MA model checking techniques which can be applied to MRMA [62].

Table 2 Applications of the models in Table 1 for multi-robot/multi-agent problems

Planning

Multi-robot planning techniques synthesise robot behaviour given a formal model of the system. Many multi-robot models can be solved with standard techniques. MMDPs can be solved exactly using MDP solvers such as value or policy iteration [100, 101]. However, these methods solve for all states, making them intractable for joint multi-robot models. Heuristic and sampling-based methods such as labelled real-time dynamic programming [102] or Monte-Carlo tree search [103] improve upon the limited scalability of exact solvers by restricting search to promising areas of the state space. Despite reducing the explored states, heuristic algorithms are slow to converge on large models, but often provide anytime behaviour such that valid solutions are synthesised quickly and improved with time. The poor scalability of MMDP planning motivates planning on simplified models. For TI-MMDPs [26], transition independence allows for compact representations of reward dependencies in conditional return graphs, which admits efficient solutions. For Dec-SIMDPs and IDMGs, the single-robot MDPs and interaction MMDPs can be solved separately using standard solvers such as value iteration [29]. Similarly, the SPATAP models in [31] are single-robot MDPs which capture the effects of the other robots, and can be solved separately. CMMDP approaches typically exploit the fact that only the resource constraint couples the agents to scale to larger problems. Planning for CMMDPs has considered a range of constraints over resource consumption, such as bounding its worst-case [67], considering a chance-constraint [68, 71], and bounding its conditional value at risk [72].

MMDPs can be solved tractably if they are sufficiently small. Therefore, in [63], robots are grouped into clusters based on robot dependencies, and each cluster is solved as a separate MMDP. Similarly, in [64], robots are incrementally added to an MMDP to control scalability.

Recent work [42••] has begun to address the poor scalability of MMDP planning. There, an anytime planner for MMDPs based on Monte-Carlo tree search is presented, where robot dependencies are exploited to decompose the value function into a set of factors from which the optimal joint action can be computed. This approach scales to previously intractable problems.

Solution methods for continuous-time multi-robot models differ depending on the objective. To solve CTMDPs for time-abstract objectives, such as expected untimed reward, MDP solvers are applied to an embedded time-abstract MDP. For timed objectives, MDP solvers are instead applied to a uniformised MDP, where each state has the same expected sojourn time [37, 104, 105]. Similarly, GSPNs can be converted to an MDP [57] or an MA [58] dependent on the objective and solved with standard techniques. For MRMA, we can plan using MA solution methods [62].

Dec-POMDPs can be solved centrally to synthesise local policies for decentralised execution, which map from local action-observation histories to actions [50, 77,78,79]. With this, local Dec-POMDP policies are robust to communication limitations and unreliable sensors. Dec-POMDP solutions can be adapted to MacDec-POMDPs and Dec-POSMDPs to synthesise policies over macro actions. In [89], the space of macro-action policies is searched exhaustively, where efficient simulators improve the scalability of policy evaluation [49]. This approach scales poorly, which is addressed in [90], where a heuristic search method optimises finite state controllers for each robot. However, MacDec-POMDP and Dec-POSMDP solutions have not been shown to scale beyond teams of around four robots [33••, 34].

Reinforcement Learning (RL)

An alternative approach to policy synthesis is RL [47]. Planners synthesise behaviour using a model of the system, whereas RL approaches learn behaviour using data sampled from the environment [46, 47]. Multi-robot RL problems are formulated assuming an underlying multi-robot model which is unknown prior to training. Fully observable, centralised problems can be formulated as an MMDP [65, 66] and solved using standard RL techniques such as deep Q-learning [106]. However, these techniques do not scale to multi-robot problems due to the exponential increase in the state and action space [66, 80••]. In many settings, decentralised policies are required due to limited communication or partial observability [80••, 81]. Here, multi-robot RL can be formulated as a Dec-POMDP and solved under the paradigm of centralised training with decentralised execution [107], which allows additional state information not available during execution to be used during training, such as the joint state. One example of this paradigm is QMix [80••], which uses a mixing network to estimate the joint Q value from single-robot Q values. RL techniques for Dec-POMDPs are still slow to converge, however, and so MacDec-POMDPs can be used to exploit existing behaviours and improve the efficiency of learning [92,93,94,95].

Model Checking

Model checking techniques evaluate the behaviour induced by robot policies by systematically checking if a property is satisfied in a formal robot model [10]. Properties are often specified with temporal logics such as linear temporal logic (LTL) or continuous stochastic logic (CSL). Similar to planning, many of the multi-robot models in Table 1 can be verified using techniques for more general models. For example, LTL formulae can be verified on JTSs and MMDPs using techniques for transition systems and MDPs [10]. However, exact LTL model checking approaches compute a product of the model and an automaton that captures the LTL formula, which significantly increases the state space, making them unsuitable for multi-robot problems. MRMA can be model checked against CSL formulae using model checking techniques for MA [62]. This also applies to GSPNs, which can be represented as an MA with identical semantics [61]. Similar CSL model checking techniques are available for CTMDPs [108].

Model checking and planning are often combined to synthesise guaranteed multi-robot behaviour. For LTL specifications, we can plan over a joint product automaton; however, this quickly becomes intractable. To overcome this, [28] concatenate single robot product automata through switch transitions in a team MMDP to reduce the state space. For MMDPs, in [64], robots are added incrementally to a product automaton until the full problem is solved or a fixed computational budget is exceeded. Alternatively, in [41], the product automaton is explored incrementally through sampling for MRSs modelled as a JTS. Combined planning and model checking techniques have been used for multi-robot data gathering [38, 39], monitoring [40], and mobility-on-demand [64].

Statistical model checking (SMC) techniques evaluate properties by sampling through a model given a set of robot policies, which avoids enumerating the state space [109] and bridges the gap between model checking and simulation techniques, which we discuss later in this section. In [8], SMC is used to evaluate quantitative properties of an MRS. SMC techniques can be applied to many of the models in Table 1. For example, we can use SMC techniques for MA [110] to evaluate bounded or unbounded properties on an MRMA. A drawback of SMC is a possible failure to explore states reached with low probability, which can render SMC unsuitable for safety critical systems [110].

Simulation

Simulators evaluate multi-robot behaviour by executing a set of robot policies in an abstracted environment model. Using formal multi-robot models, we can create a discrete-event simulator (DES) by sampling stochastic outcomes and durations and resolving non-determinism using robot policies. DESs mitigate the complexity of physics-based simulators such as Gazebo [111] by abstracting away low-level robot dynamics [112], allowing simulations to run magnitudes faster than real time. GSPNs, or variants thereof, have been used to simulate teams of football robots [56] and human–robot manufacturing teams [99], respectively. In [14••], a DES called CAMAS (context-aware multi-agent simulator) samples through an MRMA to evaluate task-level metrics of multi-robot performance under the effects of robot interactions, such as the time to complete a set of tasks.

Conclusions

In this paper, we reviewed modelling approaches for capturing the task-level behaviour of MRSs. We focused on stochastic models of multi-robot execution and introduced the different types of uncertainty encountered by MRSs. Furthermore, we discussed how these models have been used for multi-robot planning, RL, model checking, and simulation. Recent research has focused on constructing models which accurately capture the effects of uncertainty and robot interactions or constructing models small enough to be solved efficiently. These two objectives are opposing, as to accurately capture multi-robot execution, we often require joint models which are frequently intractable to solve or analyse. Therefore, future research should focus on developing smaller multi-robot models which still accurately capture uncertainty and robot interactions. This may be achieved by identifying realistic assumptions over the sources of uncertainty and robot interactions, such as interactions only occurring in small portions of the state space. Exploiting these assumptions allows for smaller models which can be solved efficiently without sacrificing model accuracy. An alternative avenue for research is to exploit the structure of multi-robot problems, such as factored state spaces and dependencies between robots, to develop scalable solution methods for multi-robot models.